The HD-ASCII File Format
This component implements the HD-ASCII File format, originally
designed and implemented by Jan Simon in Matlab for use in the Gaitlab
Heidelberg.
The file format is a container format for any data organized in
matrices. The data types are orientated on the types available in the
Matlab environment. Each matrix of data is named, but there is no
defined order.
Disadvantages
The HD-ASCII file format is extensivly used in the Gaitlab Heidelberg
but it is not recommended for Nimue based applications because of
different limitations:
- The resolution of the numbers is insufficient defineable.
This can result in ilusive bugs.
- To ascertain which data is available the complete file must
be
read because there is no order of the data and no header with the
labels.
- If timeseries are saved and also some additional
parameters,
there is no way to detect the length of the timeseries because it is
not defined which dimension of the multidimensional matrices
corresponds to the time.
- It is not possible to save data from more than one trial,
e.g. data of a complete session.
- It mixes data types with saved data. Each matrix has a
header and if this is closed with a "0" no data for this matrix is
saved.
Alle of these limitations are overruled by the Nimues default open and
flexible .d3d file format.
File extentions
To distinguesh different sets of data in the Heidelberg Gaitlab
different file extentions are defined.
| File Extension |
Frames |
Description |
Fix set of variables |
.glk
|
single trial |
Lower Body Kinetics (PiG) |
X |
| .glkn |
single trial |
time normalized .glk |
X |
.glm
|
single trial |
Lower/Upper Body Kinematics (PiG) |
X |
| .glmn |
single trial |
time normalized .glm |
X |
.gle
|
single trial |
EMG data |
X |
| .glen |
single trial |
time normalized .gle |
X |
.gla
|
mean * |
lower/upper body kinematics (PiG) |
X |
.pkl
|
|
Meta Data |
X |
.glx
|
single trial |
experimentel |
|
| .glxn |
single trial |
time normalized .glx |
|
.gxa
|
mean * |
experimentel |
|
.gaf
|
mean * |
Foot Kinematics |
X |
.glf
|
single trial |
Foot Kinematics |
X |
| .glfn |
single trial |
time normalized .glf |
X |
* session mean; normalized to 101 frames
If a single trial is splittet into strides and timenormalized for each
stride a seperate file is created. These files are numbered as the
following examples shows:
xxx.glx --> xxx_00.glxn, xxx_01.glxn, xxx_02.glxn
General
Most of the following parts of the documentation is a formatted copy
from Jans original
documentation.
The ASC-HD files have been invented as simple and human-readable data
files without limitations for certain computer types or software for
accessing. Therefore the data are save in ASCII mode with the 127-bit
character set. A file can include an arbitrary number of variables,
which can have the types double, string (which is a character array) or
string list. Each variable can have arbitrary number of dimensions of
any size limited just by the physical properties of the computer
system. In addition the first line contains a header with a section for
a user-defined control string. The number of significant figures for
double values is defined in the header also.
Details
The order of variables in the file is arbitrary.
Although there is no reason to prefer DOS, the ASC-HD style files
should use DOS line breaks, which have the ASCII code [13, 10]. The
files are closed with a trailing line break, what is usual for unix.
The names of the variables are built up by upper and lower case
characters from 'a' to 'z', 'A' to 'Z', '0' to '9' and '_' with a
leading letter. Dots '.' are allowed to separate sub-variables, so they
must not be leading or trailing or following eachother.
Examples of valid names: A, A1, A2_, A_B,
A.C,
A.D.E, b, bcd
Examples of invalid names: 1A, _B, .C, D., e..f, g.h., j$
In the files the section for each variable starts with a tag line
containing the name surrounded by square brackets and followed by the
dimensions of the data. The separators between the dimensions determine
the type of the data: doubles use the colon ':', strings the dollar '$'
and string lists the ampersand '&'. Comments start with an '#'
and are allowed in the lines with the variables only.
The values start in the following line with different strategies for
the data types:
Character arrays (strings): The strings are the rows of a character
array. Only [1 x N] or [M x N] character arrays are allowed in opposite
to string lists and double arrays. The single strings are written as
lines and separated by line breaks. Therefore character array must not
contain line breaks (ASCII 10 and 13). The number of characters is
equal for all strings of a character array, so strings are padded with
trailing spaces (Matlab style). The length of the strings need not be
specified, because it is implicitly determined by the number of
charcters in the lines of the file. Therefore only the index N matters
for [N x M] character arrays. In consequence the tag lines
"[StringName]$4", "[StringName]$1$4" and "[StringName]$4$1" bahave
exactly the same.
String lists (Matlabs cell string)
The strings of a string list can vary in the length and no trailing
spaces are needed. The single strings of the list are written as lines
separated by line breaks. Therefore strings must not contain line
breaks (ASCII code 10 or 13). For string lists with more than 2
dimensions the first index is varied at first, then the second and so
on, so-called
columnwise order (see examples).
Double arrays
For doubles with 2 dimensions the rows (varying 2nd index) are stored
in lines separated by spaces. For more then 2 dimensions the 2nd index
remains the number of elements per line, while the trailing dimensions
are kept together (see examples and algorithm). This means, that row
vectors (dimension [1 x N]) are written in a single line, while column
vectors (dimension [M x 1]) fill M lines. It is not allowed to write
column vectors in a line, because the number of lines must match the
product of all but the 2nd dimension under all circumstances. Then the
reading function can skip a value section by jumping over a
well-defined number of lines to find the next tag line.
For doubles and string lists at least 2 dimensions are assumed (as
usual in Matlab). Scalars and vectors have the size [1 x 1], [N x 1] or
[1 x M] respectively. In the tag lines the sizes of [1 x M] column
vectors can be abbreviated by omitting the first index. Scalar
variables need no dimensions at all, so just a separator for
determination of the type of data has to be included in the tag line.
For scalar doubles even this separator is optional. See examples for
illustration.
Additional empty lines are allowed before a tag line, but not inside
the value sections.
The IEEE equivalents of NaN (not a number) and +/-Inf (infinity) double
values are written as 'NaN' and 'Inf' or '-Inf', respectively.
Examples: doubles
For a scalar double 'A' with value 2 the tag line and data line is:
[A]:1:1
2
The tag line can be abbreviated (all types, seperator depends on type):
[A]:1
or (all types, seperator depends on type)
[A]:
or just (separator can be omitted for doubles only)
[A]
For row vectors of size [1 x N] the first dimension need not be
mentioned. The tag and data lines for 'B' with value [3,4] are:
[B]:2
3 4
The long form of the tag line without abbreviations is valid, too:
[B]:1:2
A double matrix of size [2 x 3] with values [1,2,3;4,5,6] with name 'C':
[C]:2:3
1 2 3
4 5 6
Multi-dimensional doubles
The approach to use the second dimension as lines and keeping the
trailing dimensions together enforces a complex ordering of values. In
consequence the array is split at the first dimension in parts.
An generic Matlab algorithm for an array [A] with dimension [Dims] is:
Ax = transpose(reshape(A, Dims(1), prod(Dims(2:end)));
fprintf(FID, Format, Ax);
Here [Format] if a format string with the number of [Dims(2)] keys like
'%g '. Remember that Matlabs printf function inserts the values
columnwise and the dimensions of [Ax] do not correlate directly with
[Dims].
For reading an additional reshaping is needed:
Ax = fscanf(FID, '%g', prod(Dims))
Ay = transpose(reshape(Ax, prod(Dims(2:end)), Dims(1)));
A = reshape(Ay, Dims);
The in PutASC and GetASC implemented algorithms differ from these
demonstrations.
For multi-dimensional string lists the proceeding is much easier: There
is no reason to keep anything together, so they are saved columnwise.
This equals the representation in the memory in Matlab.
Example: D is a [2 x 3 x 4] array containing the numbers from 1 to 24
in columnwise order:
D(1,1,1) = 1; D(2,1,1) = 2; D(1,2,1) = 3 and so on.
In the ASC-HD file this is written as:
[D]:2:3:4
1 3 5
7 9 11
13 15 17
19 21 23
2 4 6
8 10 12
14 16 18
20 22 24
Number of columns is the 2nd dimension 3, number of lines is the product
of all other dimensions 2*4 = 8.
Examples: strings
A single string is a row vector of characters like 'abc'. It has the
dimension [1 x 3], but the number of columns is determined by the
number of characters per line. The variable is called 'D':
[D]$
abc
This is equivalent to:
[D]$1
abc
A character array of size [2 x 6] named 'E' with contents
['Du '; 'hier '], spaces shown
as dots:
[E]$2
Du....
hier..
Pay attention for the 2nd dimension: The length of each line determines
the row dimension! Therefore $N, $1$N and $N$1 are treated identically
in opposite to the other types.
Examples: string lists
The string list {'Du'; 'hier'} as string list has size [2 x 1] and
the lengths of the single strings can vary (see Matlab CELLSTR):
[E]&2&1
Du
hier
The string list with size [1 x 2] can have abbreviated dimensions again:
[F]&2
Du
hier
is equivalent to:
[F]&1&2
Du
hier
Spaces and empty strings are allowed in string lists. To illustrate
that each line of the example is surrounded by > and <.
G = {'', ' ', 'Hello '} in ASC-HD style:
>[G]&3<
><
> <
>Hello <
The strings of multi-dimensional string lists are written in columnwise
order: For list H of size [2 x 3 x 4] the string H(1,1,1) is written at
first, then H(2,1,1) follows and the next is H(1,2,1) and so on. This
is much straighter than the proceeding for multi-dimensional doubles,
because there is no need to join several elements in single lines.
Examples: Empty arrays
Empty arrays of size [0 x 0] get a single 0 dimension: 'A' is an empty
double, 'B' an empty string and 'C' and empty string list, then the tag
lines looks as following:
[A]:0
[B]$0
[C]&0
No lines with values appear in these cases. Empty arrays with any
dimension differing from 0 are stored with the complete dimensions:
'A' is a double of size [0 x 2 x 3], then:
[A]:0:2:3
and no values appear.
Comments and empty lines
The tag line can be commented with the '#' character. Spaces after the
dimensions section are ignored.
[TagName]:2:3 # This is the comment
Empty lines can be inserted before a tag line:
[Tag1]$
String1
[Tag2]&
StringListElement1
Header line
Old style: In ASC-HD v2.0 the header line looks as following:
#!ASCII v2.0 GaitLabs Heidelberg Standard
or with header 'Specific Header' defined in inputs of PutASC:
#!ASCII v2.0: Specific header
The next version of the ASC-HD syntax is 4.0 (there is no 3.0!) valid
since fASC4.00:
The standard header line starts with a string equivalent to:
#!ASCII v4.0 ASC-HD [Digits 6]
This means that the file is an ASCII file of the version 4.0 in the
ASC-HD style. Double numbers are saved with 6 significant digits. No
individual header was defined.
For individual headers any string can be appended to this standard
header using a colon ':' as separator:
#!ASCII v4.0 ASC-HD [Digits 6]:Individual part (23-Apr-2006)...
Then GetASC replies the output Header:
'Individual part (23-Apr-2006)...'
If new variables are appended to an existing file, the number of digits
must not be decreased to avoid a loss of data. Be aware that management
of data precision is a critical drawback when using ASCII files.
Comments
Disk space is cheap today. Therefore it is recommended to use
'complete' data files: Among the data enough information to allow a
re-creation of the contents. Especially for ASC-HD files, which are
designed with respect to long-time accessability and for exchange with
other labs, completeness and reproducibility is the base scientific
data acquisition. So some meta-variables are good ideas: Date and time
of creation, host name, institution, properties of measurement and
computation, examined subject or object - or at least a pointer to a
location where these information could be found.
Guidelines for implementations of reading and writing routines
Line breaks deviating from the preferred DOS style should be accepted.
Before appending new variables to a file, check the trailing line
break. It is a beloved feature to extract single variables from am
ASC-HD file, but avoid dull searching for '[' to locate tag lines: it
can be data of a string or string list! Users will insert errors in
ASCII files whenever possible. A fair reading function counts the found
elements and displays the name of the variable and the line number of
problems. There is no idea how to treat characters with more then 7
bits. Think of an additional unicode type, the TeX like ["a] for the
german Umlaut-a, the ISO-Latin-1 [ä]