Decode zs2 files generated by Zwick Universal Testing Machines

zs2decode reads Zwick Roell zs2 and zp2 file formats. It can generate XML files for convenient extraction of measurements and meta data. The proprietory zs2 file format is generated by testXpert II, the control software of Zwick Roell Materials Testing Machines.

zs2 files contain data in a compressed, binary-encoded XML-like structure. The binary data stream is comprised of chunks similar to PNG files.

zs2decode is a Python (2.7, 3.3, 3.4, 3.5, 3.6) implementation of a zs2 file decoder. It can be obtained from the GitHub repository. In order to convert a zs2 file into XML, the following script can be used:

import zs2decode.parser
import zs2decode.util

zs2_file_name = 'my_data_file.zs2'
xml_output_file = 'my_data_file.xml'

# load and decompress file
data_stream = zs2decode.parser.load(zs2_file_name)
# separate binary data stream into chunks
raw_chunks = zs2decode.parser.data_stream_to_chunks(data_stream)
# convert binary chunk data into lists of Python objects
chunks = zs2decode.parser.parse_chunks(raw_chunks)
# output as XML file
with open(xml_output_file, 'wb') as f:
    f.write( zs2decode.util.chunks_to_XML(chunks) )

The output of zs2decode can be used to extract meta data, acquision parameters and measurements. An example script is provided to extract raw measurements from the XML output.

The zs2 file format specification is not distributed publicly. All information in this document comes from reverse-engineering of files written by testXpert II version 3.1 and user feedback on the github issue tracker. This project is not associated with Zwick Roell.

The zs2 file format is unrelated to the zs file format for compressed sets of Nathaniel J. Smith.

Table of Contents

The zs2 file format

File structure

Compression

zs2 files are gzip-compressed binary files. The first step to decoding a file is to unpack the zs2 file with a utility program such as gunzip or 7-zip, or opening the file using the Python module gzip. This results in a binary data stream.

gzip compression
  data stream  

Data stream structure

The data stream starts with a header signature and is followed by chunks. A typical zs2 file contains over 100,000 chunks.

Header (4 bytes)
Chunk 1
Chunk 2
...
Chunk n
Byte order

The byte order of the binary file is little-endian. For example, a 32-bit representation of the integer value 1 in the data stream would be 0x01 0x00 0x00 0x00.

Chunks

Chunks contain information on data stream structure, metadata or data.

There are two different chunk formats. One format is used by the “End-of-Section” chunk, the other format is used by all other chunks. The latter has the following structure: an ASCII-encoded name of the chunk type, starting with one byte length giving the length of the name in bytes (Section ASCII strings), followed by the data type of the particular chunk (Section Data type codes) and actual chunk data. In contrast, the End-of-Section chunk is simply a single byte of value 0xFF. Both chunk structures can be discriminated between because chunk type names do not start with 0xFF.

The following chunk structures are possible:

Chunk type name Data type code Chunk data
ASCII-encoded, at least 2 bytes 1 byte 1 or more bytes

or, for the “End-of-Section” chunk,

End of Section      
0xFF      

The total length of the chunk can be anywhere from 1 byte upward. In particular, the total chunk length is generally not a multiple of 2 or 4 bytes.

Note

Less than 5% of the existing chunk types have no data type.

An example of a chunk with chunk type name ID would be:

Chunk type name Data type Chunk data
Length Name
2 I D 0x66 48154
1 2 3 4 5 6
0x02 0x49 0x44 0x66 0x1A 0xBC

Chunk type naming

Chunk type names are Pascal ShortString-style ASCII strings defined in Section ASCII strings. Chunk types names are chosen to be readable ASCII text, comprising mostly of underscore (_), digits 0 to 9, and English letters A to Z and a to z. Very few chunk type names include other characters. The chunk type name length is limited to 254 characters since an indicated length of 255 (0xFF) represents an “End-of-Section” chunk. Also, chunk type names of length 0 (0x00) do not exist.

The three chunk types Key, Elem, and Val represent list items. Digits are used at the end of chunk type names to enumerate the list items. Within each list, numbers are consecutive in decimal format, starting with zero. For example, the list element Elem0 will be followed by Elem1. Elem9 will be followed by Elem10 etc. If a list has only one entry, the number will be zero (e.g. Key0).

Note

By convention, most chunk type names start with a capital letter A to Z and use CamelCase spelling for compound words (i.e., approximately 95% of all chunk type names). Names are derived from either English or German language. The shortest chunk type names are x, y, X, and Y. The longest chunk type name is AssignmentBetweenOrganizationDataAndTestProgramParamIds at 55 characters.

Chunk type names with special characters are rare. Those names may start with nt&)m_ prepended to a common CamelCase name, e.g. nt&)m_CompressionType.

Order of chunks

The order of some chunks is significant as they can establish a partitioning into sections (chunks of data type 0xDD start a section that corresponding “End-of-Section” chunks end), chunk lists (starting with the Count chunk), or key-value assignment (Key chunks immerdiately preceeding an Elem chunk). Beyond that, chunk order seems to be free but follows predictable, machine-generated patterns.

Note

The actual degree of flexibility in chunk ordering is defined by the implementation of the textXpert II parser, which is not known.

End-of-Section chunks

“End-of-Section” chunks contain only one byte, 0xFF. They can be discriminated from regular chunks in that chunk type names of length 255 (0xFF) do not exist. End-of-Section chunks terminate the most recent section started by a 0xDD chunk.

End of data stream

The end of the data stream is marked by the “End-of-Section” chunk that terminates the root section of the data stream (the first chunk in the data stream is of type 0xDD).

Data type codes

The 1-byte data type code determines type and, in most cases, the length of the chunk data section in bytes. A chunk type may appear with different data codes throughout the data stream. The following type codes exist:

Data type code Length of chunk data Type of data
0x11 4 Integer [1]
0x22 4 Unsigned integer: value
0x33 4 Signed integer: coordinates
0x44 4 Unsigned integer: flag, color code
0x55 2 Integer [1]
0x66 2 Integer [1]
0x88 1 Unsigned byte: type code
0x99 1 Boolean: 0=False, 1=True
0xAA at least 4 Unicode string [2]
0xBB 4 Single precision floating point number
0xCC 8 Double precision floating point number
0xDD at least 1 Document section start [3]
0xEE at least 6 List of data [2]

Data types 0x00, 0x77, and 0xFF do not appear.

[1](1, 2, 3) The interpretation of integers of data type codes 0x11, 0x55 and 0x66 depends on context. They may be either signed or unsigned, depending on the chunk type rather than the data type code. Data type code 0x11 is used for a range of purposes, including color codes (which would typically be interpreted as unsigned hexadecimal values) and flags of value 0xffffffff (which would typically be written as signed -1 rather than unsigned 4294967295).
[2](1, 2) The length of the chunk data field for data types 0xAA and 0xEE is encoded as part of the chunk data. See also Section Lists of data.
[3]Data type 0xDD indicates that a chunk marks the beginning of a structural or logical section. The length of the chunk data field is encoded as part of the chunk data. Chunk data contain an ASCII-encoded section descriptor that may be empty (see Section ASCII strings).

Chunk data

Data values

The chunk data section of all data types except 0xAA, 0xDD, and 0xEE contains one numerical or boolean value.

In multi-byte data sections, data are arranged LSB to MSB and interpreted according to the table on data type codes.

Data structures

All variable-length structures are stored following a common pattern. There are three types of variable-length data structures,

  • ASCII strings,
  • lists, and
  • unicode strings.

Each of them is preceeded by the length of the structure in multiples of the units they contain. For example, unicode strings will be preceeded by the number of logical characters rather than bytes, and lists will be preceeded by the number of entries in the list. (List entries are either numbers, strings, or n-tuples.) As a result, empty lists and empty strings are represented by a length indicator of 0.

ASCII strings

ASCII-encoded strings are not intended to be printed to the user but help stucture the document. They appear at two places: the chunk type name, and the section descriptor in chunks of data type 0xDD.

ASCII string
Length Characters
0 1 ... n
n first ... last

Chunk type names are at least one character in length while empty ASCII strings may appear as section descriptors.

Empty ASCII string
Length Characters
0      
0x00      
Lists of data

Chunk data of variable length are always encoded in a particular lists format. Lists start with an indication of the number of items in the list. This list length is encoded as 4-byte integer and may be 0 if no list items follow. Bit 31 of the list length is 0 as this bit is used as a marker for strings. Hence, lists can have up to 2,147,483,647 entries. The list length parameter is followed by exactly the number of list items specified. All list items have the same data type. List items may be n-tuples with constituents comprising different data types.

Example of an empty list:

Number of items in the list
0
1 2 3 4
0x00 0x00 0x00 0x00

Example of a list containing 2 single-precision floating point numbers, 10.1 and 1.0:

Number of items in the list Single-precision float Single-precision float
2 10.1 1.0
1 2 3 4 5 6 7 8 9 10 11 12
0x02 0x00 0x00 0x00 0x9A 0x99 0x21 0x41 0x00 0x00 0x80 0x3F

Example of a list of 2 tuples that combine a 4-byte integer with a single-precision floating point number, (1, 10.1) and (2, 1.0):

Number of items Tuple 1 Tuple 2
2 1 10.1 2 1.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0x02 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x9A 0x99 0x21 0x41 0x02 0x00 0x00 0x00 0x00 0x00 0x80 0x3F
Unicode strings

All characters and strings intended to de displayed to humans are encoded in unicode UCS-2/UTF-16 format. Each character unit is two 2 bytes long. Strings are lists 2-byte long elements with bit 31 of the list length set to 1 (“bit-31 marker”).

For example, the Norwegian interjection Skål would be represented as

String length with bit-31 marker S k å l
1 2 3 4 5 6 7 8 9 10 11 12
0x04 0x00 0x00 0x80 0x53 0x00 0x6B 0x00 0xE5 0x00 0x6C 0x00

Data type 0xAA

Chunk data of chunks with data type 0xAA contain exactly one unicode string (see Section Lists of data). For example, data type code and chunk data of the string “Hi” would be:

Data type Chunk Data
String length with bit-31 marker H i
0 1 2 3 4 5 6 7 8
0xAA 0x02 0x00 0x00 0x80 0x48 0x00 0x69 0x00

Data type 0xDD

Chunks of type 0xDD start a structural section that is ended by a corresponding End-of-Section chunk. The chunk data contain exactly one ASCII-encoded string that serves as a section descriptor. For example, data type code and section desciptor “Hi” would be:

Data

type

Chunk data
Length H i
0 1 2 3
0xDD 0x02 0x48 0x69

Without section descriptor, data type code and chunk data would be:

Data type Chunk data
Length
0 1
0xDD 0x00

Data type 0xEE

Chunk data of type 0xEE contain one list. The chunk data start with a 2-byte long header that specifies the type of data in the array, followed by a list as defined in Section Lists of data.

There are at least five different list data types defined as part of data type 0xEE, which are 0x0000, 0x0004, 0x0005, 0x0011, and 0x0016.

Data type Sub-type Byte-length of elements Type of list elements
0xEE 0x0000 n/a n/a: empty list
0xEE 0x0004 4 single-precision floating point
0xEE 0x0005 8 double-precision floating point
0xEE 0x0011 1 bytes of structured data record
0xEE 0x0016 4 integer or boolean

The byte-list of sub-type 0x0011 is a wrapper for a mixed-type data record whose interpretation depends on the chunk type (see Section Chunk type-specific data structures). This sub-type is used by the ZIMT script for measurement parameters and settings, and to store the event audit log.

Sub-types 0x0004 and 0x0005 are used to store measurement time series recorded by the testing machine.

Placeholder lists have sub-type 0x0000, followed by an empty list.

Sub-type 0x0016 seems to be used only to hold boolean values, with 0x00000000 and 0x00000001 representing False and True, respectively.

For example, data type code and chunk data of a list of sub-type 0x0016, representing a list with one integer element of value 0x12345678, would be:

Data type Chunk Data
Sub-type Number of list entries List element
0 1 2 3 4 5 6 7 8 9 10
0xEE 0x16 0x00 0x01 0x00 0x00 0x00 0x78 0x56 0x34 0x12

Chunk lists

Chunk lists are elements of the document structure. They consist of a chunk of type Count specifying the number of items in the chunk list, followed by a succession of exactly that number of list items. Chunk lists can be nested.

The three chunk types Key, Elem, and Val represent list items. They end always on an ordinal number in decimal representation (see Section Chunk type naming), i.e., 0 in the example in the table:

Chunk type name Use
Key0 Singular list item with information stored in chunk data of Key0. This chunk may immediately preceede an Elem chunk of the same enumeration (i.e., Elem0 in this case).
Elem0 Singular list item with information stored in chunk data of Elem0, or marker of the beginning of a list item with information stored in subsequent chunks (data type 0xDD).
Val0 Singular list item, information is stored in chunk data of Val0.

The Count chunk is preceeded by a structural chunk of data type 0xDD that indicates the type of content or purpose of the list. That preceeding chunk type does not need to be unique in the data stream.

Chunk type-specific data structures

Chunks with data of type 0xEE and sub-type 0x0011 contain data organized as a record.

Chunk type-specific data structures are used by the event audit system to store the event log, and by the ZIMT scripting language to store properties and parameters of variables. This record format appears to be indicated by one or more values in record data (one of them being the first byte of the record data). The record formats of the chunk types given here are guesses.

Data type Chunk data
Sub-type Byte length of and record data Record data
0 1 2 3 4 5 6 7 8 ... n
0xEE 0x11 0x00 LSB     MSB        

ZIMT parameter and property chunks

The record data of the chunks with data type 0xEE and sub-type 0x0011 is described in the tables below. For the sake of brevity, the following definition of record elements will be used:

byte
1 byte
byte (boolean)
1 byte with values limited to 0x00 and 0x01
word
2-byte integer
long
4-byte integer
single
4-byte single-precision float
double
8-byte double-precision float
string
variable-length unicode string (see Unicode strings)
list
variable-length data list (see Lists of data)
tuple
ordered group of values

ZIMT property chunks

Chunk type name Record data
Count Type
QS_Par 1 byte 0x01
1 byte (boolean)
2 byte
1 byte (boolean)
Chunk type name Record data
Number Type
QS_ValPar 1 byte 0x01
1 double
1 string
1 word
9 byte 0x00
 
Record contains value, unit, and ID.
Chunk type name Record data
Number Type
QS_TextPar 1 byte 0x01
4 string
 
String 2 contains language information,
strings 3 and 4 are empty.
Chunk type name Record data
Number Type
QS_SelPar 1 byte 0x02
1 long
1 list of longs
4 string
 
The first long is 0xFFFFFFFF if the list is not empty.
String 2 contains language information,
strings 3 and 4 are empty.
Chunk type name Record data
Number Type
QS_ValArrPar 1 byte 0x02
1 string
1 word
1 byte, usually 0x00
1 list of longs
 
The word contains an ID value.
Chunk type name Record data
Number Type
QS_ValArrParElem 1 byte 0x02
1 list of tuples of type (long, double)
 
Tuples are pairs of index and value.
Chunk type name Record data
Number Type
QS_ArrPar 1 byte 0x02
1 list of longs
1 byte

ZIMT parameter chunks

Chunk type name Record data
Number Type
QS_ParProp 1 byte 0x07
9 byte (boolean)
1 word
9 string
3 word
5 string
1 long 0x00000000
2 word
1 byte
1 string
4 byte (boolean)
 
Bytes 6 and 9 seem to always be 0x00.
The 3 words are 0x0000, 0xFFFF, 0xFFFF.
The last 4 bytes are 0x00, 0x01, 0x00, 0x01.

or

Chunk type name Record data
Number Type
QS_ParProp 1 byte 0x07
9 byte (boolean)
1 word
9 string
3 word
5 string
1 long 0x00000002
2 word
1 byte
1 long
1 string
4 byte (boolean)
 
The last 4 bytes are 0x00, 0x01, 0x00, 0x01.
Chunk type name Record data
Number Type
QS_ValProp 1 byte 0x01
1 byte (boolean)
2 byte
1 byte (boolean)
Chunk type name Record data
Number Type
QS_TextProp 1 byte 0x01
4 byte
4 byte (boolean)
 
The last byte is 0x01
Chunk type name Record data
Number Type
QS_SelProp 1 byte 0x04
3 byte (values)
1 list of 4 strings
1 list of 4 strings
1 list of strings
1 list of strings
1 list of words
1 list of longs
1 list of strings
 
Record data may end after the first three bytes.
If present, all lists are of the same length.
Chunk type name Record data
Number Type
QS_ValArrParProp 1 byte 0x02
4 byte
1 word
4 byte
Chunk type name Record data
Number Type
QS_SkalProp 1 byte 0x02
2 string
2 byte (boolean)
 
First string may contain a ZIMT script.
The booleans seem to indicate validity of the respective strings.
Chunk type name Record data
Number Type
QS_ValSetting 1 byte 0x02
2 string
1 long
1 string
3 byte
1 word
2 byte
1 list of words
1 list of strings
1 byte
10 byte
 
The leading strings are usually empty.
The long is small-valued.
The word is either 0x0000 or 0xFFFF.
If not empty, the list of words contains ID values.
If not empty, the last string contains a variable name.
Chunk type name Record data
Number Type
QS_NumFmt 1 byte 0x02
4 byte
1 double
 
The value of the double float is usually 0.1.
Chunk type name Record data
Number Type
QS_Plaus 1 byte 0x01
9 byte, usually 0x00
6 byte, usually 0xFF or 0x00
1 word, usually 0xFFFE or 0x0000
6 byte, usually 0xFF or 0x00
1 word, usually 0x7FFE or 0x00
6 byte, usually 0x00
 
Note that data in this chunk differ from QS_Tol
only in length.
Chunk type name Record data
Number Type
QS_Tol 1 byte 0x01
9 byte, usually 0x00
6 byte, usually 0xFF or 0x00
1 word, usually 0xFFFE or 0x0000
6 byte, usually 0xFF or 0x00
1 word, usually 0x7FFE or 0x00
3 byte, usually 0x00
 
Note that data in this chunk differ from QS_Plaus
only in length.

Event audit chunk

The event audit log is stored in a chunk type with name Entry. The description below represents the parsing algorithm used before version 0.3.0. In the current implementation, the chunk is parsed heuristically as bytes and strings.

[START OBSOLETE DESCRIPTION]

The first byte of the record (i.e., format code) corresponding to the description here is 0x02. A large number of Entry–Record-Format-Codes (ERFC) and associated records are defined. However, it appears to be possible to split the record data into its constituents without interpreting the format code explicitly. The procedure is described in the Section Parsing algorithm.

In addition to strings, the following prefixed data types are defined that are specific to Entry chunks:

Prefix Data block Total length of data type (bytes)
Length (bytes) Interpretation
0x07 8 1 double 9
0x64 4 1 long 5
0x01 4 4 bytes 5
0x04 1 1 byte 2

Data type and chunk data of an Entry chunk start as follows:

Data type Chunk data
Sub-type Byte length of format code and record data Record data
Format ERFC 3-tuple String
0 1 2 3 4 5 6 7 8 9 10 11 ...
0xEE 0x11 0x00 LSB     MSB 0x02          

Parsing algorithm

The following algorithm appears to be able to parse record data data into a list, regardless of record format code. The algorithm is completely heuristic and is able to extract a lot of meaningful information. However, it should be replaced with an algorithm evaluating the ERFC code.

  1. Go to start of record.
  2. Read and output ERFC byte.
  3. Interpret next 3 bytes as 3-tuple and output.
  4. While there are bytes left to parse:
  1. If string follows: interpret and output string, continue at 3.
  2. If the next byte belongs to a prefixed data type and another prefixed data type or string follows the current data block: interpret prefixed data type and output, continue at 3.
  3. If another prefixed data type or string follows 4 bytes later: interpret 4 bytes as 2 words and output, continue at 3.
  4. If another prefixed data type or string follows 2 bytes later: output 2 bytes, continue at 3.
  5. Output next byte, continue at 3.

The test for follow-up prefixed data type or string needs to verify that either the end of the string is reached or

  1. that the following data starts with a prefix defined for prefixed data types or with a string length followed by 0x00 0x80, indicating strings, and
  2. that the following number of bytes is sufficient to hold the entire prefixed data type or string.

The purpose of the follow-up test is to prevent the detection of spurious unicode string markers LSB MSB 0x00 0x80 in the binary prepresentation of double-precision floating point numbers.

Interpretation

Each Entry record begins with a common header, followed by a detailed, entry-specific record. The common header contains the following entries:

  1. Entry-record-format-code
  2. 3-tuple
  3. User name currently logged into the system
  4. Time in seconds, possibly since loading/saving a file.
  5. An ID (always the same)
  6. Empty string
  7. Another ID (always the same)
  8. The value 0
  9. A string giving a human-readable, brief description of the event
  10. Internal string describing the originator of the event

[END OBSOLETE DESCRIPTION]

Indices and tables