Copyright (c) Hyperion Entertainment and contributors.
MTRX IFF Matrix Data Storage
Jump to navigation
Jump to search
MTRX
Numerical data storage (MathVision - Seven Seas) MTRX FORM, for matrix data storage 19-July-1990 Submitted by: Doug Houck Seven Seas Software (address, etc) INTRODUCTION: Numerical data, as it comes from the real world, is an ill-mannered beast. Often much is assumed about the data, such as the number of dimensions, formatting, compression, limits, and sizes. As such, data is not portable. The MTRX FORM will both store the data, and completely describe its format, such that programs no longer need to guess the parameters of a data file. There needs to be but one program to read ascii files and output MTRX IFF files. A matrix, by our definition, is composed of three types of things. Firstly, the atomic data, such as an integer, or floating point number. Secondly, arrays, which are simply lists of things which are all the same. Thirdly, structures, which are lists of things which are different. Both arrays and structures may be composed of things besides atomic data - they may contain other structures and arrays as well. This concept of nesting structures may be repeated to any desired depth. For example, a list of data pairs could be encoded as an array of structures, where each structure contains two numbers. A two-dimensional array is simply an array of arrays. Since space conservation is often desirable, there is provision for representing each number with fewer bits, and compressing the bits together. CHUNKS The MTRX FORM is composed of the definition of the structure, followed by the BODY which contains the data which is defined. Usually, there is only one set of data, but a smarter IFF read could use the definition as a PROPerty, with identically formatted data sets (BODYs) in a LIST. FORM MTRX definition (ARRY | STRU | DTYP) BODY ARRY: The array chunk defines a counted list of similar items. The first (required) chunk in an ARRY is ELEM, which gives the number of elements in the array. Optionally, there may be limits given, (LOWR and UPPR), which could be used in scaling during sampling of the data. Lastly is the definition of an element of the array, which may be a nested definition like everything else. ARRY ::= "ARRY" #{ ELEM [LOWR] [UPPR] [PACK] ARRY|STRU|DTYP } STRU: The structure chunk defines a counted list of dissimilar things. The first (required) chunk in a STRU is FLDS, which gives the number of fields in the structure. Lastly are definitions of each field in the structure. Again, each field may have a nested definition like everything else. STRU ::= "STRU" #{ FLDS ([PACK] ARRY|STRU|DTYP)* } VALU: The value contains a datatype, and then a constant of that type. The datatype contains the size of the constant, so this chunk has variable size. VALU is used in the ARRY chunk to give the scaling limits of the array. BODY: This is the actual data we went to so much effort to describe. It is stored in "row-first" format, that is, items at the bottom of the nested description are stored next to each other. In most cases, it should be sufficient to simply block-read the whole chunk from disk, unless the reader needs to adjust byte-ordering or store in a more time-efficient format in memory. Data is assumed to be byte-aligned. PACK: The PACK chunk is necessary when the bit length of the data is not a multiple of 8, that is, not byte-aligned, and the user wishes to conserve space by packing data items together. PACK is simply a number - the number of items to bit-pack before aligning on a byte. A PACK is in effect for the remainder of its nested scope, or until overridden by a new specification. A STRU or ARRY is assumed to have a PACK of 1 by default - it is not affected by PACKs in definitions above. A PACK of 0 means to byte-align before processing the next definition. The PACK specifier should be normalized. For example, when packing a large array of 3-bit numbers, PACK should be 8 since 3*8 = 24. In this case 8 is the smallest PACK number which aligns on a byte naturally. DTYP: The DataType is the most interesting chunk, as it attempts to define every conceivable type of numeric data with 32 bits. The 32 bits are broken down into three fields, 1) the size in bits, 2) the Class, and 3) SubClass. The Class makes the most major distinction, separating integers from floating point numbers from Binary Coded Decimal and etc. Within each class is a SubClass, which gives the specific encoding used. Finally, the Size tells what how much room the data occupies. The basic division of datatypes is given in the tree structure below. Class SubClass Size Final Specific Type ===== ======== ==== =================== | Binary Unsigned - 0 ------------ 8 UByte | 16 UWord | 32 ULong | Binary Signed --- 0 ------------ 8 Byte | 16 Word | 32 ULong | Real ------------Ieee38 -------- 32 Ieee Single Precision | | | Ieee308 ------- 64 Double Precision | | 32 Truncated Double Precision | | | FFP ----------- 32 Motorola Fast Floating Point | Text ----------- Text0 --------- ?? Null-terminated text | | | CText --------- ?? Number of characters in first byte | | | FText --------- ?? Fixed length, space padded | BCD ------------ Nibble -------- ?? | Character ----- ?? A design goal was to create a classification system which other people can easily plug into. Many data types are simply size variations on existing data types. For example, a 4-bit integer can be specified by giving the size as four bits in the Signed Binary class. Be aware that not all MTRX readers may support your new type, but there will not be any type clashes or ambiguities by following these rules. If you have a truly unique Class or SubClass, you will need to register it with Commodore to prevent clashes. A second design goal was to create a format which is easily decoded by software. By aligning fields on bytes, you have the option of redefining the datatype as a structure, so as to avoid shifting when accessing the fields. Since the numbers are sequentially assigned, they are suitable as array indicies, and may be optimized in a C switch statement. A third design goal was allowing for naive and sophisticated readers. In checking for a certain datatype, a naive reader can simply compare the whole datatype with a small set of known types, which assumes that each different Size defines a unique datatype. Sophisticated readers will consider the Class, SubClass and Size separately, so as to support arbitrary size integers, and truncated Floating Point numbers, for example. * * MTRX ::= "FORM" #{ "MTRX" ARRY|STRU|DTYP BODY } Matrix * ARRY ::= "ARRY" #{ ELEM [LOWR] [UPPR] [PACK] ARRY|STRU|DTYP } Array * STRU ::= "STRU" #{ FLDS ([PACK] ARRY|STRU|DTYP)* } Structure * ELEM ::= "ELEM" #{ elements } Array elements * LOWR ::= "LOWR" { VALU } Minimum limit * UPPR ::= "UPPR" { VALU } Maximum limit * VALU ::= #{ dtyp value } Value (in union) * dtyp ::= { size, subclass, class } Data Type (scalar) * DTYP ::= "DTYP" #{ dtyp } * FLDS ::= "FLDS" #{ number of fields } Number of Fields * PACK ::= "PACK" #{ units packed b4 byte alignment } Packing * BODY ::= "BODY" #{ inner-first binary dump } Data * * [] means optional * # means the size of the unit following * * means one or more of *