Comparison of data-serialization formats
This is a comparison of data-serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.
Overview
- a. The current default format is binary.
- b. The "classic" format is plain text, and an XML format is also supported.
- c. Theoretically possible due to abstraction, but no implementation is included.
- d. The primary format is binary, but a text format is available.
- e. Means that generic tools/libraries know how to encode, decode, and dereference a reference to another piece of data in the same document. A tool may require the IDL file, but no more. Excludes custom, non-standardized referencing techniques.
- f. ASN.1 does offer OIDs, a standard format for globally unique identifiers, as well as a standard notation for referencing a component of a value. Thus it would be possible to reference a component of an encoded value present in a document by combining an OID and an "absolute reference" to the component of the value. However, there is no standard way to indicate that a field contains such an absolute reference. Therefore, a generic ASN.1 tool/library cannot automatically encode/decode/resolve references within a document without help from custom-written program code.
- g. VelocyPack offers a value type to store pointers to other VPack items. It is allowed if the VPack data resides in memory, but not if stored on disk or sent over a network.
- h. The primary format is binary, but a text format is available.
- i. The primary format is binary, but text and json formats are available.
- j. The primary format is binary, a json encoder is available.
Syntax comparison of human-readable formats
- a. Omitted XML elements are commonly decoded by XML data binding tools as NULLs. Shown here is another possible encoding; XML schema does not define an encoding for this datatype.
- b. The RFC CSV specification only deals with delimiters, newlines, and quote characters; it does not directly deal with serializing programming data structures.
- c. The netstrings specification only deals with nested byte strings; anything else is outside the scope of the specification.
- d. PHP will unserialize any floating-point number correctly, but will serialize them to their full decimal expansion. For example, 3.14 will be serialized to 3.140000000000000124344978758017532527446746826171875.
- e. XML data bindings and SOAP serialization tools provide type-safe XML serialization of programming data structures into XML. Shown are XML values that can be placed in XML elements and attributes.
- f. This syntax is not compatible with the Internet-Draft, but is used by some dialects of Lisp.
Comparison of binary formats
Format | Null | Booleans | Integer | Floating-point | String | Array | Associative array/Object |
ASN.1 | NULL type | BOOLEAN:
| INTEGER: | REAL:base-10 real values are represented as character strings in ISO 6093 format; binary real values are represented in a binary format that includes the mantissa, the base, and the exponent; the special values NaN, -INF, +INF, and negative zero are also supported | Multiple valid types | data specifications SET OF and SEQUENCE OF | user definable type |
Binn | \x00 | True: \x01 False: \x02 | big-endian 2's complement signed and unsigned 8/16/32/64 bits | single: big-endian binary32 double: big-endian binary64 | UTF-8 encoded, null terminated, preceded by int8 or int32 string length in bytes | Typecode + 1-4 bytes size + 1-4 bytes items count + list items | Typecode + 1-4 bytes size + 1-4 bytes items count + key/value pairs |
BSON | \x0A | True: \x08\x01 False: \x08\x00 | int32: 32-bit little-endian 2's complement or int64: 64-bit little-endian 2's complement | double: little-endian binary64 | UTF-8 encoded, preceded by int32 encoded string length in bytes | BSON embedded document with numeric keys | BSON embedded document |
Concise Binary Object Representation | \xf6 | True: \xf5 False: \xf4 | Small positive/negative \x00-\x17 & \x20-\x37 8-bit: positive \x18 , negative \x38 16-bit: positive \x19 , negative \x39 32-bit: positive \x1A , negative \x3A 64-bit: positive \x1B , negative \x3B Negative x encoded as | IEEE half/single/double \xf9 - \xfb Decimals and bigfloats encoded as \xc4 tag + 2-item array of integer mantissa & exponent | Length and content Bytestring \x40 - \x5f UTF-8 \x60 - \x7f Indefinite partial strings \x5f and \x7f stitched together until \xff . | Length and items \x80 - \x9e Indefinite list \x9f terminated by \xff entry. | Length and items \xa0 - \xbe Indefinite map \xbf terminated by \xff key. |
Efficient XML Interchange | xsi:nil is not allowed in binary context | 1-2 bit integer interpreted as boolean. | Boolean sign, plus arbitrary length 7-bit octets, parsed until most-significant bit is 0, in little-endian. The schema can set the zero-point to any arbitrary number. Unsigned skips the boolean flag. | Float: integer mantissa and integer exponent. Decimal: boolean sign, integer whole value, integer fractional | Length prefixed Integer-encoded Unicode. Integers may represent enumerations or string table entries instead. | Length prefixed set of items. | Not in protocol. |
FlatBuffers | Encoded as absence of field in parent object | True: one byte \x01 False: \x00 | little-endian 2's complement signed and unsigned 8/16/32/64 bits | floats: little-endian binary32 doubles: little-endian binary64 | UTF-8 encoded, preceded by 32 bit integer length of string in bytes | Vectors of any other type, preceded by 32 bit integer length of number of elements | Tables or Vectors sorted by key |
Ion | \x0f | True: \x11 False: \x10 | positive \x2x , negative \x3x Zero is always encoded in tag byte BigInts over 13 bytes have 1+ byte overhead for length | \x44 \x48 Zero is always encoded in tag byte | UTF-8: \x8x Other strings: \x9x Arbitrary length and overhead | \xbx Arbitrary length and overhead. Length in octets. | Structs : \xdx Annotations : \xex |
MessagePack | \xc0 | True: \xc3 False: \xc2 | Single byte "fixnum" or typecode + big-endian int8/16/32/64 | Typecode + IEEE single/double | Typecode + up to 15 bytes or typecode + length as uint8/16/32 + bytes; encoding is unspecified | As "fixarray" or typecode + 2–4 bytes length + array items | As "fixmap" or typecode + 2–4 bytes length + key-value pairs |
Netstrings | Not in protocol. | Not in protocol. | Not in protocol. | Length encoded as an ASCII string + ':' + data + ',' Length counts only octets between ':' and ',' | Not in protocol. | Not in protocol. | Not in protocol. |
OGDL Binary | |||||||
Property list | |||||||
Protocol Buffers | Variable encoding length signed 32-bit: varint encoding of "ZigZag"-encoded value XOR Variable encoding length signed 64-bit: varint encoding of "ZigZag"-encoded XOR Constant encoding length 32-bit: 32 bits in little-endian 2's complement Constant encoding length 64-bit: 64 bits in little-endian 2's complement | floats: little-endian binary32 doubles: little-endian binary64 | UTF-8 encoded, preceded by varint-encoded integer length of string in bytes | Repeated value with the same tag or, for varint-encoded integers only, values packed contiguously and prefixed by tag and total byte length | |||
Recursive Length Prefix | Not in protocol.\x80 often used | Not in protocol. Integer 0/1 often used. | 0 - 127: \x00 - \x7f Other values: Strings of big-endian encoded bytes, of arbitrary length, beginning with \x80 - \xbf | Integer encodings may be interpreted as IEEE float. | Length prefixed, up to 55 bytes: \x80 - \xb7 followed by data.56+ bytes: \xb8 - \xbf followed by 1-8 byte integer length of string followed by data. | Length prefixed, up to 55 bytes: \xc0 - \xf7 followed by data.56+ bytes: \xf8 - \xff followed by 1-8 byte integer length of data followed by data.Length is always in bytes, not in list items. | Not in protocol. May be encoded as lists of key/value pair lists or other formats. |
Smile | \x21 | True: \x23 False: \x22 | Single byte "small", zigzag-encoded varint s, or BigInteger | IEEE single/double, BigDecimal | Length-prefixed "short" Strings, marker-terminated "long" Strings and back-references | Arbitrary-length heterogenous arrays with end-marker | Arbitrary-length key/value pairs with end-marker |
Structured Data eXchange Formats | big-endian signed 24-bit or 32-bit integer | big-endian IEEE double | either UTF-8 or ISO 8859-1 encoded | list of elements with identical ID and size, preceded by array header with int16 length | chunks can contain other chunks to arbitrary depth | ||
Thrift |