Comparison of data-serialization formats


This is a comparison of data-serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.

Overview

FormatNullBooleansIntegerFloating-pointStringArrayAssociative array/Object
ASN.1
NULL typeBOOLEAN:
INTEGER:
  • BER: variable-length big-endian binary representation ;
  • PER Unaligned: a fixed number of bits if the integer type has a finite range; a variable number of bits otherwise;
  • PER Aligned: a fixed number of bits if the integer type has a finite range and the size of the range is less than 65536; a variable number of octets otherwise;
  • OER: one, two, or four octets if the integer type has a finite range that fits in that number of octets; a variable number of octets otherwise
  • REAL:base-10 real values are represented as character strings in ISO 6093 format;
    binary real values are represented in a binary format that includes the mantissa, the base, and the exponent;
    the special values NaN, -INF, +INF, and negative zero are also supported
    Multiple valid types data specifications SET OF and SEQUENCE OF user definable type
    Binn\x00True: \x01
    False: \x02
    big-endian 2's complement signed and unsigned 8/16/32/64 bitssingle: big-endian binary32
    double: big-endian binary64
    UTF-8 encoded, null terminated, preceded by int8 or int32 string length in bytesTypecode + 1-4 bytes size + 1-4 bytes items count + list itemsTypecode + 1-4 bytes size + 1-4 bytes items count + key/value pairs
    BSON\x0A
    True: \x08\x01
    False: \x08\x00
    int32: 32-bit little-endian 2's complement or int64: 64-bit little-endian 2's complementdouble: little-endian binary64UTF-8 encoded, preceded by int32 encoded string length in bytesBSON embedded document with numeric keysBSON embedded document
    Concise Binary Object Representation \xf6
    True: \xf5
    False: \xf4
    Small positive/negative \x00-\x17 & \x20-\x37
    8-bit: positive \x18, negative \x38
    16-bit: positive \x19, negative \x39
    32-bit: positive \x1A, negative \x3A
    64-bit: positive \x1B, negative \x3B
    Negative x encoded as
    IEEE half/single/double \xf9 - \xfb
    Decimals and bigfloats encoded as \xc4 tag + 2-item array of integer mantissa & exponent
    Length and content
    Bytestring \x40 - \x5f
    UTF-8 \x60 - \x7f
    Indefinite partial strings \x5f and \x7f stitched together until \xff.
    Length and items \x80 - \x9e
    Indefinite list \x9f terminated by \xff entry.
    Length and items \xa0 - \xbe
    Indefinite map \xbf terminated by \xff key.
    Efficient XML Interchange
    xsi:nil is not allowed in binary context1-2 bit integer interpreted as boolean.Boolean sign, plus arbitrary length 7-bit octets, parsed until most-significant bit is 0, in little-endian. The schema can set the zero-point to any arbitrary number.
    Unsigned skips the boolean flag.
    Float: integer mantissa and integer exponent.
    Decimal: boolean sign, integer whole value, integer fractional
    Length prefixed Integer-encoded Unicode. Integers may represent enumerations or string table entries instead.Length prefixed set of items.Not in protocol.
    FlatBuffersEncoded as absence of field in parent objectTrue: one byte \x01
    False: \x00
    little-endian 2's complement signed and unsigned 8/16/32/64 bitsfloats: little-endian binary32
    doubles: little-endian binary64
    UTF-8 encoded, preceded by 32 bit integer length of string in bytesVectors of any other type, preceded by 32 bit integer length of number of elementsTables or Vectors sorted by key
    Ion\x0fTrue: \x11
    False: \x10
    positive \x2x, negative \x3x
    Zero is always encoded in tag byte
    BigInts over 13 bytes have 1+ byte overhead for length
    \x44
    \x48
    Zero is always encoded in tag byte
    UTF-8: \x8x
    Other strings: \x9x
    Arbitrary length and overhead
    \xbx
    Arbitrary length and overhead. Length in octets.
    Structs : \xdx
    Annotations : \xex
    MessagePack\xc0True: \xc3
    False: \xc2
    Single byte "fixnum"
    or
    typecode + big-endian int8/16/32/64
    Typecode + IEEE single/doubleTypecode + up to 15 bytes
    or
    typecode + length as uint8/16/32 + bytes;
    encoding is unspecified
    As "fixarray"
    or
    typecode + 2–4 bytes length + array items
    As "fixmap"
    or
    typecode + 2–4 bytes length + key-value pairs
    NetstringsNot in protocol.Not in protocol.Not in protocol.Length encoded as an ASCII string + ':' + data + ','
    Length counts only octets between ':' and ','
    Not in protocol.Not in protocol.Not in protocol.
    OGDL Binary
    Property list
    Protocol BuffersVariable encoding length signed 32-bit: varint encoding of "ZigZag"-encoded value XOR
    Variable encoding length signed 64-bit: varint encoding of "ZigZag"-encoded XOR
    Constant encoding length 32-bit: 32 bits in little-endian 2's complement
    Constant encoding length 64-bit: 64 bits in little-endian 2's complement
    floats: little-endian binary32
    doubles: little-endian binary64
    UTF-8 encoded, preceded by varint-encoded integer length of string in bytesRepeated value with the same tag
    or, for varint-encoded integers only, values packed contiguously and prefixed by tag and total byte length
    Recursive Length PrefixNot in protocol.
    \x80 often used
    Not in protocol.
    Integer 0/1 often used.
    0 - 127: \x00 - \x7f
    Other values: Strings of big-endian encoded bytes, of arbitrary length, beginning with \x80 - \xbf
    Integer encodings may be interpreted as IEEE float.Length prefixed, up to 55 bytes: \x80 - \xb7 followed by data.
    56+ bytes: \xb8 - \xbf followed by 1-8 byte integer length of string followed by data.
    Length prefixed, up to 55 bytes: \xc0 - \xf7 followed by data.
    56+ bytes: \xf8 - \xff followed by 1-8 byte integer length of data followed by data.
    Length is always in bytes, not in list items.
    Not in protocol. May be encoded as lists of key/value pair lists or other formats.
    Smile\x21True: \x23
    False: \x22
    Single byte "small",
    zigzag-encoded varints, or BigInteger
    IEEE single/double, BigDecimalLength-prefixed "short" Strings, marker-terminated "long" Strings and back-referencesArbitrary-length heterogenous arrays with end-markerArbitrary-length key/value pairs with end-marker
    Structured Data eXchange Formats big-endian signed 24-bit or 32-bit integerbig-endian IEEE doubleeither UTF-8 or ISO 8859-1 encodedlist of elements with identical ID and size, preceded by array header with int16 lengthchunks can contain other chunks to arbitrary depth
    Thrift