This is an experimental block format for IPLD. It leverages the constraints of the IPLD data model to shave as many bytes off the block format as we can without compression.
Features:
Constraints:
links | values | structure
A block is broken into 3 parts and each part has different parsing rules
| ...cids | null byte |
| length | ...values(len | data) |
| structure |
The first section is all the links (cids) and is ended by a null byte.
The second section is all the values (map keys, string values, byte values). The length of this section is the first varint, so there is no closing delimiter.
The third section is the actual value type structure, which will reference cids and string/byte values from the prior sections. Other types are written inline into the structure.
// each line is an structure example
[ 0 ] // 0 int
[ 1 ] // 1 int
[ 101, 102 ] // 102 int (penalty byte due to conflict with table entries)
[ 102, reference varint ] // string
[ 103, reference varint ] // bytes
[ 108, ...(reference varint, table entry) ] // map
[ 109, ...(table entry) ] // list
0 - 100 : inline varint
100 : list delimiter
101 : varint (limited allowable use)
102 : utf8 string reference
103 : bytes reference
104 : null
105 : true
106 : false
107 : float
108 : map
109 : list
110 : cid reference
111 : signed varint
112 : signed float
113 : zero point float (not implemented)
114 : typed map (not implemented)
115 : typed list (not implemented)
116+ : inline varint
Writing this way enforces deterministic ordering and de-duplication. It also keeps the encoded length numbers as small as possible, leading to a smaller varint.
The block stucture is written as a single table entry.
If the root block stucture is a map or list the trailing delimiter (0 for map, 100 for list) MUST be removed.
Writing this way enforces deterministic order of map keys. It also keeps the encoded index numbers as small as possible, leading to a smaller varint.
No two map keys can reference the same value index. MUST throw.
For any whole positive
For negative integers:
For positive floats:
For negative floats:
read() varint for increase from prior length (begins at zero)We use read() to refer to the next read after this varint is read
read() is the full varint valueread() is a varint for the index of the value and this value is a string value.read() is a varint for the index of the value and this value is a bytes value.read()s are varints parsed into a float value.
read()s are in this map until a null byte
read() is a reference to the value index and should be represented by a string key. The index is offset by +1 so that a null byte can be used to end the map.read() is the map value and is a table entry table entryread()s are in this list until a 100 delimiter
read() is a table entry you hit a 100 delimiterread() is a varint for the index of the cid value.read() is a signed varint, follow the varint parsing rules but make the value negativeread() is a signed float, follow float parsing rules but make the left side negativeread() is a zero point float, the next varint is the integer representing the mantissa below zero. (not implemented)Sort:
[ 1, 2, 3, 4 ]
[ 1, 5, 5, 5 ]
[ 2, 0, 0, 0, 0, 0]
Serializes as:
[ 4 ] [ 1, 2, 3, 4 ]
[ 0 ] [ 1, 5, 5, 5 ]
[ 2 ] [ 2, 0, 0, 0, 0, 0]
The link section of the block is sorted with CID specific rules in order to compress the links by de-duplicating common prefixes.
The sorting of CIDs:
[ 18 ] [ length=32 ] [ digest-1 ]
[ 18 ] [ length=32 ] [ digest-2 ]
[ 1 , codec=1, hashfn=1 ] [ length=0 ] [ ]
[ 1 , codec=1, hashfn=1 ] [ length=1 ] [ 1 ]
[ 1 , codec=1, hashfn=1 ] [ length=1 ] [ 2 ]
[ 1 , codec=1, hashfn=1 ] [ length=245 ] [ digest-1 ]
[ 1 , codec=1, hashfn=1 ] [ length=245 ] [ digest-2 ]
[ 1 , codec=1, hashfn=1 ] [ length=245 ] [ digest-3 ]
[ 1 , codec=1, hashfn=1 ] [ length=250 ] [ digest-1 ]
[ 1 , codec=1, hashfn=1 ] [ length=250 ] [ digest-2 ]
[ 1 , codec=1, hashfn=1 ] [ length=250 ] [ digest-3 ]
[ 1 , codec=2, hashfn=1 ] [ length=250 ] [ digest-1 ]
[ 1 , codec=2, hashfn=1 ] [ length=250 ] [ digest-2 ]
[ 1 , codec=2, hashfn=2 ] [ length=250 ] [ digest-1 ]
[ 1 , codec=2, hashfn=2 ] [ length=250 ] [ digest-2 ]
Is serialized as:
[ 18 ] [ length=32 ] [ digest-1 ]
[ length=32 ] [ digest-2 ]
[ 1 , codec=1, hashfn=1 ] [ length=0 ] [ ]
[ 1 , codec=1, hashfn=1 ] [ length=1 ] [ 1 ]
[ 1 , codec=1, hashfn=1 ] [ length=1 ] [ 2 ]
[ 1 , codec=1, hashfn=1 ] [ length=245 ] [ digest-2 ]
[ length=245 ] [ digest-3 ]
[ length=250 ] [ digest-1 ]
[ length=250 ] [ digest-2 ]
[ length=250 ] [ digest-3 ]
[ 1 , codec=2, hashfn=1 ] [ length=250 ] [ digest-1 ]
[ length=250 ] [ digest-2 ]
[ 1 , codec=2, hashfn=2 ] [ length=250 ] [ digest-1 ]
[ length=250 ] [ digest-2 ]
All optimizations are required in order to guarantee determinism.
When no links or values are present the two nullbytes should be dropped. If the first byte in the structure is less than 19 (lower will conflict with CIDv0 and other potential future multiformats), you must prepend 101.
Any map or list that only has entries that are of the same type, and that type is not integer, MUST use typed lists and maps. Integer typed lists don't use this feature because it would be an extra byte.
All empty maps and lists must be typed. This is so that any schema validation can be done without additional parsing when the list is typed to anyting but an integer.
Note that all the following examples do not have any links or values and as such do not have proceeding null bytes.
[ 1, 2 ]
/* serializes to */
109 // list
1 // 1
2 // 2
// omit trailing delimiter when list or map is root structure
[ 1, [ 2, 3 ] ]
/* serializes to */
109 // list
1 // 1
109 // list
2 // 2
3 // 2
100 // end list
// omit trailing delimiter when list or map is root structure
[ 1, [ null ], 3 ]
/* serializes to */
109 // list
1 // 1
109 // list
104 // null
100 // end list
3 // 3
// omit trailing delimiter when list or map is root structure
The following examples have values and no CIDs, which is why they begin with a null byte.
{ "hello": "world" }
/* serializes to */
0 // no links
12 // length of values header
5 // +5 length offset
104, 101, 108, 108, 111 // "hello"
0 // +0 length offset
119, 111, 114, 108, 100 // "world"
108 // map
1 // (+1 map key offset) - 1
1 // value index
// omit trailing delimiter when list or map is root structure
[ { "hello": "world", "world": "hello" } ]
/* serializes to */
0 // no links
12 // length of values header
5 // +5 length offset
104, 101, 108, 108, 111 // "hello"
0 // +0 length offset
119, 111, 114, 108, 100 // "world"
109
108 // map
1 // (+1 map key offset ) + 1
1 // value index
1 // (+1 map key offset ) - 1
0 // value index
0 // map end
// omit trailing delimiter when list or map is root structure
0 - 100 : inline varint 100 : list delimiter 101 : varint (limited allowable use) 102 : utf8 string reference 103 : bytes reference 104 : null 105 : true 106 : false 107 : float 108 : map 109 : list 110 : cid reference 111 : signed varint 112 : signed float 113 : zero point float (not implemented) 114 : string typed map (not implemented) 115 : string typed list (not implemented) 116 : byte value typed map 117 : byte value type list 119 : float typed map 120 : float typed list 121 : signed varint map 122 : signed varint list 123 : signed float map 124 : signed float list 125 : empty map 126 : empty list 118+ : inline varint
Typed list
[ 'x', 'y', 'z' ]
/* serializes to */
0 // no links
6 // value header length
1, 120 // +1 offset length, "x"
0, 121 // +0 offset length, "y"
0, 122 // +0 offset length, "z"
115 // string typed list
1 // value index + 1
2 // value index + 1
3 // value index + 1
// trailing zero is ommitted during structure inlining