STreaming ARchives: stricter, verifiable, deterministic, highly compressible alternatives to CAR files for atproto repositories.
atproto car
9
fork

Configure Feed

Select the types of activity you want to include in your feed.

clean up verify pseudo-code

actually not hating it

phil 1d884db8 147ff4dd

+88 -67
+88 -67
star-lite/readme.md
··· 28 28 29 29 ## Format 30 30 31 - TODO: subtree -- probably just set header cbor len to 0 to omit the commit? but we still probably want the `data` root hash... 32 - 33 - STAR-lite is just a flat list of every key/record pair in the repository, in lexicographic key order, with a commit object in its header. 31 + STAR-lite is a flat list of every key/record pair in the repository, in lexicographic key order, with a commit object in its header. It's suited for single-pass streaming. 34 32 35 33 ``` 36 34 |--------- header ---------| |------------------ data (records) -------------------| ··· 62 60 63 61 When `len == 0`, no commit object is included in the archive. This is useful for archiving unsigned subtrees of a full repository tree -- the contents can still be verified from the preceeding CID field. 64 62 65 - When `len > 4096`, a parser may reject the commit object as being implausibly large. (TODO: we probably can set an exact limit. DID max is 2048 in atproto, rev must be TID format, etc). 63 + When `len > 4096`, a parser should reject the commit object as being implausibly large. (TODO: we can probably set an exact limit. DID max is 2048 in atproto, rev must be TID format, etc). 66 64 67 65 Otherwise, when `len > 0`, a partial commit object of exactly `len` bytes follows, in CBOR format. The partial commit has the same fields as an [atproto Commit Object][commit] except that the `data` field must be omitted. 68 66 ··· 71 69 72 70 ### Data: keys and records 73 71 74 - zero or more records until EOF. Each is: 72 + Zero or more records until EOF. Each is: 75 73 76 74 | field | type | 77 75 | ----------- | --------------------------------------- | ··· 83 81 The maximum key length comes from the combined limits of the `<collection>/<rkey>` syntax for atproto repo paths: 317 for the [collection][nsid] + 1 for the `/` slash + 512 for the [rkey][rkey]. 84 82 85 83 The maximum record size of 1MiB (1,048,576 bytes) comes from the atproto [*recommended data limits*][reclen]. 84 + 85 + Parsers must reject archives that exceed maximum values. 86 86 87 87 88 88 ### Varints ··· 90 90 Length prefixes in STAR are encoded as unsigned variable-length integers ([varint][varint], a variant of [LEB128][leb128]) 91 91 92 92 93 + ### Compression 94 + 95 + STAR-lite is intended to be externally compressed with zstd in transport or for storage. 96 + 97 + TODO: include recommended zstd configs 98 + 99 + TODO: include an actual table or graphs showing compression performance. should show vs CAR, and also compare gzip (maybe brotli?) to zstd settings 100 + 101 + 93 102 ### Rules 94 103 95 104 - keys must be in strict lexicographic byte order. 96 105 - duplicate keys are not allowed. 97 - - keys must be valid atproto repo paths: the format specifies utf-8, but in practice the required `<collection>/<rkey>` repo path format currently restricts characters to a small subset of ASCII. 98 - - records must be encoded as [DRISL][drisl], the deterministic subset of CBOR used by atproto. 106 + - keys should be valid atproto repo paths: the format specifies utf-8, but in practice the required repo path format `<collection>/<rkey>` restricts characters to a small subset of ASCII. 107 + - records should be encoded as [DRISL][drisl], the deterministic subset of CBOR used by atproto, though parsers are not required to interpret record bytes at all. 108 + - any parse error should be treated as fatal for the entire archive. 99 109 100 110 101 111 ## Efficient MST-aware operations ··· 107 117 108 118 ### MST node stack 109 119 110 - We don't need to materialize the entire MST at once for a depth-first tree-reconstructing walk across it: a small stack of in-progress MST nodes (one per layer of the tree) is sufficient. 120 + We don't need to materialize the entire MST at once for a depth-first tree-reconstructing walk across it: a narrow stack of MST nodes (one per layer of the tree) is sufficient state. 111 121 112 122 When a key's layer is *greater than the previous* key's layer, all in-progress MST nodes from lower layers are complete, and can be **frozen**: encoded in atproto MST node format to compute their CIDs, recursively resolving into a CID link from the current key's node. 113 123 ··· 117 127 - serialized into runs of CAR-format blocks, 118 128 - any other transformation 119 129 120 - Once the entire tree has been walked and frozen, the highest-layer MST node can finally be considered frozen to produce the root node CID, which be match the CID in a STAR-lite file's header. 130 + Once the entire tree has been walked and frozen, the highest-layer MST node can finally be considered frozen to produce the root node CID, which match the CID in a STAR-lite file's header. 131 + 132 + 133 + ### Archive verification 134 + 135 + Verification requires MST reconstruction just like CAR conversion, but never requires temporary disk storage. Each record must be hashed to compute its CID, but its byte contents can be immediately discarded. 136 + 137 + Layer-0 MST nodes are materialized with computed record CIDs, then encoded, then hashed, to produce node CIDs. The encoded node bytes (and referenced record CIDs) are discarded, since we only need the node CID to help materialize a MST node. 138 + 139 + The final output is the root MST node's CID, which verifies the entire archive if it matches the `data` field from the commit object. 140 + 141 + Verification asserts the integrity of the repository contents: verifying the signature of the archive's [commit object][commit] (if present) is a separate process, outside the scope of STAR. See atproto [commit signatures][commit-sigs] 142 + 143 + 144 + ```python 145 + # MstNode interface: 146 + # is_empty() => bool true if the node has no subtree or value links 147 + # reset_to_empty() clears the node to `empty` state 148 + # link_record(key, cid) appends an entry with a key and value link 149 + # link_subtree(cid) inserts a node link as the "left" child (empty node), 150 + # or as the right-most entry's "right" 151 + # to_cbor() => bytes bytes: canonical DAG-CBOR encoding of the MST node 152 + 153 + def reconstruct_root_cid(key_record_pairs): 154 + """Compute the MST root CID from repo contents 155 + 156 + key_record_pairs must be in lexicographic key order (= depth-first mst walk) 157 + """ 158 + stack: list[MstNode] = [] 159 + prev_layer = -1 160 + 161 + # the actual walk. everything left of the stack is finalized. 162 + # anything remaining in the stack gets rolled up at the end. 163 + for (key, record_cbor) in key_record_pairs: 164 + key_layer = compute_mst_layer(key) 165 + 166 + # grow the stack if needed, init with empty nodes. 167 + while len(stack) <= key_layer: 168 + stack.append(MstNode()) 169 + 170 + # finalize lower levels if this key is at a higher level than last. 171 + # higher key means everything lower in the stack is to-our-left now. 172 + if key_layer > prev_layer: 173 + for node, parent in zip(stack[:key_layer], stack[1:]): 174 + if node.is_empty(): 175 + continue # skip possible empty bottom-most nodes 176 + parent.link_subtree(compute_cid(node.to_cbor())) 177 + node.reset_to_empty() 178 + 179 + # add a node entry for the current record 180 + stack[key_layer].link_value(key, compute_cid(record_cbor)) 181 + 182 + prev_layer = key_layer 183 + 184 + # finalize remaining stack 185 + for node, parent in zip(stack[:-1], stack[1:]): 186 + if node.is_empty(): 187 + continue 188 + parent.link_subtree(compute_cid(node.to_cbor())) 189 + node.reset_to_empty() 190 + 191 + # get the finished root node, finally. 192 + if len(stack) > 0: 193 + root = stack[-1] 194 + else: 195 + root = MstNode() # empty repo: atproto CAR writes one single empty node 196 + 197 + return compute_cid(root.to_cbor()) 198 + ``` 121 199 122 200 123 201 ### Conversion to CAR ··· 142 220 #### pseudo-code 143 221 144 222 ```python 145 - ## WIP! 223 + # wip! 146 224 147 225 def to_stream_ordered_car(key_record_pairs): 148 226 stack = [] ··· 205 283 output.extend(byte_log[run.what:run.whattt]) 206 284 207 285 return node_cid, output 208 - ``` 209 - 210 - 211 - ### Archive verification 212 - 213 - Verification requires MST reconstruction just like CAR conversion, but never requires temporary disk storage. Each record must be hashed to compute its CID, but its byte contents can be immediately discarded. 214 - 215 - Layer-0 MST nodes are materialized with computed record CIDs, then encoded, then hashed, to produce node CIDs. The encoded node bytes (and referenced record CIDs) are discarded, since we only need the node CID to help materialize a MST node. 216 - 217 - The final output is the root MST node's CID, which verifies the entire archive if it matches the `data` field from the commit object. 218 - 219 - Verification asserts the integrity of the repository contents: verifying the signature of the archive's [commit object][commit] (if present) is a separate process, outside the scope of STAR. See atproto [commit signatures][commit-sigs] 220 - 221 - 222 - ```python 223 - ## WIP!! 224 - 225 - def verify(key_record_pairs, expected_root_cid): 226 - stack: list[MstNode] = [] 227 - prev_layer = -1 228 - 229 - for (key, record) in key_record_pairs: 230 - record_cid = compute_cid(record) 231 - key_layer = layer_of(key) 232 - 233 - # grow the stack if needed, init with empty nodes 234 - while len(stack) <= key_layer: 235 - stack.append(MstNode()) 236 - 237 - # when `key` is at a higher layer than last, freeze all layers below 238 - if key_layer > prev_layer: 239 - for i in range(key_layer): 240 - if stack[i].is_empty(): 241 - continue 242 - (node_cid, _) = encode_mst_node(stack[i]) 243 - stack[i + 1].attach_subtree(node_cid) 244 - stack[i] = MstNode() # empty it 245 - 246 - # every key-record pair must insert to a node `entry` 247 - stack[key_layer].entries.append(Leaf(key, record_cid, car_run=None)) 248 - 249 - prev_layer = key_layer 250 - 251 - # Fold remaining stack bottom-up. 252 - node_cid = None 253 - for node in stack: 254 - if node_cid is not None: 255 - node.attach_subtree(node_cid) 256 - node_cid = None 257 - if not node.is_empty(): 258 - (node_cid, _) = encode_mst_node(node) 259 - 260 - # Empty repo: canonical empty MST node CID. 261 - if node_cid is None: 262 - (node_cid, _) = encode_mst_node(MstNode()) 263 - 264 - return node_cid == expected_root_cid 265 286 ``` 266 287 267 288