STreaming ARchives: stricter, verifiable, deterministic, highly compressible alternatives to CAR files for atproto repositories.
atproto car
9
fork

Configure Feed

Select the types of activity you want to include in your feed.

rough star-lite outline

phil 5c1077b9 3c6ebc8a

+158
+158
star-lite/readme.md
··· 1 + # STAR-lite 2 + 3 + **ST**reaming **AR**chive repository format (extra light version) 4 + 5 + A stricter, simpler, still verifiable, more compressible alternative to [CAR](https://ipld.io/specs/transport/car/carv1/#format-description). 6 + 7 + STAR-lite describes both the actual binary encoding, and its memory-bounded algorithm to convert any sequence of in-order key-record pairs into stream-ordered CAR files. 8 + 9 + This efficient conversion makes STAR-lite suitable as an efficient network transport format or for long-term archiving and backup, without sacrificing interoperability. 10 + 11 + 12 + ### compared to CARs: 13 + 14 + - All MST node blocks and all MST CIDs are omitted, eliminating the least-compressible content. 15 + - Strict content ordering, deterministic encoding. 16 + - Bounded-memory conversion back to stream-ordered CAR. 17 + 18 + STAR-lite files shine when zstd-compressed. 19 + 20 + 21 + ### compared to STAR-L0 and STAR-L1: 22 + 23 + - Smallest archive format (with zstd compression) 24 + - Content verification requires a complete scan of all content 25 + - No support for sparse archives or "CAR slice"-like proofs (yet) 26 + - Disk spilling required for memory-bounded streaming of large archives 27 + 28 + 29 + ## format 30 + 31 + STAR-lite is just a flat list of every key/record pair in the repository, in lexicographic key order, with a commit object in its header. 32 + 33 + ``` 34 + |------ header ------| |------------------ data (records) -------------------| 35 + [ magic | len | cbor ] [ len | str | len | cbor ] [ len | str | len | cbor] … 36 + ``` 37 + 38 + | name | type | 39 + | ----- | -------------------------------------- | 40 + | magic | three-byte mark to identify the format | 41 + | len | unsigned varint | 42 + | str | utf-8 bytes | 43 + | cbor | cbor bytes | 44 + 45 + 46 + ### magic 47 + 48 + Three bytes: `0x2A 0x6C 0x00`, ASCII for `*l\0`: "star", "**l**ite", version 0. 49 + 50 + 51 + ### header len + cbor: Commit 52 + 53 + A length-prefixed CBOR blob containing an atproto signed Commit object. The CBOR format is the same as the atproto repo spec describes. The Commit may be ignored, but for archive content verification, its `data` field must be parsed at minimum. 54 + 55 + TODO: specify maximum commit size 56 + 57 + TODO: like the other STAR formats we should actually define a slightly-modified commit object, specifically with a nullable data cid for empty repos. otherwise, we should include the magic CID of an empty atproto MST node's hash like you get with CARs. 58 + 59 + 60 + ### data: keys and records 61 + 62 + zero or more records until EOF. Each is: 63 + 64 + | field | type | 65 + | ----------- | --------------------------------------- | 66 + | key len | varint (TODO: min and max) | 67 + | key str | utf-8 bytes, exactly `key len` length | 68 + | record len | varint, max: 1,048,576 (1MiB) | 69 + | record cbor | cbor bytes, exactly `record len` length | 70 + 71 + 72 + ### varints 73 + 74 + Unsigned LEB128 / multiformats unsigned-varint: 7 bits per byte, MSB is the continuation flag, little-endian byte order. 75 + 76 + TODO: say we defer to that spec -- which one specifically? (match to CAR's) 77 + 78 + TODO: do we need to resolve any ambiguities from the spec? eg., that encoders must use the minimum number of bytes (no leading `0x80` padding bytes)? (we want to be deterministic). Also any security notes, like most number of bytes before bailing on the varint read? 79 + 80 + 81 + ### rules 82 + 83 + - keys must be in strict lexicographic byte order. 84 + - duplicate keys are not allowed. 85 + - keys must be valid atproto repo paths: the format specifies utf-8, but in practice the required `<collection>/<rkey>` repo path format currently restricts characters to a small subset of ASCII. 86 + - records must be encoded as [DRISL](https://dasl.ing/drisl.html), the deterministic subset of CBOR used by atproto. 87 + 88 + 89 + ## efficient MST recovery 90 + 91 + *for archive verification and conversion to CAR* 92 + 93 + The simple way to verify an archive is to insert each `(key, record)` pair into an atproto MST builder library to reconstruct the full MST. Then, assert that the MST's root `CID` matches the CID in the Commit's `data` field. 94 + 95 + For large repositories, building the MST this way may require significant memory, or significant storage I/O. STAR-lite includes a bounded-memory, efficient disk-spilling algorithm to recover the MST for verification or conversion to other atproto formats. 96 + 97 + 98 + ### archive verification 99 + 100 + Verification uses the same MST recover technique as CAR conversion (below), but evicts subtrees by simply dropping them, rather than spilling to disk, since only the root MST node's CID is required for verification. 101 + 102 + 103 + ### conversion to CAR 104 + 105 + Stream-ordered CARs (in "preorder traversal" block order) are a depth-first walk over the Merkle Search Tree, and keys encountered during a depth-first MST walk are in strict lexicographic order. 106 + 107 + There is a a useful symmetry here: 108 + 109 + - every subtree of an MST occupies a contiguous region of the stream-order serialized CAR 110 + - every subtree of an MST spans a contiguous range lexicographically-ordered keys 111 + 112 + So, any subtree-spanning range of keys (and records) can be materialized directly into its stream-ordered sequence of CAR blocks, independent of the rest of the archive. 113 + 114 + MST subtrees can't be *emitted* until the entire MST has been reconstructed, because stream-ordering requires that the very first CAR block is the MST root node, and that is the very last node we can serialize. 115 + 116 + But what we can do, is write serialized segments of the final CAR to disk temporarily as the entire MST is reconstructed, to stay within a strict memory budget. Streaming out the final stream-ordered CAR can use `copy_file_range` or equivalent to splice them in at the right places. 117 + 118 + 119 + ### algorithm 120 + 121 + ``` 122 + read magic 123 + read commit 124 + init mst 125 + 126 + // TODO: fix this up to eagerly serialize subtrees 127 + 128 + for (key, record) in star_lite_entries: 129 + mst.insert(key, record) 130 + if mst.memory_usage() > limit: 131 + // find the leftmost subtree whose rightmost key < `key` (its structure is now frozen) 132 + subtree := mst.evict_leftmost_finalized_subtree() 133 + root_cid := subtree.root_cid() 134 + segment_path := temp_dir.create_segment(root_cid) 135 + subtree.write_blocks_in_car_order(segment_path) 136 + // replace the in-memory subtree pointer with a marker 137 + mst.replace_with_marker(root_cid, segment_path) 138 + 139 + // EOF 140 + root_cid := mst.finalize() 141 + 142 + // stream out 143 + init car := AtprotoCar(commit) 144 + for block_or_marker in mst.depth_first_walk(): 145 + match block_or_marker: 146 + Block(cid, bytes) => car.write_block(cid, bytes) 147 + Marker(segment) => car.splice_file(segment.path) 148 + ``` 149 + 150 + Memory is bounded because there is a practical (low) limit to MST height. 151 + 152 + 153 + 154 + ### conversion from CAR 155 + 156 + TODO: but basically: use repo-stream for a bounded-memory MST walk if you can't be certain that the CAR is stream-ordered. If it *is* stream-ordered, any streaming walker will work and it's pretty simple to write out. 157 + 158 + Note: it is **not possible** to know if an atproto CAR is stream-ordered except by either knowing that it was encoded that way in advance, or by reading the **entire** archive first to verify.