···3838[ magic | cid | len | cbor ] [ len | str | len | cbor ] [ len | str | len | cbor] …
3939```
40404141-| name | type |
4242-| ----- | --------------------------------------- |
4341| magic | three-byte mark to identify the format |
4442| cid | atproto-format binary CID link |
4543| len | unsigned varint |
···63616462### Header len + cbor: optional partial commit object
65636666-When `len == 0`, no commit object is included in the archive. This is useful for archiving unsigned subtrees of a full repository tree -- the contents can still be verified from the preceeding CID field.
6767-6868-When `len > 4096`, a parser should reject the commit object as being implausibly large. (TODO: we can probably set an exact limit. DID max is 2048 in atproto, rev must be TID format, etc).
6464+`len == 0` means the commit object was omitted. The header `CID` (above) still proves repository integrity, but identity and cryptographic signature (among other metadata) are not included. One possible use-case is archiving subtrees of a repository, but note that this is not the same as a "CAR slice" which can prove a subset of a repository all the way to its root.
69657070-Otherwise, when `len > 0`, a partial commit object of exactly `len` bytes follows, in CBOR format. The partial commit has the same fields as an [atproto Commit Object][commit] except that the `data` field must be omitted.
6666+When `len <= 4096`, a partial commit object of exactly `len` bytes follows, in CBOR format. The partial commit has the same fields as an [atproto Commit Object][commit] except that the `data` field must be omitted.
71677268To verify the commit signature, use the Header CID (above) as the `data` field to compute the commit's signed CID.
6969+7070+When `len > 4096`, a parser should reject the commit object as being implausibly large. (TODO: we can probably set an exact limit. DID max is 2048 in atproto, rev must be TID format, etc).
737174727573### Data: keys and records
···106104107105While any atproto MST library can reconstruct a full repo MST by simply inserting each `(key, record)` pair, materializing the entire MST at once costs significant memory or i/o overhead.
108106109109-We exploit the lexicographic key ordering of STAR-lite files (or any stream of lex-ordered key-record pairs) to **walk a fully-reconstructed MST without holding the entire tree in memory**.
107107+We exploit the lexicographic key ordering of STAR-lite files (or any stream of lex-ordered key-record pairs) to **walk a fully-reconstructed MST without holding the entire tree in memory**. Only a narrow *stack* of MST nodes (one per layer) must be buffered.
110108111111-This enables efficient transformations, like verifying repository integrity, or conversion to stream-ordered atproto CARv1 format archive.
109109+This enables efficient transformations, like verifying repository integrity, or conversion to other formats, like stream-ordered atproto CARv1.
112110113111114112### Archive verification
115113116116-Verification requires MST reconstruction just like CAR conversion, but never requires temporary disk storage. Each record must be hashed to compute its CID, but its byte contents can be immediately discarded.
114114+Each record must be hashed to compute its CID, but its byte contents can be immediately discarded. MST nodes can also be discarded once finalized into CIDs.
117115118118-Layer-0 MST nodes are materialized with computed record CIDs, then encoded, then hashed, to produce node CIDs. The encoded node bytes (and referenced record CIDs) are discarded, since we only need the node CID to help materialize a MST node.
116116+The final output is the root MST node's CID. It must match the data `CID` field from the header, or else the archive is corrupt.
119117120120-The final output is the root MST node's CID, which verifies the entire archive if it matches the `data` field from the commit object.
121121-122122-Verification asserts the integrity of the repository contents: verifying the signature of the archive's [commit object][commit] (if present) is a separate process, outside the scope of STAR. See atproto [commit signatures][commit-sigs]
118118+_Note: as mentioned above, this is only an integrity check. **Authenticity** requires verifying the signature from the [commit object][commit] by resolving a DID to find a public key -- the [same process as with CAR files][commit-sigs], and an external concern for STAR-lite._
123119124120125121#### Pseudo-code