···85858686- `len` (varint): The length of the proceeding atproto commit object in bytes.
87878888-- `commit`: An atproto commit object in `DAG-CBOR` derived from the [repo spec](https://www.ietf.org/archive/id/draft-holmgren-at-repository-00.html#name-commit-objects):
8888+- `commit` (DAG-CBOR): An atproto commit object in `DAG-CBOR` derived from the [repo spec](https://www.ietf.org/archive/id/draft-holmgren-at-repository-00.html#name-commit-objects):
89899090 - `did` (string, nullable): same as repo spec
9191 - `version` (integer, required): corresponding CAR repo format version, currently fixed value of `3`
···109109- `node`: TODO: we need a new node format. It must be convertible back to a repo-spec style node.
110110111111- `record`: The atproto record. Its CID can be computed over the bytes of its `block` (see below).
112112+113113+### node / base
114114+115115+```
116116+|----- node -----|
117117+[ len | mst node ]
118118+```
119119+120120+- `len` (varint): the length of the proceeding CBOR block, in bytes.
121121+122122+- `mst node` (DAG-CBOR): object with the following schema
123123+ - `l` (hash link, nullable)
124124+125125+note1: it's a bit tempting to redesign the MST nodes, because the _reason_ (and lack of special-ness) for `l` being separate from the entries in `e` took a long time for me to understand. but the existing format definitely works so maybe sticking close to it is the move?
126126+127127+note2: a magic special zero hash-link is a pretty gross way to shoehorn in a sentinel! null was already taken because subtrees always are optional
128128+129129+(this section is very much in flux)
130130+131131+was thinking of making base (depth=0) nodes special (implicit cid) and then further simplifying to a simple array of entries since they can't have subtrees (`l` or `t`s).
132132+133133+buuuutttt it's probably simpler just to give the node a nullable `cid` property that's required when depth=0.
134134+135135+on the other track, i was thinking nodes could be rewritten as a pair of arrays
136136+137137+```
138138+index: [ 0 , 1 , 2 , 3 ]
139139+140140+new
141141+entries: [ (keyA, linkA) , (keyB, linkB) , (keyC, linkC) ] xxxxxxxxxxxxxxx
142142+trees: [ * tree before A , * tree before B , <null> , *tree after C ]
143143+144144+vs old repo spec
145145+mst node:[ tree in `l` , keyA's `t` , keyB's null `t`, keyC's `t` ]
146146+```
147147+148148+i think most languages can handle a pair of arrays ok with zip? but the equal-or-one-shorter length of `entries` compared to `trees` seems like asking for bugs.
149149+150150+so let's keep it simple (similar to the repo spec), trying again:
151151+152152+153153+```
154154+|----- node -----|
155155+[ len | mst node ]
156156+```
157157+158158+- `len` (varint): the length of the proceeding CBOR block, in bytes.
159159+160160+- `mst node` (DAG-CBOR): object with the following schema
161161+ - `cid` (hash link, nullable): the CID of this MST node. must be `null` for nodes at `depth=0`; required to be non-null for nodes at any higher `depth`.
162162+ - `l` (hash link, nullable): reference to a subtree at a lower depth containing only keys to the left of this node. when the referenced node is included in the archive, it must be given a special zeroed-out link reference (all zero bytes (deal with hash link prefixes or whatever... probably can assume sha256 but careful for lossless reversibility back to CAR))
163163+ - `e` (array, required): ordered array of entry objects, each containing:
164164+ - `p` (integer, required): number of bytes shared with the previous entry (TODO key compression actually)
165165+ - `k` (byte string, required): key suffix remaining
166166+ - `v` (hash link, **nullable**): reference to the record data for this key. must be null if the STAR includes the record; must _not_ be null if the record is not included in the STAR
167167+ - `t` (hash link, nullable): link to a subtree that sorts to the right of this entry's key and to the left of the next entry's key. see `l` above.
168168+169169+NOTE: the option to not include `v` (and requiring its hash link to be present in that case) keeps the option open for `key->CID`-only archives, which can be nice for things like diffing a repo to handle a firehose `#sync` event, or perhaps to exclude large records specifically from the archive. (make this cohesive with optional vs null handling if using that)
170170+171171+TODO: nullable vs optional? (in general??)
172172+173173+tempting to do something like:
174174+175175+- omitted means there is no subtree
176176+- null means there is a subtree and it's included (CID to-calculate)
177177+- non-null means there is a subtree and it's *not* included (MST slice or sparse tree)
178178+179179+hmmm: having separate optional and null cases might make deserializing into some languages tricky. i'm not sure if serde can handle that well? omitempty + nullable => `Option<Option<T>>`? should probably check other languages.
180180+112181113182### record
114183