STreaming ARchives: stricter, verifiable, deterministic, highly compressible alternatives to CAR files for atproto repositories.
atproto car
9
fork

Configure Feed

Select the types of activity you want to include in your feed.

23 3 0

Clone this repository

https://tangled.org/microcosm.blue/star https://tangled.org/did:plc:lulmyldiq4sb2ikags5sfb25/star
git@tangled.org:microcosm.blue/star git@tangled.org:did:plc:lulmyldiq4sb2ikags5sfb25/star

For self-hosted knots, clone URLs may differ based on your setup.

Download tar.gz
readme.md

STAR: STreaming ARchive formats#

Stricter, verifiable, deterministic, highly compressible alternatives to CAR files for atproto repositories.

CAR STAR-lite STAR-L0 STAR-L1
verifiable
existing tools
archive size worst best good near-best
streamable ❌^1 ✅^2 ✅ best
bounded memory ❌^1 ✅^2 ✅ best
speed worst^1 good/best^3 best better
complexity ✅ best ✅ best ok tricky
strict
deterministic
slices, sparse ❌^4
subtree

Read more:

  • CAR: best interoperability

    A standardized content-addressed block format

  • STAR-lite: best compression

    A flat key-record encoding with no MST

  • STAR-L0/L1: best for streaming verification

    A strictly-ordered block format with implicit CIDs and MST recovery at lower layers


Notes:

  1. See this issue on the ietf atproto repo draft: it's not possible in general to correctly treat a CAR repo as stream-ordered without knowing (out of band) that it was encoded that way, so parsers must buffer the entire repository. Disk spilling can bound memory usage, like repo-stream does, but requires many random i/o reads. Stream-ordered CARs are competitive with STAR variants on some axes, but given the unresolved issues, are not considered in this comparison.

  2. STAR-lite streaming verification or conversion-to-CAR requires disk spilling to acheive bounded memory, but the i/o is optimized for a small number of one-time in-order reads from disk.

  3. STAR-lite values can be emitted immediately and trivially from its encoded form with zero buffering required. However, MST recovery (or pre-verification) requires either two passes or disk spilling -- but it's still more efficient than CAR.

  4. STAR-lite could support MST slices and probably sparse MSTs, but this is not specified yet. MST slices in particular would be valuable.