STAR: STreaming ARchive formats#
Stricter, verifiable, deterministic, highly compressible alternatives to CAR files for atproto repositories.
| CAR | STAR-lite | STAR-L0 | STAR-L1 | |
|---|---|---|---|---|
| verifiable | ✅ | ✅ | ✅ | ✅ |
| existing tools | ✅ | ❌ | ❌ | ❌ |
| archive size | worst | best | good | near-best |
| streamable | ❌^1 | ✅^2 | ✅ best | ✅ |
| bounded memory | ❌^1 | ✅^2 | ✅ best | ✅ |
| speed | worst^1 | good/best^3 | best | better |
| complexity | ✅ best | ✅ best | ok | tricky |
| strict | ❌ | ✅ | ✅ | ✅ |
| deterministic | ❌ | ✅ | ✅ | ✅ |
| slices, sparse | ✅ | ❌^4 | ✅ | ✅ |
| subtree | ❌ | ✅ | ✅ | ✅ |
Read more:
-
CAR: best interoperability
A standardized content-addressed block format
-
STAR-lite: best compression
A flat key-record encoding with no MST
-
STAR-L0/L1: best for streaming verification
A strictly-ordered block format with implicit CIDs and MST recovery at lower layers
Notes:
-
See this issue on the ietf atproto repo draft: it's not possible in general to correctly treat a CAR repo as stream-ordered without knowing (out of band) that it was encoded that way, so parsers must buffer the entire repository. Disk spilling can bound memory usage, like repo-stream does, but requires many random i/o reads. Stream-ordered CARs are competitive with STAR variants on some axes, but given the unresolved issues, are not considered in this comparison.
-
STAR-lite streaming verification or conversion-to-CAR requires disk spilling to acheive bounded memory, but the i/o is optimized for a small number of one-time in-order reads from disk.
-
STAR-lite values can be emitted immediately and trivially from its encoded form with zero buffering required. However, MST recovery (or pre-verification) requires either two passes or disk spilling -- but it's still more efficient than CAR.
-
STAR-lite could support MST slices and probably sparse MSTs, but this is not specified yet. MST slices in particular would be valuable.