STreaming ARchives: stricter, verifiable, deterministic, highly compressible alternatives to CAR files for atproto repositories.
atproto car
9
fork

Configure Feed

Select the types of activity you want to include in your feed.

STAR-lite compression results#

STAR-lite strips the Merkle Search Tree (MST) and CAR block CIDs from a repository. The resulting files are only slightly smaller: about 75% of the original, across real atproto repos.

The real savings appear when you compress an archive: despite the modest reduction in total input bytes, STAR-lite files compress around 2x better than CAR files.

Note: keys in CAR files are compressed within MST nodes, but not compressed in STAR-lite files, which slightly reduces the relative size of uncompressed files in CAR's favour. Key compression has minimal benefit after compression, so STAR-lite skips it.

Advice#

zstd 3 is highly effective for STAR-lite files. zstd-9 is slightly better.

zstd 9 --long=27 offers modest benefits for STAR-lite files over 16MiB. For smaller repos its benefit is negligible, and its larger initial allocation makes it significantly slower on very small repos. Note that most atproto repos are very small.

zstd 22: significantly better compression for very large (>100MiB) STAR-lite files, but very slow, possibly useful in archival scenarios.

dictionaries: not recommended. zstd has dictionary training, but it performed poorly on our large sample of real atproto repositories: it mainly benefits very-small files, and while 74% of our sample is under 100KiB (in CAR format), those files combined take up less than 3% of the total disk space. Using dictionaries actually compresses worse for repos over 1MiB, which represents 84% of the input sample when weighted by disk usage.

gzip: a fine option for web servers compressing at the transport layer and maximally compatible: compressess to within about 10% of zstd 3.

Results#

Measured on a snapshot 490,328 of repos from the Bluesky Morel PDS from November 2025.

Ratios are STAR/CAR (lower is better). "raw" baseline = uncompressed CAR; "coder" baseline = CAR compressed with the same coder.

overall#

N=490328, raw CAR=229.57 GiB, raw STAR=171.51 GiB.

setting mean (raw) med (raw) wt (raw) mean (coder) med (coder) wt (coder)
raw 0.668 0.678 0.747
gzip 0.293 0.232 0.214 0.568 0.556 0.552
zstd --fast 1 0.334 0.295 0.285 0.633 0.636 0.671
zstd 3 0.281 0.223 0.194 0.567 0.551 0.554
zstd 9 0.276 0.218 0.183 0.563 0.544 0.543
zstd 9 +long 0.276 0.218 0.177 0.563 0.544 0.525
zstd 22 (N=29434) 0.271 0.213 0.159 0.557 0.536 0.492
zstd 9 +dict (N=147100) 0.225 0.203 0.196 0.549 0.539 0.561

< 100 KiB#

N=362162, raw CAR=5.84 GiB, raw STAR=4.05 GiB.

setting mean (raw) med (raw) wt (raw) mean (coder) med (coder) wt (coder)
raw 0.645 0.647 0.693
gzip 0.322 0.263 0.215 0.580 0.571 0.517
zstd --fast 1 0.355 0.316 0.270 0.631 0.633 0.605
zstd 3 0.313 0.259 0.210 0.578 0.566 0.523
zstd 9 0.309 0.255 0.204 0.576 0.564 0.517
zstd 9 +long 0.309 0.256 0.204 0.577 0.564 0.517
zstd 22 (N=21666) 0.306 0.252 0.199 0.576 0.565 0.511
zstd 9 +dict (N=108819) 0.238 0.212 0.190 0.553 0.541 0.512

100 KiB – 1 MiB#

N=92499, raw CAR=29.91 GiB, raw STAR=21.95 GiB.

setting mean (raw) med (raw) wt (raw) mean (coder) med (coder) wt (coder)
raw 0.728 0.726 0.734
gzip 0.208 0.210 0.210 0.527 0.529 0.531
zstd --fast 1 0.266 0.271 0.272 0.622 0.628 0.635
zstd 3 0.193 0.194 0.192 0.530 0.528 0.532
zstd 9 0.184 0.185 0.183 0.519 0.517 0.521
zstd 9 +long 0.185 0.186 0.184 0.520 0.518 0.523
zstd 22 (N=5664) 0.176 0.177 0.174 0.502 0.499 0.502
zstd 9 +dict (N=27643) 0.185 0.188 0.186 0.525 0.525 0.527

1 MiB – 10 MiB#

N=31177, raw CAR=90.81 GiB, raw STAR=68.12 GiB.

setting mean (raw) med (raw) wt (raw) mean (coder) med (coder) wt (coder)
raw 0.749 0.744 0.750
gzip 0.216 0.217 0.216 0.554 0.551 0.555
zstd --fast 1 0.288 0.290 0.289 0.673 0.670 0.675
zstd 3 0.191 0.193 0.193 0.544 0.538 0.551
zstd 9 0.179 0.182 0.179 0.530 0.527 0.531
zstd 9 +long 0.180 0.182 0.179 0.531 0.527 0.530
zstd 22 (N=1828) 0.169 0.173 0.168 0.507 0.505 0.507
zstd 9 +dict (N=9334) 0.199 0.202 0.201 0.568 0.564 0.573

10 MiB – 100 MiB#

N=4447, raw CAR=95.83 GiB, raw STAR=71.94 GiB.

setting mean (raw) med (raw) wt (raw) mean (coder) med (coder) wt (coder)
raw 0.752 0.748 0.751
gzip 0.216 0.218 0.215 0.562 0.558 0.559
zstd --fast 1 0.290 0.293 0.289 0.687 0.680 0.684
zstd 3 0.198 0.200 0.197 0.572 0.568 0.567
zstd 9 0.187 0.189 0.187 0.566 0.560 0.565
zstd 9 +long 0.174 0.177 0.173 0.526 0.526 0.525
zstd 22 (N=273) 0.162 0.164 0.160 0.497 0.500 0.494
zstd 9 +dict (N=1288) 0.201 0.202 0.200 0.574 0.571 0.569

≥ 100 MiB#

N=43, raw CAR=7.17 GiB, raw STAR=5.45 GiB.

setting mean (raw) med (raw) wt (raw) mean (coder) med (coder) wt (coder)
raw 0.761 0.753 0.760
gzip 0.200 0.211 0.186 0.554 0.550 0.533
zstd --fast 1 0.265 0.285 0.247 0.674 0.669 0.655
zstd 3 0.182 0.195 0.172 0.557 0.555 0.537
zstd 9 0.172 0.185 0.162 0.557 0.563 0.535
zstd 9 +long 0.156 0.166 0.149 0.510 0.510 0.496
zstd 22 (N=3) 0.096 0.108 0.081 0.409 0.417 0.349
zstd 9 +dict (N=16) 0.175 0.194 0.148 0.534 0.559 0.500