STreaming ARchives: stricter, verifiable, deterministic, highly compressible alternatives to CAR files for atproto repositories.
atproto car
9
fork

Configure Feed

Select the types of activity you want to include in your feed.

at main 117 lines 6.1 kB view raw view rendered
1# STAR-lite compression results 2 3STAR-lite strips the Merkle Search Tree (MST) and CAR block CIDs from a repository. The resulting files are only slightly smaller: about 75% of the original, across real atproto repos. 4 5The real savings appear when you compress an archive: despite the modest reduction in total input bytes, **STAR-lite files compress around 2x better than CAR files**. 6 7Note: keys in CAR files are compressed within MST nodes, but not compressed in STAR-lite files, which slightly reduces the relative size of uncompressed files in CAR's favour. Key compression has minimal benefit after compression, so STAR-lite skips it. 8 9 10## Advice 11 12**zstd 3** is highly effective for STAR-lite files. zstd-9 is slightly better. 13 14**zstd 9 --long=27** offers modest benefits **for STAR-lite files over 16MiB**. For smaller repos its benefit is negligible, and its larger initial allocation makes it significantly slower on very small repos. Note that most atproto repos are very small. 15 16**zstd 22**: significantly better compression for very large (>100MiB) STAR-lite files, but very slow, possibly useful in archival scenarios. 17 18**dictionaries**: **not recommended**. zstd has dictionary training, but it performed poorly on our large sample of real atproto repositories: it mainly benefits very-small files, and while 74% of our sample is under 100KiB (in CAR format), those files combined take up less than 3% of the total disk space. Using dictionaries actually **compresses worse** for repos over 1MiB, which represents 84% of the input sample when weighted by disk usage. 19 20**gzip**: a fine option for web servers compressing at the transport layer and maximally compatible: compressess to within about 10% of zstd 3. 21 22 23## Results 24 25Measured on a snapshot 490,328 of repos from the Bluesky Morel PDS from November 2025. 26 27Ratios are STAR/CAR (lower is better). "raw" baseline = uncompressed CAR; "coder" baseline = CAR compressed with the same coder. 28 29### overall 30 31N=490328, raw CAR=229.57 GiB, raw STAR=171.51 GiB. 32 33| setting | mean (raw) | med (raw) | wt (raw) | mean (coder) | med (coder) | wt (coder) | 34|---|---:|---:|---:|---:|---:|---:| 35| raw | 0.668 | 0.678 | 0.747 | — | — | — | 36| gzip | 0.293 | 0.232 | 0.214 | 0.568 | 0.556 | 0.552 | 37| zstd --fast 1 | 0.334 | 0.295 | 0.285 | 0.633 | 0.636 | 0.671 | 38| zstd 3 | 0.281 | 0.223 | 0.194 | 0.567 | 0.551 | 0.554 | 39| zstd 9 | 0.276 | 0.218 | 0.183 | 0.563 | 0.544 | 0.543 | 40| zstd 9 +long | 0.276 | 0.218 | 0.177 | 0.563 | 0.544 | 0.525 | 41| zstd 22 (N=29434) | 0.271 | 0.213 | 0.159 | 0.557 | 0.536 | 0.492 | 42| zstd 9 +dict (N=147100) | 0.225 | 0.203 | 0.196 | 0.549 | 0.539 | 0.561 | 43 44### < 100 KiB 45 46N=362162, raw CAR=5.84 GiB, raw STAR=4.05 GiB. 47 48| setting | mean (raw) | med (raw) | wt (raw) | mean (coder) | med (coder) | wt (coder) | 49|---|---:|---:|---:|---:|---:|---:| 50| raw | 0.645 | 0.647 | 0.693 | — | — | — | 51| gzip | 0.322 | 0.263 | 0.215 | 0.580 | 0.571 | 0.517 | 52| zstd --fast 1 | 0.355 | 0.316 | 0.270 | 0.631 | 0.633 | 0.605 | 53| zstd 3 | 0.313 | 0.259 | 0.210 | 0.578 | 0.566 | 0.523 | 54| zstd 9 | 0.309 | 0.255 | 0.204 | 0.576 | 0.564 | 0.517 | 55| zstd 9 +long | 0.309 | 0.256 | 0.204 | 0.577 | 0.564 | 0.517 | 56| zstd 22 (N=21666) | 0.306 | 0.252 | 0.199 | 0.576 | 0.565 | 0.511 | 57| zstd 9 +dict (N=108819) | 0.238 | 0.212 | 0.190 | 0.553 | 0.541 | 0.512 | 58 59### 100 KiB – 1 MiB 60 61N=92499, raw CAR=29.91 GiB, raw STAR=21.95 GiB. 62 63| setting | mean (raw) | med (raw) | wt (raw) | mean (coder) | med (coder) | wt (coder) | 64|---|---:|---:|---:|---:|---:|---:| 65| raw | 0.728 | 0.726 | 0.734 | — | — | — | 66| gzip | 0.208 | 0.210 | 0.210 | 0.527 | 0.529 | 0.531 | 67| zstd --fast 1 | 0.266 | 0.271 | 0.272 | 0.622 | 0.628 | 0.635 | 68| zstd 3 | 0.193 | 0.194 | 0.192 | 0.530 | 0.528 | 0.532 | 69| zstd 9 | 0.184 | 0.185 | 0.183 | 0.519 | 0.517 | 0.521 | 70| zstd 9 +long | 0.185 | 0.186 | 0.184 | 0.520 | 0.518 | 0.523 | 71| zstd 22 (N=5664) | 0.176 | 0.177 | 0.174 | 0.502 | 0.499 | 0.502 | 72| zstd 9 +dict (N=27643) | 0.185 | 0.188 | 0.186 | 0.525 | 0.525 | 0.527 | 73 74### 1 MiB – 10 MiB 75 76N=31177, raw CAR=90.81 GiB, raw STAR=68.12 GiB. 77 78| setting | mean (raw) | med (raw) | wt (raw) | mean (coder) | med (coder) | wt (coder) | 79|---|---:|---:|---:|---:|---:|---:| 80| raw | 0.749 | 0.744 | 0.750 | — | — | — | 81| gzip | 0.216 | 0.217 | 0.216 | 0.554 | 0.551 | 0.555 | 82| zstd --fast 1 | 0.288 | 0.290 | 0.289 | 0.673 | 0.670 | 0.675 | 83| zstd 3 | 0.191 | 0.193 | 0.193 | 0.544 | 0.538 | 0.551 | 84| zstd 9 | 0.179 | 0.182 | 0.179 | 0.530 | 0.527 | 0.531 | 85| zstd 9 +long | 0.180 | 0.182 | 0.179 | 0.531 | 0.527 | 0.530 | 86| zstd 22 (N=1828) | 0.169 | 0.173 | 0.168 | 0.507 | 0.505 | 0.507 | 87| zstd 9 +dict (N=9334) | 0.199 | 0.202 | 0.201 | 0.568 | 0.564 | 0.573 | 88 89### 10 MiB – 100 MiB 90 91N=4447, raw CAR=95.83 GiB, raw STAR=71.94 GiB. 92 93| setting | mean (raw) | med (raw) | wt (raw) | mean (coder) | med (coder) | wt (coder) | 94|---|---:|---:|---:|---:|---:|---:| 95| raw | 0.752 | 0.748 | 0.751 | — | — | — | 96| gzip | 0.216 | 0.218 | 0.215 | 0.562 | 0.558 | 0.559 | 97| zstd --fast 1 | 0.290 | 0.293 | 0.289 | 0.687 | 0.680 | 0.684 | 98| zstd 3 | 0.198 | 0.200 | 0.197 | 0.572 | 0.568 | 0.567 | 99| zstd 9 | 0.187 | 0.189 | 0.187 | 0.566 | 0.560 | 0.565 | 100| zstd 9 +long | 0.174 | 0.177 | 0.173 | 0.526 | 0.526 | 0.525 | 101| zstd 22 (N=273) | 0.162 | 0.164 | 0.160 | 0.497 | 0.500 | 0.494 | 102| zstd 9 +dict (N=1288) | 0.201 | 0.202 | 0.200 | 0.574 | 0.571 | 0.569 | 103 104### ≥ 100 MiB 105 106N=43, raw CAR=7.17 GiB, raw STAR=5.45 GiB. 107 108| setting | mean (raw) | med (raw) | wt (raw) | mean (coder) | med (coder) | wt (coder) | 109|---|---:|---:|---:|---:|---:|---:| 110| raw | 0.761 | 0.753 | 0.760 | — | — | — | 111| gzip | 0.200 | 0.211 | 0.186 | 0.554 | 0.550 | 0.533 | 112| zstd --fast 1 | 0.265 | 0.285 | 0.247 | 0.674 | 0.669 | 0.655 | 113| zstd 3 | 0.182 | 0.195 | 0.172 | 0.557 | 0.555 | 0.537 | 114| zstd 9 | 0.172 | 0.185 | 0.162 | 0.557 | 0.563 | 0.535 | 115| zstd 9 +long | 0.156 | 0.166 | 0.149 | 0.510 | 0.510 | 0.496 | 116| zstd 22 (N=3) | 0.096 | 0.108 | 0.081 | 0.409 | 0.417 | 0.349 | 117| zstd 9 +dict (N=16) | 0.175 | 0.194 | 0.148 | 0.534 | 0.559 | 0.500 |