STAR-lite compression results#
STAR-lite strips the Merkle Search Tree (MST) and CAR block CIDs from a repository. The resulting files are only slightly smaller: about 75% of the original, across real atproto repos.
The real savings appear when you compress an archive: despite the modest reduction in total input bytes, STAR-lite files compress around 2x better than CAR files.
Note: keys in CAR files are compressed within MST nodes, but not compressed in STAR-lite files, which slightly reduces the relative size of uncompressed files in CAR's favour. Key compression has minimal benefit after compression, so STAR-lite skips it.
Advice#
zstd 3 is highly effective for STAR-lite files. zstd-9 is slightly better.
zstd 9 --long=27 offers modest benefits for STAR-lite files over 16MiB. For smaller repos its benefit is negligible, and its larger initial allocation makes it significantly slower on very small repos. Note that most atproto repos are very small.
zstd 22: significantly better compression for very large (>100MiB) STAR-lite files, but very slow, possibly useful in archival scenarios.
dictionaries: not recommended. zstd has dictionary training, but it performed poorly on our large sample of real atproto repositories: it mainly benefits very-small files, and while 74% of our sample is under 100KiB (in CAR format), those files combined take up less than 3% of the total disk space. Using dictionaries actually compresses worse for repos over 1MiB, which represents 84% of the input sample when weighted by disk usage.
gzip: a fine option for web servers compressing at the transport layer and maximally compatible: compressess to within about 10% of zstd 3.
Results#
Measured on a snapshot 490,328 of repos from the Bluesky Morel PDS from November 2025.
Ratios are STAR/CAR (lower is better). "raw" baseline = uncompressed CAR; "coder" baseline = CAR compressed with the same coder.
overall#
N=490328, raw CAR=229.57 GiB, raw STAR=171.51 GiB.
| setting | mean (raw) | med (raw) | wt (raw) | mean (coder) | med (coder) | wt (coder) |
|---|---|---|---|---|---|---|
| raw | 0.668 | 0.678 | 0.747 | — | — | — |
| gzip | 0.293 | 0.232 | 0.214 | 0.568 | 0.556 | 0.552 |
| zstd --fast 1 | 0.334 | 0.295 | 0.285 | 0.633 | 0.636 | 0.671 |
| zstd 3 | 0.281 | 0.223 | 0.194 | 0.567 | 0.551 | 0.554 |
| zstd 9 | 0.276 | 0.218 | 0.183 | 0.563 | 0.544 | 0.543 |
| zstd 9 +long | 0.276 | 0.218 | 0.177 | 0.563 | 0.544 | 0.525 |
| zstd 22 (N=29434) | 0.271 | 0.213 | 0.159 | 0.557 | 0.536 | 0.492 |
| zstd 9 +dict (N=147100) | 0.225 | 0.203 | 0.196 | 0.549 | 0.539 | 0.561 |
< 100 KiB#
N=362162, raw CAR=5.84 GiB, raw STAR=4.05 GiB.
| setting | mean (raw) | med (raw) | wt (raw) | mean (coder) | med (coder) | wt (coder) |
|---|---|---|---|---|---|---|
| raw | 0.645 | 0.647 | 0.693 | — | — | — |
| gzip | 0.322 | 0.263 | 0.215 | 0.580 | 0.571 | 0.517 |
| zstd --fast 1 | 0.355 | 0.316 | 0.270 | 0.631 | 0.633 | 0.605 |
| zstd 3 | 0.313 | 0.259 | 0.210 | 0.578 | 0.566 | 0.523 |
| zstd 9 | 0.309 | 0.255 | 0.204 | 0.576 | 0.564 | 0.517 |
| zstd 9 +long | 0.309 | 0.256 | 0.204 | 0.577 | 0.564 | 0.517 |
| zstd 22 (N=21666) | 0.306 | 0.252 | 0.199 | 0.576 | 0.565 | 0.511 |
| zstd 9 +dict (N=108819) | 0.238 | 0.212 | 0.190 | 0.553 | 0.541 | 0.512 |
100 KiB – 1 MiB#
N=92499, raw CAR=29.91 GiB, raw STAR=21.95 GiB.
| setting | mean (raw) | med (raw) | wt (raw) | mean (coder) | med (coder) | wt (coder) |
|---|---|---|---|---|---|---|
| raw | 0.728 | 0.726 | 0.734 | — | — | — |
| gzip | 0.208 | 0.210 | 0.210 | 0.527 | 0.529 | 0.531 |
| zstd --fast 1 | 0.266 | 0.271 | 0.272 | 0.622 | 0.628 | 0.635 |
| zstd 3 | 0.193 | 0.194 | 0.192 | 0.530 | 0.528 | 0.532 |
| zstd 9 | 0.184 | 0.185 | 0.183 | 0.519 | 0.517 | 0.521 |
| zstd 9 +long | 0.185 | 0.186 | 0.184 | 0.520 | 0.518 | 0.523 |
| zstd 22 (N=5664) | 0.176 | 0.177 | 0.174 | 0.502 | 0.499 | 0.502 |
| zstd 9 +dict (N=27643) | 0.185 | 0.188 | 0.186 | 0.525 | 0.525 | 0.527 |
1 MiB – 10 MiB#
N=31177, raw CAR=90.81 GiB, raw STAR=68.12 GiB.
| setting | mean (raw) | med (raw) | wt (raw) | mean (coder) | med (coder) | wt (coder) |
|---|---|---|---|---|---|---|
| raw | 0.749 | 0.744 | 0.750 | — | — | — |
| gzip | 0.216 | 0.217 | 0.216 | 0.554 | 0.551 | 0.555 |
| zstd --fast 1 | 0.288 | 0.290 | 0.289 | 0.673 | 0.670 | 0.675 |
| zstd 3 | 0.191 | 0.193 | 0.193 | 0.544 | 0.538 | 0.551 |
| zstd 9 | 0.179 | 0.182 | 0.179 | 0.530 | 0.527 | 0.531 |
| zstd 9 +long | 0.180 | 0.182 | 0.179 | 0.531 | 0.527 | 0.530 |
| zstd 22 (N=1828) | 0.169 | 0.173 | 0.168 | 0.507 | 0.505 | 0.507 |
| zstd 9 +dict (N=9334) | 0.199 | 0.202 | 0.201 | 0.568 | 0.564 | 0.573 |
10 MiB – 100 MiB#
N=4447, raw CAR=95.83 GiB, raw STAR=71.94 GiB.
| setting | mean (raw) | med (raw) | wt (raw) | mean (coder) | med (coder) | wt (coder) |
|---|---|---|---|---|---|---|
| raw | 0.752 | 0.748 | 0.751 | — | — | — |
| gzip | 0.216 | 0.218 | 0.215 | 0.562 | 0.558 | 0.559 |
| zstd --fast 1 | 0.290 | 0.293 | 0.289 | 0.687 | 0.680 | 0.684 |
| zstd 3 | 0.198 | 0.200 | 0.197 | 0.572 | 0.568 | 0.567 |
| zstd 9 | 0.187 | 0.189 | 0.187 | 0.566 | 0.560 | 0.565 |
| zstd 9 +long | 0.174 | 0.177 | 0.173 | 0.526 | 0.526 | 0.525 |
| zstd 22 (N=273) | 0.162 | 0.164 | 0.160 | 0.497 | 0.500 | 0.494 |
| zstd 9 +dict (N=1288) | 0.201 | 0.202 | 0.200 | 0.574 | 0.571 | 0.569 |
≥ 100 MiB#
N=43, raw CAR=7.17 GiB, raw STAR=5.45 GiB.
| setting | mean (raw) | med (raw) | wt (raw) | mean (coder) | med (coder) | wt (coder) |
|---|---|---|---|---|---|---|
| raw | 0.761 | 0.753 | 0.760 | — | — | — |
| gzip | 0.200 | 0.211 | 0.186 | 0.554 | 0.550 | 0.533 |
| zstd --fast 1 | 0.265 | 0.285 | 0.247 | 0.674 | 0.669 | 0.655 |
| zstd 3 | 0.182 | 0.195 | 0.172 | 0.557 | 0.555 | 0.537 |
| zstd 9 | 0.172 | 0.185 | 0.162 | 0.557 | 0.563 | 0.535 |
| zstd 9 +long | 0.156 | 0.166 | 0.149 | 0.510 | 0.510 | 0.496 |
| zstd 22 (N=3) | 0.096 | 0.108 | 0.081 | 0.409 | 0.417 | 0.349 |
| zstd 9 +dict (N=16) | 0.175 | 0.194 | 0.148 | 0.534 | 0.559 | 0.500 |