this repo has no description
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

docs: update benchmarks with fresh run and k256 v0.0.2 results

sig-verify: zig 9,845 v/s (k256 v0.0.2 Fe26), go 15,128 v/s
decode: zig 235k fps (verified), go 15.6k fps

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zzstoatzz 49126ae1 59500a6f

+20 -20
+20 -20
README.md
··· 24 24 25 25 | SDK | frames/sec (median) | MB/s | blocks/frame | errors | 26 26 |-----|--------:|-----:|-----:|-----:| 27 - | zig ([zat](https://tangled.sh/@zzstoatzz.io/zat), arena reuse) | 311,428 | 1,482.8 | 9.98 | 0 | 28 - | go ([indigo](https://github.com/bluesky-social/indigo)) | 15,560 | 75.3 | 9.98 | 0 | 27 + | zig ([zat](https://tangled.sh/@zzstoatzz.io/zat), arena reuse) | 235,049 | 1,133.6 | 9.98 | 0 | 28 + | go ([indigo](https://github.com/bluesky-social/indigo)) | 15,587 | 75.6 | 9.98 | 0 | 29 29 30 30 ### decode-only (no CID hash verification) 31 31 ··· 33 33 34 34 | SDK | frames/sec (median) | MB/s | blocks/frame | errors | 35 35 |-----|--------:|-----:|-----:|-----:| 36 - | zig (zat, arena reuse) | 630,543 | 3,094.7 | 9.98 | 0 | 37 - | zig (zat, alloc per frame) | 525,906 | 2,552.0 | 9.98 | 0 | 38 - | rust (raw, arena reuse) | 244,113 | 1,171.0 | 9.98 | 0 | 39 - | rust (raw, alloc per frame) | 186,962 | 919.4 | 9.98 | 0 | 40 - | rust ([jacquard](https://github.com/rsform/jacquard)) | 47,881 | 238.9 | 9.98 | 0 | 41 - | go (raw, fxamacker/cbor) | 41,398 | 200.7 | 9.98 | 0 | 42 - | python ([atproto](https://github.com/MarshalX/atproto)) | 29,675 | 146.1 | 9.98 | 0 | 43 - | go (indigo) | 15,560 | 75.3 | 9.98 | 0 | 36 + | zig (zat, arena reuse) | 529,424 | 2,638.0 | 9.98 | 0 | 37 + | zig (zat, alloc per frame) | 521,925 | 2,529.7 | 9.98 | 0 | 38 + | rust (raw, arena reuse) | 226,146 | 1,097.4 | 9.98 | 0 | 39 + | rust (raw, alloc per frame) | 200,763 | 1,012.9 | 9.98 | 0 | 40 + | rust ([jacquard](https://github.com/rsform/jacquard)) | 56,523 | 275.8 | 9.98 | 0 | 41 + | go (raw, fxamacker/cbor) | 40,137 | 187.0 | 9.98 | 0 | 42 + | python ([atproto](https://github.com/MarshalX/atproto)) | 33,842 | 163.0 | 9.98 | 0 | 43 + | go (indigo) | 15,587 | 75.6 | 9.98 | 0 | 44 44 45 45 note: indigo appears in both tables. its number is the same because it always verifies — there is no option to disable it in go-car v1. 46 46 ··· 91 91 - signature verification — separate from decode, not measured here 92 92 - MST validation — separate from decode, not measured here 93 93 94 - there are no correctness differences between the two decode paths. the ~20x gap is entirely implementation cost. 94 + there are no correctness differences between the two decode paths. the ~15x gap is entirely implementation cost. 95 95 96 - ## where the ~20x comes from 96 + ## where the ~15x comes from 97 97 98 98 we traced indigo's decode path at the instruction level. the cost compounds from several architectural differences: 99 99 ··· 105 105 | CAR block reads | `make([]byte, section_len)` + copy per block; CID parsed twice (once to read, once to verify) | reads directly from input slice; CID parsed once | ~1.5x | 106 106 | blocks field | `make([]uint8, len)` + `io.ReadFull` copies entire CAR payload | slices into input buffer | ~1.2x | 107 107 108 - these factors multiply. refmt's reflection overhead × per-value heap allocation × GC pressure × byte copying = ~20x on this workload. 108 + these factors multiply. refmt's reflection overhead × per-value heap allocation × GC pressure × byte copying = ~15x on this workload. 109 109 110 110 note: indigo's `cbor-gen` (code-generated unmarshal for the commit struct) is fast — the bottleneck is `cbornode.DecodeInto` (refmt/reflection) for the per-block DAG-CBOR decode, which runs ~10 times per frame. 111 111 112 112 ## fairness notes 113 113 114 - - **CID verification**: only zat and indigo verify block hashes. this is ~2x overhead for zat (311k vs 630k fps). the decode-only table exists for architectural comparison, but the production-correct table is the one that matters for real-world use 114 + - **CID verification**: only zat and indigo verify block hashes. this is ~2x overhead for zat (235k vs 529k fps). the decode-only table exists for architectural comparison, but the production-correct table is the one that matters for real-world use 115 115 - **zig** and **rust (raw)** both use arena allocation + zero-copy string/byte decoding. the "alloc per frame" variants are the fair cross-language comparison; "arena reuse" shows the production pattern 116 116 - **rust (jacquard)** is the real AT Protocol SDK that rust developers use. it pays for serde-based owned deserialization (`String`, `BTreeMap`), async CAR parsing (tokio poll/wake per block via iroh-car), and per-object heap allocation 117 117 - **go (raw)** uses fxamacker/cbor (no reflection for known struct types), a hand-rolled sync CAR parser (no CID hash verification), and no indigo dependency. GC pressure remains the fundamental constraint — Go's experimental arena package (`GOEXPERIMENT=arenas`) is on hold and not recommended for production ··· 146 146 147 147 | SDK | variant | verifies/sec (median) | entries | P-256 | secp256k1 | errors | 148 148 |-----|---------|--------:|-----:|-----:|-----:|-----:| 149 - | go ([indigo](https://github.com/bluesky-social/indigo)) | full pipeline | 15,109 | 3,072 | 0 | 3,072 | 0 | 150 - | go (indigo) | crypto-only | 15,012 | 3,072 | 0 | 3,072 | 0 | 151 - | zig ([zat](https://tangled.sh/@zzstoatzz.io/zat) + [k256](https://tangled.sh/@zzstoatzz.io/k256)) | full pipeline | 9,796 | 3,072 | 0 | 3,072 | 0 | 152 - | zig (zat + k256) | crypto-only | 9,716 | 3,072 | 0 | 3,072 | 0 | 149 + | go ([indigo](https://github.com/bluesky-social/indigo)) | full pipeline | 15,128 | 3,072 | 0 | 3,072 | 0 | 150 + | go (indigo) | crypto-only | 14,047 | 3,072 | 0 | 3,072 | 0 | 151 + | zig ([zat](https://tangled.sh/@zzstoatzz.io/zat) + [k256](https://tangled.sh/@zzstoatzz.io/k256)) | full pipeline | 9,845 | 3,072 | 0 | 3,072 | 0 | 152 + | zig (zat + k256) | crypto-only | 9,818 | 3,072 | 0 | 3,072 | 0 | 153 153 154 - Go leads sig verification by ~1.5x. indigo uses [decred/dcrd](https://github.com/decred/dcrd/tree/master/dcrec/secp256k1) — a highly optimized secp256k1 implementation with specialized 10×26-bit field arithmetic. zig uses [k256](https://tangled.sh/@zzstoatzz.io/k256) with GLV endomorphism, precomputed base point tables, and Jacobian point arithmetic, on top of zig stdlib's fiat-crypto field operations. 154 + Go leads sig verification by ~1.5x. indigo uses [decred/dcrd](https://github.com/decred/dcrd/tree/master/dcrec/secp256k1) — a highly optimized secp256k1 implementation with specialized 10×26-bit field arithmetic and NAF point multiplication. k256 v0.0.2 uses the same 10×26-bit field representation, GLV endomorphism, precomputed base point tables, and Jacobian point arithmetic. the remaining gap is primarily stdlib scalar operations (~42% of verify time) and normalize overhead in the field arithmetic. 155 155 156 156 the crypto-only vs full-pipeline numbers being nearly identical confirms ECDSA is the bottleneck, not CBOR re-encoding overhead. 157 157 ··· 197 197 198 198 | lang | SDK | version | CBOR engine | CAR engine | 199 199 |------|-----|---------|-------------|------------| 200 - | zig | [zat](https://tangled.sh/@zzstoatzz.io/zat) | 0.2.2 | hand-rolled | hand-rolled (+ SHA-256 CID verify, size limits) | 200 + | zig | [zat](https://tangled.sh/@zzstoatzz.io/zat) v0.2.2 + [k256](https://tangled.sh/@zzstoatzz.io/k256) v0.0.2 | — | hand-rolled | hand-rolled (+ SHA-256 CID verify, size limits) | 201 201 | rust | raw (minicbor + bumpalo) | — | [minicbor](https://crates.io/crates/minicbor) (zero-copy) | hand-rolled (sync) | 202 202 | rust | [jacquard](https://github.com/rsform/jacquard) | 0.9 | [ciborium](https://crates.io/crates/ciborium) (header) + [serde_ipld_dagcbor](https://crates.io/crates/serde_ipld_dagcbor) (body) | [iroh-car](https://crates.io/crates/iroh-car) (async) | 203 203 | go | raw (fxamacker/cbor) | — | [fxamacker/cbor](https://github.com/fxamacker/cbor) | hand-rolled (sync, no CID verify) |