docs: add correctness parity analysis and gap decomposition

+37 -1

1 changed file

expand all

README.md

+37 -1

README.md

··· 72 72 | go (indigo) | `evt.Deserialize` → typed `RepoCommit` via code-gen CBOR → `car.NewBlockReader` (+ SHA-256 verify) → `cbornode.DecodeInto` per block | 73 73 | python | `Frame.from_bytes` + `parse_subscribe_repos_message` → `CAR.from_bytes` (libipld decodes all blocks internally) | 74 74 75 + ## correctness parity 76 + 77 + we traced the full decode path of every SDK to verify that no SDK is winning by skipping correctness work. 78 + 79 + **what both zat and indigo do per frame:** 80 + - decode full CBOR payload (all commit fields — repo, rev, ops, timestamp, etc.) 81 + - parse CAR header and all block sections 82 + - parse CID structure (version, codec, multihash) for each block 83 + - SHA-256 hash each block and compare against CID digest 84 + - decode every block as DAG-CBOR 85 + 86 + **what neither side does:** 87 + - DAG-CBOR deterministic encoding validation (sorted keys, minimal integers) — indigo's refmt doesn't check this either 88 + - signature verification — separate from decode, not measured here 89 + - MST validation — separate from decode, not measured here 90 + 91 + **the only asymmetry:** indigo enforces size limits on CBOR map lengths and a 2MB cap on the blocks field. these are integer comparisons — effectively free. 92 + 93 + the ~20x gap between zat and indigo (both with CID verification) is entirely implementation cost, not correctness differences. 94 + 95 + ## where the ~20x comes from 96 + 97 + we traced indigo's decode path at the instruction level. the cost compounds from several architectural differences: 98 + 99 + | factor | indigo | zat | approx cost | 100 + |--------|--------|-----|-------------| 101 + | CBOR decode | refmt: token pump → reflection → `reflect.SetMapIndex` per entry | hand-written, direct dispatch | ~3-4x | 102 + | string/byte handling | Go `string` heap allocation per value (repo, rev, path, action, per-block keys) | zero-copy slices into input buffer | ~2-3x | 103 + | memory management | per-object GC'd heap allocation; every map, array, int is boxed | arena allocator, 24-byte `Value` union | ~2-3x | 104 + | CAR block reads | `make([]byte, section_len)` + copy per block; CID parsed twice (once to read, once to verify) | reads directly from input slice; CID parsed once | ~1.5x | 105 + | blocks field | `make([]uint8, len)` + `io.ReadFull` copies entire CAR payload | slices into input buffer | ~1.2x | 106 + 107 + these factors multiply. refmt's reflection overhead × per-value heap allocation × GC pressure × byte copying = ~20x on this workload. 108 + 109 + note: indigo's `cbor-gen` (code-generated unmarshal for the commit struct) is fast — the bottleneck is `cbornode.DecodeInto` (refmt/reflection) for the per-block DAG-CBOR decode, which runs ~10 times per frame. 110 + 75 111 ## fairness notes 76 112 77 113 - **CID verification**: only zat and indigo verify block hashes. this is ~2x overhead for zat (311k vs 630k fps). the decode-only table exists for architectural comparison, but the production-correct table is the one that matters for real-world use 78 114 - **zig** and **rust (raw)** both use arena allocation + zero-copy string/byte decoding. the "alloc per frame" variants are the fair cross-language comparison; "arena reuse" shows the production pattern 79 115 - **rust (jacquard)** is the real AT Protocol SDK that rust developers use. it pays for serde-based owned deserialization (`String`, `BTreeMap`), async CAR parsing (tokio poll/wake per block via iroh-car), and per-object heap allocation 80 116 - **go (raw)** uses fxamacker/cbor (no reflection for known struct types), a hand-rolled sync CAR parser (no CID hash verification), and no indigo dependency. GC pressure remains the fundamental constraint — Go's experimental arena package (`GOEXPERIMENT=arenas`) is on hold and not recommended for production 81 - - **go (indigo)** — bluesky's own production relay — uses code-generated CBOR unmarshal (no reflection at the frame level) but pays for go-car's per-block CID hash verification and cbornode's reflection-based DAG-CBOR decode 117 + - **go (indigo)** — bluesky's own production relay — uses code-generated CBOR unmarshal (no reflection at the frame level) but pays for go-car's per-block CID hash verification and cbornode's reflection-based DAG-CBOR decode via the unmaintained refmt library 82 118 - **python** is faster than jacquard despite being "Python" — its hot path is `libipld` (Rust via PyO3), which does the entire CAR parse + per-block DAG-CBOR decode in one synchronous C-extension call 83 119 - **error handling**: all SDKs use infallible decode functions that never abort on failure — errors are counted and the frame is skipped 84 120 - **capture coupling**: the corpus capture tool uses zat's CBOR decoder for the commit-with-ops header peek. this is standard CBOR parsing (not zat's typed firehose decoder), but it does mean frames that zat's CBOR decoder rejects won't appear in the corpus

Configure Feed

Configure Feed