docs: update devlog with CID verification and go-raw results

+31 -13

1 changed file

expand all

devlog

+31 -13

devlog/002-firehose-and-benchmarks.md

··· 29 29 - **zero-copy everywhere** — CBOR strings and byte strings are slices into the input buffer, not copies. CIDs reference the raw bytes directly. the only allocations are for array/map containers (which go into the arena). 30 30 - **inline map key reading** — CBOR map keys in DAG-CBOR are always text strings, so we inline the key read instead of going through the full `decodeAt` → `Value` union construction per key. 31 31 32 + ### CID hash verification (0.2.1) 33 + 34 + `car.read()` now SHA-256 hashes each block and compares against the digest in the CID. this is the correct behavior for untrusted data from the network — it proves block content wasn't corrupted or tampered with. `readWithOptions(.{ .verify_block_hashes = false })` skips verification for trusted local data. 35 + 36 + of the SDKs we benchmarked, only zat and go's indigo (via go-car) verify CID hashes. rust's iroh-car and python's libipld do not. 37 + 32 38 ### round-robin host rotation (0.1.6) 33 39 34 40 both clients now rotate through multiple hosts on reconnect. the firehose defaults to `bsky.network` plus three `firehose.network` regional endpoints. jetstream defaults to 12+ hosts. backoff resets when switching to a fresh host. 35 41 36 42 ## the benchmarks 37 43 38 - we built [atproto-bench](https://tangled.sh/@zzstoatzz.io/atproto-bench) — a cross-SDK benchmark that captures ~10 seconds of live firehose traffic, then decodes the full corpus with four SDKs. 44 + we built [atproto-bench](https://tangled.sh/@zzstoatzz.io/atproto-bench) — a cross-SDK benchmark that captures ~10 seconds of live firehose traffic, then decodes the full corpus with each SDK. 39 45 40 46 every SDK does the same work per frame: decode CBOR header → decode CBOR payload → parse CAR → decode every CAR block as DAG-CBOR. block counts and error counts are reported per SDK so you can verify parity. per-pass variance (min/median/max) is reported so you can see how stable the numbers are. 41 47 42 48 the corpus is captured with a CBOR header peek (check `t == "#commit"` and `ops` is non-empty) using zat's CBOR decoder. this is standard CBOR parsing — not zat's typed firehose decoder — but it does mean frames that zat's CBOR decoder rejects won't appear in the corpus. 43 49 44 - the original version of these benchmarks had work asymmetry: zat only decoded op-linked blocks (~2.3k per corpus), while rust and go decoded all CAR blocks (~23k). python parsed CAR structure but didn't iterate blocks. the numbers below are from the corrected version where all SDKs decode every block. 45 - 46 - ### results 50 + ### results: production-correct (with CID verification) 47 51 48 52 3,298 frames (16.2 MB), 5 measured passes, macOS arm64 (M3 Max): 49 53 50 54 | SDK | frames/sec (median) | MB/s | blocks/frame | 51 55 |-----|--------:|-----:|-----:| 52 - | zig (zat, arena reuse) | 628,091 | 3,044.8 | 9.98 | 53 - | zig (zat, alloc per frame) | 559,825 | 2,662.0 | 9.98 | 56 + | zig (zat, arena reuse) | 311,428 | 1,482.8 | 9.98 | 57 + | go (indigo) | 15,560 | 75.3 | 9.98 | 58 + 59 + both SDKs: 0 errors. zat is ~20x faster than indigo when both do the full correct work (decode + SHA-256 CID verification per block). 60 + 61 + ### results: decode-only (no CID verification) 62 + 63 + | SDK | frames/sec (median) | MB/s | blocks/frame | 64 + |-----|--------:|-----:|-----:| 65 + | zig (zat, arena reuse) | 630,543 | 3,094.7 | 9.98 | 66 + | zig (zat, alloc per frame) | 525,906 | 2,552.0 | 9.98 | 54 67 | rust (raw, arena reuse) | 244,113 | 1,171.0 | 9.98 | 55 68 | rust (raw, alloc per frame) | 186,962 | 919.4 | 9.98 | 56 69 | rust (jacquard) | 47,881 | 238.9 | 9.98 | 70 + | go (raw, fxamacker/cbor) | 41,398 | 200.7 | 9.98 | 57 71 | python (atproto) | 29,675 | 146.1 | 9.98 | 58 - | go (indigo) | 11,548 | 58.0 | 9.98 | 72 + | go (indigo) | 15,560 | 75.3 | 9.98 | 59 73 60 - all SDKs: 0 errors. run-to-run variance is ~30-40% — compare ratios within a single run, not across runs. 74 + all SDKs: 0 errors. run-to-run variance is ~30-40% — compare ratios within a single run, not across runs. indigo's number is the same in both tables because go-car v1 always verifies. 61 75 62 76 ### why zat is fast 63 77 ··· 79 93 80 94 the difference between these two (~5x) is entirely SDK architecture, not language. the remaining difference between rust (raw) and zig (~2.5x) is language-level: enum layout, arena implementation, codegen. 81 95 82 - ### python 96 + ### how architecture affects go 83 97 84 - python's atproto SDK uses libipld (Rust via PyO3) under the hood, which does the entire CAR parse + per-block DAG-CBOR decode in one synchronous C-extension call. python beats jacquard because libipld avoids async overhead and uses a different (faster) Rust CBOR library internally. 98 + we include two go implementations: 85 99 86 - ### go 100 + **go (raw)** uses fxamacker/cbor (struct-tag-based decode, no reflection for known types), a hand-rolled sync CAR parser that skips CID hash verification, and no indigo dependency. result: ~41k fps — 3.5x faster than indigo. 101 + 102 + **go (indigo)** uses cbor-gen (code-generated, already reflection-free at the frame level) but pays for go-car's per-block SHA-256 CID verification and cbornode's reflection-based DAG-CBOR decode via the unmaintained refmt library. result: ~15k fps. 103 + 104 + the go-raw improvement comes from two things: a faster per-block CBOR library (fxamacker vs refmt) and skipping CID hashing. GC pressure is the fundamental ceiling in Go — every string, byte slice, and decoded value is heap-allocated, and Go has no arena allocator. 87 105 88 - indigo — bluesky's own production relay — is the slowest. go-car is synchronous (no async overhead excuse), and cbor-gen is code-generated (no reflection). the cost is GC pressure: every string, byte slice, and block is a heap allocation that the garbage collector has to sweep. at ~10 blocks/frame, that's a lot of short-lived objects per decode. 106 + ### python 89 107 90 - indigo handles the live firehose fine at ~1k events/sec. but it explains why bluesky runs beefy relay infrastructure. 108 + python's atproto SDK uses libipld (Rust via PyO3) under the hood, which does the entire CAR parse + per-block DAG-CBOR decode in one synchronous C-extension call. python beats jacquard because libipld avoids async overhead and uses a different (faster) Rust CBOR library internally. 91 109 92 110 ### does this matter? 93 111

Configure Feed

Configure Feed