atproto utils for zig zat.dev
atproto sdk zig
26
fork

Configure Feed

Select the types of activity you want to include in your feed.

at main 83 lines 5.6 kB view raw view rendered
1# the sig-verify saga 2 3the previous devlogs covered decode throughput — zat's CBOR/CAR pipeline vs Go's indigo. this one is about the other half of a relay's hot path: ECDSA signature verification. 4 5every commit on the AT Protocol network is signed. a relay verifying incoming commits needs to: decode the signed commit, strip the `sig` field, re-encode as deterministic DAG-CBOR, SHA-256 hash the result, then verify the ECDSA signature against the account's public key. the corpus is ~100% secp256k1 (the network migrated from P-256 early on). 6 7## starting point 8 9zig's stdlib includes `std.crypto.ecc.Secp256k1` — projective coordinates, constant-time scalar arithmetic, no endomorphism optimization. first attempt: just call `mulPublic` and check. 10 11~2,800 verifies/sec. Go's indigo (using [decred/dcrd](https://github.com/decred/dcrd/tree/master/dcrec/secp256k1)) was doing ~15,000. a 5.4x gap. 12 13## k256 — three optimizations 14 15we built [k256](https://tangled.sh/@zzstoatzz.io/k256), an optimized secp256k1 verifier on top of zig's stdlib scalar and field primitives. 16 17**1. endomorphism (GLV decomposition)** 18 19secp256k1 has an efficient endomorphism: multiplying the x-coordinate by a constant beta is equivalent to multiplying the point by lambda (mod n). this lets you decompose a scalar multiplication `k*P` into `k1*P + k2*beta(P)` where k1 and k2 are half-width (~128-bit) scalars. half the doublings for the dominant operation. 20 21**2. 4-way Shamir (interleaved multi-scalar multiplication)** 22 23ECDSA verification computes `u1*G + u2*Q` where G is the generator and Q is the public key. with endomorphism applied to both, that's four half-width scalar multiplications. instead of computing them separately and adding, Shamir's trick interleaves them in a single pass through the bits — one shared doubling chain, adding precomputed combinations at each step. 24 25**3. projective comparison** 26 27the final check in ECDSA is whether the x-coordinate of the computed point equals `r` (mod n). naive: convert to affine (expensive field inversion), compare. better: compare in projective coordinates — `r * Z == X (mod p)`. this avoids the inversion entirely, which matters when it's on the hot path of every verify. 28 29important subtlety: zig's stdlib uses projective coordinates (x = X/Z), not Jacobian (x = X/Z²). got this wrong initially by assuming Jacobian — reading the stdlib source (`std.crypto.ecc.Secp256k1`) caught it. 30 31these three changes brought us to ~9,800 v/s (k256 v0.0.2). Go was still at ~15,000. 32 33## 5x52-bit field rewrite 34 35profiling showed field multiplication was the bottleneck. zig's stdlib represents secp256k1 field elements as 10 limbs of 26 bits each. multiplication requires 100 partial products (10×10). libsecp256k1 (Bitcoin Core's C library) uses 5 limbs of 52 bits — only 25 partial products, leveraging u128 multiply-accumulate. 36 37ported the `field_5x52_int128_impl.h` routines to zig. the representation change cuts multiply cost by ~4x (25 vs 100 products), and the wider limbs mean fewer carries and less bookkeeping. 38 39this got us to ~15,800 v/s — roughly even with Go (k256 v0.0.3). 40 41## Fermat scalar inversion 42 43the remaining gap was in scalar inversion. zig's stdlib uses the divstep algorithm (binary extended Euclidean) — 769 iterations of conditional moves in a serialized dependency chain. constant-time, designed for signing safety where the scalar is secret. 44 45for verification, all inputs are public. Fermat's little theorem gives us `s^{-1} = s^{n-2} mod n` via an addition chain: 253 squarings + 40 multiplications. ported the chain from [secp256k1-voi](https://gitlab.com/yawning/secp256k1-voi) (generated by [addchain](https://github.com/mmcloughlin/addchain) v0.4.0). 46 47this brought preparsed-key to ~19,100 v/s (k256 v0.0.4). 48 49## adding blacksky's rsky to the bench 50 51before writing any of this up, we wanted to check: what if someone else is doing this better? 52 53[rsky](https://github.com/blacksky-algorithms/rsky) is blacksky's full AT Protocol implementation in Rust — the most complete outside of bluesky's own Go stack. their relay uses RustCrypto's [k256](https://crates.io/crates/k256) (pure Rust, complete addition formulas, GLV endomorphism) for secp256k1, and ciborium + serde_ipld_dagcbor + rs-car-sync for the decode pipeline. 54 55we added both decode and sig-verify benchmarks using their exact crate stack. 56 57### results (single run, M3 Max) 58 59**decode with CID verification (frames/sec)** 60 61| SDK | frames/sec | MB/s | 62|---|---:|---:| 63| zig (zat) | 290,461 | 1,409 | 64| rust (rsky stack) | 38,905 | 187 | 65| go (indigo) | 15,074 | 73 | 66 67**sig-verify — preparsed-key (verifies/sec)** 68 69| SDK | verifies/sec | 70|---|---:| 71| rust (rsky stack) | 20,748 | 72| zig (k256) | 19,110 | 73| go (dcrd) | 17,934 | 74 75RustCrypto's k256 has a slight edge on pure crypto (~9% over zig). the full pipeline gap is larger because serde_ipld_dagcbor is efficient at the CBOR re-encoding step. on decode, zig is 7.5x faster than rsky and 19x faster than Go. 76 77the overall picture: all three SDKs are competitive on signature verification — it's table stakes. decode throughput is where the architectural differences (zero-copy CBOR, arena allocation, hand-rolled CAR parsing) actually compound. 78 79## what's in this release 80 81- **k256 v0.0.4**: 5×52-bit field, Fermat scalar inversion, zig zen cleanup (Fe26→Fe, AffinePoint26→AffinePoint, removed redundant constants) 82- **zat v0.2.3**: changelog backfill for 0.2.1 (CID verification) and 0.2.2 (CAR size limits), this devlog 83- **atproto-bench**: rsky stack added for both decode and sig-verify, three-way comparison