the sig-verify saga#

the previous devlogs covered decode throughput — zat's CBOR/CAR pipeline vs Go's indigo. this one is about the other half of a relay's hot path: ECDSA signature verification.

every commit on the AT Protocol network is signed. a relay verifying incoming commits needs to: decode the signed commit, strip the sig field, re-encode as deterministic DAG-CBOR, SHA-256 hash the result, then verify the ECDSA signature against the account's public key. the corpus is ~100% secp256k1 (the network migrated from P-256 early on).

starting point#

zig's stdlib includes std.crypto.ecc.Secp256k1 — projective coordinates, constant-time scalar arithmetic, no endomorphism optimization. first attempt: just call mulPublic and check.

~2,800 verifies/sec. Go's indigo (using decred/dcrd) was doing ~15,000. a 5.4x gap.

k256 — three optimizations#

we built k256, an optimized secp256k1 verifier on top of zig's stdlib scalar and field primitives.

1. endomorphism (GLV decomposition)

secp256k1 has an efficient endomorphism: multiplying the x-coordinate by a constant beta is equivalent to multiplying the point by lambda (mod n). this lets you decompose a scalar multiplication k*P into k1*P + k2*beta(P) where k1 and k2 are half-width (~128-bit) scalars. half the doublings for the dominant operation.

2. 4-way Shamir (interleaved multi-scalar multiplication)

ECDSA verification computes u1*G + u2*Q where G is the generator and Q is the public key. with endomorphism applied to both, that's four half-width scalar multiplications. instead of computing them separately and adding, Shamir's trick interleaves them in a single pass through the bits — one shared doubling chain, adding precomputed combinations at each step.

3. projective comparison

the final check in ECDSA is whether the x-coordinate of the computed point equals r (mod n). naive: convert to affine (expensive field inversion), compare. better: compare in projective coordinates — r * Z == X (mod p). this avoids the inversion entirely, which matters when it's on the hot path of every verify.

important subtlety: zig's stdlib uses projective coordinates (x = X/Z), not Jacobian (x = X/Z²). got this wrong initially by assuming Jacobian — reading the stdlib source (std.crypto.ecc.Secp256k1) caught it.

these three changes brought us to ~9,800 v/s (k256 v0.0.2). Go was still at ~15,000.

5x52-bit field rewrite#

profiling showed field multiplication was the bottleneck. zig's stdlib represents secp256k1 field elements as 10 limbs of 26 bits each. multiplication requires 100 partial products (10×10). libsecp256k1 (Bitcoin Core's C library) uses 5 limbs of 52 bits — only 25 partial products, leveraging u128 multiply-accumulate.

ported the field_5x52_int128_impl.h routines to zig. the representation change cuts multiply cost by ~4x (25 vs 100 products), and the wider limbs mean fewer carries and less bookkeeping.

this got us to ~15,800 v/s — roughly even with Go (k256 v0.0.3).

Fermat scalar inversion#

the remaining gap was in scalar inversion. zig's stdlib uses the divstep algorithm (binary extended Euclidean) — 769 iterations of conditional moves in a serialized dependency chain. constant-time, designed for signing safety where the scalar is secret.

for verification, all inputs are public. Fermat's little theorem gives us s^{-1} = s^{n-2} mod n via an addition chain: 253 squarings + 40 multiplications. ported the chain from secp256k1-voi (generated by addchain v0.4.0).

this brought preparsed-key to ~19,100 v/s (k256 v0.0.4).

adding blacksky's rsky to the bench#

before writing any of this up, we wanted to check: what if someone else is doing this better?

rsky is blacksky's full AT Protocol implementation in Rust — the most complete outside of bluesky's own Go stack. their relay uses RustCrypto's k256 (pure Rust, complete addition formulas, GLV endomorphism) for secp256k1, and ciborium + serde_ipld_dagcbor + rs-car-sync for the decode pipeline.

we added both decode and sig-verify benchmarks using their exact crate stack.

results (single run, M3 Max)#

decode with CID verification (frames/sec)

SDK	frames/sec	MB/s
zig (zat)	290,461	1,409
rust (rsky stack)	38,905	187
go (indigo)	15,074	73

sig-verify — preparsed-key (verifies/sec)

SDK	verifies/sec
rust (rsky stack)	20,748
zig (k256)	19,110
go (dcrd)	17,934

RustCrypto's k256 has a slight edge on pure crypto (~9% over zig). the full pipeline gap is larger because serde_ipld_dagcbor is efficient at the CBOR re-encoding step. on decode, zig is 7.5x faster than rsky and 19x faster than Go.

the overall picture: all three SDKs are competitive on signature verification — it's table stakes. decode throughput is where the architectural differences (zero-copy CBOR, arena allocation, hand-rolled CAR parsing) actually compound.

what's in this release#

k256 v0.0.4: 5×52-bit field, Fermat scalar inversion, zig zen cleanup (Fe26→Fe, AffinePoint26→AffinePoint, removed redundant constants)
zat v0.2.3: changelog backfill for 0.2.1 (CID verification) and 0.2.2 (CAR size limits), this devlog
atproto-bench: rsky stack added for both decode and sig-verify, three-way comparison

Configure Feed