commits

root cause: every call to verify() was doing
`AffinePoint.fromStdlib(public_key.affineCoordinates())`, and stdlib's
Secp256k1.affineCoordinates() unconditionally inverts Z — even when the
point was created from SEC1 (where Z is always 1). that field inversion
was ~12 µs per call, which Tracy instrumentation showed was 19% of the
verify budget. totally wasted work since the result is deterministic
per PublicKey.

fix: PublicKey now caches an AffinePoint at fromSec1 time. one-time
field inversion cost at key construction, zero cost per verify. adds
~80 bytes per PublicKey (2 × Fe = 10 × u64), negligible.

signature change: verify_mod.verify() now takes AffinePoint instead of
Secp256k1. soft-breaking only for direct callers of the low-level
verify function — the public high-level APIs (Signature.verifyMsg,
Signature.verifyPrehashed, PublicKey.fromSec1) are unchanged.

measured (ReleaseFast, M1, 200k iterations × 8 warm runs):
before: mean 18,630 v/s, ~52.7 µs/op
after: mean 23,299 v/s, ~42.9 µs/op
delta: +25% mean, +11.6% worst-case (worst-after vs best-before)

added correctness safety net:
- tests/verify_test.zig: 2 new stress tests that run under standard
`zig build test`:
- "stress: 2000 random verify cases match stdlib exactly" —
randomized (keypair, msg, signature) triples, bit-exact agreement
with stdlib verify required
- "stress: 500 corrupted signature cases match stdlib exactly" —
random bit-flip corruptions, rejection parity with stdlib
these catch regressions in scalar reduction, field arithmetic, table
indexing, sign handling, and edge cases before a benchmark would.
all 42 tests pass pre- and post-change.

added scripts/bench_verify.zig + `zig build bench-verify` target for
reproducible throughput measurement on future optimization work.

3w ago

zzstoatzz +1

9ce3161e

chore: bump version to 0.0.4 v0.0.4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2mo ago

zzstoatzz +1

b9b79ea2

feat: 5×52-bit field, Fermat scalar inversion, type cleanup

rewrite field element from 10×26-bit schoolbook to 5×52-bit
unsaturated limbs (ported from libsecp256k1 field_5x52_int128_impl.h).
25 products per mul vs 100 — roughly 2x field speedup on arm64.

add Fermat scalar inversion s^(n-2) via addition chain (253 sq + 40 mul),
replacing stdlib divstep (769 iterations). ported from secp256k1-voi.

lazy runtime initialization for 32×256 base point table (comptime
interpreter can't handle u128 arithmetic at this scale).

rename Fe26→Fe, AffinePoint26→AffinePoint — names now match the
implementation. remove redundant P1/P2/P3 constants.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2mo ago

zzstoatzz +1

629313e6

feat: direct base table mul + projective-table GLV v0.0.3

- extend G_TABLE from [16][256] to [32][256] for full 256-bit scalar coverage
- u1*G: direct byte-at-a-time lookup, no GLV split, zero doublings
- u2*Q: Jacobian precompute tables, no batchToAffine field inversion

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2mo ago

zzstoatzz +1

24bba1cd

feat: 10×26-bit field arithmetic (Fe26) v0.0.2

Replace stdlib 4×64-bit Montgomery field with direct 10×26-bit
representation for secp256k1. All point arithmetic, batch affine
conversion, endomorphism, and verification now operate in Fe26.

~9% faster than v0.0.1 baseline with safe normalize-on-output strategy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2mo ago

zzstoatzz +1

473ce55d

feat: precomputed base table + jacobian point arithmetic v0.0.1

Separate base-point and public-key multiply paths:
- u1*G via 16x256 comptime byte table (~32 mixed adds, zero doublings)
- u2*Q via 2-way Jacobian Shamir (a=0 dbl 2M+5S, mixed add 7M+4S)

Also set version to 0.0.1 for patch-level releases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2mo ago

zzstoatzz +1

05a207aa

feat: optimized secp256k1 ECDSA verification

3 algorithmic optimizations over zig stdlib, no assembly:

1. endomorphism via 1 field multiply (not ~65 doublings)
2. single 4-way Shamir loop (128 doublings, not 256)
3. projective-space comparison (no field inversion)

3.3x faster than stdlib on 3072-entry atproto corpus.
drop-in API compatible with std.crypto.sign.ecdsa.EcdsaSecp256k1Sha256.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2mo ago

chore: bump version to 0.0.5 v0.0.5 main

9085eb1c

zzstoatzz

cache PublicKey affine form — 22% verify throughput win

4d04dfa0

zzstoatzz

chore: bump version to 0.0.4 v0.0.4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

9ce3161e

zzstoatzz +1

2mo

feat: 5×52-bit field, Fermat scalar inversion, type cleanup

b9b79ea2

zzstoatzz +1

2mo

feat: direct base table mul + projective-table GLV v0.0.3

629313e6

zzstoatzz +1

2mo

feat: 10×26-bit field arithmetic (Fe26) v0.0.2

24bba1cd

zzstoatzz +1

2mo

feat: precomputed base table + jacobian point arithmetic v0.0.1

473ce55d

zzstoatzz +1

2mo

feat: optimized secp256k1 ECDSA verification

05a207aa

zzstoatzz +1

2mo