commits

bundles four changes from the 2026-04-09 external review (relay
docs/zlay-external-review-2026-04-09.md). all four target the two
unresolved problems from the 2026-04-08 incident: ~99% host_authority
pool rejection and HTTP probe starvation during cold-start spawn.

1. preload effective_account_count in listActiveHostsImpl (item 1)

spawnWorker was doing a per-host blocking DbRequest for
getEffectiveAccountCount(host_id) on every host during cold-start
— ~2,770 round-trips through the DbRequestQueue, each yielding the
spawn fiber. fold the COUNT/JOIN/GROUP BY into the existing batch
query so the value is preloaded into Host.effective_account_count.
addHost (one-off requestCrawl path) keeps the inline fetch since
it's not in the cold-start hot path.

2. resolver slot recovery (item 2)

resolveHostAuthority used to retry resolve() on the SAME pool slot
after a failure. if a slot's std.http.Client gets into any kind of
bad state, the retry is wasted and the slot stays bad forever. on
first-attempt failure, deinit + re-init the slot via the new
recycleHostResolver helper before retrying. directly tests the
leading "poisoned slot" hypothesis without making any zig stdlib
claims. dormant in production while keep_alive=false (no persistent
connections to corrupt) but ready for the next canary.

3. pool/loop metrics (item 4)

six new counters/gauges in broadcaster.zig Stats:
host_resolver_acquire_wait_us_total — pool contention timing
host_resolver_in_use — current slot count held
host_resolver_resets_total — slot recovery firings
host_resolver_resolve_fail_total — first-attempt failures
resolve_loop_resolve_ok_total — background loop ok
resolve_loop_resolve_fail_total — background loop fail

the resolve_loop counters reveal something we've been operating
blind on: the background signing-key resolveLoop has been
log.debug+continue on errors, never measured. when these first
ship, whatever number resolve_loop_resolve_fail shows is the
baseline, NOT a regression — that's the whole point.

4. configurable host_resolver_pool_size (item 5)

was const = 4. heap-allocate from start() based on env var
HOST_RESOLVER_POOL_SIZE (default 4, max 64). with keep_alive=false,
pool width is a real startup throughput knob — bumping it lets more
is_new checks run concurrently during reconnect storms. tune from
ops based on the new acquire_wait_us metric.

5. zat dep bumped to v0.3.0-alpha.23 — surfaces the underlying
std.http.Client.fetch error kind through resolver.resolve, so the
existing sampleLogReject("resolve", did, @errorName(err), ...) call
in resolveHostAuthority will print the actual transport error
(UnknownHostName, ConnectionRefused, TlsAlert, etc.) instead of
always DidResolutionFailed.

not in this batch (per reviewer's "do not yet" list):
- spawn batch loop tuning — slim per-host work first, re-measure
- re-enabling keep_alive=true globally — canary first, after this
metrics shipment lets us see what the broken path returns
- splitting liveness onto a dedicated thread — see if probe flap
survives the slimmed startup fiber first

3w ago

zzstoatzz

bbba92c8

fix build: drop unused err1 capture in resolveHostAuthority

3w ago

zzstoatzz

584571aa

disable keep_alive on host authority resolver pool + log resolve errors

3w ago

zzstoatzz

ee4e3682

bump per-consumer buffer 8192→65536 + host_authority reject breakdown

3w ago

zzstoatzz

31825b25

subscriber: extract prepareFrameWork + add UAF regression test

3w ago

zzstoatzz

1eec3241

fix UAF: dupe FrameWork.hostname per submit instead of borrowing

3w ago

zzstoatzz +1

168d9f18

bump websocket.zig + zat: fix requestCrawl POST hang

3w ago

zzstoatzz +1

fbdffbe3

mark DB success on did_cache hits

3w ago

zzstoatzz +1

3dc21b93

fix gcLoop: silently exited after one tick

3w ago

zzstoatzz +1

e5f415fc

update README, CLAUDE.md, Dockerfile for current state

3w ago

zzstoatzz +1

b91382bb

fix fiber GPF writeup: weeks → days, tangled.sh → tangled.org

3w ago

zzstoatzz +1

60bdf4aa

add fiber GPF issue writeup

3w ago

zzstoatzz +1

e64d903e

add standalone fiber GPF repro with root cause analysis

3w ago

zzstoatzz +1

4748b759

fix Dockerfile: Evented requires ReleaseFast, not ReleaseSafe

3w ago

zzstoatzz +1

02434de6

re-enable Evented backend: production SIGSEGV was websocket bug, not fibers

3w ago

zzstoatzz +1

e7b62386

bump zat + websocket.zig: fix handshake panic on TCP split mid-CRLF

3w ago

zzstoatzz +1

42e10193

revert to Io.Threaded: shelve Evented backend until upstream stabilizes

3w ago

zzstoatzz +1

b8ef1485

ensure tooBig: false on outgoing commit frames

3w ago

zzstoatzz +1

c3bc3be0

replace ev_db with DbRequestQueue: route all DB from Evented through pool_io

3w ago

zzstoatzz +1

e9c7b961

make ensureEvDb() fallible: recover from transient init failures

3w ago

zzstoatzz +1

92a15ba4

lazy-init Evented pg.Pool: defer creation until first fiber runs

3w ago

zzstoatzz +1

949e9a7f

fix all cross-Io heap corruption: dual pg.Pool + Threaded service paths

3w ago

zzstoatzz +1

c8efaa16

narrow persist_order to cover only dp.persist()

3w ago

zzstoatzz +1

fe8a08cd

add broadcast queue push-lock spin counter

3w ago

zzstoatzz +1

1d2e18d1

add pipeline contention metrics + zero-consumer fast path

3w ago

zzstoatzz +1

cde39569

fix CPU regression: coalesce cursor flushes instead of FIFO queuing

3w ago

zzstoatzz +1

ed90a1b2

tidy repo root: move notes to docs/, repro to scripts/

3w ago

zzstoatzz +1

3533416f

fix cross-Io heap corruption: subscriber pg.Pool access from Evented fibers

3w ago

zzstoatzz +1

fadf9b56

add docs/stdlib-patches.md: catalog of zig stdlib patches and workarounds for Evented backend

3w ago

zzstoatzz

a5e6e8b0

update NOTES.md: document crashes 6-8, stdlib patches, cross-Io rule

3w ago

zzstoatzz +1

b4334031

fix GC thread teardown race: join instead of detach, mark DB success from gc()

3w ago

zzstoatzz +1

2156d080

fix cross-Io heap corruption: GC loop and health checks were using Threaded pg.Pool from Evented fibers

3w ago

zzstoatzz +1

72ba680f

fix use-after-free: broadcast() was destroying consumers still referenced by Handler

3w ago

zzstoatzz +1

439c678a

fix startup deadlock: run resyncer on plain thread, not Evented fiber

3w ago

zzstoatzz +1

66748127

fix SIGSEGV: plain threads calling Evented Io.Mutex → Uring NULL deref

3w ago

zzstoatzz +1

f9bf5156

throttle startup connection ramp to prevent event loop starvation

3w ago

zzstoatzz +1

edbd7521

fix netListenIp: sync syscalls for bind/listen (kernel <6.11)

3w ago

zzstoatzz +1

6c83f114

Io.Evented backend via patched Uring.zig networking

3w ago

zzstoatzz +1

4a236716

fix evented GPF: ReleaseSafe optimizer bug, build with ReleaseFast

3w ago

zzstoatzz +1

39134d1d

evented backend with single ordered publication path

3w ago

zzstoatzz +1

16872782

update NOTES.md: document crash 4 fix and httpFallback dispatch

3w ago

zzstoatzz +1

6d6c832c

fix pingLoop use-after-free: respect cancellation, check isClosed

3w ago

zzstoatzz +1

59ff078f

bump zat to v0.3.0-alpha.10 + websocket to 4222f98

3w ago

zzstoatzz +1

e10b1eb4

add NOTES.md: 0.16 migration status, crashes, known issues

3w ago

zzstoatzz +1

0f11cfcb

bump zat to alpha.9 + websocket.zig to 0261b7d (client write lock)

3w ago

zzstoatzz +1

e5ed0d1e

bump pg.zig to fix pool multi-waiter panic, make pool size configurable

3w ago

zzstoatzz +1

f996812d

force Io.Threaded backend — fix SIGSEGV on startup

3w ago

zzstoatzz +1

6a38822c

bump zat to v0.3.0-alpha.8 (handle WriteFailed in repo_verifier)

3w ago

revert host retention changes (143605c..e5a2a14) main

39a0886a

zzstoatzz +1

18d

stagger startup connections across 30s window to survive health probes

e5a2a14f

zzstoatzz +1

18d

fix host retention: dormant hosts stop their thread, not just relabel

5d3a7b54

zzstoatzz +1

19d

fix reconcile loop: remove invalid Future.wait(), use sleep loop

a8f6d6aa

zzstoatzz +1

19d

persistent host reconnect: never give up on known hosts

143605c8

zzstoatzz +1

19d

clean up stale Evented-era comments across codebase

80eca785

zzstoatzz +1

fix socket-after-close panic in consumer drop path

4735725c

zzstoatzz +1

switch to Io.Threaded backend, shelve Evented

e6cdf844

zzstoatzz +1

pin zat v0.3.0-alpha.21 to match running production

b6a52a03

zzstoatzz +1

fix UAF: dupe FrameWork.hostname per submit instead of borrowing

12357c06

zzstoatzz

reset tree to b91382b for canary 1

284008fa

zzstoatzz +1

gcLoop: disable malloc_trim, bump interval 10min→1h, add timing

4f3d1d4c

zzstoatzz +1

host_authority: slot recovery + pool metrics + preload account count

795cc415

zzstoatzz

fix build: drop unused err1 capture in resolveHostAuthority

bbba92c8

zzstoatzz

disable keep_alive on host authority resolver pool + log resolve errors

584571aa

zzstoatzz

bump per-consumer buffer 8192→65536 + host_authority reject breakdown

ee4e3682

zzstoatzz

subscriber: extract prepareFrameWork + add UAF regression test

31825b25

zzstoatzz

fix UAF: dupe FrameWork.hostname per submit instead of borrowing

1eec3241

zzstoatzz

bump websocket.zig + zat: fix requestCrawl POST hang

168d9f18

zzstoatzz +1

mark DB success on did_cache hits

fbdffbe3

zzstoatzz +1

fix gcLoop: silently exited after one tick

3dc21b93

zzstoatzz +1

update README, CLAUDE.md, Dockerfile for current state

e5f415fc

zzstoatzz +1

fix fiber GPF writeup: weeks → days, tangled.sh → tangled.org

b91382bb

zzstoatzz +1

add fiber GPF issue writeup

60bdf4aa

zzstoatzz +1

add standalone fiber GPF repro with root cause analysis

e64d903e

zzstoatzz +1

fix Dockerfile: Evented requires ReleaseFast, not ReleaseSafe

4748b759

zzstoatzz +1

re-enable Evented backend: production SIGSEGV was websocket bug, not fibers

02434de6

zzstoatzz +1

bump zat + websocket.zig: fix handshake panic on TCP split mid-CRLF

e7b62386

zzstoatzz +1

revert to Io.Threaded: shelve Evented backend until upstream stabilizes

42e10193

zzstoatzz +1

ensure tooBig: false on outgoing commit frames

b8ef1485

zzstoatzz +1

replace ev_db with DbRequestQueue: route all DB from Evented through pool_io

c3bc3be0

zzstoatzz +1

make ensureEvDb() fallible: recover from transient init failures

e9c7b961

zzstoatzz +1

lazy-init Evented pg.Pool: defer creation until first fiber runs

92a15ba4

zzstoatzz +1

fix all cross-Io heap corruption: dual pg.Pool + Threaded service paths

949e9a7f

zzstoatzz +1

narrow persist_order to cover only dp.persist()

c8efaa16

zzstoatzz +1

add broadcast queue push-lock spin counter

fe8a08cd

zzstoatzz +1

add pipeline contention metrics + zero-consumer fast path

1d2e18d1

zzstoatzz +1

fix CPU regression: coalesce cursor flushes instead of FIFO queuing

cde39569

zzstoatzz +1

tidy repo root: move notes to docs/, repro to scripts/

ed90a1b2

zzstoatzz +1

fix cross-Io heap corruption: subscriber pg.Pool access from Evented fibers

3533416f

zzstoatzz +1

add docs/stdlib-patches.md: catalog of zig stdlib patches and workarounds for Evented backend

fadf9b56

zzstoatzz +1

update NOTES.md: document crashes 6-8, stdlib patches, cross-Io rule

a5e6e8b0

zzstoatzz

fix GC thread teardown race: join instead of detach, mark DB success from gc()

b4334031

zzstoatzz +1

fix cross-Io heap corruption: GC loop and health checks were using Threaded pg.Pool from Evented fibers

2156d080

zzstoatzz +1

fix use-after-free: broadcast() was destroying consumers still referenced by Handler

72ba680f

zzstoatzz +1

fix startup deadlock: run resyncer on plain thread, not Evented fiber

439c678a

zzstoatzz +1

fix SIGSEGV: plain threads calling Evented Io.Mutex → Uring NULL deref

66748127

zzstoatzz +1

throttle startup connection ramp to prevent event loop starvation

f9bf5156

zzstoatzz +1

fix netListenIp: sync syscalls for bind/listen (kernel <6.11)

edbd7521

zzstoatzz +1

Io.Evented backend via patched Uring.zig networking

6c83f114

zzstoatzz +1

fix evented GPF: ReleaseSafe optimizer bug, build with ReleaseFast

4a236716

zzstoatzz +1

evented backend with single ordered publication path

39134d1d

zzstoatzz +1

update NOTES.md: document crash 4 fix and httpFallback dispatch

16872782

zzstoatzz +1

fix pingLoop use-after-free: respect cancellation, check isClosed

6d6c832c

zzstoatzz +1

bump zat to v0.3.0-alpha.10 + websocket to 4222f98

59ff078f

zzstoatzz +1

add NOTES.md: 0.16 migration status, crashes, known issues

e10b1eb4

zzstoatzz +1

bump zat to alpha.9 + websocket.zig to 0261b7d (client write lock)

0f11cfcb

zzstoatzz +1

bump pg.zig to fix pool multi-waiter panic, make pool size configurable

e5ed0d1e

zzstoatzz +1

force Io.Threaded backend — fix SIGSEGV on startup

f996812d

zzstoatzz +1

bump zat to v0.3.0-alpha.8 (handle WriteFailed in repo_verifier)

6a38822c

zzstoatzz +1

Configure Feed

Configure Feed

commits