atproto relay implementation in zig zlay.waow.tech
9
fork

Configure Feed

Select the types of activity you want to include in your feed.

update README, CLAUDE.md, Dockerfile for current state

- README: document Evented backend, cross-Io architecture, DbRequestQueue,
link to devlog 008, fix zat dep link, note ReleaseFast requirement
- CLAUDE.md: Evented not Threaded, ReleaseFast not ReleaseSafe
- Dockerfile: fix build flag to ReleaseFast (comments said ReleaseFast
but flag was still ReleaseSafe)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

+22 -15
+9 -6
CLAUDE.md
··· 1 1 # zlay 2 2 3 - AT Protocol relay in zig 0.16. reader thread per PDS + shared frame 4 - processing pool. ReleaseSafe in production. Io.Threaded backend (Evented 5 - attempt shelved — see docs/evented-attempt.md). 3 + AT Protocol relay in zig 0.16. Io.Evented backend (io_uring fibers, 4 + ~47 OS threads for ~2,800 PDS connections). ReleaseFast in production 5 + (Evented + ReleaseSafe GPFs on startup — upstream zig bug, see 6 + scripts/fiber_gpf_issue.md). 6 7 7 8 ## before pushing 8 9 9 10 - `zig fmt --check .` and `zig build test` 10 11 - MUST use `-Dtarget=x86_64-linux-gnu` for production (musl breaks RocksDB) 11 - - ReleaseFast has a known double-free — do not use 12 + - MUST use `-Doptimize=ReleaseFast` — ReleaseSafe GPFs under Evented 13 + - do NOT use Debug builds in production (2.5 GiB vs 1.5 GiB RSS) 12 14 13 15 ## deploy 14 16 15 - configs at `../@zzstoatzz.io/relay/` — `just zlay publish-remote ReleaseSafe` 17 + configs at `../@zzstoatzz.io/relay/` — `just zlay publish-remote ReleaseFast` 16 18 17 19 KUBECONFIG is set automatically by the zlay module (`zlay/kubeconfig.yaml`). 18 20 ··· 22 24 - [docs/deployment.md](docs/deployment.md) — build flags, infra, resource usage 23 25 - [docs/gotchas.md](docs/gotchas.md) — zig/pg.zig/rocksdb-zig/deploy traps 24 26 - [docs/incident-2026-03-04.md](docs/incident-2026-03-04.md) — ReleaseSafe RSS analysis 25 - - [docs/evented-attempt.md](docs/evented-attempt.md) — Evented backend attempt and why we reverted 27 + - [docs/evented-attempt.md](docs/evented-attempt.md) — Evented backend migration story 28 + - [scripts/fiber_gpf_issue.md](scripts/fiber_gpf_issue.md) — upstream zig bug report
+1 -1
Dockerfile
··· 22 22 # contextSwitch, confirmed 2026-04-05). This is a zig codegen bug, not our code. 23 23 # ReleaseFast avoids the bad optimization path. The previous production SIGSEGV 24 24 # under ReleaseFast was a websocket handshake bug, now fixed (9ac64da). 25 - RUN zig build -Doptimize=ReleaseSafe -Dcpu=baseline -Dtarget=x86_64-linux-gnu 25 + RUN zig build -Doptimize=ReleaseFast -Dcpu=baseline -Dtarget=x86_64-linux-gnu 26 26 27 27 FROM --platform=linux/amd64 debian:bookworm-slim 28 28 RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates && rm -rf /var/lib/apt/lists/*
+12 -8
README.md
··· 8 8 9 9 - **direct PDS crawl** — the bootstrap relay is called once at startup for the host list via `listHosts`, then all data flows directly from each PDS. 10 10 11 + - **Io.Evented backend** — uses zig 0.16's [`std.Io`](https://ziglang.org/documentation/master/std/#std.Io) with the [Evented](https://ziglang.org/documentation/master/std/#std.Io.Evented) backend (io_uring). ~2,800 PDS subscriber fibers run on ~47 OS threads — a 60x reduction from the 0.15 thread-per-PDS model. 12 + 13 + - **cross-Io architecture** — networking runs on Evented fibers, database access runs on Threaded workers. a lock-free [DbRequestQueue](https://tangled.org/zzstoatzz.io/zlay/blob/main/src/db_request_queue.zig) bridges the two using atomic spinlocks — no futex, no cross-Io boundary violations. see [devlog 008](https://zat.dev/#devlog/008-the-io-migration.md) for the full migration story. 14 + 11 15 - **optimistic signature validation** — on signing key cache miss, the frame passes through immediately and the DID is queued for background resolution. all subsequent commits are verified against the cached key. the cache caps at a configurable size and evicts the least recently used entry when full. 12 16 13 17 - **inline collection index** — indexes `(DID, collection)` pairs in the event processing pipeline using RocksDB. serves `listReposByCollection` from the relay process — no sidecar. the index design draws on [fig](https://tangled.org/microcosm.blue)'s work on [lightrail](https://tangled.org/microcosm.blue/lightrail). 14 - 15 - - **reader thread per PDS + frame processing pool** — each PDS gets a lightweight reader thread (cursor tracking, rate limiting, header decode). heavy work (full CBOR decode, validation, DB persist, broadcast) runs on a shared pool of frame workers (configurable, default 16). 16 18 17 19 ## spec compliance 18 20 ··· 24 26 25 27 | dependency | purpose | 26 28 |---|---| 27 - | [zat](https://tangled.org/zzstoatzz.io/zat) | AT Protocol primitives (CBOR, CAR, signatures, DID resolution) | 28 - | [websocket.zig](https://github.com/zzstoatzz/websocket.zig) | WebSocket client/server (fork with HTTP fallback + TCP split fixes) | 29 + | [zat](https://tangled.org/zat.dev/zat) | AT Protocol primitives (CBOR, CAR, signatures, DID resolution) | 30 + | [websocket.zig](https://github.com/zzstoatzz/websocket.zig) | WebSocket client/server (fork with write lock, HTTP fallback, TCP split fix) | 29 31 | [pg.zig](https://github.com/karlseguin/pg.zig) | PostgreSQL driver | 30 32 | [rocksdb-zig](https://github.com/Syndica/rocksdb-zig) | RocksDB bindings | 31 33 32 34 ## build 33 35 34 - requires zig 0.15 and a C/C++ toolchain (for RocksDB). 36 + requires zig 0.16 and a C/C++ toolchain (for RocksDB). 35 37 36 38 ```bash 37 - zig build # build (debug) 38 - zig build test # run tests 39 - zig build -Doptimize=ReleaseSafe # release build (production default) 39 + zig build # build (debug) 40 + zig build test # run tests 41 + zig build -Doptimize=ReleaseFast # release build (production) 40 42 ``` 43 + 44 + note: `ReleaseSafe` GPFs on startup with the Evented backend due to a [zig stdlib bug](https://tangled.org/zzstoatzz.io/zlay/blob/main/scripts/fiber_gpf_issue.md) in `fiber.zig` context switching. use `ReleaseFast` for production. 41 45 42 46 ## configuration 43 47