atproto relay implementation in zig zlay.waow.tech
9
fork

Configure Feed

Select the types of activity you want to include in your feed.

docs: slim CLAUDE.md, move details to docs/gotchas.md

CLAUDE.md loads into every conversation — keep it to guardrails only
(23 lines, down from 100). moved pg.zig API patterns, rocksdb-zig
traps, websocket.zig fork details, metrics gotchas, and deploy
operational tips to docs/gotchas.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

+55 -90
+13 -90
CLAUDE.md
··· 1 1 # zlay 2 2 3 - AT Protocol relay in zig 0.15.2. subscribes to every PDS, validates commit 4 - signatures, serves merged firehose to downstream consumers. 5 - 6 - ## architecture 7 - 8 - - **reader thread per PDS** (~2,750) — lightweight: TLS read, header decode, 9 - cursor tracking, rate limiting. submits raw frames to pool. 10 - - **frame pool** (16 workers) — CBOR decode, ECDSA verify, DB persist, broadcast. 11 - - **validator** (4-8 resolver threads) — background DID resolution, signing key cache. 12 - - dependencies: zat (AT proto primitives), websocket.zig, pg.zig, rocksdb-zig. 13 - 14 - ## build 3 + AT Protocol relay in zig 0.15.2. reader thread per PDS + shared frame 4 + processing pool. ReleaseSafe in production. 15 5 16 - ```bash 17 - zig build -Doptimize=ReleaseSafe -Dtarget=x86_64-linux-gnu # production 18 - zig build test # run tests 19 - zig fmt --check . # CI checks this 20 - ``` 6 + ## before pushing 21 7 22 - always run `zig fmt --check .` and `zig build test` before pushing. 8 + - `zig fmt --check .` and `zig build test` 9 + - MUST use `-Dtarget=x86_64-linux-gnu` for production (musl breaks RocksDB) 10 + - ReleaseFast has a known double-free — do not use 23 11 24 12 ## deploy 25 13 26 - deploy configs live at `../@zzstoatzz.io/relay/` (justfile, helm, terraform). 27 - 28 - ```bash 29 - cd ../relay 30 - just zlay-publish-remote ReleaseSafe # build on server, deploy 31 - just zlay-publish-remote # debug build (fallback) 32 - ``` 14 + configs at `../@zzstoatzz.io/relay/` — `just zlay-publish-remote ReleaseSafe` 33 15 34 - **critical rules:** 35 - - MUST use `-Dtarget=x86_64-linux-gnu` — musl produces illegal instructions in RocksDB 36 - - MUST set `KUBECONFIG="$(pwd)/zlay-kubeconfig.yaml"` — default context is docker-desktop 37 - - never change probe paths and image in separate operations 38 - - immutable SHA tags, never `:latest` 39 - - ReleaseFast has a known double-free — do not use 16 + MUST set `KUBECONFIG="$(pwd)/zlay-kubeconfig.yaml"` — default context is docker-desktop. 40 17 41 - ## key files 18 + ## docs 42 19 43 - | file | purpose | 44 - |---|---| 45 - | `src/main.zig` | entry point, thread stacks (8 MiB), signal handling | 46 - | `src/subscriber.zig` | per-PDS reader thread, FrameHandler | 47 - | `src/frame_worker.zig` | pool worker: decode, validate, persist, broadcast | 48 - | `src/thread_pool.zig` | generic ring-buffer thread pool | 49 - | `src/validator.zig` | signature verify, DID resolver, LRU key cache | 50 - | `src/broadcaster.zig` | fan-out to consumers, prometheus metrics | 51 - | `src/event_log.zig` | disk persist, postgres, cursor replay | 52 - | `src/collection_index.zig` | RocksDB (collection, did) index | 53 - | `src/slurper.zig` | multi-host crawl manager | 54 - | `src/api.zig` | HTTP API endpoints (served via httpFallback on WS port) | 55 - 56 - ## current production state 57 - 58 - - running at zlay.waow.tech, ~2,750 PDS hosts 59 - - ReleaseSafe, 8 MiB thread stacks, ~1.1 GiB RSS 60 - - ports: 3000 (WS + HTTP), 3001 (metrics only) 61 - - probes: `/_healthz` (liveness), `/_readyz` (readiness, DB check) 62 - - `relay_build_info{git_sha,optimize}` metric confirms what's running 63 - 64 - ## postgres 65 - 66 - schema: `account`, `account_repo` (rev/cid), `log_file_refs`, `host`, `domain_ban`, `backfill_progress`. 67 - 68 - pg.zig patterns: 69 - - `pool.rowUnsafe()` for single-row queries — `.get(T, col)` returns `T` 70 - - `pool.query()` + `result.nextUnsafe()` for multi-row 71 - - `QueryRow.deinit()` returns `!void` — use `defer row.deinit() catch {}` 72 - - DB-dependent tests skip via `requireDatabaseUrl()` → `error.SkipZigTest` 73 - 74 - ## websocket.zig fork 75 - 76 - fork of karlseguin/websocket.zig with patches: 77 - - `httpFallback`: `_handleHandshake` catches parse errors, routes to `H.httpFallback` 78 - if present — this is how all HTTP API endpoints are served on the WS port 79 - - lenient handshake parser: non-WS headers skip instead of error 80 - - all HTTP + WS on port 3000, metrics-only on 3001 81 - 82 - ## operational notes 83 - 84 - - containers are minimal debian — no curl, wget, or shell utilities. 85 - use busybox pod or `kubectl port-forward` to check metrics 86 - - two separate kubeconfigs: Go relay (`kubeconfig.yaml`) vs zlay (`zlay-kubeconfig.yaml`) 87 - - `$ZLAY_KUBECONFIG` is only set inside `just` — raw shell needs the absolute path 88 - - after `helm upgrade`, must `kubectl set image` back to SHA tag (helm uses `latest` from values) 89 - - `f.readAll()` not `f.reader().readAll()` — `File.reader()` takes a buffer arg in zig 0.15 90 - - mallinfo() overflows at 2 GiB (c_int fields) — smaps metrics are the reliable RSS source 91 - - rocksdb-zig `ColumnFamilyOptions` is a stub — use `extern fn rocksdb_set_options_cf` post-open 92 - 93 - ## zig gotchas relevant here 94 - 95 - - `&.{...}` in loops creates stack-local arrays that alias — heap-allocate instead 96 - - operator precedence: `(byte & 0xf0) == 0` not `byte & 0xf0 == 0` 97 - - `ArrayList` is unmanaged in 0.15 — pass allocator to each method 98 - - rocksdb-zig iterator Data: do NOT call `.deinit()` on entries (SIGABRT) 99 - - rocksdb-zig DB.open: path must be null-terminated 100 - - pg.zig `QueryRow.deinit()` returns `!void` — use `defer row.deinit() catch {}` 20 + - [docs/design.md](docs/design.md) — architecture, threading, memory model 21 + - [docs/deployment.md](docs/deployment.md) — build flags, infra, resource usage 22 + - [docs/gotchas.md](docs/gotchas.md) — zig/pg.zig/rocksdb-zig/deploy traps 23 + - [docs/incident-2026-03-04.md](docs/incident-2026-03-04.md) — ReleaseSafe RSS analysis
+42
docs/gotchas.md
··· 1 + # gotchas 2 + 3 + hard-won knowledge from debugging zlay. check here before guessing. 4 + 5 + ## zig 0.15 6 + 7 + - `&.{...}` in loops creates stack-local arrays that alias across iterations — all references point to same memory. heap-allocate instead. 8 + - operator precedence: `(byte & 0xf0) == 0` not `byte & 0xf0 == 0` 9 + - `ArrayList` is unmanaged — pass allocator to each method 10 + - `f.readAll()` not `f.reader().readAll()` — `File.reader()` takes a buffer arg in zig 0.15 11 + 12 + ## pg.zig 13 + 14 + - `pool.rowUnsafe()` for single-row queries — `.get(T, col)` returns `T` 15 + - `pool.query()` + `result.nextUnsafe()` for multi-row 16 + - `QueryRow.deinit()` returns `!void` — use `defer row.deinit() catch {}` 17 + - DB-dependent tests skip via `requireDatabaseUrl()` → `error.SkipZigTest` 18 + 19 + ## rocksdb-zig 20 + 21 + - iterator `Data` has `.free = rocksdb_free` but points to internal buffers — do NOT call `.deinit()` on iterator entries (SIGABRT). only deinit `Data` from `db.get()` 22 + - `DB.open` passes `dir.ptr` raw to C API — path must be null-terminated (use `realpathAlloc` or zero-init buffer) 23 + - `ColumnFamilyOptions` is a stub — use `extern fn rocksdb_set_options_cf` to set options post-open 24 + 25 + ## websocket.zig fork 26 + 27 + fork of karlseguin/websocket.zig with patches: 28 + - `httpFallback`: `_handleHandshake` catches parse errors, routes to `H.httpFallback` if present — this is how all HTTP API endpoints are served on the WS port 29 + - lenient handshake parser: non-WS headers skip instead of error 30 + 31 + ## metrics 32 + 33 + - mallinfo() overflows at 2 GiB (c_int fields) — smaps metrics are the reliable RSS source 34 + - `relay_build_info{git_sha,optimize}` confirms what binary is running 35 + 36 + ## deploy 37 + 38 + - containers are minimal debian — no curl, wget, or shell utilities. use busybox pod or `kubectl port-forward` 39 + - two separate kubeconfigs: Go relay (`kubeconfig.yaml`) vs zlay (`zlay-kubeconfig.yaml`) 40 + - `$ZLAY_KUBECONFIG` is only set inside `just` — raw shell needs the absolute path 41 + - after `helm upgrade`, must `kubectl set image` back to SHA tag (helm uses `latest` from values) 42 + - never change probe paths and image in separate operations (see [incident-2026-03-04.md](incident-2026-03-04.md))