# architecture ## infrastructure - **Hetzner Cloud CPX41** — 16 vCPU (AMD), 32 GB RAM, 240 GB NVMe, 20 TB bandwidth @ ~$30/mo - **k3s** — single-node kubernetes, installed via cloud-init - **traefik** — ingress controller (ships with k3s) - **cert-manager** — automatic TLS via Let's Encrypt ## workloads ### relay the core service. [`ghcr.io/bluesky-social/indigo`](https://github.com/bluesky-social/indigo/pkgs/container/indigo), deployed via [bjw-s/app-template](https://github.com/bjw-s-labs/helm-charts) with `hostNetwork: true` for lower-overhead networking. connects to every PDS on the network and aggregates their writes into a single firehose stream (`com.atproto.sync.subscribeRepos`). backed by postgresql for state. the relay maintains an in-process identity cache (hashicorp LRU, 5M entries, 24h TTL) — every event requires a DID document lookup, and this cache keeps the relay from hammering PLC. memory usage climbs over the first day as the cache fills, then plateaus once eviction matches insertion. `GOMEMLIMIT=6GiB` is set so the Go runtime returns memory to the OS under pressure rather than holding onto it indefinitely. ### lightrail a sidecar serving `com.atproto.sync.listReposByCollection` — the endpoint TAP crawlers use to enumerate which accounts have records in a given collection. [lightrail](https://tangled.org/microcosm.blue/lightrail) is fig's Rust collection directory, replacing the previous Go-based collectiondir (which had unbounded memory growth). lightrail subscribes to the relay's firehose (`--subscribe https://relay.waow.tech`), indexes `(DID, collection)` pairs in [fjall](https://github.com/fjall-rs/fjall), and detects collection creation/deletion using MST adjacent key proofs from sync 1.1 commit ops — no `describeRepo` calls needed for most events. **backfill:** lightrail handles its own via `--deep-crawl`, discovering hosts from the relay's `listHosts` and crawling each one's `listRepos`. no manual backfill step needed. **admin:** `GET /admin` serves an HTML dashboard; `GET /admin/status` returns JSON. both require HTTP basic auth (password from `LIGHTRAIL_ADMIN_PASSWORD` env var). routed via traefik ingress path matching (`/xrpc/com.atproto.sync.listReposByCollection`) so the relay's existing endpoints are unaffected. ### jetstream [`ghcr.io/bluesky-social/jetstream`](https://github.com/bluesky-social/jetstream) subscribes to the relay's firehose over localhost and re-serves it as JSON websocket events. a lightweight alternative for consumers that don't need CBOR/CAR decoding. ### postgresql relay's backing database, deployed via [bitnami/postgresql](https://github.com/bitnami/charts/tree/main/bitnami/postgresql). stores relay state (PDS host list, cursor positions, etc.). ### monitoring prometheus + grafana via [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack). scrapes relay (`:2471/metrics`), jetstream, and lightrail (`:6789/metrics`). kubelet scraping is enabled for container-level disk I/O metrics. public read-only access at `relay-metrics.waow.tech`. the relay and lightrail ServiceMonitors are standalone manifests (`kubectl apply -f`) rather than inline in the helm values — the `additionalServiceMonitors` field in kube-prometheus-stack silently fails when targeting services in a different namespace. ## PDS connection maintenance relays try to reconnect to PDS hosts when connections drop, but eventually give up after repeated failures (exponential backoff). PDS hosts re-announce themselves to bluesky's relay when they come back online, but not to third-party relays like ours. this causes a natural decay in connected host count over time. fix: a k8s CronJob (`indigo/deploy/reconnect-cronjob.yaml`) runs every 4 hours, fetching the [community PDS list](https://github.com/mary-ext/atproto-scraping) and sending `requestCrawl` for each host. this can also be run manually via `just indigo reconnect`. ## steady-state specs (indigo relay) | metric | value | |--------|-------| | storage (relay data) | ~21 GB | | storage (postgres) | ~2.4 GB | | storage (lightrail fjall) | ~3 GB (~6.8M repos indexed) | | CPU usage | 5-15% | | network throughput | ~600 events/sec typical, 2000 peak | | connected PDS hosts | ~2800 | | memory (relay) | ~6 GiB (plateaus at GOMEMLIMIT) | | memory (lightrail) | ~4 GiB during resync, expected lower at steady state | --- ## zlay (zig relay) a second relay implementation in [Zig](https://ziglang.org/), deployed on a separate Hetzner node. source: [tangled.org/zzstoatzz.io/zlay](https://tangled.org/zzstoatzz.io/zlay). runs at `zlay.waow.tech`. ### how it differs from indigo **same model, different internals.** zlay crawls PDS hosts directly — it is not a fan-out relay. `RELAY_UPSTREAM` (default: `bsky.network`) is a bootstrap seed used once at startup to populate the host list via `listHosts`. after that, all data flows directly from each PDS. **inline collection index.** instead of running collectiondir as a sidecar, zlay indexes `(DID, collection)` pairs directly in its event processing pipeline, inspired by [fig](https://tangled.org/microcosm.blue)'s [lightrail](https://tangled.org/microcosm.blue/lightrail). storage is [RocksDB](https://rocksdb.org/) with two column families (`rbc` for collection→DID lookups, `cbr` for DID→collection cleanup). serves `listReposByCollection` from the relay's HTTP port — no separate service. **optimistic validation.** on a signing key cache miss, zlay passes the frame through immediately and queues the DID for background resolution. first commit from an unknown account is unvalidated; subsequent commits are verified. indigo blocks until resolution completes. **split ports.** 3000 for the WebSocket firehose, 3001 for HTTP (health, stats, metrics, admin, XRPC). indigo serves everything on port 2470 (with metrics on 2471). **fibers, not goroutines.** zig 0.16 `Io.Evented` backend runs ~2,800 subscriber tasks on ~47 OS threads via io_uring fibers. requires ReleaseFast due to a zig stdlib GPF in fiber context switching under ReleaseSafe (tracked via `scripts/repro_evented.zig`). predictable memory (no GC). ### deployment separate Hetzner cpx41 in Hillsboro OR (`hil`), independent k3s cluster. terraform in `zlay/infra/`. ```bash just zlay init # terraform init just zlay infra # create server just zlay kubeconfig # pull kubeconfig just zlay deploy # full deploy (cert-manager, postgres, relay, monitoring) just zlay publish-remote # build and push image just zlay status # check pods + health just zlay logs # tail logs ``` ### collection index backfill the collection index is live-only — it indexes `create` ops as they flow through the firehose. historical data is backfilled by importing from a source relay (bsky.network) via `com.atproto.sync.listReposByCollection`. the backfiller discovers collections from two sources (lexicon garden llms.txt + RocksDB scan), then pages through each collection on the source relay, adding DIDs to RocksDB. progress is tracked in postgres for crash-resumability. triggered via `POST /admin/backfill-collections`, status via `GET`. see the [zlay backfill docs](https://tangled.org/zzstoatzz.io/zlay/tree/main/docs/backfill.md) for full details, or use `scripts/backfill-status` in this repo. ### verification `scripts/zlay-smoketest` tests endpoint conformance, pagination, and set completeness against a reference relay. `scripts/collectiondir-diff` compares `listReposByCollection` results between any two endpoints (use `--limit` values ≤ 1000 for zlay). [pulsar](https://tangled.org/mackuba.eu/pulsar) (by @mackuba.eu) provides live firehose coverage comparison — subscribes to multiple relays simultaneously and counts unique DIDs over a time window. ### steady-state specs (zlay) | metric | value | |--------|-------| | connected PDS hosts | ~2,830 | | OS threads | ~47 (Evented backend, io_uring fibers) | | collection index DIDs | ~30.4M (backfill 1,017/1,287 collections) | | memory (steady state) | ~1.2 GiB (zig 0.16, Evented/ReleaseFast) | | memory limit | 10 GiB | | PVC | 20 GiB | | `listReposByCollection` max limit | 1000 |