# architecture

## infrastructure

- **Hetzner Cloud CPX41** — 16 vCPU (AMD), 32 GB RAM, 240 GB NVMe, 20 TB bandwidth @ ~$30/mo
- **k3s** — single-node kubernetes, installed via cloud-init
- **traefik** — ingress controller (ships with k3s)
- **cert-manager** — automatic TLS via Let's Encrypt

## workloads

### relay

the core service. [`ghcr.io/bluesky-social/indigo`](https://github.com/bluesky-social/indigo/pkgs/container/indigo), deployed via [bjw-s/app-template](https://github.com/bjw-s-labs/helm-charts) with `hostNetwork: true` for lower-overhead networking. connects to every PDS on the network and aggregates their writes into a single firehose stream (`com.atproto.sync.subscribeRepos`). backed by postgresql for state.

the relay maintains an in-process identity cache (hashicorp LRU, 5M entries, 24h TTL) — every event requires a DID document lookup, and this cache keeps the relay from hammering PLC. memory usage climbs over the first day as the cache fills, then plateaus once eviction matches insertion. `GOMEMLIMIT=6GiB` is set so the Go runtime returns memory to the OS under pressure rather than holding onto it indefinitely.

### lightrail

a sidecar serving `com.atproto.sync.listReposByCollection` — the endpoint TAP crawlers use to enumerate which accounts have records in a given collection. [lightrail](https://tangled.org/microcosm.blue/lightrail) is fig's Rust collection directory, replacing the previous Go-based collectiondir (which had unbounded memory growth).

lightrail subscribes to the relay's firehose (`--subscribe https://relay.waow.tech`), indexes `(DID, collection)` pairs in [fjall](https://github.com/fjall-rs/fjall), and detects collection creation/deletion using MST adjacent key proofs from sync 1.1 commit ops — no `describeRepo` calls needed for most events.

**backfill:** lightrail handles its own via `--deep-crawl`, discovering hosts from the relay's `listHosts` and crawling each one's `listRepos`. no manual backfill step needed.

**admin:** `GET /admin` serves an HTML dashboard; `GET /admin/status` returns JSON. both require HTTP basic auth (password from `LIGHTRAIL_ADMIN_PASSWORD` env var).

routed via traefik ingress path matching (`/xrpc/com.atproto.sync.listReposByCollection`) so the relay's existing endpoints are unaffected.

### jetstream

[`ghcr.io/bluesky-social/jetstream`](https://github.com/bluesky-social/jetstream) subscribes to the relay's firehose over localhost and re-serves it as JSON websocket events. a lightweight alternative for consumers that don't need CBOR/CAR decoding.

### postgresql

relay's backing database, deployed via [bitnami/postgresql](https://github.com/bitnami/charts/tree/main/bitnami/postgresql). stores relay state (PDS host list, cursor positions, etc.).

### monitoring

prometheus + grafana via [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack). scrapes relay (`:2471/metrics`), jetstream, and lightrail (`:6789/metrics`). kubelet scraping is enabled for container-level disk I/O metrics. public read-only access at `relay-metrics.waow.tech`.

the relay and lightrail ServiceMonitors are standalone manifests (`kubectl apply -f`) rather than inline in the helm values — the `additionalServiceMonitors` field in kube-prometheus-stack silently fails when targeting services in a different namespace.

## PDS connection maintenance

relays try to reconnect to PDS hosts when connections drop, but eventually give up after repeated failures (exponential backoff). PDS hosts re-announce themselves to bluesky's relay when they come back online, but not to third-party relays like ours. this causes a natural decay in connected host count over time.

fix: a k8s CronJob (`indigo/deploy/reconnect-cronjob.yaml`) runs every 4 hours, fetching the [community PDS list](https://github.com/mary-ext/atproto-scraping) and sending `requestCrawl` for each host. this can also be run manually via `just indigo reconnect`.

## steady-state specs (indigo relay)

| metric | value |
|--------|-------|
| storage (relay data) | ~21 GB |
| storage (postgres) | ~2.4 GB |
| storage (lightrail fjall) | ~3 GB (~6.8M repos indexed) |
| CPU usage | 5-15% |
| network throughput | ~600 events/sec typical, 2000 peak |
| connected PDS hosts | ~2800 |
| memory (relay) | ~6 GiB (plateaus at GOMEMLIMIT) |
| memory (lightrail) | ~4 GiB during resync, expected lower at steady state |

---

## zlay (zig relay)

a second relay implementation in [Zig](https://ziglang.org/), deployed on a separate Hetzner node. source: [tangled.org/zzstoatzz.io/zlay](https://tangled.org/zzstoatzz.io/zlay). runs at `zlay.waow.tech`.

### how it differs from indigo

**same model, different internals.** zlay crawls PDS hosts directly — it is not a fan-out relay. `RELAY_UPSTREAM` (default: `bsky.network`) is a bootstrap seed used once at startup to populate the host list via `listHosts`. after that, all data flows directly from each PDS.

**inline collection index.** instead of running collectiondir as a sidecar, zlay indexes `(DID, collection)` pairs directly in its event processing pipeline, inspired by [fig](https://tangled.org/microcosm.blue)'s [lightrail](https://tangled.org/microcosm.blue/lightrail). storage is [RocksDB](https://rocksdb.org/) with two column families (`rbc` for collection→DID lookups, `cbr` for DID→collection cleanup). serves `listReposByCollection` from the relay's HTTP port — no separate service.

**optimistic validation.** on a signing key cache miss, zlay passes the frame through immediately and queues the DID for background resolution. first commit from an unknown account is unvalidated; subsequent commits are verified. indigo blocks until resolution completes.

**split ports.** 3000 for the WebSocket firehose, 3001 for HTTP (health, stats, metrics, admin, XRPC). indigo serves everything on port 2470 (with metrics on 2471).

**fibers, not goroutines.** zig 0.16 `Io.Evented` backend runs ~2,800 subscriber tasks on ~47 OS threads via io_uring fibers. requires ReleaseFast due to a zig stdlib GPF in fiber context switching under ReleaseSafe (tracked via `scripts/repro_evented.zig`). predictable memory (no GC).

### deployment

separate Hetzner cpx41 in Hillsboro OR (`hil`), independent k3s cluster. terraform in `zlay/infra/`.

```bash
just zlay init            # terraform init
just zlay infra           # create server
just zlay kubeconfig      # pull kubeconfig
just zlay deploy          # full deploy (cert-manager, postgres, relay, monitoring)
just zlay publish-remote  # build and push image
just zlay status          # check pods + health
just zlay logs            # tail logs
```

### collection index backfill

the collection index is live-only — it indexes `create` ops as they flow through the firehose. historical data is backfilled by importing from a source relay (bsky.network) via `com.atproto.sync.listReposByCollection`.

the backfiller discovers collections from two sources (lexicon garden llms.txt + RocksDB scan), then pages through each collection on the source relay, adding DIDs to RocksDB. progress is tracked in postgres for crash-resumability. triggered via `POST /admin/backfill-collections`, status via `GET`.

see the [zlay backfill docs](https://tangled.org/zzstoatzz.io/zlay/tree/main/docs/backfill.md) for full details, or use `scripts/backfill-status` in this repo.

### verification

`scripts/zlay-smoketest` tests endpoint conformance, pagination, and set completeness against a reference relay. `scripts/collectiondir-diff` compares `listReposByCollection` results between any two endpoints (use `--limit` values ≤ 1000 for zlay).

[pulsar](https://tangled.org/mackuba.eu/pulsar) (by @mackuba.eu) provides live firehose coverage comparison — subscribes to multiple relays simultaneously and counts unique DIDs over a time window.

### steady-state specs (zlay)

| metric | value |
|--------|-------|
| connected PDS hosts | ~2,830 |
| OS threads | ~47 (Evented backend, io_uring fibers) |
| collection index DIDs | ~30.4M (backfill 1,017/1,287 collections) |
| memory (steady state) | ~1.2 GiB (zig 0.16, Evented/ReleaseFast) |
| memory limit | 10 GiB |
| PVC | 20 GiB |
| `listReposByCollection` max limit | 1000 |