relay-eval: ad-hoc eval + coverage-history recipes

+53

.claude/skills/relay-eval/SKILL.md

··· 1 + --- 2 + name: relay-eval 3 + description: evaluate relay firehose coverage against reference relays using the relay-eval server. use for ad-hoc coverage checks, canary validation, and historical trend analysis. 4 + --- 5 + 6 + evaluate relay coverage via `just relay-eval`. runs on a dedicated server over ssh — no local bandwidth cost. context (if any): $ARGUMENTS 7 + 8 + ## commands 9 + 10 + ```bash 11 + # ad-hoc eval (the main one) 12 + just relay-eval eval zlay.waow.tech # 60s, zlay vs bsky.network 13 + just relay-eval eval zlay.waow.tech 30 # faster 30s check 14 + just relay-eval eval zlay.waow.tech,relay.waow.tech 120 # multiple relays 15 + 16 + # historical 17 + just relay-eval runs-range # how far back data goes 18 + just relay-eval coverage-history > out.csv # full CSV dump 19 + 20 + # other 21 + just relay-eval run # trigger full 5-min scheduled run 22 + just relay-eval status # service health + db summary 23 + just relay-eval report # latest full run results 24 + ``` 25 + 26 + bsky.network is auto-included as a reference relay. 27 + 28 + ## interpreting results 29 + 30 + ``` 31 + host unique_dids union_dids pct connected 32 + zlay.waow.tech 4973 5044 98.6 1 33 + bsky.network 5034 5044 99.8 1 34 + ``` 35 + 36 + - `pct` = coverage. 95-99%+ is good. <90% with connected=1 = missing events. 37 + - `connected` = 1 means relay stayed up the full window. 0 = dropped. 38 + 39 + ## canary workflow 40 + 41 + 1. deploy canary 42 + 2. wait ~15-20 min for ramp 43 + 3. `just relay-eval eval zlay.waow.tech 60` 44 + 4. if >95%: repeat at T+30, T+60 to confirm stability 45 + 5. if <90%: investigate with `just zlay probe delta` 46 + 47 + ## public API (no ssh, quick spot-check) 48 + 49 + ```bash 50 + curl -sS 'https://relay-eval.waow.tech/api/trend?limit=50' \ 51 + | jq -r '.[] | select(.host=="zlay.waow.tech") 52 + | "\(.ts) \(.dids)/\(.union) \(100*.dids/.union | floor)%"' 53 + ```

+68

.claude/skills/relay-history/SKILL.md

··· 1 + --- 2 + name: relay-history 3 + description: fetch and summarize historical relay-eval coverage data. use for checking stability over time, spotting outage patterns, and comparing relays. 4 + --- 5 + 6 + fetch historical coverage from the relay-eval public API. no SSH required. context (if any): $ARGUMENTS 7 + 8 + ## API 9 + 10 + base: `https://relay-eval.waow.tech/api/trend` 11 + 12 + query params: 13 + - `limit` — number of most-recent runs to return (default varies; use 500 for deep history, 50 for quick checks) 14 + 15 + response is a flat JSON array of `{ts, union, host, dids}` objects — one per relay per run. filter client-side by host. 16 + 17 + ## commands 18 + 19 + ### quick status (last 12 runs for a relay) 20 + 21 + ```bash 22 + curl -sS 'https://relay-eval.waow.tech/api/trend?limit=200' | jq -r ' 23 + [.[] | select(.host=="zlay.waow.tech")] | sort_by(.ts) | .[-12:] | 24 + .[] | "\(.ts) \(if .dids > 0 then "\(.dids)/\(.union) \(100*.dids/.union | floor)%" else "DOWN (0%)" end)" 25 + ' 26 + ``` 27 + 28 + ### summary with outage count + streak 29 + 30 + ```bash 31 + curl -sS 'https://relay-eval.waow.tech/api/trend?limit=500' | jq -r ' 32 + [.[] | select(.host=="HOST")] | sort_by(.ts) as $all | 33 + [$all[] | select(.dids == 0)] | length as $zeros | 34 + [$all[] | select(.dids > 0) | (100*.dids/.union)] | (add / length | floor) as $avg | 35 + ($all | reverse | reduce .[] as $r ({"n":0,"done":false}; 36 + if .done then . elif $r.dids == 0 then .done=true else .n+=1 end) | .n) as $streak | 37 + "runs: \($all|length), outages: \($zeros), avg coverage (when up): \($avg)%, current streak: \($streak)" 38 + ' 39 + ``` 40 + 41 + replace `HOST` with the relay hostname (e.g. `zlay.waow.tech`, `relay.waow.tech`). 42 + 43 + ### compare two relays 44 + 45 + ```bash 46 + curl -sS 'https://relay-eval.waow.tech/api/trend?limit=100' | jq -r ' 47 + group_by(.ts) | .[-12:] | .[] | 48 + (.[0].ts) as $ts | (.[0].union) as $u | 49 + [.[] | select(.host == "zlay.waow.tech" or .host == "bsky.network")] | 50 + sort_by(.host) | 51 + "\($ts) " + ([.[] | "\(.host): \(if .dids>0 then "\(100*.dids/.union|floor)%" else "DOWN" end)"] | join(" ")) 52 + ' 53 + ``` 54 + 55 + ### list all tracked relays 56 + 57 + ```bash 58 + curl -sS 'https://relay-eval.waow.tech/api/trend?limit=1' | jq -r '[.[].host] | sort[]' 59 + ``` 60 + 61 + ## interpreting results 62 + 63 + - **97-99%** coverage is normal and healthy 64 + - **0%** means the relay was unreachable or restarting during that eval window 65 + - **alternating 0%/97%** pattern = pod crash-looping (restarts every ~30-60 min) 66 + - **< 90% with no 0% runs** = relay is up but missing events (investigation needed) 67 + - **streak** = consecutive runs from the most recent without a 0%. higher = more stable. 68 + - runs happen every ~30 min, so streak of 12 = ~6 hours stable

+141

docs/relay-eval-recipes.md

··· 1 + # relay-eval: pulling zlay coverage stats 2 + 3 + two paths, both usable directly by the zlay engineer: 4 + 5 + 1. **full historical CSV from the db** — via justfile recipe, ssh'd to 6 + the relay-eval server. this is what you want for multi-week 7 + inflection-point analysis. 8 + 2. **quick public JSON API** — via curl + jq. limited to what the web 9 + server exposes (no `connected` column), good for fast spot-checks. 10 + 11 + ## path 1: CSV dump of the runs + relay_stats join (multi-week) 12 + 13 + requires ssh access to the relay-eval server (`just relay-eval ssh` 14 + already works for operators; engineer can use the same key or run the 15 + recipe locally from this repo). 16 + 17 + ```bash 18 + # how much data is there? 19 + just relay-eval runs-range 20 + # => 1343 runs, earliest: 2026-03-13T03:10:59Z, latest: 2026-04-10T02:06:17Z 21 + 22 + # default: zlay + relay.waow.tech + bsky.network as reference curves 23 + just relay-eval coverage-history > coverage.csv 24 + 25 + # or customize the host list (quote the comma list carefully): 26 + just relay-eval coverage-history "'zlay.waow.tech','asia.firehose.network'" 27 + ``` 28 + 29 + CSV columns: `timestamp,host,unique_dids,union_dids,coverage_pct,connected`. 30 + 31 + `connected` is 1/0 (the eval run tracked whether the relay stayed 32 + connected the full 5-min window; 0 = dropped mid-run). use 33 + `WHERE connected = 1` in downstream analysis if you want to exclude 34 + disconnect-artifact zeros. 35 + 36 + `relay.waow.tech` and `bsky.network` are the "known-stable" reference 37 + curves. a zlay-specific regression shows as zlay diverging while those 38 + stay flat. a network-wide shift shows as all three dipping together. 39 + 40 + loading into pandas / duckdb: 41 + 42 + ```python 43 + import pandas as pd 44 + df = pd.read_csv("coverage.csv", parse_dates=["timestamp"]) 45 + pivot = df.pivot(index="timestamp", columns="host", values="coverage_pct") 46 + pivot.plot() # or: df.groupby("host")["coverage_pct"].describe() 47 + ``` 48 + 49 + ## path 2: public JSON API (no ssh, last ~N runs) 50 + 51 + last ~25 hours of zlay coverage, newest first, no ssh required: 52 + 53 + ```bash 54 + curl -sS 'https://relay-eval.waow.tech/api/trend?limit=50' \ 55 + | jq -r 'group_by(.ts) | sort_by(.[0].ts) | reverse | .[] 56 + | {ts: .[0].ts, 57 + u: .[0].union, 58 + z: (map(select(.host=="zlay.waow.tech"))[0].dids // 0), 59 + b: (map(select(.host=="bsky.network"))[0].dids // 0)} 60 + | "\(.ts) zlay=\(.z)/\(.u) (\(100*.z/.u | floor)%) bsky=\(.b) (\(100*.b/.u | floor)%)"' 61 + ``` 62 + 63 + bump `limit=50` higher for more history. each row is one 5-min sampling 64 + run. the API only returns "valid" runs (>50% of relays stayed connected 65 + the full window), so you never have to filter broken runs yourself. 66 + 67 + **caveats of path 2 vs path 1**: the public API does NOT expose the 68 + per-relay `connected` flag, and it truncates at whatever `limit` you 69 + pass. for the "when did zlay's curve depart from the reference 70 + relays" question, use path 1. for "is zlay healthy right now", path 2 71 + is faster to type. 72 + 73 + ## what the fields mean 74 + 75 + ``` 76 + {"ts": "...", "union": N, "host": "...", "dids": M} 77 + ``` 78 + 79 + - `ts` — run start timestamp 80 + - `union` — total unique DIDs seen across *all* tracked relays during 81 + the 5-minute window. this is the "ground truth" denominator. 82 + - `host` — relay hostname 83 + - `dids` — unique DIDs that host relayed during the window 84 + - `dids/union` — that relay's coverage ratio. 1.0 = saw everything 85 + the best of the rest saw. 0.0 = saw nothing. 86 + 87 + relays comparable to zlay in scope: `bsky.network` (the reference 88 + implementation), `relay.waow.tech` (our indigo instance, same network 89 + as zlay but different code), `relay.fire.hose.cam`, etc. use any of 90 + them as a "is it me or the network" sanity check — if bsky.network 91 + also drops in the same window, the dip isn't zlay-specific. 92 + 93 + ## more focused queries 94 + 95 + **just today's zlay coverage** (one line per run): 96 + ```bash 97 + curl -sS 'https://relay-eval.waow.tech/api/trend?limit=50' \ 98 + | jq -r '.[] | select(.host=="zlay.waow.tech") 99 + | "\(.ts) \(.dids)/\(.union) \(100*.dids/.union | floor)%"' 100 + ``` 101 + 102 + **the latest run, all relays ranked**: 103 + ```bash 104 + curl -sS 'https://relay-eval.waow.tech/api/latest' | jq . 105 + ``` 106 + 107 + **raw trend JSON for loading into pandas / duckdb / sqlite**: 108 + ```bash 109 + curl -sS 'https://relay-eval.waow.tech/api/trend?limit=500' > trend.json 110 + ``` 111 + (500 runs ≈ 10 days of history) 112 + 113 + ## caveats (from `relay-eval-methodology.md`) 114 + 115 + - `union` can include outliers. a relay replaying historical events will 116 + inflate its `dids` above the 1.0 ratio. relay-eval's own `og.svg` 117 + rendering uses median-based outlier detection for rankings; the 118 + trend API just gives you the raw numbers. if a specific relay is 119 + >1.3× the median, treat it as "replaying" not "live". 120 + - `diffs.classification` (via `/api/latest`) separates missed DIDs 121 + into `coverage_gap` (real bug — PDS was reachable and resolvable) 122 + vs `unresolvable` (DID lookup failed — ambiguous) vs `deactivated` 123 + (account is dead). `coverage_gap` is the number to fixate on. 124 + - `/api/latest` truncates to 30 `diff_samples` per relay. for the 125 + full missed-DID set you'd need to query sqlite directly, but you 126 + probably don't need that for the canary-2 diagnostic work. 127 + 128 + ## using this as a canary acceptance signal 129 + 130 + the cleanest diagnostic use: **after deploying a canary, watch the 131 + next 2-3 relay-eval runs** (roughly 60-90 minutes post-deploy, since 132 + runs are every ~30 min and the first sample after a cold-start deploy 133 + may catch ramp-up). if zlay's coverage ratio stays above, say, 0.90 134 + for 3 consecutive runs, the canary is serving evaluators at ingest 135 + rate. a single low-coverage run isn't conclusive; a multi-run sag is. 136 + 137 + this runs automatically and covers the 0 → N → 0 failure curves 138 + operators miss between manual probes. the operator's `just zlay probe` 139 + sweep is lower-latency (you can probe any time) but relay-eval is 140 + lower-effort (runs itself) and more authoritative (simultaneous 141 + sampling across ~16 relays).

+57 -1

relay-eval/justfile

··· 67 67 "SELECT count(*) || ' runs, latest: ' || max(timestamp) FROM runs;" 2>/dev/null || echo "(no db)" 68 68 EOF 69 69 70 - # trigger a manual eval run (blocks until complete) 70 + # trigger a full eval run (all relays, 5-min window, blocks until complete) 71 71 run: 72 72 ssh {{ server }} 'systemctl start relay-eval.service && journalctl -u relay-eval --no-pager -n 30' 73 73 74 + # ad-hoc eval of specific relay(s) against bsky.network as reference 75 + # usage: 76 + # just relay-eval eval zlay.waow.tech # 60s window, zlay vs bsky 77 + # just relay-eval eval zlay.waow.tech 30 # 30s window 78 + # just relay-eval eval zlay.waow.tech,relay.waow.tech 120 # multiple relays, 2 min 79 + # results land in the same db + web dashboard as scheduled runs 80 + eval relay window="60": 81 + #!/usr/bin/env bash 82 + set -euo pipefail 83 + # always include bsky.network as reference unless it's already in the list 84 + RELAYS="{{ relay }}" 85 + if ! echo "$RELAYS" | grep -q "bsky.network"; then 86 + RELAYS="${RELAYS},bsky.network" 87 + fi 88 + echo "==> evaluating: $RELAYS (window={{ window }}s)" 89 + ssh {{ server }} "/opt/relay-eval-src/relay-eval/zig-out/bin/relay-eval eval \ 90 + --relays $RELAYS \ 91 + --window {{ window }} \ 92 + --db /var/lib/relay-eval/relay-eval.db" 93 + echo "" 94 + echo "==> results:" 95 + ssh {{ server }} "sqlite3 -header -column /var/lib/relay-eval/relay-eval.db \ 96 + \"SELECT rs.host, rs.unique_dids, r.union_dids, \ 97 + ROUND(100.0 * rs.unique_dids / NULLIF(r.union_dids, 0), 1) AS pct, \ 98 + rs.connected \ 99 + FROM runs r JOIN relay_stats rs ON rs.run_id = r.id \ 100 + WHERE r.id = (SELECT max(id) FROM runs);\"" 101 + 74 102 # tail eval logs 75 103 logs-eval: 76 104 ssh {{ server }} journalctl -u relay-eval -f ··· 78 106 # tail web server logs 79 107 logs-web: 80 108 ssh {{ server }} journalctl -u relay-eval-web -f 109 + 110 + # dump zlay + reference relays coverage history as CSV (stdout) 111 + # usage: 112 + # just relay-eval coverage-history # default hosts, all runs 113 + # just relay-eval coverage-history 'zlay.waow.tech' # single host 114 + # just relay-eval coverage-history "'a','b','c'" # custom SQL list 115 + # pipe to a file: just relay-eval coverage-history > coverage.csv 116 + coverage-history hosts="'zlay.waow.tech','relay.waow.tech','bsky.network'": 117 + #!/usr/bin/env bash 118 + set -euo pipefail 119 + ssh {{ server }} "sqlite3 -header -csv /var/lib/relay-eval/relay-eval.db \" 120 + SELECT 121 + r.timestamp, 122 + rs.host, 123 + rs.unique_dids, 124 + r.union_dids, 125 + ROUND(100.0 * rs.unique_dids / NULLIF(r.union_dids, 0), 2) AS coverage_pct, 126 + rs.connected 127 + FROM runs r 128 + JOIN relay_stats rs ON rs.run_id = r.id 129 + WHERE rs.host IN ({{ hosts }}) 130 + ORDER BY r.timestamp ASC; 131 + \"" 132 + 133 + # count + time range of the runs table 134 + runs-range: 135 + ssh {{ server }} "sqlite3 /var/lib/relay-eval/relay-eval.db \ 136 + \"SELECT count(*) || ' runs, earliest: ' || min(timestamp) || ', latest: ' || max(timestamp) FROM runs;\"" 81 137 82 138 # query latest run results 83 139 report:

Configure Feed

Configure Feed