···3232/stress
3333/supercollider
3434/hepa
3535+/relay
35363637# Don't ignore this file itself, or other specific dotfiles
3738!.gitignore
···49505051# Relay dash output
5152/public/
5353+/cmd/relay/public
+9-10
Makefile
···8686 docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "plugins.security.disabled=true" -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=0penSearch-Pal0mar" opensearch-palomar
87878888.PHONY: run-dev-relay
8989-run-dev-relay: .env ## Runs 'bigsky' Relay for local dev
9090- GOLOG_LOG_LEVEL=info go run ./cmd/bigsky --admin-key localdev
9191-# --crawl-insecure-ws
8989+run-dev-relay: .env ## Runs relay for local dev
9090+ LOG_LEVEL=info go run ./cmd/relay --admin-password localdev serve
92919392.PHONY: run-dev-ident
9493run-dev-ident: .env ## Runs 'bluepages' identity directory for local dev
9594 GOLOG_LOG_LEVEL=info go run ./cmd/bluepages serve
96959796.PHONY: build-relay-image
9898-build-relay-image: ## Builds 'bigsky' Relay docker image
9999- docker build -t bigsky -f cmd/bigsky/Dockerfile .
9797+build-relay-image: ## Builds relay docker image
9898+ docker build -t relay -f cmd/relay/Dockerfile .
10099101101-.PHONY: build-relay-ui
102102-build-relay-ui: ## Build Relay dash web app
103103- cd ts/bgs-dash; yarn install --frozen-lockfile; yarn build
100100+.PHONY: build-relay-admin-ui
101101+build-relay-admin-ui: ## Build relay admin web UI
102102+ cd cmd/relay/relay-admin-ui; yarn install --frozen-lockfile; yarn build
104103 mkdir -p public
105105- cp -r ts/bgs-dash/dist/* public/
104104+ cp -r cmd/relay/relay-admin-ui/dist/* public/
106105107106.PHONY: run-relay-image
108107run-relay-image:
109109- docker run -p 2470:2470 bigsky /bigsky --admin-key localdev
108108+ docker run -p 2470:2470 relay /relay serve --admin-password localdev
110109# --crawl-insecure-ws
111110112111.PHONY: run-dev-search
+3-2
README.md
···8899**Go Services:**
10101111-- **bigsky** ([README](./cmd/bigsky/README.md)): relay reference implementation, running at `bsky.network`
1111+- **relay** ([README](./cmd/relay/README.md)): relay reference implementation
1212+- **rainbow** ([README](./cmd/rainbow/README.md)): firehose "splitter" or "fan-out" service
1213- **palomar** ([README](./cmd/palomar/README.md)): fulltext search service for <https://bsky.app>
1314- **hepa** ([README](./cmd/hepa/README.md)): auto-moderation bot for [Ozone](https://ozone.tools)
1415···47484849Individual commands can be run like:
49505050- go run ./cmd/bigsky
5151+ go run ./cmd/relay
51525253The [HACKING](./HACKING.md) file has a list of commands and packages in this repository and some other development tips.
5354
+52
cmd/relay/HACKING.md
···11+22+33+## Behaviors
44+55+Details about how the relay operates which might not be obvious!
66+77+- unknown/unexpected fields on overall firehose messages (eg, `#commit`) are *not* passed-through, so it is critical to upgrade the relay when there are protocol changes
88+- records and commit objects *are* passed through verbatim: they are serialized in `blocks` fields on `#commit` and `#sync` messages
99+- some admin UI changes are persisted across restarts (stored in database), others are not (ephemeral)
1010+ - ephemeral (but can be configured via env vars): new-hosts-per-day limit; enable/disable requestCrawl
1111+ - persisted (in database): account takedowns, domain bans, host bans, host account limit
1212+- the "lenient mode" configuration flag is intended as a short-term migration tool for [atproto Sync 1.1](https://github.com/bluesky-social/proposals/tree/main/0006-sync-iteration) and will be removed over time
1313+- once an upstream host websocket is established, the sequence numbers on that socket must always increase; messages with lower sequence will be dropped. but this is only strictly enforced over the life the the socket connection; if the relay restarts and the host emits older sequence numbers, those messages will start coming through
1414+- for a new host (no known previous sequence number), the relay will connect at "current" firehose offset, not "oldest" offset and backfill
1515+- for a known host, the relay will attempt to reconnect (eg, after a drop or restart) at the last persisted sequence number. persisting should happen every few seconds, or at clean shutdown of the daemon, but it is possible for the cursor to be slightly out of sync, resulting in replay of messages
1616+- account-level `#commit` revisions must always increase, and these revisions are stored for every valid `#commit` or `#sync` message from the account. repeated or lower revision messages are dropped. messages with revisions corresponding to a TID "in the future" (beyond a fudge period of a few minutes) are also dropped
1717+- messages for an account (DID) which come from a host connection which are not the current PDS host for that account are dropped. If there is a mismatch, the relay will re-resolve the identity (DID document) and double-check before dropping the message, in case there was an account migration not reflected yet in local caches.
1818+- if a host sends no messages for a long period, the relay will drop the connection and set the host status to "idle"; this is common for low-traffic PDS instances (eg, handful of accounts). The expectation is that the host would then send a `requestCrawl` ping next time there is a new event.
1919+- when the relay restarts, it connects to all "active" hosts
2020+2121+2222+## Internal Implementation Details
2323+2424+- the parallel event scheduler prevents multiple tasks for the same account (DID) from being processed at the same time
2525+- note the potentential for race-conditions with messages about the same account (DID) coming from different hosts around the same time: in this case there is no guarantee about ordering
2626+- the relay keeps track of which events have been received-but-not-processed by sequence number, and only increments the `lastSeq` for actually-processed events. the "inflight" set of messages (sequence numbers) can grow rather large for active hosts, if there are many events for a single account (only one processed per account at a time)
2727+2828+2929+## Code Organization and History
3030+3131+*Note: this was written in April 2025, and is likely to get out of date*
3232+3333+This codebase started as a fork of the prior `bigsky` / "BGS" relay implementation. The host and account state management, and message validation, were re-written. The "slurper" got a refactor, and some event stream and disk persistence code got lighter changes.
3434+3535+- `Service` struct: overall service executable/daemon. Implements protocol and admin HTTP endpoints.
3636+- `relay.Relay` struct: core relay service logic, message validation and processing, state and database management
3737+- `relay.Slurper` struct: maintains active subscriptions (WebSocket connections) to upstream hosts (eg, PDS instances)
3838+- `relay/models` package: database models
3939+- `stream` package: fork of `indigo:events` package, including websocket "frame" type, listeners, and some event stream rate-limiting
4040+- `stream.XRPCStreamEvent` struct: relatively critical/central serialiation type
4141+- `stream.eventmgr.EventManager`: manages output firehose: disk persistence, sequencing, etc
4242+- `testing` package: end-to-end integration tests
4343+4444+The `stream` code should probably get merged back in with the `indigo:events` at some point, but there are many small differences so it won't be a quick/trivial change.
4545+4646+4747+## Verification Tools and Tests
4848+4949+- `goat` has several firehose verify flags
5050+- `./testing/` contains a framework for end-to-end relay integration tests
5151+- commit-level MST slice validation tests are in `indigo:atproto/repo`
5252+- there are some interop test resources at: https://github.com/bluesky-social/atproto-interop-tests
+65-34
cmd/relay/README.md
···6677This is a reference implementation of an atproto relay, written and operated by Bluesky.
8899-In atproto, a relay subscribes to multiple PDS hosts and outputs a combined "firehose" event stream. Downstream services can subscribe to this single firehose a get all relevant events for the entire network, or a specific sub-graph of the network. The relay maintains a mirror of repo data from all accounts on the upstream PDS instances, and verifies repo data structure integrity and identity signatures. It is agnostic to applications, and does not validate data against atproto Lexicon schemas.
99+In [atproto](https://atproto.com), a relay subscribes to multiple PDS hosts and outputs a combined "firehose" event stream. Downstream services can subscribe to this single firehose a get all relevant events for the entire network, or a specific sub-graph of the network. The relay verifies repo data structure integrity and identity signatures. It is application-agnostic, and does not validate data records against atproto Lexicon schemas.
10101111This relay implementation is designed to subscribe to the entire global network. The current state of the codebase is informally expected to scale to around 100 million accounts in the network, and tens of thousands of repo events per second (peak).
12121313Features and design decisions:
14141515-- runs on a single server
1616-- crawling and account state: stored in SQL database
1717-- SQL driver: gorm, with PostgreSQL in production and sqlite for testing
1515+- runs on a single server (not a distributed system)
1616+- upstream host and account state is stored in a SQL database
1717+- SQL driver: [gorm](https://gorm.io), supporting PostgreSQL in production and sqlite for testing
1818- highly concurrent: not particularly CPU intensive
1919- single golang binary for easy deployment
2020- observability: logging, prometheus metrics, OTEL traces
2121- admin web interface: configure limits, add upstream PDS instances, etc
22222323-This software is not yet as packaged, documented, and supported for self-hosting as our PDS distribution or Ozone service. But it is relatively simple and inexpensive to get running.
2323+This daemon is relatively simple to self-host, though it isn't as well documented or supported as the PDS reference implementation (see details below).
24242525-A note and reminder about relays in general are that they are more of a convenience in the protocol than a hard requirement. The "firehose" API is the exact same on the PDS and on a relay. Any service which subscribes to the relay could instead connect to one or more PDS instances directly.
2525+See `./HACKING.md` for more documentation of specific behaviors of this implementation.
262627272828## Development Tips
29293030The README and Makefile at the top level of this git repo have some generic helpers for testing, linting, formatting code, etc.
31313232-To re-build and run the relay locally:
3232+To build the admin web interface, and then build and run the relay locally:
33333434+ make build-relay-admin-ui
3435 make run-dev-relay
35363636-You can re-build and run the command directly to get a list of configuration flags and env vars; env vars will be loaded from `.env` if that file exists:
3737+You can run the command directly to get a list of configuration flags and environment variables. The environment will be loaded from a `.env`file if one exist:
37383838- RELAY_ADMIN_PASSWORD=dummy go run ./cmd/relay/ --help
3939+ go run ./cmd/relay/ --help
39404040-By default, the daemon will use sqlite for databases (in the directory `./data/relay/`) and the HTTP API will be bound to localhost port 2470.
4141+You can also build an run the command directly:
4242+4343+ go build ./cmd/relay
4444+ ./relay serve
4545+4646+By default, the daemon will use sqlite for databases (in the directory `./data/relay/`), and the HTTP API will be bound to localhost port 2470.
41474248When the daemon isn't running, sqlite database files can be inspected with:
4349···4551 [...]
4652 sqlite> .schema
47534848-Wipe all local data:
5454+To wipe all local data (careful!):
49555050- # careful! double-check this destructive command
5656+ # double-check before running this destructive command
5157 rm -rf ./data/relay/*
52585353-There is a basic web dashboard, though it will not be included unless built and copied to a local directory `./public/`. Run `make build-relay-ui`, and then when running the daemon the dashboard will be available at: <http://localhost:2470/dash/>. Paste in the admin key, eg `dummy`.
5959+There is a basic web dashboard, though it will not be included unless built and copied to a local directory `./public/`. Run `make build-relay-admin-ui`, and then when running the daemon the dashboard will be available at: <http://localhost:2470/dash/>. Paste in the admin key, eg `dummy`.
54605561The local admin routes can also be accessed by passing the admin password using HTTP Basic auth (with username `admin`), for example:
5662···60666167 http post :2470/admin/pds/requestCrawl -a admin:dummy hostname=pds.example.com
62686969+The `goat` command line tool (also part of the indigo git repository) includes helpers for administering, inspecting, and debugging relays:
63706464-## Docker Containers
7171+ RELAY_HOST=http://localhost:2470 goat firehose --verify-mst
7272+ RELAY_HOST=http://localhost:2470 goat relay admin host list
65736666-One way to deploy is running a docker image. You can pull and/or run a specific version of relay, referenced by git commit, from the Bluesky Github container registry. For example:
7474+## API Endpoints
67756868- docker pull ghcr.io/bluesky-social/indigo:relay-fd66f93ce1412a3678a1dd3e6d53320b725978a6
6969- docker run ghcr.io/bluesky-social/indigo:relay-fd66f93ce1412a3678a1dd3e6d53320b725978a6
7676+This relay implements the core atproto "sync" API endpoints:
70777171-There is a Dockerfile in this directory, which can be used to build customized/patched versions of the relay as a container, republish them, run locally, deploy to servers, deploy to an orchestrated cluster, etc. See docs and guides for docker and cluster management systems for details.
7878+- `GET /xrpc/com.atproto.sync.subscribeRepos` (WebSocket)
7979+- `GET /xrpc/com.atproto.sync.getRepo` (HTTP redirect to account's PDS)
8080+- `GET /xrpc/com.atproto.sync.getRepoStatus`
8181+- `GET /xrpc/com.atproto.sync.listRepos` (optional)
8282+- `GET /xrpc/com.atproto.sync.getLatestCommit` (optional)
72838484+It also implements some relay-specific endpoints:
73857474-## Database Setup
8686+- `POST /xrpc/com.atproto.sync.requestCrawl`
8787+- `GET /xrpc/com.atproto.sync.listHosts`
8888+- `GET /xrpc/com.atproto.sync.getHostStatus`
75897676-PostgreSQL and Sqlite are both supported. Database configuration is passed via the `DATABASE_URL` environment variable, or the corresponding CLI arg.
9090+Documentation can be found in the [atproto specifications](https://atproto.com/specs/sync) for repository synchronization, event streams, data formats, account status, etc.
77917878-For PostgreSQL, the user and database must already be configured. Some example SQL commands are:
9292+This implementation also has some off-protocol admin endpoints under `/admin/`. These have legacy schemas from an earlier implementation, are not well documented, and should not be considered a stable API to build upon. The intention is to refactor them in to Lexicon-specified APIs.
79938080- CREATE DATABASE relay;
9494+## Configuration and Operation
81958282- CREATE USER ${username} WITH PASSWORD '${password}';
8383- GRANT ALL PRIVILEGES ON DATABASE relay TO ${username};
9696+*NOTE: this document is not a complete guide to operating a relay as a public service. That requires planning around acceptable use policies, financial sustainability, infrastructure selection, etc. This is just a quick overview of the mechanics of getting a relay up and running.*
84978585-This service currently uses `gorm` to automatically run database migrations as the regular user. There is no concept of running a separate set of migrations under more privileged database user.
8686-8787-8888-## Deployment
8989-9090-*NOTE: this is not a complete guide to operating a relay. There are decisions to be made and communicated about policies, bandwidth use, PDS crawling and rate-limits, financial sustainability, etc, which are not covered here. This is just a quick overview of how to technically get a relay up and running.*
9191-9292-In a real-world system, you will probably want to use PostgreSQL.
9393-9494-Some notable configuration env vars to set:
9898+Some notable configuration env vars:
959996100- `RELAY_ADMIN_PASSWORD`
97101- `DATABASE_URL`: eg, `postgres://relay:CHANGEME@localhost:5432/relay`
···103107There is a health check endpoint at `/xrpc/_health`. Prometheus metrics are exposed by default on port 2471, path `/metrics`. The service logs fairly verbosely to stdout; use `LOG_LEVEL` to control log volume (`warn`, `info`, etc).
104108105109Be sure to double-check bandwidth usage and pricing if running a public relay! Bandwidth prices can vary widely between providers, and popular cloud services (AWS, Google Cloud, Azure) are very expensive compared to alternatives like OVH or Hetzner.
110110+111111+The relay admin interface has flexibility for many situations, but in some operational incidents it may be necessary to run SQL commands to do cleanups. This should be done when the relay is not actively operating. It is also recommended to run SQL commands in a transaction that can be rolled back in case of a typo or mistake.
112112+113113+### PostgreSQL
114114+115115+PostgreSQL is recommended for any non-trival relay deployments. Database configuration is passed via the `DATABASE_URL` environment variable, or the corresponding CLI arg.
116116+117117+The user and database must already be configured. For example:
118118+119119+ CREATE DATABASE relay;
120120+121121+ CREATE USER ${username} WITH PASSWORD '${password}';
122122+ GRANT ALL PRIVILEGES ON DATABASE relay TO ${username};
123123+124124+This service currently uses `gorm` to automatically run database migrations as the regular user. There is no support for running database migrations separately under more privileged database user.
125125+126126+### Docker
127127+128128+The relay is relatively easy to build and operate as as simple executable, but there is also Dockerfile in this directory. It can be used to build customized/patched versions of the relay as a container, republish them, run locally, deploy to servers, deploy to an orchestrated cluster, etc.
129129+130130+We strongly recommend running docker in "host networking" mode when operating a full-network relay.
131131+132132+### Bootstrapping Host List
133133+134134+The relay comes with a helper command to pull a list of hosts from an existing relay. You should shut the relay down first and run this as a separate command:
135135+136136+ ./relay pull-hosts