update relay README · alice.mosphere.at/indigo@859b021

+16 -32

1 changed file

expand all

cmd

relay

README.md

+16 -32

cmd/relay/README.md

··· 2 2 atproto Relay Service 3 3 =============================== 4 4 5 - *NOTE: "Relays" used to be called "Big Graph Servers", or "BGS", or "bigsky". Many variables and packages still reference "bgs"* 5 + *NOTE: "relays" used to be called "Big Graph Servers", or "BGS", or "bigsky". Many variables and packages still reference "bgs"* 6 6 7 - This is the implementation of an atproto Relay which is running in the production network, written and operated by Bluesky. 7 + This is the implementation of an atproto relay which is running in the production network, written and operated by Bluesky. 8 8 9 - In atproto, a Relay subscribes to multiple PDS hosts and outputs a combined "firehose" event stream. Downstream services can subscribe to this single firehose a get all relevant events for the entire network, or a specific sub-graph of the network. The Relay maintains a mirror of repo data from all accounts on the upstream PDS instances, and verifies repo data structure integrity and identity signatures. It is agnostic to applications, and does not validate data against atproto Lexicon schemas. 9 + In atproto, a relay subscribes to multiple PDS hosts and outputs a combined "firehose" event stream. Downstream services can subscribe to this single firehose a get all relevant events for the entire network, or a specific sub-graph of the network. The relay maintains a mirror of repo data from all accounts on the upstream PDS instances, and verifies repo data structure integrity and identity signatures. It is agnostic to applications, and does not validate data against atproto Lexicon schemas. 10 10 11 - This Relay implementation is designed to subscribe to the entire global network. The current state of the codebase is informally expected to scale to around 50 million accounts in the network, and thousands of repo events per second (peak). 11 + This relay implementation is designed to subscribe to the entire global network. The current state of the codebase is informally expected to scale to around 100 million accounts in the network, and tens of thousands of repo events per second (peak). 12 12 13 13 Features and design decisions: 14 14 ··· 20 20 - observability: logging, prometheus metrics, OTEL traces 21 21 - admin web interface: configure limits, add upstream PDS instances, etc 22 22 23 - This software is not as packaged, documented, and supported for self-hosting as our PDS distribution or Ozone service. But it is relatively simple and inexpensive to get running. 23 + This software is not yet as packaged, documented, and supported for self-hosting as our PDS distribution or Ozone service. But it is relatively simple and inexpensive to get running. 24 24 25 - A note and reminder about Relays in general are that they are more of a convenience in the protocol than a hard requirement. The "firehose" API is the exact same on the PDS and on a Relay. Any service which subscribes to the Relay could instead connect to one or more PDS instances directly. 25 + A note and reminder about relays in general are that they are more of a convenience in the protocol than a hard requirement. The "firehose" API is the exact same on the PDS and on a relay. Any service which subscribes to the relay could instead connect to one or more PDS instances directly. 26 26 27 27 28 28 ## Development Tips 29 29 30 30 The README and Makefile at the top level of this git repo have some generic helpers for testing, linting, formatting code, etc. 31 31 32 - To re-build and run the Relay locally: 32 + To re-build and run the relay locally: 33 33 34 34 make run-dev-relay 35 35 ··· 37 37 38 38 RELAY_ADMIN_KEY=localdev go run ./cmd/relay/ --help 39 39 40 - By default, the daemon will use sqlite for databases (in the directory `./data/bigsky/`), CAR data will be stored as individual shard files in `./data/bigsky/carstore/`), and the HTTP API will be bound to localhost port 2470. 40 + By default, the daemon will use sqlite for databases (in the directory `./data/relay/`) and the HTTP API will be bound to localhost port 2470. 41 41 42 42 When the daemon isn't running, sqlite database files can be inspected with: 43 43 44 - sqlite3 data/bigsky/bgs.sqlite 44 + sqlite3 data/relay/relay.sqlite 45 45 [...] 46 46 sqlite> .schema 47 47 48 48 Wipe all local data: 49 49 50 50 # careful! double-check this destructive command 51 - rm -rf ./data/bigsky/* 51 + rm -rf ./data/relay/* 52 52 53 53 There is a basic web dashboard, though it will not be included unless built and copied to a local directory `./public/`. Run `make build-relay-ui`, and then when running the daemon the dashboard will be available at: <http://localhost:2470/dash/>. Paste in the admin key, eg `localdev`. 54 54 ··· 63 63 64 64 ## Docker Containers 65 65 66 - One way to deploy is running a docker image. You can pull and/or run a specific version of bigsky, referenced by git commit, from the Bluesky Github container registry. For example: 66 + One way to deploy is running a docker image. You can pull and/or run a specific version of relay, referenced by git commit, from the Bluesky Github container registry. For example: 67 67 68 68 docker pull ghcr.io/bluesky-social/indigo:relay-fd66f93ce1412a3678a1dd3e6d53320b725978a6 69 69 docker run ghcr.io/bluesky-social/indigo:relay-fd66f93ce1412a3678a1dd3e6d53320b725978a6 70 70 71 - There is a Dockerfile in this directory, which can be used to build customized/patched versions of the Relay as a container, republish them, run locally, deploy to servers, deploy to an orchestrated cluster, etc. See docs and guides for docker and cluster management systems for details. 71 + There is a Dockerfile in this directory, which can be used to build customized/patched versions of the relay as a container, republish them, run locally, deploy to servers, deploy to an orchestrated cluster, etc. See docs and guides for docker and cluster management systems for details. 72 72 73 73 74 74 ## Database Setup 75 75 76 - PostgreSQL and Sqlite are both supported. When using Sqlite, separate files are used for Relay metadata and CarStore metadata. With PostgreSQL a single database server, user, and logical database can all be reused: table names will not conflict. 77 - 78 - Database configuration is passed via the `DATABASE_URL` and `CARSTORE_DATABASE_URL` environment variables, or the corresponding CLI args. 76 + PostgreSQL and Sqlite are both supported. Database configuration is passed via the `DATABASE_URL` environment variable, or the corresponding CLI arg. 79 77 80 78 For PostgreSQL, the user and database must already be configured. Some example SQL commands are: 81 79 82 - CREATE DATABASE bgs; 83 - CREATE DATABASE carstore; 80 + CREATE DATABASE relay; 84 81 85 82 CREATE USER ${username} WITH PASSWORD '${password}'; 86 - GRANT ALL PRIVILEGES ON DATABASE bgs TO ${username}; 87 - GRANT ALL PRIVILEGES ON DATABASE carstore TO ${username}; 83 + GRANT ALL PRIVILEGES ON DATABASE relay TO ${username}; 88 84 89 85 This service currently uses `gorm` to automatically run database migrations as the regular user. There is no concept of running a separate set of migrations under more privileged database user. 90 86 91 87 92 88 ## Deployment 93 89 94 - *NOTE: this is not a complete guide to operating a Relay. There are decisions to be made and communicated about policies, bandwidth use, PDS crawling and rate-limits, financial sustainability, etc, which are not covered here. This is just a quick overview of how to technically get a relay up and running.* 90 + *NOTE: this is not a complete guide to operating a relay. There are decisions to be made and communicated about policies, bandwidth use, PDS crawling and rate-limits, financial sustainability, etc, which are not covered here. This is just a quick overview of how to technically get a relay up and running.* 95 91 96 92 In a real-world system, you will probably want to use PostgreSQL. 97 93 ··· 99 95 100 96 - `ENVIRONMENT`: eg, `production` 101 97 - `DATABASE_URL`: see section below 102 - - `DATA_DIR`: misc data will go in a subdirectory 103 98 - `GOLOG_LOG_LEVEL`: log verbosity 104 - - `RESOLVE_ADDRESS`: DNS server to use 105 - - `FORCE_DNS_UDP`: recommend "true" 106 99 107 100 There is a health check endpoint at `/xrpc/_health`. Prometheus metrics are exposed by default on port 2471, path `/metrics`. The service logs fairly verbosely to stderr; use `GOLOG_LOG_LEVEL` to control log volume. 108 - 109 - As a rough guideline for the compute resources needed to run a full-network Relay, in June 2024 an example Relay for over 5 million repositories used: 110 - 111 - - roughly 1 TByte of disk for PostgreSQL 112 - - roughly 1 TByte of disk for event playback buffer 113 - - roughly 5k disk I/O operations per second (all combined) 114 - - roughly 100% of one CPU core (quite low CPU utilization) 115 - - roughly 5GB of RAM for `relay`, and as much RAM as available for PostgreSQL and page cache 116 - - on the order of 1 megabit inbound bandwidth (crawling PDS instances) and 1 megabit outbound per connected client. 1 mbit continuous is approximately 350 GByte/month 117 101 118 102 Be sure to double-check bandwidth usage and pricing if running a public relay! Bandwidth prices can vary widely between providers, and popular cloud services (AWS, Google Cloud, Azure) are very expensive compared to alternatives like OVH or Hetzner. 119 103

Configure Feed

Configure Feed