this repo has no description
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

readme

dholms 024b63f8 45a6f90e

+180
+180
cmd/nexus/README.md
··· 1 + `nexus`: atproto sync utility 2 + ======================================== 3 + 4 + Nexus is a single-tenant service that subscribes to an atproto relay and outputs filtered, verified events for a subset of repos. 5 + 6 + Nexus simplifies firehose consumption by handling verification, backfill, and filtering. Your application connects to nexus and receives simple JSON events for only the repos and collections you care about. Historical data for configured repos is automatically fetched from PDSs and delivered before live events begin. 7 + 8 + Features and design decisions: 9 + 10 + - verifies repo structure, MST integrity, and identity signatures 11 + - automatic backfill: fetches full repo history from PDS when adding new repos 12 + - filtered output: by DID list, by collection, or full network mode 13 + - ordering guarantees: live events wait for historical backfill to complete 14 + - delivery modes: WebSocket with acks, fire-and-forget, or webhook 15 + - single golang binary, SQLite backend 16 + - designed for moderate scale (thousands of repos, 10k+ events/sec) 17 + 18 + This tool is useful for building applications that need to track specific accounts or collections without dealing with the complexity of repo verification and backfill orchestration. 19 + 20 + ## Running Locally 21 + 22 + `go run ./cmd/nexus --disable-acks=true` 23 + 24 + By default, the service uses SQLite at `./nexus.db` and binds to port `:8080`. 25 + 26 + ## Quick Start 27 + 28 + ```bash 29 + # Run nexus 30 + go run ./cmd/nexus --disable-acks=true 31 + # By default, the service uses SQLite at `./nexus.db` and binds to port `:8080`. 32 + 33 + # In a separate terminal, connect to receive events: 34 + websocat ws://localhost:8080/channel 35 + 36 + # Add a repo to track 37 + curl -X POST http://localhost:8080/add-repos \ 38 + -H "Content-Type: application/json" \ 39 + -d '{"dids": ["did:plc:z72i7hdynmk6r22z27h6tvur"]}' # @bsky.app repo 40 + ``` 41 + 42 + Each repo will be backfilled from its PDS, then live events will stream as they arrive from the relay. 43 + 44 + ## HTTP API 45 + 46 + - `GET /health`: returns `{"status":"ok"}` 47 + - `POST /add-repos`: add DIDs to track (triggers backfill of added repos) 48 + - `POST /remove-repos`: remove DIDs (stops sync, deletes data) 49 + - `GET /channel`: WebSocket endpoint to receive events 50 + 51 + Note: only one WebSocket client can connect at a time. 52 + 53 + ## Configuration 54 + 55 + Environment variables or CLI flags: 56 + 57 + - `NEXUS_DB_PATH`: path to SQLite database file (default: `./nexus.db`) 58 + - `NEXUS_RELAY_URL`: atproto relay URL (default: `https://relay1.us-east.bsky.network`) 59 + - `NEXUS_BIND`: HTTP server address (default: `:8080`) 60 + - `NEXUS_FIREHOSE_PARALLELISM`: concurrent firehose event processors (default: `10`) 61 + - `NEXUS_RESYNC_PARALLELISM`: concurrent resync workers (default: `5`) 62 + - `NEXUS_CURSOR_SAVE_INTERVAL`: how often to save cursor (default: `5s`, set to `0` to disable) 63 + - `NEXUS_FULL_NETWORK_MODE`: track all repos on the network (default: `false`) 64 + - `NEXUS_SIGNAL_COLLECTION`: track all repos with at least one record in this collection (e.g. `app.bsky.actor.profile`) 65 + - `NEXUS_COLLECTION_FILTERS`: comma-separated collection filters, wildcards accepted (e.g., `app.bsky.feed.post,app.bsky.graph.*`) 66 + - `NEXUS_DISABLE_ACKS`: fire-and-forget mode, no client acks (default: `false`) 67 + - `NEXUS_WEBHOOK_URL`: webhook URL for event delivery (disables WebSocket mode) 68 + - `NEXUS_LOG_LEVEL`: log verbosity (`debug`, `info`, `warn`, `error`, default: `info`) 69 + 70 + ## Delivery Modes 71 + 72 + Nexus supports three delivery modes: 73 + 74 + **WebSocket with acks** (default): Client sends acks each event once it has been processed/persisted. Ensures that no data is lost and client does not need to handle cursors. It's recommended to use a client library such as (@TODO) when using this mode. 75 + 76 + **Fire-and-forget**: Set `NEXUS_DISABLE_ACKS=true`. Events are sent and considered "acked" once the client receives them. Simpler but may result in data loss. Recommended for testing purposes or when data integrity is not critical. 77 + 78 + **Webhook**: Set `NEXUS_WEBHOOK_URL=http://...`. Events are POSTed as JSON. Events considered "acked" once the webhook responds with a 200. Recommended for lower throughput serverless environments. 79 + 80 + 81 + ## Network Boundary Modes 82 + 83 + Nexus syncs a subset of repos in the network. It can operate in three modes for determining this network boundary. 84 + 85 + **Dynamically Configured** (default): Nexus starts out tracking no repos. Specific repos can then by added via `/add-repos` and removed via `/remove-repos`. 86 + 87 + **Collection Signal**: Set `NEXUS_SIGNAL_COLLECTION=com.example.nsid`. Track all repos that have at least one record in the specified collection. Many applications create a "declaration" or "profile" in a repo when that repo uses that application 88 + 89 + **Full Network**: Set `NEXUS_FULL_NETWORK_MODE=true`. Enumerates and tracks all findable repos on the entire network. Resource-intensive and takes days/weeks to complete backfill. 90 + 91 + ## Event Format 92 + 93 + Events are delivered as JSON: 94 + 95 + **Record events** (create, update, delete): 96 + 97 + ```json 98 + { 99 + "id": 12345, 100 + "type": "record", 101 + "record": { 102 + "did": "did:plc:abc123", 103 + "collection": "app.bsky.feed.post", 104 + "rkey": "3kb3fge5lm32x", 105 + "action": "create", 106 + "cid": "bafyreig...", 107 + "record": { 108 + "text": "Hello world!", 109 + "$type": "app.bsky.feed.post", 110 + "createdAt": "2024-10-07T12:00:00.000Z" 111 + }, 112 + "live": true 113 + } 114 + } 115 + ``` 116 + 117 + **User events** (handle or status changes): 118 + 119 + ```json 120 + { 121 + "id": 12346, 122 + "type": "user", 123 + "user": { 124 + "did": "did:plc:abc123", 125 + "handle": "alice.bsky.social", 126 + "isActive": true, 127 + "status": "active" 128 + } 129 + } 130 + ``` 131 + 132 + ## Backfill 133 + 134 + When a repo is added (via `/add-repos`, full network mode, or collection discovery): 135 + 136 + 1. **Historical backfill**: Nexus fetches the full repo from the account's PDS using `com.atproto.sync.getRepo` 137 + 2. **Live event buffering**: Any firehose events for this repo during backfill are held in memory 138 + 3. **Ordering guarantee**: Historical events (marked `live: false`) are delivered first 139 + 4. **Cutover**: After historical events complete, buffered live events are drained 140 + 5. **Live streaming**: New firehose events are delivered immediately (marked `live: true`) 141 + 142 + This ensures your application receives a complete, ordered view of each repo without gaps or duplicates. 143 + 144 + ### Per-Repo Ordering Rules 145 + 146 + Nexus offloads cursor management and takes care of delivery guarantees. Events are delivered *at least once*. Events may be delivered more than once if Nexus crashes and restarts before receiving an ack for a given event or if the event times out before being acked (default 10s). 147 + 148 + There is no global ordering of events across repos. However Nexus will ensure ordering within each repo and will avoid sending the next event until the previous event has completed processing. 149 + 150 + Events for the same repo are delivered with strict ordering: 151 + 152 + - **Live events** (`live: true`) are synchronization barriers - all prior events must complete before a live event can be sent, and the live event must complete (acked) before any subsequent events are sent 153 + - **Historical events** (`live: false`, in the case of backfill/resyncs) can be sent concurrently with each other, but cannot be sent while a live event is in-flight 154 + 155 + Example sequence: `H1, H2, L1, H3, H4, L2, H5` 156 + - H1 and H2 sent concurrently 157 + - Wait for H1 and H2 to complete, then send L1 (alone) 158 + - Wait for L1 to complete, then send H3 and H4 concurrently 159 + - Wait for H3 and H4 to complete, then send L2 (alone) 160 + - Wait for L2 to complete, then send H5 161 + 162 + This ensures live events act as ordering checkpoints while allowing historical backfill to run quickly. 163 + 164 + ## Collection Filtering 165 + 166 + Collection filters use wildcards but only at the period breaks in NSIDs. For example: 167 + 168 + `NEXUS_COLLECTION_FILTERS=app.bsky.feed.post,app.bsky.graph.*` 169 + 170 + Filters apply to record events only. User events are always delivered for tracked repos. 171 + 172 + ## Operations 173 + 174 + Nexus logs to stdout in JSON format. The firehose consumer automatically reconnects with exponential backoff on relay failures. Cursor position is saved periodically (default 5 seconds) and restored on restart. 175 + 176 + SQLite is tuned for high write throughput: WAL mode, 10-second busy timeout, `synchronous=NORMAL`, 64MB cache, batched deletes. The outbox buffers up to 1M pending events in memory. 177 + 178 + Resync is automatic: if a commit does not validate according to [Sync v1.1](https://github.com/bluesky-social/proposals/tree/main/0006-sync-iteration) semantics, the repo is marked `desynced` until it can be refetched from the authoritative PDS. Live events during resync are buffered and replayed after completion. Failures trigger exponential backoff (1 minute → 1 hour max). 179 + 180 + Identity resolution uses a cached directory (24-hour TTL). DNS lookups are skipped for `*.bsky.social` handles. The cache warms up at startup and may cause a burst of PLC directory requests.