···11+`nexus`: atproto sync utility
22+========================================
33+44+Nexus is a single-tenant service that subscribes to an atproto relay and outputs filtered, verified events for a subset of repos.
55+66+Nexus simplifies firehose consumption by handling verification, backfill, and filtering. Your application connects to nexus and receives simple JSON events for only the repos and collections you care about. Historical data for configured repos is automatically fetched from PDSs and delivered before live events begin.
77+88+Features and design decisions:
99+1010+- verifies repo structure, MST integrity, and identity signatures
1111+- automatic backfill: fetches full repo history from PDS when adding new repos
1212+- filtered output: by DID list, by collection, or full network mode
1313+- ordering guarantees: live events wait for historical backfill to complete
1414+- delivery modes: WebSocket with acks, fire-and-forget, or webhook
1515+- single golang binary, SQLite backend
1616+- designed for moderate scale (thousands of repos, 10k+ events/sec)
1717+1818+This tool is useful for building applications that need to track specific accounts or collections without dealing with the complexity of repo verification and backfill orchestration.
1919+2020+## Running Locally
2121+2222+`go run ./cmd/nexus --disable-acks=true`
2323+2424+By default, the service uses SQLite at `./nexus.db` and binds to port `:8080`.
2525+2626+## Quick Start
2727+2828+```bash
2929+# Run nexus
3030+go run ./cmd/nexus --disable-acks=true
3131+# By default, the service uses SQLite at `./nexus.db` and binds to port `:8080`.
3232+3333+# In a separate terminal, connect to receive events:
3434+websocat ws://localhost:8080/channel
3535+3636+# Add a repo to track
3737+curl -X POST http://localhost:8080/add-repos \
3838+ -H "Content-Type: application/json" \
3939+ -d '{"dids": ["did:plc:z72i7hdynmk6r22z27h6tvur"]}' # @bsky.app repo
4040+```
4141+4242+Each repo will be backfilled from its PDS, then live events will stream as they arrive from the relay.
4343+4444+## HTTP API
4545+4646+- `GET /health`: returns `{"status":"ok"}`
4747+- `POST /add-repos`: add DIDs to track (triggers backfill of added repos)
4848+- `POST /remove-repos`: remove DIDs (stops sync, deletes data)
4949+- `GET /channel`: WebSocket endpoint to receive events
5050+5151+Note: only one WebSocket client can connect at a time.
5252+5353+## Configuration
5454+5555+Environment variables or CLI flags:
5656+5757+- `NEXUS_DB_PATH`: path to SQLite database file (default: `./nexus.db`)
5858+- `NEXUS_RELAY_URL`: atproto relay URL (default: `https://relay1.us-east.bsky.network`)
5959+- `NEXUS_BIND`: HTTP server address (default: `:8080`)
6060+- `NEXUS_FIREHOSE_PARALLELISM`: concurrent firehose event processors (default: `10`)
6161+- `NEXUS_RESYNC_PARALLELISM`: concurrent resync workers (default: `5`)
6262+- `NEXUS_CURSOR_SAVE_INTERVAL`: how often to save cursor (default: `5s`, set to `0` to disable)
6363+- `NEXUS_FULL_NETWORK_MODE`: track all repos on the network (default: `false`)
6464+- `NEXUS_SIGNAL_COLLECTION`: track all repos with at least one record in this collection (e.g. `app.bsky.actor.profile`)
6565+- `NEXUS_COLLECTION_FILTERS`: comma-separated collection filters, wildcards accepted (e.g., `app.bsky.feed.post,app.bsky.graph.*`)
6666+- `NEXUS_DISABLE_ACKS`: fire-and-forget mode, no client acks (default: `false`)
6767+- `NEXUS_WEBHOOK_URL`: webhook URL for event delivery (disables WebSocket mode)
6868+- `NEXUS_LOG_LEVEL`: log verbosity (`debug`, `info`, `warn`, `error`, default: `info`)
6969+7070+## Delivery Modes
7171+7272+Nexus supports three delivery modes:
7373+7474+**WebSocket with acks** (default): Client sends acks each event once it has been processed/persisted. Ensures that no data is lost and client does not need to handle cursors. It's recommended to use a client library such as (@TODO) when using this mode.
7575+7676+**Fire-and-forget**: Set `NEXUS_DISABLE_ACKS=true`. Events are sent and considered "acked" once the client receives them. Simpler but may result in data loss. Recommended for testing purposes or when data integrity is not critical.
7777+7878+**Webhook**: Set `NEXUS_WEBHOOK_URL=http://...`. Events are POSTed as JSON. Events considered "acked" once the webhook responds with a 200. Recommended for lower throughput serverless environments.
7979+8080+8181+## Network Boundary Modes
8282+8383+Nexus syncs a subset of repos in the network. It can operate in three modes for determining this network boundary.
8484+8585+**Dynamically Configured** (default): Nexus starts out tracking no repos. Specific repos can then by added via `/add-repos` and removed via `/remove-repos`.
8686+8787+**Collection Signal**: Set `NEXUS_SIGNAL_COLLECTION=com.example.nsid`. Track all repos that have at least one record in the specified collection. Many applications create a "declaration" or "profile" in a repo when that repo uses that application
8888+8989+**Full Network**: Set `NEXUS_FULL_NETWORK_MODE=true`. Enumerates and tracks all findable repos on the entire network. Resource-intensive and takes days/weeks to complete backfill.
9090+9191+## Event Format
9292+9393+Events are delivered as JSON:
9494+9595+**Record events** (create, update, delete):
9696+9797+```json
9898+{
9999+ "id": 12345,
100100+ "type": "record",
101101+ "record": {
102102+ "did": "did:plc:abc123",
103103+ "collection": "app.bsky.feed.post",
104104+ "rkey": "3kb3fge5lm32x",
105105+ "action": "create",
106106+ "cid": "bafyreig...",
107107+ "record": {
108108+ "text": "Hello world!",
109109+ "$type": "app.bsky.feed.post",
110110+ "createdAt": "2024-10-07T12:00:00.000Z"
111111+ },
112112+ "live": true
113113+ }
114114+}
115115+```
116116+117117+**User events** (handle or status changes):
118118+119119+```json
120120+{
121121+ "id": 12346,
122122+ "type": "user",
123123+ "user": {
124124+ "did": "did:plc:abc123",
125125+ "handle": "alice.bsky.social",
126126+ "isActive": true,
127127+ "status": "active"
128128+ }
129129+}
130130+```
131131+132132+## Backfill
133133+134134+When a repo is added (via `/add-repos`, full network mode, or collection discovery):
135135+136136+1. **Historical backfill**: Nexus fetches the full repo from the account's PDS using `com.atproto.sync.getRepo`
137137+2. **Live event buffering**: Any firehose events for this repo during backfill are held in memory
138138+3. **Ordering guarantee**: Historical events (marked `live: false`) are delivered first
139139+4. **Cutover**: After historical events complete, buffered live events are drained
140140+5. **Live streaming**: New firehose events are delivered immediately (marked `live: true`)
141141+142142+This ensures your application receives a complete, ordered view of each repo without gaps or duplicates.
143143+144144+### Per-Repo Ordering Rules
145145+146146+Nexus offloads cursor management and takes care of delivery guarantees. Events are delivered *at least once*. Events may be delivered more than once if Nexus crashes and restarts before receiving an ack for a given event or if the event times out before being acked (default 10s).
147147+148148+There is no global ordering of events across repos. However Nexus will ensure ordering within each repo and will avoid sending the next event until the previous event has completed processing.
149149+150150+Events for the same repo are delivered with strict ordering:
151151+152152+- **Live events** (`live: true`) are synchronization barriers - all prior events must complete before a live event can be sent, and the live event must complete (acked) before any subsequent events are sent
153153+- **Historical events** (`live: false`, in the case of backfill/resyncs) can be sent concurrently with each other, but cannot be sent while a live event is in-flight
154154+155155+Example sequence: `H1, H2, L1, H3, H4, L2, H5`
156156+- H1 and H2 sent concurrently
157157+- Wait for H1 and H2 to complete, then send L1 (alone)
158158+- Wait for L1 to complete, then send H3 and H4 concurrently
159159+- Wait for H3 and H4 to complete, then send L2 (alone)
160160+- Wait for L2 to complete, then send H5
161161+162162+This ensures live events act as ordering checkpoints while allowing historical backfill to run quickly.
163163+164164+## Collection Filtering
165165+166166+Collection filters use wildcards but only at the period breaks in NSIDs. For example:
167167+168168+`NEXUS_COLLECTION_FILTERS=app.bsky.feed.post,app.bsky.graph.*`
169169+170170+Filters apply to record events only. User events are always delivered for tracked repos.
171171+172172+## Operations
173173+174174+Nexus logs to stdout in JSON format. The firehose consumer automatically reconnects with exponential backoff on relay failures. Cursor position is saved periodically (default 5 seconds) and restored on restart.
175175+176176+SQLite is tuned for high write throughput: WAL mode, 10-second busy timeout, `synchronous=NORMAL`, 64MB cache, batched deletes. The outbox buffers up to 1M pending events in memory.
177177+178178+Resync is automatic: if a commit does not validate according to [Sync v1.1](https://github.com/bluesky-social/proposals/tree/main/0006-sync-iteration) semantics, the repo is marked `desynced` until it can be refetched from the authoritative PDS. Live events during resync are buffered and replayed after completion. Failures trigger exponential backoff (1 minute → 1 hour max).
179179+180180+Identity resolution uses a cached directory (24-hour TTL). DNS lookups are skipped for `*.bsky.social` handles. The cache warms up at startup and may cause a burst of PLC directory requests.