Webhooks for the AT Protocol airglow.run
atproto atprotocol automation webhook
12
fork

Configure Feed

Select the types of activity you want to include in your feed.

docs: performance.md

Hugo 066250eb 55e94998

+31
+31
docs/performance.md
··· 1 + # Performance: Jetstream Event Fan-Out 2 + 3 + ## Current Architecture 4 + 5 + Airglow maintains a **single WebSocket connection** to Jetstream, regardless of how many user subscriptions exist. 6 + 7 + ### How it works 8 + 9 + 1. **One WebSocket, deduplicated collections** — `JetstreamConsumer` is a singleton. On startup (and whenever subscriptions change), it loads all active subscriptions from the database and groups them by lexicon (collection). Only the **unique collection names** are sent as `wantedCollections` params to Jetstream. If 100 users subscribe to `app.bsky.feed.post`, Jetstream sends events for that collection once. 10 + 11 + 2. **In-memory fan-out** — When an event arrives, the consumer looks up all subscriptions for that collection in a `Map<string, Subscription[]>` and iterates through them, evaluating each subscription's conditions. Only matching subscriptions trigger their actions (webhook delivery or record creation). 12 + 13 + 3. **WebSocket reconnection on collection changes** — When a subscription is created or deleted, if the set of watched collections changes, the consumer closes and reopens the WebSocket with updated `wantedCollections` params. If only the subscriptions within an existing collection change, no reconnection is needed — the in-memory map is simply updated. 14 + 15 + ### Why this works well at current scale 16 + 17 + - The `Map` lookup by collection is O(1). 18 + - Condition matching is a simple linear scan per subscription — fast for dozens or even hundreds of subscriptions per collection. 19 + - All fan-out happens in-process with no network overhead. 20 + - A single WebSocket minimizes Jetstream resource usage. 21 + 22 + ## Potential Future Improvements 23 + 24 + As the number of subscriptions per collection grows (thousands+), the linear scan through conditions on every event could become a bottleneck. Some options: 25 + 26 + - **Condition indexing** — Build inverted indexes on condition fields/values so only potentially matching subscriptions are evaluated, rather than scanning all of them. 27 + - **Batch/parallel condition evaluation** — Evaluate conditions for multiple subscriptions concurrently rather than sequentially. 28 + - **Sharded consumers** — Run multiple consumer instances, each responsible for a subset of collections or users, to distribute the fan-out load. 29 + - **Pre-filtering with Jetstream features** — If Jetstream adds more granular filtering (e.g. by DID or record fields), leverage that to reduce the volume of events the consumer needs to process. 30 + 31 + None of these are needed today. The current design is simple, correct, and efficient for the expected scale.