Performance: Jetstream Event Fan-Out#
Current Architecture#
Airglow maintains a single WebSocket connection to Jetstream, regardless of how many user automations exist.
How it works#
-
One WebSocket, deduplicated collections —
JetstreamConsumeris a singleton. On startup (and whenever automations change), it loads all active automations from the database and groups them by collection + operation (e.g.app.bsky.feed.post+create). Only the unique collection names are sent aswantedCollectionsparams to Jetstream. If 100 users watchapp.bsky.feed.post, Jetstream sends events for that collection once. -
In-memory fan-out — When an event arrives, the consumer looks up all automations for that collection + operation in a
Map<string, Automation[]>and iterates through them, evaluating each automation's conditions. Only matching automations trigger their actions (webhook delivery or record creation). -
WebSocket reconnection on collection changes — When an automation is created or deleted, if the set of watched collections changes, the consumer closes and reopens the WebSocket with updated
wantedCollectionsparams. If only the automations within an existing collection change, no reconnection is needed — the in-memory map is simply updated.
Why this works well at current scale#
- The
Maplookup by collection + operation is O(1). - Condition matching is a simple linear scan per automation — fast for dozens or even hundreds of automations per collection.
- All fan-out happens in-process with no network overhead.
- A single WebSocket minimizes Jetstream resource usage.
Potential Future Improvements#
As the number of automations per collection grows (thousands+), the linear scan through conditions on every event could become a bottleneck. Some options:
- Condition indexing — Build inverted indexes on condition fields/values so only potentially matching automations are evaluated, rather than scanning all of them.
- Batch/parallel condition evaluation — Evaluate conditions for multiple automations concurrently rather than sequentially.
- Sharded consumers — Run multiple consumer instances, each responsible for a subset of collections or users, to distribute the fan-out load.
- Pre-filtering with Jetstream features — If Jetstream adds more granular filtering (e.g. by DID or record fields), leverage that to reduce the volume of events the consumer needs to process.
None of these are needed today. The current design is simple, correct, and efficient for the expected scale.