···1919# and can be added to the global gitignore or merged into this file. For a more nuclear
2020# option (not recommended) you can uncomment the following to ignore the entire idea folder.
2121#.idea/
2222+2323+# Application Data
2424+/data
+59
CLAUDE.md
···11+# CLAUDE.md
22+33+## Project Overview
44+55+Ramjet is a single-node Rust service that consumes the ATProtocol firehose, persists records for tracked collections in fjall v3 (pure-Rust LSM-tree), and re-emits events via WebSocket fan-out with priority ordering.
66+77+## Common Commands
88+99+- **Build**: `cargo build`
1010+- **Run tests**: `cargo test`
1111+- **Check code**: `cargo check`
1212+- **Format code**: `cargo fmt`
1313+- **Run**: `cargo run -- --db-path /tmp/ramjet-test --listen-addr 127.0.0.1:8080 --tracked-collections "garden.lexicon.**"`
1414+1515+## Architecture
1616+1717+Three async pipelines connected by channels:
1818+1. **Firehose Ingester** — WebSocket client connects to relay, decodes CBOR frames (#commit, #identity, #account), sends `IngestEvent` over mpsc channel. Auto-reconnects with 2s delay.
1919+2. **Batch Writer** — Collects events into configurable batches (size + timeout), writes atomically to fjall via `WriteBatch`. Routes tracked collections to records keyspace + high-priority fan-out, forward-only collections to low-priority fan-out. Filters denied repos. Queues tooBig commits for backfill.
2020+3. **WebSocket Fan-out** — Serves `com.atproto.sync.subscribeRepos`. Replays from events keyspace on cursor, then streams live via biased `tokio::select!` (high > low > client recv).
2121+4. **Identity Worker** — Processes identity events from mpsc channel. Resolves DID documents, updates `handle_to_did` and `did_to_doc` keyspaces, removes stale handle mappings.
2222+5. **Backfill Worker** — Polls `backfill_queue\x00` prefix in meta keyspace every 5s. Fetches CAR from PDS, parses with `atproto_repo::repo::MemoryRepository`, writes records.
2323+2424+Eight fjall keyspaces: `records`, `events`, `meta`, `repo_state`, `did_to_doc`, `handle_to_did`, `blobs`, `blob_meta`.
2525+2626+## Module Structure
2727+2828+- `src/config.rs` — CLI args (clap), `CollectionPattern`/`CollectionMatcher`, `ServiceConfig`
2929+- `src/errors.rs` — `RamjetError` enum (thiserror)
3030+- `src/types.rs` — Core domain types: `RecordValue`, `CommitOp`, `CommitOps`, `RepoState`, `OpType`, `AccountStatus`, `EventType`, `FanOutEvent`, `SharedFanOutEvent`
3131+- `src/storage/mod.rs` — `FjallDb` struct (8 keyspaces), `open()`, `batch()`, `get_cursor()`, `set_cursor()`, `get_repo_state()`, `current_sequence()`, `oldest_event_sequence()`
3232+- `src/storage/keys.rs` — Key encoding/decoding with `\x00` separators, versioned record keys (`did\x00collection\x00rkey\x00rev`), BE u64 for event keys
3333+- `src/storage/encoding.rs` — Binary encode/decode for `RecordValue` (CID + data, tombstone = empty), `RepoState`, compact events
3434+- `src/server/mod.rs` — `AppState`, `build_router()` with 12 routes
3535+- `src/server/health.rs` — `GET /_health`
3636+- `src/server/metrics.rs` — Prometheus metrics (`prometheus-client 0.23`)
3737+- `src/server/xrpc.rs` — XRPC handlers: getRecord, listRecords, describeRepo, resolveIdentity, resolveHandle, resolveDid
3838+- `src/server/admin.rs` — Admin handlers with JWT auth: getState, setState, resync
3939+- `src/server/websocket.rs` — WebSocket fan-out with cursor replay and biased priority delivery
4040+- `src/pipeline/mod.rs` — Module re-exports
4141+- `src/pipeline/ingester.rs` — Firehose WebSocket client with CBOR decoding (atproto-dasl)
4242+- `src/pipeline/writer.rs` — Batch writer with collection routing and deny-list filtering
4343+- `src/pipeline/fanout.rs` — `FanOutChannels` with high/low priority broadcast channels (capacity 8192)
4444+- `src/pipeline/identity.rs` — Identity resolution worker
4545+- `src/pipeline/backfill.rs` — Backfill worker with CAR parsing
4646+4747+## Error Handling
4848+4949+Errors follow: `error-ramjet-{domain}-{number} {message}: {details}`
5050+5151+Use `thiserror` for error enums, `anyhow::Result` for function returns.
5252+5353+## Key Conventions
5454+5555+- Use `tracing` for structured logging. All calls must be fully qualified (`tracing::info!`, etc.).
5656+- fjall v3 iterators yield `Guard` items — call `guard.into_inner()` to get `Result<(UserKey, UserValue)>`.
5757+- ATProto crates are path dependencies from `../../atproto-crates-studious-guide/amman-v2/crates/`.
5858+- Run `cargo fmt` before committing any changes.
5959+- Edition 2024, rust-version 1.94. `unsafe_code = "forbid"`.
···11-# ramjet-mature-wolfhound11+# Ramjet
22+33+Single-node Rust service that consumes the ATProtocol firehose, persists records for tracked collections in [fjall](https://github.com/fjall-rs/fjall) (pure-Rust LSM-tree), and re-emits events via WebSocket fan-out with priority ordering.
44+55+## Motivation
66+77+ATProtocol applications often need a mix of precisely tracked collections and best-effort network-wide intake. Ramjet solves this with separately configured **tracked** and **forwarded** collection sets:
88+99+- **Tracked collections** are persisted to storage and emitted on the high-priority fan-out channel — these are the records your application depends on for correctness.
1010+- **Forwarded collections** are emitted on the low-priority fan-out channel without being persisted — useful for analytics, search indexing, or other consumers that want broad visibility without the storage cost.
1111+1212+For example, [Lexicon Garden](https://lexicon.garden) needs accurately tracked lexicon schema and AppView-specific collections, but also wants all records for analytics. [Smoke Signal](https://smokesignal.events) similarly tracks event, RSVP, profile, and location records from different collections explicitly, while still consuming the full network stream.
1313+1414+Both collection configurations can be updated at runtime, and the admin API supports on-demand backfill and resync operations for any repository — making it straightforward to add new tracked collections and catch up on historical data.
1515+1616+## Quick start
1717+1818+```bash
1919+cargo build --release
2020+2121+cargo run --release -- \
2222+ --db-path /var/lib/ramjet/data \
2323+ --listen-addr 0.0.0.0:8080 \
2424+ --relay-host bsky.network \
2525+ --tracked-collections "app.bsky.feed.*" \
2626+ --forward-collections "app.bsky.**"
2727+```
2828+2929+## Configuration
3030+3131+All options are available as CLI flags or environment variables.
3232+3333+| Flag | Env | Default | Description |
3434+|---|---|---|---|
3535+| `--db-path` | `RAMJET_DB_PATH` | `./data/ramjet.db` | fjall database directory |
3636+| `--listen-addr` | `RAMJET_LISTEN_ADDR` | `0.0.0.0:8080` | HTTP listen address |
3737+| `--relay-host` | `RAMJET_RELAY_HOST` | `bsky.network` | Firehose relay hostname |
3838+| `--tracked-collections` | `RAMJET_TRACKED_COLLECTIONS` | `*` | Collections to persist (space-separated patterns) |
3939+| `--forward-collections` | `RAMJET_FORWARD_COLLECTIONS` | `*` | Collections to forward via WebSocket only |
4040+| `--event-retention-hours` | `RAMJET_EVENT_RETENTION_HOURS` | `72` | Hours to retain events |
4141+| `--batch-size` | `RAMJET_BATCH_SIZE` | `500` | Ingester batch size |
4242+| `--batch-timeout-ms` | `RAMJET_BATCH_TIMEOUT_MS` | `100` | Batch flush timeout (ms) |
4343+| `--admin-dids` | `RAMJET_ADMIN_DIDS` | *(empty)* | Comma-separated admin DIDs for protected endpoints |
4444+| `--zstd-dict-path` | `RAMJET_ZSTD_DICT_PATH` | *(none)* | Path to zstd dictionary for event compression |
4545+| `--backfill` | `RAMJET_BACKFILL` | *(empty)* | Comma-separated DIDs to backfill on startup |
4646+| `--consumer-group` | `RAMJET_CONSUMER_GROUPS` | *(empty)* | Consumer group definitions (`name:partitions`, comma-separated) |
4747+4848+### Collection patterns
4949+5050+- `*` — match all collections
5151+- `com.example.feed` — exact NSID match
5252+- `com.example.*` — single-segment wildcard (matches `com.example.feed` but not `com.example.feed.like`)
5353+- `com.example.**` — glob wildcard (matches `com.example.feed`, `com.example.feed.like`, etc.)
5454+5555+### Zstd dictionary compression
5656+5757+When `--zstd-dict-path` is provided, Ramjet compresses events in the events keyspace using a pre-trained zstd dictionary. The dictionary is identified by hash and stored in the meta keyspace so dictionary changes are detected on restart. Clients can download the active dictionary via the `GET /dictionary` endpoint (supports `If-None-Match` with CID-based ETags).
5858+5959+### Consumer groups
6060+6161+Consumer groups partition events across multiple WebSocket connections by DID hash, similar to Kafka consumer groups. Define groups at startup:
6262+6363+```bash
6464+cargo run --release -- \
6565+ --consumer-group indexers:3 \
6666+ --consumer-group notifiers:2
6767+```
6868+6969+Clients connect with `?group=indexers&partition=0` to receive only events for DIDs that hash to partition 0. All events for the same DID always route to the same partition. Omitting `group`/`partition` gives broadcast mode (all events).
7070+7171+## Architecture
7272+7373+Five async pipelines connected by channels:
7474+7575+```
7676+Firehose ──→ Ingester ──mpsc──→ Writer ──broadcast──→ WebSocket Fan-out
7777+ │ │ ↑
7878+ │ └─partitioned──→┤ (consumer groups)
7979+ ├─→ fjall storage │
8080+ └─→ events keyspace ───────┘ (cursor replay)
8181+8282+Backfill Worker ──→ CAR fetch ──→ fjall storage
8383+Identity Worker ──→ DID/handle resolution ──→ fjall storage
8484+```
8585+8686+1. **Firehose Ingester** — connects to the relay via WebSocket, decodes CBOR frames (`#commit`, `#identity`, `#account`), and sends events over an mpsc channel. Auto-reconnects with a 2s delay.
8787+2. **Batch Writer** — collects events into configurable batches (size + timeout), writes to fjall atomically via `WriteBatch`, and broadcasts to fan-out channels with priority routing (tracked = high, forwarded = low). Filters denied repos. Queues `tooBig` commits and desynced repos for backfill.
8888+3. **WebSocket Fan-out** — serves `dev.ngerakines.ramjet.stream.subscribe` with cursor-based replay from the events keyspace, then switches to live streaming with biased priority delivery (high > low > client recv). Supports both broadcast and partitioned (consumer group) modes.
8989+4. **Identity Worker** — resolves DID documents and handle mappings via concurrent resolution (bounded by a semaphore), updating the `did_to_doc` and `handle_to_did` keyspaces.
9090+5. **Backfill Worker** — polls a backfill queue every 5s, fetches full repo CARs from PDS endpoints using `DiskRepository` (spills to disk for large repos), parses records, and writes to storage. Repos exceeding 5 consecutive failures are auto-denied.
9191+9292+### Storage
9393+9494+Eight fjall keyspaces:
9595+9696+| Keyspace | Key format | Value | Purpose |
9797+|---|---|---|---|
9898+| `records` | `did\0collection\0rkey\0rev` | CID + DAG-CBOR (empty = tombstone) | Versioned record history |
9999+| `events` | `u64 BE` | Compact binary event | Event log for cursor replay |
100100+| `meta` | string key | varies | Cursor, sequence counter, backfill queue, RIBLT cache |
101101+| `repo_state` | DID bytes | `RepoState` (rev + status + denied + backfilled) | Per-repo state |
102102+| `did_to_doc` | DID bytes | Timestamped JSON DID document | Identity cache |
103103+| `handle_to_did` | handle bytes | DID string | Handle-to-DID mapping |
104104+| `blobs` | blob key | blob data | Blob storage |
105105+| `blob_meta` | blob key | metadata | Blob metadata |
106106+107107+Records are versioned by including the repository revision in the key. Each create/update appends a new entry; deletes append a tombstone (empty value). The latest version of a record is the last entry in a prefix scan over `did\0collection\0rkey\0`. This provides a built-in change history without a separate temporal keyspace.
108108+109109+### Event encoding
110110+111111+Events are stored in a compact binary format with tag-based versioning:
112112+- **V1 tags** (0x01-0x03): 1B tag + 8B sequence + variable fields
113113+- **V2 tags** (0x04-0x06): 1B tag + 8B sequence + 8B timestamp (microseconds) + variable fields
114114+115115+Both commit operations, identity events, and account events have dedicated compact encodings. The V2 format adds microsecond timestamps for latency tracking. Events are optionally compressed with zstd dictionary compression before storage.
116116+117117+## API
118118+119119+### Health and metrics
120120+121121+- `GET /_health` — returns `{"status":"ok","version":"0.1.0"}`
122122+- `GET /metrics` — Prometheus text format (includes HTTP metrics, pipeline counters, tokio task metrics, fan-out queue depths)
123123+124124+### XRPC endpoints
125125+126126+| Method | Endpoint | Description |
127127+|---|---|---|
128128+| GET | `com.atproto.repo.getRecord` | Fetch a single record by repo/collection/rkey |
129129+| GET | `com.atproto.repo.listRecords` | List records in a collection with cursor pagination |
130130+| GET | `com.atproto.repo.describeRepo` | Repo metadata, collections, rev, DID document |
131131+| GET | `com.atproto.identity.resolveIdentity` | Resolve DID or handle to DID document (supports `Cache-Control: max-stale=N`) |
132132+| GET | `com.atproto.identity.resolveHandle` | Resolve handle to DID |
133133+| GET | `com.atproto.identity.resolveDid` | Fetch DID document |
134134+| GET | `dev.ngerakines.ramjet.stream.subscribe` | WebSocket event stream with cursor replay and consumer group support |
135135+| GET | `dev.ngerakines.ramjet.repos.getReconciliation` | RIBLT sketch for set reconciliation |
136136+137137+### Set reconciliation (RIBLT)
138138+139139+The `getReconciliation` endpoint returns an [RIBLT](https://github.com/ngerakines/riblt-rs) sketch for a DID's tracked records, allowing consumers to efficiently determine which records they're missing without transferring the full record list.
140140+141141+```
142142+GET /xrpc/dev.ngerakines.ramjet.repos.getReconciliation?did=did:plc:abc123
143143+```
144144+145145+Returns `application/x-riblt` binary with response headers:
146146+- `ETag` — repo rev (supports `If-None-Match` for 304 Not Modified)
147147+- `X-Riblt-Records` — number of records encoded
148148+- `X-Riblt-Rev` — repo revision at time of sketch generation
149149+150150+Optional `collection` parameter filters the sketch to a single collection.
151151+152152+**Client workflow:**
153153+1. Build a local sketch from your records using the same symbol encoding (`collection\0rkey\0cid`)
154154+2. Fetch Ramjet's sketch
155155+3. Subtract and decode — the symmetric difference tells you which records you're missing (and which you have that Ramjet doesn't)
156156+4. Fetch missing records via `getRecord` or `listRecords`
157157+158158+Sketch sizes are allocated in 25,000-cell increments, bumping to the next tier when the record count exceeds 50% past the current size. Sketches are cached per-DID and automatically invalidated when records change.
159159+160160+### Admin endpoints (JWT auth required)
161161+162162+| Method | Endpoint | Description |
163163+|---|---|---|
164164+| GET | `dev.ngerakines.ramjet.repos.getState` | Get repo state (rev, status, denied) |
165165+| POST | `dev.ngerakines.ramjet.repos.setState` | Set repo denied flag |
166166+| POST | `dev.ngerakines.ramjet.repos.resync` | Queue a repo for backfill |
167167+| POST | `dev.ngerakines.ramjet.repos.rebuildReconciliation` | Force rebuild of RIBLT sketch cache |
168168+169169+### Other endpoints
170170+171171+| Method | Endpoint | Description |
172172+|---|---|---|
173173+| GET | `/dictionary` | Download the active zstd dictionary (supports `If-None-Match` with CID ETags) |
174174+175175+## Development
176176+177177+```bash
178178+cargo check # type-check
179179+cargo test # run tests (50 unit tests)
180180+cargo fmt # format code
181181+cargo build # debug build
182182+```
183183+184184+### Binaries
185185+186186+- `ramjet` — main service
187187+- `ramjet-writer` — benchmark binary for ingester-to-writer pipeline analysis
188188+- `ramjet-data` — data inspection tool
189189+- `ramjet-dictgen` — zstd dictionary generator
190190+- `ramjet-forecast` — capacity forecasting
191191+- `ramjet-consumer` — example WebSocket consumer
192192+- `rjtop` — TUI dashboard
193193+194194+## License
195195+196196+MIT
+104
docs/zstd-compression-plan.md
···11+# Zstd Dictionary Compression for Events and Records Keyspaces
22+33+## Goal
44+55+Compress data in the events and records keyspaces using zstd with a pre-trained dictionary. Small payloads (100B-few KB) don't compress well with standard zstd but achieve 3-6x compression with a dictionary that captures common patterns (DID prefixes, collection NSIDs, CBOR structure bytes).
66+77+## Part 1: Dictionary Training Tool
88+99+Create `src/bin/ramjet_dictgen.rs` that:
1010+1111+1. Opens the fjall database read-only
1212+2. Iterates the events keyspace collecting 10K-100K raw compact event values into `Vec<Vec<u8>>`
1313+3. Calls `zstd::dict::from_samples(&samples, dict_size)` (COVER algorithm)
1414+4. Writes dictionary bytes to a file
1515+5. Prints stats: sample count, dictionary size, average compression ratio with/without dictionary
1616+1717+### CLI args (clap)
1818+1919+- `--db-path` - path to fjall database (required)
2020+- `--output` - output dictionary file path (required)
2121+- `--max-samples` - max events to sample (default 50000)
2222+- `--dict-size` - dictionary size in bytes (default 65536 = 64KB)
2323+2424+### Dependency
2525+2626+```toml
2727+zstd = "0.13"
2828+```
2929+3030+### Notes
3131+3232+- Samples should represent the actual distribution of commit/identity/account events
3333+- 64KB dictionary is a good starting point; test 32KB and 112KB too
3434+- Retrain when tracked collections change significantly
3535+3636+## Part 2: Compression Integration in Ramjet
3737+3838+### Config changes (`src/config.rs`)
3939+4040+- Add `--zstd-dict-path <PATH>` optional CLI arg to `CliArgs` struct
4141+- Add `zstd_dict_path: Option<PathBuf>` field to `ServiceConfig`
4242+- When absent, events are stored uncompressed (current behavior)
4343+4444+### Storage changes (`src/storage/mod.rs`)
4545+4646+- Add `zstd_dict: Option<Vec<u8>>` field to `FjallDb`
4747+- Extend `FjallDb::open(path: &Path)` to accept an optional dictionary path, or load it in `main.rs` before passing to a new constructor
4848+- Add helper methods:
4949+ - `compress_event(&self, data: &[u8]) -> Vec<u8>` - returns compressed if dict loaded, raw otherwise
5050+ - `decompress_event(&self, data: &[u8]) -> anyhow::Result<Vec<u8>>` - auto-detects format
5151+5252+### Format detection for backwards compatibility
5353+5454+- Zstd frames start with magic bytes `0x28B52FFD`
5555+- Compact events start with `0x01`-`0x06` (v1: `0x01` commit, `0x02` identity, `0x03` account; v2: `0x04` commit_op, `0x05` identity_v2, `0x06` account_v2)
5656+- Legacy JSON starts with `0x7B` (`{`)
5757+- Check first 4 bytes to determine format; decompress only if zstd magic detected
5858+5959+### Writer changes (`src/pipeline/writer.rs`)
6060+6161+- After `encode_compact_commit_op`/`encode_compact_identity_v2`/`encode_compact_account_v2`, compress via `db.compress_event(&compact_event)`
6262+- After `RecordValue::encode()`, compress via `db.compress_event(&encoded)` before inserting into records keyspace
6363+- Create `zstd::bulk::Compressor::with_dictionary(level, &dict)` per batch (cheap to create)
6464+- Compression level 3 (default) is a good balance
6565+6666+### Backfill changes (`src/pipeline/backfill.rs`)
6767+6868+- After `RecordValue::encode()`, compress via `db.compress_event(&encoded)` before inserting into records keyspace
6969+7070+### Replay changes (`src/server/websocket.rs`)
7171+7272+- Before calling `compact_event_to_cbor`, decompress if zstd magic detected
7373+- Create `zstd::bulk::Decompressor::with_dictionary(&dict)` per connection
7474+- Use `max_decompressed_size = 4 * 1024 * 1024` (4MB cap)
7575+7676+### XRPC changes (`src/server/xrpc.rs`)
7777+7878+- `getRecord` and `listRecords` decompress record values via `db.decompress_event()` before `RecordValue::decode()`
7979+- Tombstones (empty values) pass through decompression unchanged
8080+8181+### Meta keyspace
8282+8383+- Store `zstd_dict_id` (hash of dictionary bytes) in meta keyspace at startup
8484+- Log a warning if the stored dict ID doesn't match the loaded dictionary (dictionary was swapped without retraining)
8585+8686+### Memory budget
8787+8888+- Each Compressor/Decompressor instance: ~128-256KB + dictionary copy
8989+- Writer needs 1 compressor (reused per batch)
9090+- Each replaying WebSocket connection needs 1 decompressor
9191+9292+## Migration Path
9393+9494+1. Deploy with `--zstd-dict-path` absent - no change to existing behavior
9595+2. Run `ramjet_dictgen` against a live database to train dictionary
9696+3. Redeploy with `--zstd-dict-path /path/to/dict`
9797+4. New events and records are compressed; old data (compact binary, legacy JSON, uncompressed records) still readable via format detection
9898+5. No data migration needed - mixed compressed/uncompressed data coexists in both keyspaces
9999+100100+## Expected Results
101101+102102+- Identity/account events (~50-80 bytes compact): ~2-3x compression with dictionary
103103+- Commit events with record bodies (~200B-5KB compact): ~3-6x compression with dictionary
104104+- CPU overhead: negligible at zstd level 3 for small payloads
+423
src/bin/ramjet_consumer.rs
···11+//! Benchmark binary: connects to a ramjet instance's stream.subscribe
22+//! WebSocket endpoint and logs ingestion rate with per-collection
33+//! breakdown for pipeline analysis and publication baseline.
44+55+use std::collections::HashMap;
66+use std::sync::atomic::{AtomicU64, Ordering};
77+use std::sync::{Arc, Mutex};
88+use std::time::{Duration, Instant};
99+1010+use clap::Parser;
1111+use futures_util::StreamExt;
1212+use tokio::sync::mpsc;
1313+1414+#[derive(Parser, Debug)]
1515+#[command(
1616+ name = "ramjet-consumer",
1717+ about = "Benchmark fan-out consumer with per-collection stats"
1818+)]
1919+struct Args {
2020+ /// Ramjet host to connect to.
2121+ #[arg(long, default_value = "localhost:8080")]
2222+ host: String,
2323+2424+ /// Cursor to resume from (omit to start live).
2525+ #[arg(long)]
2626+ cursor: Option<u64>,
2727+2828+ /// Stats reporting interval in seconds.
2929+ #[arg(long, default_value = "10")]
3030+ stats_interval: u64,
3131+3232+ /// Number of top collections to show in stats output.
3333+ #[arg(long, default_value = "10")]
3434+ top_collections: usize,
3535+3636+ /// Sample rate for printing events to stdout (0.0 = none, 1.0 = all).
3737+ #[arg(long)]
3838+ sample: Option<f64>,
3939+}
4040+4141+/// Minimal event struct for fast CBOR deserialization — skips record data.
4242+#[derive(serde::Deserialize)]
4343+struct MinimalEvent {
4444+ #[serde(default)]
4545+ kind: Option<String>,
4646+ #[serde(default)]
4747+ seq: Option<u64>,
4848+ #[serde(default)]
4949+ commit: Option<MinimalCommit>,
5050+}
5151+5252+#[derive(serde::Deserialize)]
5353+struct MinimalCommit {
5454+ #[serde(default)]
5555+ collection: Option<String>,
5656+}
5757+5858+struct CollectionData {
5959+ interval: HashMap<String, u64>,
6060+ totals: HashMap<String, u64>,
6161+}
6262+6363+struct Stats {
6464+ // Written by receive loop
6565+ events_total: AtomicU64,
6666+ bytes_received: AtomicU64,
6767+6868+ // Written by processing task
6969+ commits_total: AtomicU64,
7070+ identity_total: AtomicU64,
7171+ account_total: AtomicU64,
7272+ unknown_total: AtomicU64,
7373+ last_seq: AtomicU64,
7474+ seq_gaps: AtomicU64,
7575+ seq_gap_events: AtomicU64,
7676+ process_drops: AtomicU64,
7777+7878+ /// Single mutex for collection accounting.
7979+ collections: Mutex<CollectionData>,
8080+}
8181+8282+impl Stats {
8383+ fn new() -> Self {
8484+ Self {
8585+ events_total: AtomicU64::new(0),
8686+ commits_total: AtomicU64::new(0),
8787+ identity_total: AtomicU64::new(0),
8888+ account_total: AtomicU64::new(0),
8989+ unknown_total: AtomicU64::new(0),
9090+ bytes_received: AtomicU64::new(0),
9191+ last_seq: AtomicU64::new(0),
9292+ seq_gaps: AtomicU64::new(0),
9393+ seq_gap_events: AtomicU64::new(0),
9494+ process_drops: AtomicU64::new(0),
9595+ collections: Mutex::new(CollectionData {
9696+ interval: HashMap::new(),
9797+ totals: HashMap::new(),
9898+ }),
9999+ }
100100+ }
101101+102102+ fn take_interval_counts(&self) -> HashMap<String, u64> {
103103+ let mut data = self.collections.lock().unwrap();
104104+ std::mem::take(&mut data.interval)
105105+ }
106106+107107+ fn total_collections(&self) -> usize {
108108+ self.collections.lock().unwrap().totals.len()
109109+ }
110110+}
111111+112112+#[tokio::main]
113113+async fn main() -> anyhow::Result<()> {
114114+ let args = Args::parse();
115115+116116+ tracing_subscriber::fmt()
117117+ .with_env_filter(
118118+ tracing_subscriber::EnvFilter::try_from_default_env()
119119+ .unwrap_or_else(|_| tracing_subscriber::EnvFilter::new("info")),
120120+ )
121121+ .init();
122122+123123+ tracing::info!(
124124+ host = %args.host,
125125+ cursor = ?args.cursor,
126126+ stats_interval = args.stats_interval,
127127+ top_collections = args.top_collections,
128128+ "starting ramjet-consumer benchmark"
129129+ );
130130+131131+ let stats = Arc::new(Stats::new());
132132+133133+ // Processing channel — receive loop sends raw payloads, processing task handles parsing
134134+ let (process_tx, process_rx) = mpsc::channel::<Vec<u8>>(8192);
135135+136136+ // Spawn processing task
137137+ tokio::spawn({
138138+ let stats = stats.clone();
139139+ let sample_rate = args.sample.unwrap_or(0.0);
140140+ async move {
141141+ process_events(process_rx, stats, sample_rate).await;
142142+ }
143143+ });
144144+145145+ // Spawn stats reporter
146146+ let _stats_handle = tokio::spawn({
147147+ let stats = stats.clone();
148148+ let interval = Duration::from_secs(args.stats_interval);
149149+ let top_n = args.top_collections;
150150+ async move {
151151+ report_stats_loop(stats, interval, top_n).await;
152152+ }
153153+ });
154154+155155+ // Connect and consume, using last_seq as cursor on reconnection
156156+ let mut cursor = args.cursor;
157157+ loop {
158158+ if let Err(e) = receive_loop(&args.host, cursor, &stats, &process_tx).await {
159159+ tracing::warn!(error = %e, "consumer disconnected, reconnecting in 2s");
160160+ } else {
161161+ tracing::info!("connection closed, reconnecting in 2s");
162162+ }
163163+ // Use last received seq as cursor for reconnection
164164+ let last = stats.last_seq.load(Ordering::Relaxed);
165165+ if last > 0 {
166166+ cursor = Some(last);
167167+ }
168168+ tokio::time::sleep(Duration::from_secs(2)).await;
169169+ }
170170+171171+ #[allow(unreachable_code)]
172172+ Ok(())
173173+}
174174+175175+/// Lightweight receive loop — counts bytes/events and forwards raw payloads
176176+/// to the processing task. No JSON parsing happens here.
177177+async fn receive_loop(
178178+ host: &str,
179179+ cursor: Option<u64>,
180180+ stats: &Stats,
181181+ process_tx: &mpsc::Sender<Vec<u8>>,
182182+) -> anyhow::Result<()> {
183183+ let url = if let Some(cursor) = cursor {
184184+ format!("ws://{host}/xrpc/dev.ngerakines.ramjet.stream.subscribe?cursor={cursor}")
185185+ } else {
186186+ format!("ws://{host}/xrpc/dev.ngerakines.ramjet.stream.subscribe")
187187+ };
188188+189189+ tracing::info!(%url, cursor = ?cursor, "connecting to ramjet");
190190+191191+ let uri: http::Uri = url.parse()?;
192192+ let (mut ws, _response) = tokio_websockets::ClientBuilder::from_uri(uri)
193193+ .connect()
194194+ .await?;
195195+196196+ tracing::info!("connected");
197197+198198+ let mut last_event_at = Instant::now();
199199+ let mut silence_warned = false;
200200+201201+ loop {
202202+ let deadline = if silence_warned {
203203+ last_event_at + Duration::from_secs(10 * 60)
204204+ } else {
205205+ last_event_at + Duration::from_secs(5 * 60)
206206+ };
207207+208208+ tokio::select! {
209209+ biased;
210210+211211+ _ = tokio::time::sleep_until(tokio::time::Instant::from_std(deadline)) => {
212212+ if !silence_warned {
213213+ tracing::warn!(
214214+ silence_secs = last_event_at.elapsed().as_secs(),
215215+ "no events received for 5 minutes, will exit in 5 minutes",
216216+ );
217217+ silence_warned = true;
218218+ } else {
219219+ anyhow::bail!(
220220+ "no events received for 10 minutes, exiting"
221221+ );
222222+ }
223223+ }
224224+225225+ msg = ws.next() => {
226226+ let Some(result) = msg else {
227227+ return Ok(());
228228+ };
229229+ let message = result?;
230230+231231+ if !message.is_binary() {
232232+ continue;
233233+ }
234234+235235+ last_event_at = Instant::now();
236236+ silence_warned = false;
237237+238238+ let payload = message.into_payload();
239239+ let data: &[u8] = payload.as_ref();
240240+241241+ stats
242242+ .bytes_received
243243+ .fetch_add(data.len() as u64, Ordering::Relaxed);
244244+ stats.events_total.fetch_add(1, Ordering::Relaxed);
245245+246246+ if process_tx.try_send(data.to_vec()).is_err() {
247247+ stats.process_drops.fetch_add(1, Ordering::Relaxed);
248248+ }
249249+ }
250250+ }
251251+ }
252252+}
253253+254254+/// Processing task — receives raw payloads from the channel, does minimal
255255+/// CBOR deserialization (skipping record data), and updates stats.
256256+async fn process_events(mut rx: mpsc::Receiver<Vec<u8>>, stats: Arc<Stats>, sample_rate: f64) {
257257+ let mut prev_seq: Option<u64> = None;
258258+259259+ while let Some(data) = rx.recv().await {
260260+ // Minimal deserialization — only extracts kind, seq, commit.collection
261261+ let Ok(event) = atproto_dasl::drisl::from_slice::<MinimalEvent>(data.as_slice()) else {
262262+ stats.unknown_total.fetch_add(1, Ordering::Relaxed);
263263+ continue;
264264+ };
265265+266266+ // Sequence tracking and gap detection
267267+ if let Some(seq) = event.seq {
268268+ if let Some(prev) = prev_seq {
269269+ let gap = seq.saturating_sub(prev);
270270+ if gap > 1 {
271271+ stats.seq_gaps.fetch_add(1, Ordering::Relaxed);
272272+ stats.seq_gap_events.fetch_add(gap - 1, Ordering::Relaxed);
273273+ }
274274+ }
275275+ prev_seq = Some(seq);
276276+ stats.last_seq.store(seq, Ordering::Relaxed);
277277+ }
278278+279279+ // Optional sampling — full parse to Ipld for printing
280280+ if sample_rate > 0.0 {
281281+ use rand::RngExt;
282282+ let should_print =
283283+ sample_rate >= 1.0 || rand::rng().random_range(0.0..1.0) < sample_rate;
284284+ if should_print {
285285+ if let Ok(full) =
286286+ atproto_dasl::drisl::from_slice::<atproto_dasl::Ipld>(data.as_slice())
287287+ {
288288+ print_event(&full);
289289+ }
290290+ }
291291+ }
292292+293293+ match event.kind.as_deref() {
294294+ Some("commit") => {
295295+ stats.commits_total.fetch_add(1, Ordering::Relaxed);
296296+ if let Some(commit) = &event.commit {
297297+ if let Some(collection) = &commit.collection {
298298+ let mut collections = stats.collections.lock().unwrap();
299299+ *collections.interval.entry(collection.clone()).or_default() += 1;
300300+ *collections.totals.entry(collection.clone()).or_default() += 1;
301301+ }
302302+ }
303303+ }
304304+ Some("identity") => {
305305+ stats.identity_total.fetch_add(1, Ordering::Relaxed);
306306+ }
307307+ Some("account") => {
308308+ stats.account_total.fetch_add(1, Ordering::Relaxed);
309309+ }
310310+ _ => {
311311+ stats.unknown_total.fetch_add(1, Ordering::Relaxed);
312312+ }
313313+ }
314314+ }
315315+}
316316+317317+async fn report_stats_loop(stats: Arc<Stats>, interval: Duration, top_n: usize) {
318318+ let mut ticker = tokio::time::interval(interval);
319319+ ticker.tick().await; // skip first immediate tick
320320+321321+ let mut prev_events: u64 = 0;
322322+ let mut prev_commits: u64 = 0;
323323+ let mut prev_bytes: u64 = 0;
324324+ let mut prev_time = Instant::now();
325325+326326+ loop {
327327+ ticker.tick().await;
328328+329329+ let now = Instant::now();
330330+ let elapsed = now.duration_since(prev_time).as_secs_f64();
331331+ prev_time = now;
332332+333333+ let events = stats.events_total.load(Ordering::Relaxed);
334334+ let commits = stats.commits_total.load(Ordering::Relaxed);
335335+ let identity = stats.identity_total.load(Ordering::Relaxed);
336336+ let account = stats.account_total.load(Ordering::Relaxed);
337337+ let unknown = stats.unknown_total.load(Ordering::Relaxed);
338338+ let bytes = stats.bytes_received.load(Ordering::Relaxed);
339339+ let last_seq = stats.last_seq.load(Ordering::Relaxed);
340340+ let seq_gaps = stats.seq_gaps.load(Ordering::Relaxed);
341341+ let seq_gap_events = stats.seq_gap_events.load(Ordering::Relaxed);
342342+ let process_drops = stats.process_drops.load(Ordering::Relaxed);
343343+344344+ let d_events = events - prev_events;
345345+ let d_commits = commits - prev_commits;
346346+ let d_bytes = bytes - prev_bytes;
347347+348348+ prev_events = events;
349349+ prev_commits = commits;
350350+ prev_bytes = bytes;
351351+352352+ let event_rate = d_events as f64 / elapsed;
353353+ let commit_rate = d_commits as f64 / elapsed;
354354+ let byte_rate = d_bytes as f64 / elapsed;
355355+356356+ // Get and format top collections for this interval
357357+ let interval_counts = stats.take_interval_counts();
358358+ let top_collections = top_n_collections(&interval_counts, top_n);
359359+ let total_collections = stats.total_collections();
360360+361361+ tracing::info!(
362362+ events_per_sec = format!("{event_rate:.0}"),
363363+ commits_per_sec = format!("{commit_rate:.0}"),
364364+ throughput = format!("{}/s", format_bytes(byte_rate as u64)),
365365+ interval_events = d_events,
366366+ interval_commits = d_commits,
367367+ total_events = events,
368368+ total_commits = commits,
369369+ total_identity = identity,
370370+ total_account = account,
371371+ total_unknown = unknown,
372372+ total_bytes = format_bytes(bytes),
373373+ last_seq = last_seq,
374374+ unique_collections = total_collections,
375375+ seq_gaps = seq_gaps,
376376+ seq_gap_events = seq_gap_events,
377377+ process_drops = process_drops,
378378+ "stats"
379379+ );
380380+381381+ if !top_collections.is_empty() {
382382+ for (collection, count) in &top_collections {
383383+ let rate = *count as f64 / elapsed;
384384+ tracing::info!(
385385+ collection = %collection,
386386+ ops = count,
387387+ ops_per_sec = format!("{rate:.1}"),
388388+ " collection"
389389+ );
390390+ }
391391+ }
392392+ }
393393+}
394394+395395+fn print_event(event: &atproto_dasl::Ipld) {
396396+ match serde_json::to_string(event) {
397397+ Ok(json) => println!("{json}"),
398398+ Err(_) => println!("{event:?}"),
399399+ }
400400+}
401401+402402+fn top_n_collections(counts: &HashMap<String, u64>, n: usize) -> Vec<(String, u64)> {
403403+ let mut entries: Vec<(String, u64)> = counts.iter().map(|(k, v)| (k.clone(), *v)).collect();
404404+ entries.sort_by(|a, b| b.1.cmp(&a.1));
405405+ entries.truncate(n);
406406+ entries
407407+}
408408+409409+fn format_bytes(bytes: u64) -> String {
410410+ const KIB: u64 = 1024;
411411+ const MIB: u64 = 1024 * KIB;
412412+ const GIB: u64 = 1024 * MIB;
413413+414414+ if bytes >= GIB {
415415+ format!("{:.2} GiB", bytes as f64 / GIB as f64)
416416+ } else if bytes >= MIB {
417417+ format!("{:.2} MiB", bytes as f64 / MIB as f64)
418418+ } else if bytes >= KIB {
419419+ format!("{:.2} KiB", bytes as f64 / KIB as f64)
420420+ } else {
421421+ format!("{bytes} B")
422422+ }
423423+}
+685
src/bin/ramjet_data.rs
···11+//! Interactive TUI for exploring a ramjet fjall database.
22+//!
33+//! Overview screen shows database metrics and keyspace list.
44+//! Selecting a keyspace opens a key browser with live value preview.
55+66+use std::path::PathBuf;
77+88+use clap::Parser;
99+use crossterm::event::{self, Event, KeyCode, KeyEventKind};
1010+use crossterm::terminal::{disable_raw_mode, enable_raw_mode};
1111+use ratatui::DefaultTerminal;
1212+use ratatui::layout::{Constraint, Direction, Layout};
1313+use ratatui::prelude::Stylize;
1414+use ratatui::style::{Color, Modifier, Style};
1515+use ratatui::text::{Line, Span, Text};
1616+use ratatui::widgets::{Block, Borders, List, ListItem, ListState, Paragraph, Wrap};
1717+1818+use ramjet::storage::FjallDb;
1919+2020+#[derive(Parser)]
2121+#[command(name = "ramjet-data", about = "Interactive database explorer")]
2222+struct Args {
2323+ /// Path to the fjall database directory.
2424+ #[arg(long, env = "RAMJET_DB_PATH", default_value = "./data/ramjet.db")]
2525+ db_path: PathBuf,
2626+}
2727+2828+/// Names of all 8 keyspaces, matching FjallDb field order.
2929+const KEYSPACE_NAMES: [&str; 8] = [
3030+ "records",
3131+ "events",
3232+ "meta",
3333+ "repo_state",
3434+ "did_to_doc",
3535+ "handle_to_did",
3636+ "blobs",
3737+ "blob_meta",
3838+];
3939+4040+struct App {
4141+ db: FjallDb,
4242+ db_path: PathBuf,
4343+ screen: Screen,
4444+ // Overview
4545+ overview_state: ListState,
4646+ keyspace_stats: Vec<KeyspaceInfo>,
4747+ disk_size: u64,
4848+ // Explorer
4949+ explorer: ExplorerState,
5050+}
5151+5252+#[derive(PartialEq)]
5353+enum Screen {
5454+ Overview,
5555+ Explorer,
5656+}
5757+5858+struct KeyspaceInfo {
5959+ name: String,
6060+ count: u64,
6161+ exact: bool,
6262+}
6363+6464+struct ExplorerState {
6565+ keyspace_name: String,
6666+ entries: Vec<(Vec<u8>, Vec<u8>)>,
6767+ list_state: ListState,
6868+ scroll_offset: u16,
6969+}
7070+7171+impl ExplorerState {
7272+ fn new() -> Self {
7373+ Self {
7474+ keyspace_name: String::new(),
7575+ entries: Vec::new(),
7676+ list_state: ListState::default(),
7777+ scroll_offset: 0,
7878+ }
7979+ }
8080+8181+ fn selected_value(&self) -> Option<&[u8]> {
8282+ self.list_state
8383+ .selected()
8484+ .and_then(|i| self.entries.get(i))
8585+ .map(|(_, v)| v.as_slice())
8686+ }
8787+8888+ fn selected_key(&self) -> Option<&[u8]> {
8989+ self.list_state
9090+ .selected()
9191+ .and_then(|i| self.entries.get(i))
9292+ .map(|(k, _)| k.as_slice())
9393+ }
9494+}
9595+9696+impl App {
9797+ fn new(db: FjallDb, db_path: PathBuf) -> Self {
9898+ let mut app = Self {
9999+ db,
100100+ db_path,
101101+ screen: Screen::Overview,
102102+ overview_state: ListState::default(),
103103+ keyspace_stats: Vec::new(),
104104+ disk_size: 0,
105105+ explorer: ExplorerState::new(),
106106+ };
107107+ app.overview_state.select(Some(0));
108108+ app.refresh_overview();
109109+ app
110110+ }
111111+112112+ fn refresh_overview(&mut self) {
113113+ self.keyspace_stats = KEYSPACE_NAMES
114114+ .iter()
115115+ .map(|name| {
116116+ let ks = self.get_keyspace(name);
117117+ let (count, exact) = count_keyspace(ks);
118118+ KeyspaceInfo {
119119+ name: name.to_string(),
120120+ count,
121121+ exact,
122122+ }
123123+ })
124124+ .collect();
125125+ self.disk_size = dir_size(&self.db_path);
126126+ }
127127+128128+ fn get_keyspace(&self, name: &str) -> &fjall::Keyspace {
129129+ match name {
130130+ "records" => &self.db.records,
131131+ "events" => &self.db.events,
132132+ "meta" => &self.db.meta,
133133+ "repo_state" => &self.db.repo_state,
134134+ "did_to_doc" => &self.db.did_to_doc,
135135+ "handle_to_did" => &self.db.handle_to_did,
136136+ "blobs" => &self.db.blobs,
137137+ "blob_meta" => &self.db.blob_meta,
138138+ _ => &self.db.meta,
139139+ }
140140+ }
141141+142142+ fn enter_explorer(&mut self) {
143143+ let Some(idx) = self.overview_state.selected() else {
144144+ return;
145145+ };
146146+ let Some(info) = self.keyspace_stats.get(idx) else {
147147+ return;
148148+ };
149149+150150+ self.explorer.keyspace_name = info.name.clone();
151151+ self.explorer.scroll_offset = 0;
152152+153153+ let ks = self.get_keyspace(&info.name);
154154+ let mut entries = Vec::new();
155155+ for guard in ks.iter() {
156156+ let Ok((k, v)) = guard.into_inner() else {
157157+ continue;
158158+ };
159159+ let key: &[u8] = &k;
160160+ let val: &[u8] = &v;
161161+ entries.push((key.to_vec(), val.to_vec()));
162162+ if entries.len() >= 50_000 {
163163+ break;
164164+ }
165165+ }
166166+ self.explorer.entries = entries;
167167+ self.explorer.list_state = ListState::default();
168168+ if !self.explorer.entries.is_empty() {
169169+ self.explorer.list_state.select(Some(0));
170170+ }
171171+172172+ self.screen = Screen::Explorer;
173173+ }
174174+}
175175+176176+fn main() -> anyhow::Result<()> {
177177+ let args = Args::parse();
178178+ let db = FjallDb::open(&args.db_path, None)?;
179179+ let mut app = App::new(db, args.db_path);
180180+181181+ enable_raw_mode()?;
182182+ let mut terminal = ratatui::init();
183183+ let result = run_app(&mut terminal, &mut app);
184184+ ratatui::restore();
185185+ disable_raw_mode()?;
186186+187187+ result
188188+}
189189+190190+fn run_app(terminal: &mut DefaultTerminal, app: &mut App) -> anyhow::Result<()> {
191191+ loop {
192192+ terminal.draw(|frame| match app.screen {
193193+ Screen::Overview => draw_overview(frame, app),
194194+ Screen::Explorer => draw_explorer(frame, app),
195195+ })?;
196196+197197+ if let Event::Key(key) = event::read()? {
198198+ if key.kind != KeyEventKind::Press {
199199+ continue;
200200+ }
201201+202202+ match app.screen {
203203+ Screen::Overview => match key.code {
204204+ KeyCode::Char('q') => return Ok(()),
205205+ KeyCode::Up => {
206206+ let i = app.overview_state.selected().unwrap_or(0);
207207+ if i > 0 {
208208+ app.overview_state.select(Some(i - 1));
209209+ }
210210+ }
211211+ KeyCode::Down => {
212212+ let i = app.overview_state.selected().unwrap_or(0);
213213+ if i + 1 < KEYSPACE_NAMES.len() {
214214+ app.overview_state.select(Some(i + 1));
215215+ }
216216+ }
217217+ KeyCode::Enter => {
218218+ app.enter_explorer();
219219+ }
220220+ KeyCode::Char('r') => {
221221+ app.refresh_overview();
222222+ }
223223+ _ => {}
224224+ },
225225+ Screen::Explorer => match key.code {
226226+ KeyCode::Esc => {
227227+ app.refresh_overview();
228228+ app.screen = Screen::Overview;
229229+ }
230230+ KeyCode::Up => {
231231+ let i = app.explorer.list_state.selected().unwrap_or(0);
232232+ if i > 0 {
233233+ app.explorer.list_state.select(Some(i - 1));
234234+ app.explorer.scroll_offset = 0;
235235+ }
236236+ }
237237+ KeyCode::Down => {
238238+ let i = app.explorer.list_state.selected().unwrap_or(0);
239239+ if i + 1 < app.explorer.entries.len() {
240240+ app.explorer.list_state.select(Some(i + 1));
241241+ app.explorer.scroll_offset = 0;
242242+ }
243243+ }
244244+ KeyCode::PageUp => {
245245+ let i = app.explorer.list_state.selected().unwrap_or(0);
246246+ let new_i = i.saturating_sub(20);
247247+ app.explorer.list_state.select(Some(new_i));
248248+ app.explorer.scroll_offset = 0;
249249+ }
250250+ KeyCode::PageDown => {
251251+ let i = app.explorer.list_state.selected().unwrap_or(0);
252252+ let max = app.explorer.entries.len().saturating_sub(1);
253253+ let new_i = (i + 20).min(max);
254254+ app.explorer.list_state.select(Some(new_i));
255255+ app.explorer.scroll_offset = 0;
256256+ }
257257+ KeyCode::Home => {
258258+ if !app.explorer.entries.is_empty() {
259259+ app.explorer.list_state.select(Some(0));
260260+ app.explorer.scroll_offset = 0;
261261+ }
262262+ }
263263+ KeyCode::End => {
264264+ if !app.explorer.entries.is_empty() {
265265+ app.explorer
266266+ .list_state
267267+ .select(Some(app.explorer.entries.len() - 1));
268268+ app.explorer.scroll_offset = 0;
269269+ }
270270+ }
271271+ KeyCode::Char('j') => {
272272+ app.explorer.scroll_offset = app.explorer.scroll_offset.saturating_add(1);
273273+ }
274274+ KeyCode::Char('k') => {
275275+ app.explorer.scroll_offset = app.explorer.scroll_offset.saturating_sub(1);
276276+ }
277277+ _ => {}
278278+ },
279279+ }
280280+ }
281281+ }
282282+}
283283+284284+fn draw_overview(frame: &mut ratatui::Frame, app: &mut App) {
285285+ let chunks = Layout::default()
286286+ .direction(Direction::Vertical)
287287+ .constraints([Constraint::Length(7), Constraint::Min(5)])
288288+ .split(frame.area());
289289+290290+ // Database info panel
291291+ let cursor = app
292292+ .db
293293+ .get_cursor()
294294+ .ok()
295295+ .flatten()
296296+ .map(|c| c.to_string())
297297+ .unwrap_or_else(|| "none".to_string());
298298+299299+ let event_seq = app.db.current_sequence().to_string();
300300+301301+ let total_keys: u64 = app.keyspace_stats.iter().map(|ks| ks.count).sum();
302302+ let any_inexact = app.keyspace_stats.iter().any(|ks| !ks.exact);
303303+ let keys_prefix = if any_inexact { ">" } else { "" };
304304+305305+ let info_text = Text::from(vec![
306306+ Line::from(vec![
307307+ Span::styled("Path: ", Style::default().fg(Color::DarkGray)),
308308+ Span::raw(app.db_path.display().to_string()),
309309+ ]),
310310+ Line::from(vec![
311311+ Span::styled("Disk: ", Style::default().fg(Color::DarkGray)),
312312+ Span::raw(format_bytes(app.disk_size)),
313313+ Span::raw(" "),
314314+ Span::styled("Keys: ", Style::default().fg(Color::DarkGray)),
315315+ Span::raw(format!("{keys_prefix}{}", format_count(total_keys))),
316316+ ]),
317317+ Line::from(vec![
318318+ Span::styled("Cursor: ", Style::default().fg(Color::DarkGray)),
319319+ Span::raw(cursor),
320320+ Span::raw(" "),
321321+ Span::styled("Event seq: ", Style::default().fg(Color::DarkGray)),
322322+ Span::raw(event_seq),
323323+ ]),
324324+ Line::from(""),
325325+ Line::from(Span::styled(
326326+ "[↑↓] navigate [Enter] explore [r] refresh [q] quit",
327327+ Style::default().fg(Color::DarkGray),
328328+ )),
329329+ ]);
330330+331331+ let info_block = Paragraph::new(info_text).block(
332332+ Block::default()
333333+ .borders(Borders::ALL)
334334+ .title(" Database Overview "),
335335+ );
336336+ frame.render_widget(info_block, chunks[0]);
337337+338338+ // Keyspace list
339339+ let items: Vec<ListItem> = app
340340+ .keyspace_stats
341341+ .iter()
342342+ .map(|ks| {
343343+ let count_str = format_count(ks.count);
344344+ let prefix = if ks.exact { "" } else { ">" };
345345+ ListItem::new(Line::from(vec![
346346+ Span::styled(
347347+ format!("{:<15}", ks.name),
348348+ Style::default().add_modifier(Modifier::BOLD),
349349+ ),
350350+ Span::styled(
351351+ format!("{prefix}{count_str:>12} keys"),
352352+ Style::default().fg(Color::Cyan),
353353+ ),
354354+ ]))
355355+ })
356356+ .collect();
357357+358358+ let list = List::new(items)
359359+ .block(Block::default().borders(Borders::ALL).title(" Keyspaces "))
360360+ .highlight_style(
361361+ Style::default()
362362+ .bg(Color::DarkGray)
363363+ .add_modifier(Modifier::BOLD),
364364+ )
365365+ .highlight_symbol("▸ ");
366366+367367+ frame.render_stateful_widget(list, chunks[1], &mut app.overview_state);
368368+}
369369+370370+fn draw_explorer(frame: &mut ratatui::Frame, app: &mut App) {
371371+ let chunks = Layout::default()
372372+ .direction(Direction::Horizontal)
373373+ .constraints([Constraint::Percentage(35), Constraint::Percentage(65)])
374374+ .split(frame.area());
375375+376376+ // Key list (left panel)
377377+ let items: Vec<ListItem> = app
378378+ .explorer
379379+ .entries
380380+ .iter()
381381+ .map(|(k, v)| {
382382+ let key_display = format_key_display(&app.explorer.keyspace_name, k);
383383+ let size_str = format_bytes_short(v.len() as u64);
384384+ ListItem::new(Line::from(vec![
385385+ Span::raw(key_display),
386386+ Span::styled(
387387+ format!(" ({size_str})"),
388388+ Style::default().fg(Color::DarkGray),
389389+ ),
390390+ ]))
391391+ })
392392+ .collect();
393393+394394+ let position = app.explorer.list_state.selected().unwrap_or(0);
395395+ let total = app.explorer.entries.len();
396396+ let key_title = format!(
397397+ " {} [{}/{total}] ",
398398+ app.explorer.keyspace_name,
399399+ position + 1
400400+ );
401401+402402+ let list = List::new(items)
403403+ .block(
404404+ Block::default()
405405+ .borders(Borders::ALL)
406406+ .title(key_title)
407407+ .title_bottom(
408408+ Line::from(
409409+ " [Esc] back [↑↓ PgUp PgDn Home End] navigate [j/k] scroll value ",
410410+ )
411411+ .fg(Color::DarkGray),
412412+ ),
413413+ )
414414+ .highlight_style(
415415+ Style::default()
416416+ .bg(Color::DarkGray)
417417+ .add_modifier(Modifier::BOLD),
418418+ )
419419+ .highlight_symbol("▸ ");
420420+421421+ frame.render_stateful_widget(list, chunks[0], &mut app.explorer.list_state);
422422+423423+ // Value preview (right panel)
424424+ let value_text = if let (Some(key), Some(value)) =
425425+ (app.explorer.selected_key(), app.explorer.selected_value())
426426+ {
427427+ format_value_preview(&app.explorer.keyspace_name, key, value)
428428+ } else {
429429+ "No entry selected".to_string()
430430+ };
431431+432432+ let value_paragraph = Paragraph::new(value_text)
433433+ .block(
434434+ Block::default()
435435+ .borders(Borders::ALL)
436436+ .title(" Value Preview "),
437437+ )
438438+ .wrap(Wrap { trim: false })
439439+ .scroll((app.explorer.scroll_offset, 0));
440440+441441+ frame.render_widget(value_paragraph, chunks[1]);
442442+}
443443+444444+/// Format a key for display based on which keyspace it belongs to.
445445+fn format_key_display(keyspace: &str, key: &[u8]) -> String {
446446+ match keyspace {
447447+ "records" => {
448448+ if let Ok((did, coll, rkey, rev)) = ramjet::storage::keys::decode_record_key(key) {
449449+ return format!("{did} / {coll} / {rkey} @ {rev}");
450450+ }
451451+ }
452452+ "events" => {
453453+ if let Ok(seq) = ramjet::storage::keys::decode_event_key(key) {
454454+ return format!("seq:{seq}");
455455+ }
456456+ }
457457+ _ => {}
458458+ }
459459+460460+ // Fallback: try UTF-8, then hex
461461+ if let Ok(s) = std::str::from_utf8(key) {
462462+ if s.chars().all(|c| !c.is_control()) {
463463+ return s.to_string();
464464+ }
465465+ }
466466+ format_hex(key)
467467+}
468468+469469+/// Format a value for the preview pane with keyspace-aware decoding.
470470+fn format_value_preview(keyspace: &str, key: &[u8], value: &[u8]) -> String {
471471+ let mut out = String::new();
472472+473473+ // Key info header
474474+ out.push_str(&format!("Key: {}\n", format_key_display(keyspace, key)));
475475+ out.push_str(&format!(
476476+ "Size: {} ({} bytes)\n",
477477+ format_bytes(value.len() as u64),
478478+ value.len()
479479+ ));
480480+ out.push_str("───────────────────────────────────\n");
481481+482482+ // Keyspace-specific decoding
483483+ match keyspace {
484484+ "records" => {
485485+ if ramjet::types::RecordValue::is_tombstone(value) {
486486+ out.push_str("TOMBSTONE (deleted record)\n");
487487+ return out;
488488+ }
489489+ if let Ok(rv) = ramjet::types::RecordValue::decode(value) {
490490+ out.push_str(&format!("CID: {}\n", rv.cid_string()));
491491+ out.push_str(&format!("Data: {} bytes\n\n", rv.data.len()));
492492+ if !rv.data.is_empty() {
493493+ // Try DAG-CBOR → JSON
494494+ if let Ok(val) = atproto_dasl::from_slice::<serde_json::Value>(&rv.data) {
495495+ if let Ok(pretty) = serde_json::to_string_pretty(&val) {
496496+ out.push_str(&pretty);
497497+ return out;
498498+ }
499499+ }
500500+ // Try plain JSON
501501+ if let Ok(val) = serde_json::from_slice::<serde_json::Value>(&rv.data) {
502502+ if let Ok(pretty) = serde_json::to_string_pretty(&val) {
503503+ out.push_str(&pretty);
504504+ return out;
505505+ }
506506+ }
507507+ out.push_str(&format_hex_block(&rv.data));
508508+ }
509509+ return out;
510510+ }
511511+ }
512512+ "repo_state" => {
513513+ if let Ok(rs) = ramjet::types::RepoState::decode(value) {
514514+ out.push_str(&format!("Rev: {}\n", rs.rev));
515515+ out.push_str(&format!("Status: {}\n", rs.status.as_str()));
516516+ out.push_str(&format!("Denied: {}\n", rs.denied));
517517+ return out;
518518+ }
519519+ }
520520+ "events" => {
521521+ // JSON values
522522+ if let Ok(val) = serde_json::from_slice::<serde_json::Value>(value) {
523523+ if let Ok(pretty) = serde_json::to_string_pretty(&val) {
524524+ out.push_str(&pretty);
525525+ return out;
526526+ }
527527+ }
528528+ }
529529+ "did_to_doc" => {
530530+ // Timestamped DID document: [8B BE timestamp][JSON]
531531+ let (ts, json_bytes) = ramjet::storage::encoding::decode_timestamped_doc(value);
532532+ if ts > 0 {
533533+ out.push_str(&format!("updated_at: {ts} (unix secs)\n\n"));
534534+ }
535535+ if let Ok(val) = serde_json::from_slice::<serde_json::Value>(json_bytes) {
536536+ if let Ok(pretty) = serde_json::to_string_pretty(&val) {
537537+ out.push_str(&pretty);
538538+ return out;
539539+ }
540540+ }
541541+ }
542542+ "handle_to_did" => {
543543+ if let Ok(s) = std::str::from_utf8(value) {
544544+ out.push_str(s);
545545+ return out;
546546+ }
547547+ }
548548+ "meta" => {
549549+ // Try as u64 BE
550550+ if value.len() == 8 {
551551+ let val = u64::from_be_bytes(value.try_into().unwrap());
552552+ out.push_str(&format!("{val} (u64)\n"));
553553+ return out;
554554+ }
555555+ // Try as UTF-8
556556+ if let Ok(s) = std::str::from_utf8(value) {
557557+ if !s.is_empty() && s.chars().all(|c| !c.is_control()) {
558558+ out.push_str(s);
559559+ return out;
560560+ }
561561+ }
562562+ // Try as JSON
563563+ if let Ok(val) = serde_json::from_slice::<serde_json::Value>(value) {
564564+ if let Ok(pretty) = serde_json::to_string_pretty(&val) {
565565+ out.push_str(&pretty);
566566+ return out;
567567+ }
568568+ }
569569+ }
570570+ _ => {}
571571+ }
572572+573573+ // Fallback: hex dump
574574+ out.push_str(&format_hex_block(value));
575575+ out
576576+}
577577+578578+fn format_hex(bytes: &[u8]) -> String {
579579+ bytes.iter().map(|b| format!("{b:02x}")).collect()
580580+}
581581+582582+fn format_hex_block(bytes: &[u8]) -> String {
583583+ let mut out = String::new();
584584+ for (i, chunk) in bytes.chunks(16).enumerate() {
585585+ let offset = i * 16;
586586+ out.push_str(&format!("{offset:08x} "));
587587+ for (j, b) in chunk.iter().enumerate() {
588588+ out.push_str(&format!("{b:02x} "));
589589+ if j == 7 {
590590+ out.push(' ');
591591+ }
592592+ }
593593+ // Pad if short
594594+ let pad = 16 - chunk.len();
595595+ for _ in 0..pad {
596596+ out.push_str(" ");
597597+ }
598598+ if chunk.len() <= 8 {
599599+ out.push(' ');
600600+ }
601601+ out.push_str(" |");
602602+ for b in chunk {
603603+ if b.is_ascii_graphic() || *b == b' ' {
604604+ out.push(*b as char);
605605+ } else {
606606+ out.push('.');
607607+ }
608608+ }
609609+ out.push_str("|\n");
610610+ }
611611+ out
612612+}
613613+614614+/// Returns (count, exact). If the keyspace has more than 1M keys,
615615+/// counting stops early and `exact` is false.
616616+fn count_keyspace(ks: &fjall::Keyspace) -> (u64, bool) {
617617+ let limit: u64 = 1_000_000;
618618+ let mut count: u64 = 0;
619619+ for guard in ks.iter() {
620620+ if guard.into_inner().is_ok() {
621621+ count += 1;
622622+ }
623623+ if count >= limit {
624624+ return (count, false);
625625+ }
626626+ }
627627+ (count, true)
628628+}
629629+630630+fn dir_size(path: &PathBuf) -> u64 {
631631+ let Ok(entries) = std::fs::read_dir(path) else {
632632+ return 0;
633633+ };
634634+ let mut total: u64 = 0;
635635+ for entry in entries.flatten() {
636636+ let Ok(meta) = entry.metadata() else {
637637+ continue;
638638+ };
639639+ if meta.is_dir() {
640640+ total += dir_size(&entry.path().to_path_buf());
641641+ } else {
642642+ total += meta.len();
643643+ }
644644+ }
645645+ total
646646+}
647647+648648+fn format_bytes(bytes: u64) -> String {
649649+ const KIB: u64 = 1024;
650650+ const MIB: u64 = 1024 * KIB;
651651+ const GIB: u64 = 1024 * MIB;
652652+653653+ if bytes >= GIB {
654654+ format!("{:.2} GiB", bytes as f64 / GIB as f64)
655655+ } else if bytes >= MIB {
656656+ format!("{:.2} MiB", bytes as f64 / MIB as f64)
657657+ } else if bytes >= KIB {
658658+ format!("{:.2} KiB", bytes as f64 / KIB as f64)
659659+ } else {
660660+ format!("{bytes} B")
661661+ }
662662+}
663663+664664+fn format_bytes_short(bytes: u64) -> String {
665665+ const KIB: u64 = 1024;
666666+ const MIB: u64 = 1024 * KIB;
667667+668668+ if bytes >= MIB {
669669+ format!("{:.1}M", bytes as f64 / MIB as f64)
670670+ } else if bytes >= KIB {
671671+ format!("{:.1}K", bytes as f64 / KIB as f64)
672672+ } else {
673673+ format!("{bytes}B")
674674+ }
675675+}
676676+677677+fn format_count(n: u64) -> String {
678678+ if n >= 1_000_000 {
679679+ format!("{:.2}M", n as f64 / 1_000_000.0)
680680+ } else if n >= 1_000 {
681681+ format!("{:.1}K", n as f64 / 1_000.0)
682682+ } else {
683683+ n.to_string()
684684+ }
685685+}
+121
src/bin/ramjet_dictgen.rs
···11+//! Zstd dictionary training tool for the events keyspace.
22+//!
33+//! Samples compact binary events from a fjall database and trains a zstd
44+//! dictionary using the COVER algorithm. Prints compression statistics
55+//! comparing dictionary-based vs standard compression.
66+77+use std::path::PathBuf;
88+99+use clap::Parser;
1010+1111+use ramjet::storage::FjallDb;
1212+1313+#[derive(Parser)]
1414+#[command(
1515+ name = "ramjet-dictgen",
1616+ about = "Train a zstd dictionary from the events keyspace"
1717+)]
1818+struct Args {
1919+ /// Path to the fjall database directory.
2020+ #[arg(long)]
2121+ db_path: PathBuf,
2222+2323+ /// Output dictionary file path.
2424+ #[arg(long)]
2525+ output: PathBuf,
2626+2727+ /// Maximum number of events to sample.
2828+ #[arg(long, default_value = "50000")]
2929+ max_samples: usize,
3030+3131+ /// Dictionary size in bytes.
3232+ #[arg(long, default_value = "65536")]
3333+ dict_size: usize,
3434+}
3535+3636+fn main() -> anyhow::Result<()> {
3737+ let args = Args::parse();
3838+3939+ let db = FjallDb::open(&args.db_path, None)?;
4040+4141+ println!(
4242+ "Sampling up to {} events from {}...",
4343+ args.max_samples,
4444+ args.db_path.display()
4545+ );
4646+4747+ let mut samples: Vec<Vec<u8>> = Vec::with_capacity(args.max_samples);
4848+ for guard in db.events.iter() {
4949+ let Ok((_key, value)) = guard.into_inner() else {
5050+ continue;
5151+ };
5252+ let bytes: &[u8] = &value;
5353+ samples.push(bytes.to_vec());
5454+ if samples.len() >= args.max_samples {
5555+ break;
5656+ }
5757+ }
5858+5959+ if samples.is_empty() {
6060+ anyhow::bail!("no events found in the database");
6161+ }
6262+6363+ println!("Collected {} samples", samples.len());
6464+6565+ // Compute stats on raw samples
6666+ let total_raw: usize = samples.iter().map(|s| s.len()).sum();
6767+ let avg_raw = total_raw as f64 / samples.len() as f64;
6868+6969+ println!("Average sample size: {avg_raw:.1} bytes");
7070+ println!("Total raw size: {total_raw} bytes");
7171+ println!();
7272+7373+ // Train dictionary
7474+ println!(
7575+ "Training dictionary ({} bytes) with COVER algorithm...",
7676+ args.dict_size
7777+ );
7878+ let dict = zstd::dict::from_samples(&samples, args.dict_size)?;
7979+ println!("Dictionary trained: {} bytes", dict.len());
8080+8181+ // Compress with dictionary and measure
8282+ let mut total_compressed_dict: usize = 0;
8383+ let mut total_compressed_plain: usize = 0;
8484+8585+ let mut compressor_dict = zstd::bulk::Compressor::with_dictionary(3, &dict)?;
8686+ let mut compressor_plain = zstd::bulk::Compressor::new(3)?;
8787+8888+ for sample in &samples {
8989+ let compressed = compressor_dict.compress(sample)?;
9090+ total_compressed_dict += compressed.len();
9191+9292+ let compressed = compressor_plain.compress(sample)?;
9393+ total_compressed_plain += compressed.len();
9494+ }
9595+9696+ let ratio_dict = total_raw as f64 / total_compressed_dict as f64;
9797+ let ratio_plain = total_raw as f64 / total_compressed_plain as f64;
9898+9999+ println!();
100100+ println!("Compression Results ({} samples)", samples.len());
101101+ println!("-----------------------------------");
102102+ println!(
103103+ " Without dictionary: {} -> {} ({:.2}x)",
104104+ total_raw, total_compressed_plain, ratio_plain
105105+ );
106106+ println!(
107107+ " With dictionary: {} -> {} ({:.2}x)",
108108+ total_raw, total_compressed_dict, ratio_dict
109109+ );
110110+ println!(
111111+ " Dictionary gain: {:.2}x over plain zstd",
112112+ ratio_dict / ratio_plain
113113+ );
114114+ println!();
115115+116116+ // Write dictionary
117117+ std::fs::write(&args.output, &dict)?;
118118+ println!("Dictionary written to {}", args.output.display());
119119+120120+ Ok(())
121121+}
+630
src/bin/ramjet_forecast.rs
···11+//! Storage forecast tool for capacity planning.
22+//!
33+//! Analyzes a sample database to compute ingestion rates and project
44+//! storage requirements across multiple time horizons.
55+66+use std::collections::HashMap;
77+use std::path::PathBuf;
88+99+use clap::Parser;
1010+1111+use ramjet::storage::FjallDb;
1212+use ramjet::storage::encoding::{self, decode_timestamped_doc};
1313+use ramjet::storage::keys;
1414+1515+#[derive(Parser)]
1616+#[command(name = "ramjet-forecast", about = "Storage capacity forecast tool")]
1717+struct Args {
1818+ /// Path to the fjall database directory.
1919+ #[arg(long, env = "RAMJET_DB_PATH", default_value = "./data/ramjet.db")]
2020+ db_path: PathBuf,
2121+2222+ /// Maximum number of keys to sample per keyspace (0 = unlimited).
2323+ #[arg(long, default_value = "0")]
2424+ sample_limit: u64,
2525+}
2626+2727+fn main() -> anyhow::Result<()> {
2828+ let args = Args::parse();
2929+3030+ let db = FjallDb::open(&args.db_path, None)?;
3131+ let disk_bytes = dir_size(&args.db_path);
3232+3333+ println!("Ramjet Storage Forecast");
3434+ println!("=======================");
3535+ println!("Database path: {}", args.db_path.display());
3636+ println!("Disk usage: {}", format_bytes(disk_bytes));
3737+ println!();
3838+3939+ // -- Events keyspace analysis (primary rate source) --
4040+ let event_stats = analyze_events(&db, args.sample_limit);
4141+ let sample_duration_secs = event_stats.duration_secs;
4242+4343+ if sample_duration_secs < 60.0 {
4444+ println!(
4545+ "WARNING: Sample duration is only {:.0}s. Run the service longer for accurate projections.",
4646+ sample_duration_secs
4747+ );
4848+ println!();
4949+ }
5050+5151+ println!("Sample Period");
5252+ println!("-------------");
5353+ println!(
5454+ "Duration: {:.1} hours ({:.0} seconds)",
5555+ sample_duration_secs / 3600.0,
5656+ sample_duration_secs
5757+ );
5858+ println!("Total events: {}", format_count(event_stats.total_events));
5959+ println!(
6060+ "Events/sec: {:.1}",
6161+ event_stats.total_events as f64 / sample_duration_secs
6262+ );
6363+ println!();
6464+6565+ // -- Event type breakdown --
6666+ println!("Event Breakdown");
6767+ println!("---------------");
6868+ for (event_type, count) in &event_stats.by_type {
6969+ let pct = *count as f64 / event_stats.total_events.max(1) as f64 * 100.0;
7070+ let rate = *count as f64 / sample_duration_secs;
7171+ println!(
7272+ " {:<12} {:>10} ({:>5.1}%) {:.1}/s",
7373+ event_type,
7474+ format_count(*count),
7575+ pct,
7676+ rate
7777+ );
7878+ }
7979+ println!();
8080+8181+ // -- Keyspace analysis --
8282+ let keyspace_stats = analyze_keyspaces(&db, args.sample_limit);
8383+8484+ println!("Keyspace Analysis");
8585+ println!("-----------------");
8686+ println!(
8787+ " {:<15} {:>12} {:>12} {:>12} {:>12}",
8888+ "Keyspace", "Keys", "Avg Key", "Avg Value", "Est. Size"
8989+ );
9090+ let mut total_estimated = 0u64;
9191+ for ks in &keyspace_stats {
9292+ let est_size = ks.count as u64 * (ks.avg_key_size + ks.avg_value_size);
9393+ total_estimated += est_size;
9494+ println!(
9595+ " {:<15} {:>12} {:>10} B {:>10} B {:>12}",
9696+ ks.name,
9797+ format_count(ks.count),
9898+ ks.avg_key_size,
9999+ ks.avg_value_size,
100100+ format_bytes(est_size)
101101+ );
102102+ }
103103+ println!(
104104+ " {:<15} {:>12} {:>10} {:>10} {:>12}",
105105+ "",
106106+ "",
107107+ "",
108108+ "",
109109+ format_bytes(total_estimated)
110110+ );
111111+ println!();
112112+113113+ // -- Collection breakdown in records keyspace --
114114+ let collection_stats = analyze_collections(&db, args.sample_limit);
115115+ if !collection_stats.is_empty() {
116116+ println!("Records by Collection");
117117+ println!("---------------------");
118118+ println!(
119119+ " {:<45} {:>10} {:>10} {:>12}",
120120+ "Collection", "Records", "Avg Size", "Est. Size"
121121+ );
122122+ let mut sorted: Vec<_> = collection_stats.iter().collect();
123123+ sorted.sort_by(|a, b| b.1.total_bytes.cmp(&a.1.total_bytes));
124124+ for (collection, stats) in &sorted {
125125+ let avg = if stats.count > 0 {
126126+ stats.total_bytes / stats.count as u64
127127+ } else {
128128+ 0
129129+ };
130130+ println!(
131131+ " {:<45} {:>10} {:>8} B {:>12}",
132132+ truncate(collection, 45),
133133+ format_count(stats.count),
134134+ avg,
135135+ format_bytes(stats.total_bytes)
136136+ );
137137+ }
138138+ println!();
139139+ }
140140+141141+ // -- Identity stats --
142142+ let identity_stats = analyze_identities(&db, args.sample_limit);
143143+ println!("Identity Cache");
144144+ println!("--------------");
145145+ println!(
146146+ " DID documents: {}",
147147+ format_count(identity_stats.doc_count)
148148+ );
149149+ println!(
150150+ " Handle mappings: {}",
151151+ format_count(identity_stats.handle_count)
152152+ );
153153+ println!(" Avg doc size: {} B", identity_stats.avg_doc_size);
154154+ println!(
155155+ " Unique repos: {}",
156156+ format_count(identity_stats.unique_repos)
157157+ );
158158+ println!();
159159+160160+ // -- Projections --
161161+ if sample_duration_secs < 1.0 {
162162+ println!("Insufficient data for projections (need at least 1 second of events).");
163163+ return Ok(());
164164+ }
165165+166166+ // Calculate growth rates from the sample
167167+ let events_per_sec = event_stats.total_events as f64 / sample_duration_secs;
168168+ let events_bytes_per_sec = keyspace_stats
169169+ .iter()
170170+ .find(|ks| ks.name == "events")
171171+ .map(|ks| {
172172+ let total = ks.count as f64 * (ks.avg_key_size + ks.avg_value_size) as f64;
173173+ total / sample_duration_secs
174174+ })
175175+ .unwrap_or(0.0);
176176+177177+ let records_per_sec = {
178178+ let rec_ks = keyspace_stats.iter().find(|ks| ks.name == "records");
179179+ match rec_ks {
180180+ Some(ks) => ks.count as f64 / sample_duration_secs,
181181+ None => 0.0,
182182+ }
183183+ };
184184+ let records_bytes_per_sec = keyspace_stats
185185+ .iter()
186186+ .find(|ks| ks.name == "records")
187187+ .map(|ks| {
188188+ let total = ks.count as f64 * (ks.avg_key_size + ks.avg_value_size) as f64;
189189+ total / sample_duration_secs
190190+ })
191191+ .unwrap_or(0.0);
192192+193193+ let identity_bytes_per_sec = {
194194+ let docs = keyspace_stats.iter().find(|ks| ks.name == "did_to_doc");
195195+ let handles = keyspace_stats.iter().find(|ks| ks.name == "handle_to_did");
196196+ let doc_total = docs
197197+ .map(|ks| ks.count as f64 * (ks.avg_key_size + ks.avg_value_size) as f64)
198198+ .unwrap_or(0.0);
199199+ let handle_total = handles
200200+ .map(|ks| ks.count as f64 * (ks.avg_key_size + ks.avg_value_size) as f64)
201201+ .unwrap_or(0.0);
202202+ (doc_total + handle_total) / sample_duration_secs
203203+ };
204204+205205+ // Total data growth rate
206206+ let total_bytes_per_sec = events_bytes_per_sec + records_bytes_per_sec + identity_bytes_per_sec;
207207+208208+ // Note: events keyspace has retention, so it eventually plateaus.
209209+ // Records and identity keyspaces grow unbounded (updates overwrite).
210210+211211+ println!("Growth Rates (from sample)");
212212+ println!("-------------------------");
213213+ println!(
214214+ " Events: {:.1}/s {}/s",
215215+ events_per_sec,
216216+ format_bytes(events_bytes_per_sec as u64)
217217+ );
218218+ println!(
219219+ " Records: {:.1}/s {}/s",
220220+ records_per_sec,
221221+ format_bytes(records_bytes_per_sec as u64)
222222+ );
223223+ println!(
224224+ " Identity: {}/s",
225225+ format_bytes(identity_bytes_per_sec as u64)
226226+ );
227227+ println!(
228228+ " Total: {}/s",
229229+ format_bytes(total_bytes_per_sec as u64)
230230+ );
231231+ println!();
232232+233233+ // Projection horizons
234234+ let horizons = [
235235+ ("1 day", 86400.0),
236236+ ("1 week", 86400.0 * 7.0),
237237+ ("30 days", 86400.0 * 30.0),
238238+ ("90 days", 86400.0 * 90.0),
239239+ ("1 year", 86400.0 * 365.0),
240240+ ];
241241+242242+ println!("Storage Projections");
243243+ println!("-------------------");
244244+ println!("Assumes constant ingestion rate from sample period.");
245245+ println!("Note: events keyspace has configurable retention (default 72h).");
246246+ println!(" Identity/repo_state keyspaces grow with unique repos, not linearly.");
247247+ println!();
248248+ println!(
249249+ " {:<10} {:>14} {:>14} {:>14} {:>14}",
250250+ "Horizon", "Events", "Records", "Identity", "Total"
251251+ );
252252+253253+ for (label, secs) in &horizons {
254254+ let events_size = events_bytes_per_sec * secs;
255255+ let records_size = records_bytes_per_sec * secs;
256256+ let identity_size = identity_bytes_per_sec * secs;
257257+ let total = events_size + records_size + identity_size;
258258+259259+ println!(
260260+ " {:<10} {:>14} {:>14} {:>14} {:>14}",
261261+ label,
262262+ format_bytes(events_size as u64),
263263+ format_bytes(records_size as u64),
264264+ format_bytes(identity_size as u64),
265265+ format_bytes(total as u64)
266266+ );
267267+ }
268268+ println!();
269269+270270+ // Event retention cap for events keyspace
271271+ println!("Events Keyspace with Retention Cap");
272272+ println!("----------------------------------");
273273+ let retention_hours = [24, 48, 72, 168, 720];
274274+ println!(" {:<16} {:>14}", "Retention", "Steady-State");
275275+ for hours in retention_hours {
276276+ let steady_state = events_bytes_per_sec * hours as f64 * 3600.0;
277277+ println!(
278278+ " {:<16} {:>14}",
279279+ format!("{hours}h"),
280280+ format_bytes(steady_state as u64)
281281+ );
282282+ }
283283+ println!();
284284+285285+ // Unique repo growth estimate
286286+ let repos_per_sec = identity_stats.unique_repos as f64 / sample_duration_secs;
287287+ println!("Unique Repo Growth");
288288+ println!("------------------");
289289+ println!(
290290+ " Current repos: {}",
291291+ format_count(identity_stats.unique_repos)
292292+ );
293293+ println!(" Growth rate: {:.1} repos/s", repos_per_sec);
294294+ for (label, secs) in &horizons {
295295+ let projected = identity_stats.unique_repos as f64 + repos_per_sec * secs;
296296+ println!(" {:<10} {}", label, format_count(projected as u64));
297297+ }
298298+ println!();
299299+300300+ // Per-collection projections for top collections
301301+ if !collection_stats.is_empty() {
302302+ println!("Per-Collection Storage Projections (30 days)");
303303+ println!("--------------------------------------------");
304304+ let thirty_days = 86400.0 * 30.0;
305305+ let mut sorted: Vec<_> = collection_stats.iter().collect();
306306+ sorted.sort_by(|a, b| b.1.total_bytes.cmp(&a.1.total_bytes));
307307+308308+ println!(
309309+ " {:<45} {:>12} {:>14}",
310310+ "Collection", "Records/day", "30d Storage"
311311+ );
312312+ for (collection, stats) in sorted.iter().take(20) {
313313+ let rate_per_day = stats.count as f64 / sample_duration_secs * 86400.0;
314314+ let bytes_30d = stats.total_bytes as f64 / sample_duration_secs * thirty_days;
315315+ println!(
316316+ " {:<45} {:>12} {:>14}",
317317+ truncate(collection, 45),
318318+ format_count(rate_per_day as u64),
319319+ format_bytes(bytes_30d as u64)
320320+ );
321321+ }
322322+ }
323323+324324+ Ok(())
325325+}
326326+327327+// -- Analysis types --
328328+329329+struct EventStats {
330330+ total_events: u64,
331331+ duration_secs: f64,
332332+ by_type: Vec<(String, u64)>,
333333+}
334334+335335+struct KeyspaceStats {
336336+ name: String,
337337+ count: u64,
338338+ avg_key_size: u64,
339339+ avg_value_size: u64,
340340+}
341341+342342+struct CollectionInfo {
343343+ count: u64,
344344+ total_bytes: u64,
345345+}
346346+347347+struct IdentityStats {
348348+ doc_count: u64,
349349+ handle_count: u64,
350350+ avg_doc_size: u64,
351351+ unique_repos: u64,
352352+}
353353+354354+// -- Analysis functions --
355355+356356+fn analyze_events(db: &FjallDb, sample_limit: u64) -> EventStats {
357357+ let mut total: u64 = 0;
358358+ let mut by_type: HashMap<String, u64> = HashMap::new();
359359+ let mut first_seq: Option<u64> = None;
360360+ let mut last_seq: Option<u64> = None;
361361+362362+ for guard in db.events.iter() {
363363+ let Ok((key, value)) = guard.into_inner() else {
364364+ continue;
365365+ };
366366+367367+ if let Ok(seq) = keys::decode_event_key(&key) {
368368+ if first_seq.is_none() {
369369+ first_seq = Some(seq);
370370+ }
371371+ last_seq = Some(seq);
372372+ }
373373+374374+ total += 1;
375375+376376+ let slice: &[u8] = &value;
377377+ if let Ok(event) = encoding::decode_compact_event(slice) {
378378+ let t = match event {
379379+ encoding::CompactEvent::Commit { .. } => "commit",
380380+ encoding::CompactEvent::CommitOp { .. } => "commit",
381381+ encoding::CompactEvent::Identity { .. } => "identity",
382382+ encoding::CompactEvent::Account { .. } => "account",
383383+ };
384384+ *by_type.entry(t.to_string()).or_default() += 1;
385385+ } else if slice.first() == Some(&b'{') {
386386+ // Legacy JSON events
387387+ if let Ok(event) = serde_json::from_slice::<serde_json::Value>(slice) {
388388+ let t = event
389389+ .get("t")
390390+ .and_then(|v| v.as_str())
391391+ .unwrap_or("unknown")
392392+ .to_string();
393393+ *by_type.entry(t).or_default() += 1;
394394+ }
395395+ }
396396+397397+ if sample_limit > 0 && total >= sample_limit {
398398+ break;
399399+ }
400400+ }
401401+402402+ // Estimate duration from sequence numbers.
403403+ // ATProto firehose sequences are microsecond timestamps.
404404+ let duration_secs = match (first_seq, last_seq) {
405405+ (Some(first), Some(last)) if last > first => {
406406+ // Firehose seq numbers are ~microsecond timestamps
407407+ let diff = last - first;
408408+ if diff > 1_000_000_000 {
409409+ // Looks like microsecond timestamps
410410+ diff as f64 / 1_000_000.0
411411+ } else if diff > 1_000_000 {
412412+ // Might be millisecond timestamps
413413+ diff as f64 / 1_000.0
414414+ } else {
415415+ // Simple counter — estimate from event count
416416+ // Fall back to counting events and assuming ~500 events/sec
417417+ total as f64 / 500.0
418418+ }
419419+ }
420420+ _ => 0.0,
421421+ };
422422+423423+ let mut sorted_types: Vec<_> = by_type.into_iter().collect();
424424+ sorted_types.sort_by(|a, b| b.1.cmp(&a.1));
425425+426426+ EventStats {
427427+ total_events: total,
428428+ duration_secs: duration_secs.max(1.0),
429429+ by_type: sorted_types,
430430+ }
431431+}
432432+433433+fn analyze_keyspaces(db: &FjallDb, sample_limit: u64) -> Vec<KeyspaceStats> {
434434+ let keyspaces: Vec<(&str, &fjall::Keyspace)> = vec![
435435+ ("records", &db.records),
436436+ ("events", &db.events),
437437+ ("meta", &db.meta),
438438+ ("repo_state", &db.repo_state),
439439+ ("did_to_doc", &db.did_to_doc),
440440+ ("handle_to_did", &db.handle_to_did),
441441+ ("blobs", &db.blobs),
442442+ ("blob_meta", &db.blob_meta),
443443+ ];
444444+445445+ keyspaces
446446+ .into_iter()
447447+ .map(|(name, ks)| {
448448+ let mut count: u64 = 0;
449449+ let mut total_key_bytes: u64 = 0;
450450+ let mut total_value_bytes: u64 = 0;
451451+452452+ for guard in ks.iter() {
453453+ let Ok((key, value)) = guard.into_inner() else {
454454+ continue;
455455+ };
456456+ count += 1;
457457+ total_key_bytes += key.len() as u64;
458458+ total_value_bytes += value.len() as u64;
459459+460460+ if sample_limit > 0 && count >= sample_limit {
461461+ break;
462462+ }
463463+ }
464464+465465+ KeyspaceStats {
466466+ name: name.to_string(),
467467+ count,
468468+ avg_key_size: if count > 0 {
469469+ total_key_bytes / count
470470+ } else {
471471+ 0
472472+ },
473473+ avg_value_size: if count > 0 {
474474+ total_value_bytes / count
475475+ } else {
476476+ 0
477477+ },
478478+ }
479479+ })
480480+ .collect()
481481+}
482482+483483+fn analyze_collections(db: &FjallDb, sample_limit: u64) -> HashMap<String, CollectionInfo> {
484484+ let mut collections: HashMap<String, CollectionInfo> = HashMap::new();
485485+ let mut count: u64 = 0;
486486+487487+ for guard in db.records.iter() {
488488+ let Ok((key, value)) = guard.into_inner() else {
489489+ continue;
490490+ };
491491+492492+ if let Ok((_, collection, _, _)) = keys::decode_record_key(&key) {
493493+ let entry = collections
494494+ .entry(collection.to_string())
495495+ .or_insert(CollectionInfo {
496496+ count: 0,
497497+ total_bytes: 0,
498498+ });
499499+ entry.count += 1;
500500+ entry.total_bytes += key.len() as u64 + value.len() as u64;
501501+ }
502502+503503+ count += 1;
504504+ if sample_limit > 0 && count >= sample_limit {
505505+ break;
506506+ }
507507+ }
508508+509509+ collections
510510+}
511511+512512+fn analyze_identities(db: &FjallDb, sample_limit: u64) -> IdentityStats {
513513+ let mut doc_count: u64 = 0;
514514+ let mut total_doc_bytes: u64 = 0;
515515+516516+ let mut count: u64 = 0;
517517+ for guard in db.did_to_doc.iter() {
518518+ let Ok((_key, value)) = guard.into_inner() else {
519519+ continue;
520520+ };
521521+ doc_count += 1;
522522+ let slice: &[u8] = &value;
523523+ let (_ts, json_bytes) = decode_timestamped_doc(slice);
524524+ total_doc_bytes += json_bytes.len() as u64;
525525+526526+ count += 1;
527527+ if sample_limit > 0 && count >= sample_limit {
528528+ break;
529529+ }
530530+ }
531531+532532+ let mut handle_count: u64 = 0;
533533+ count = 0;
534534+ for guard in db.handle_to_did.iter() {
535535+ let Ok(_) = guard.into_inner() else {
536536+ continue;
537537+ };
538538+ handle_count += 1;
539539+540540+ count += 1;
541541+ if sample_limit > 0 && count >= sample_limit {
542542+ break;
543543+ }
544544+ }
545545+546546+ // Count unique repos from repo_state
547547+ let mut unique_repos: u64 = 0;
548548+ count = 0;
549549+ for guard in db.repo_state.iter() {
550550+ let Ok(_) = guard.into_inner() else {
551551+ continue;
552552+ };
553553+ unique_repos += 1;
554554+555555+ count += 1;
556556+ if sample_limit > 0 && count >= sample_limit {
557557+ break;
558558+ }
559559+ }
560560+561561+ IdentityStats {
562562+ doc_count,
563563+ handle_count,
564564+ avg_doc_size: if doc_count > 0 {
565565+ total_doc_bytes / doc_count
566566+ } else {
567567+ 0
568568+ },
569569+ unique_repos,
570570+ }
571571+}
572572+573573+// -- Formatting helpers --
574574+575575+fn format_bytes(bytes: u64) -> String {
576576+ const KB: u64 = 1024;
577577+ const MB: u64 = 1024 * KB;
578578+ const GB: u64 = 1024 * MB;
579579+ const TB: u64 = 1024 * GB;
580580+581581+ if bytes >= TB {
582582+ format!("{:.2} TB", bytes as f64 / TB as f64)
583583+ } else if bytes >= GB {
584584+ format!("{:.2} GB", bytes as f64 / GB as f64)
585585+ } else if bytes >= MB {
586586+ format!("{:.2} MB", bytes as f64 / MB as f64)
587587+ } else if bytes >= KB {
588588+ format!("{:.2} KB", bytes as f64 / KB as f64)
589589+ } else {
590590+ format!("{bytes} B")
591591+ }
592592+}
593593+594594+fn format_count(n: u64) -> String {
595595+ if n >= 1_000_000_000 {
596596+ format!("{:.2}B", n as f64 / 1_000_000_000.0)
597597+ } else if n >= 1_000_000 {
598598+ format!("{:.2}M", n as f64 / 1_000_000.0)
599599+ } else if n >= 1_000 {
600600+ format!("{:.1}K", n as f64 / 1_000.0)
601601+ } else {
602602+ format!("{n}")
603603+ }
604604+}
605605+606606+fn truncate(s: &str, max: usize) -> String {
607607+ if s.len() <= max {
608608+ s.to_string()
609609+ } else {
610610+ format!("{}...", &s[..max - 3])
611611+ }
612612+}
613613+614614+fn dir_size(path: &PathBuf) -> u64 {
615615+ let Ok(entries) = std::fs::read_dir(path) else {
616616+ return 0;
617617+ };
618618+ let mut total: u64 = 0;
619619+ for entry in entries.flatten() {
620620+ let Ok(meta) = entry.metadata() else {
621621+ continue;
622622+ };
623623+ if meta.is_dir() {
624624+ total += dir_size(&entry.path().to_path_buf());
625625+ } else {
626626+ total += meta.len();
627627+ }
628628+ }
629629+ total
630630+}
+721
src/bin/ramjet_writer.rs
···11+//! Benchmark binary: ingests from a relay and writes to fjall,
22+//! periodically logging ingestion rate, write rate, and disk usage
33+//! for pipeline analysis and capacity planning.
44+55+use std::collections::HashMap;
66+use std::path::PathBuf;
77+use std::sync::Arc;
88+use std::sync::atomic::{AtomicU64, Ordering};
99+use std::time::{Duration, Instant};
1010+1111+use clap::Parser;
1212+use tokio::sync::mpsc;
1313+use tokio_util::sync::CancellationToken;
1414+1515+use ramjet::config::{CollectionMatcher, ServiceConfig};
1616+use ramjet::pipeline::ingester::{IngestEvent, run_ingester};
1717+use ramjet::server::dictionary::compute_raw_cid;
1818+use ramjet::server::metrics::Metrics;
1919+use ramjet::storage::FjallDb;
2020+use ramjet::storage::keys;
2121+use ramjet::types::{AccountStatus, OpType, RecordValue};
2222+2323+#[derive(Parser, Debug)]
2424+#[command(
2525+ name = "ramjet-writer",
2626+ about = "Benchmark ingester→writer pipeline with periodic stats"
2727+)]
2828+struct Args {
2929+ /// Path to the fjall database directory.
3030+ #[arg(long, env = "RAMJET_DB_PATH", default_value = "./data/ramjet-bench.db")]
3131+ db_path: PathBuf,
3232+3333+ /// Upstream relay WebSocket host.
3434+ #[arg(long, env = "RAMJET_RELAY_HOST", default_value = "bsky.network")]
3535+ relay_host: String,
3636+3737+ /// Collection patterns to persist (space-separated).
3838+ #[arg(long, env = "RAMJET_TRACKED_COLLECTIONS", default_value = "*")]
3939+ tracked_collections: String,
4040+4141+ /// Maximum events per write batch.
4242+ #[arg(long, default_value = "500")]
4343+ batch_size: usize,
4444+4545+ /// Maximum wait time (ms) for batch fill before flushing.
4646+ #[arg(long, default_value = "100")]
4747+ batch_timeout_ms: u64,
4848+4949+ /// Stats reporting interval in seconds.
5050+ #[arg(long, default_value = "10")]
5151+ stats_interval: u64,
5252+5353+ /// Sample rate for printing events to stdout (0.0 = none, 1.0 = all).
5454+ #[arg(long)]
5555+ sample: Option<f64>,
5656+5757+ /// Path to a local zstd dictionary file.
5858+ #[arg(long, env = "RAMJET_ZSTD_DICT_PATH")]
5959+ zstd_dict_path: Option<PathBuf>,
6060+6161+ /// URL of the ramjet server to fetch the zstd dictionary from.
6262+ #[arg(long, env = "RAMJET_URL")]
6363+ ramjet_url: Option<String>,
6464+6565+ /// Disable zstd dictionary usage entirely.
6666+ #[arg(long, default_value = "false")]
6767+ no_zstd: bool,
6868+}
6969+7070+/// Counters shared between the writer loop and the stats reporter.
7171+struct Stats {
7272+ events_ingested: AtomicU64,
7373+ events_written: AtomicU64,
7474+ batches_committed: AtomicU64,
7575+ records_written: AtomicU64,
7676+ records_deleted: AtomicU64,
7777+ bytes_written: AtomicU64,
7878+}
7979+8080+impl Stats {
8181+ fn new() -> Self {
8282+ Self {
8383+ events_ingested: AtomicU64::new(0),
8484+ events_written: AtomicU64::new(0),
8585+ batches_committed: AtomicU64::new(0),
8686+ records_written: AtomicU64::new(0),
8787+ records_deleted: AtomicU64::new(0),
8888+ bytes_written: AtomicU64::new(0),
8989+ }
9090+ }
9191+}
9292+9393+#[tokio::main]
9494+async fn main() -> anyhow::Result<()> {
9595+ let args = Args::parse();
9696+9797+ tracing_subscriber::fmt()
9898+ .with_env_filter(
9999+ tracing_subscriber::EnvFilter::try_from_default_env()
100100+ .unwrap_or_else(|_| tracing_subscriber::EnvFilter::new("info")),
101101+ )
102102+ .init();
103103+104104+ tracing::info!(
105105+ db_path = %args.db_path.display(),
106106+ relay_host = %args.relay_host,
107107+ tracked = %args.tracked_collections,
108108+ batch_size = args.batch_size,
109109+ batch_timeout_ms = args.batch_timeout_ms,
110110+ stats_interval = args.stats_interval,
111111+ "starting ramjet-writer benchmark"
112112+ );
113113+114114+ let dict_path = resolve_dictionary(&args).await?;
115115+ let db = Arc::new(FjallDb::open(&args.db_path, dict_path.as_deref())?);
116116+ tracing::info!("fjall database opened");
117117+118118+ if let Ok(Some(cursor)) = db.get_cursor() {
119119+ tracing::info!(cursor, "resuming from stored cursor");
120120+ }
121121+122122+ let tracked = CollectionMatcher::new(&args.tracked_collections);
123123+ let cancel = CancellationToken::new();
124124+ let metrics = Arc::new(Metrics::new());
125125+ let stats = Arc::new(Stats::new());
126126+127127+ // Build a minimal ServiceConfig for the ingester
128128+ let config = Arc::new(ServiceConfig {
129129+ db_path: args.db_path.clone(),
130130+ relay_host: args.relay_host.clone(),
131131+ listen_addr: "127.0.0.1:0".parse().unwrap(),
132132+ tracked_collections: CollectionMatcher::new(&args.tracked_collections),
133133+ forward_collections: CollectionMatcher::new(""),
134134+ event_retention_hours: 0,
135135+ batch_size: args.batch_size,
136136+ batch_timeout_ms: args.batch_timeout_ms,
137137+ admin_dids: Default::default(),
138138+ zstd_dict_path: None,
139139+ backfill_dids: Vec::new(),
140140+ consumer_groups: Vec::new(),
141141+ });
142142+143143+ let (tx, rx) = mpsc::channel(4096);
144144+ // Identity channel (unused in benchmark, but required by ingester)
145145+ let (identity_tx, _identity_rx) = mpsc::channel(1024);
146146+147147+ // Spawn ingester
148148+ let ingester_handle = tokio::spawn({
149149+ let config = config.clone();
150150+ let db = db.clone();
151151+ let metrics = metrics.clone();
152152+ let cancel = cancel.clone();
153153+ async move {
154154+ if let Err(e) = run_ingester(config, db, tx, identity_tx, metrics, cancel).await {
155155+ tracing::error!(error = %e, "ingester failed");
156156+ }
157157+ }
158158+ });
159159+160160+ // Spawn stats reporter
161161+ let stats_handle = tokio::spawn({
162162+ let stats = stats.clone();
163163+ let db_path = args.db_path.clone();
164164+ let cancel = cancel.clone();
165165+ let interval = Duration::from_secs(args.stats_interval);
166166+ async move {
167167+ report_stats_loop(stats, db_path, interval, cancel).await;
168168+ }
169169+ });
170170+171171+ // Run writer in the main task
172172+ run_writer_loop(
173173+ db,
174174+ tracked,
175175+ rx,
176176+ stats,
177177+ config.batch_size,
178178+ config.batch_timeout_ms,
179179+ args.sample.unwrap_or(0.0),
180180+ cancel.clone(),
181181+ )
182182+ .await;
183183+184184+ cancel.cancel();
185185+ let _ = tokio::join!(ingester_handle, stats_handle);
186186+ tracing::info!("ramjet-writer benchmark shut down");
187187+ Ok(())
188188+}
189189+190190+/// Resolve the zstd dictionary to use, fetching from a ramjet server if configured.
191191+///
192192+/// Returns the path to the dictionary file, or `None` if no dictionary should be used.
193193+async fn resolve_dictionary(args: &Args) -> anyhow::Result<Option<PathBuf>> {
194194+ if args.no_zstd {
195195+ tracing::info!("zstd dictionary disabled via --no-zstd");
196196+ return Ok(None);
197197+ }
198198+199199+ // Compute the CID of the local dictionary if one was provided.
200200+ let local_dict = args
201201+ .zstd_dict_path
202202+ .as_ref()
203203+ .map(|p| std::fs::read(p))
204204+ .transpose()?;
205205+ let local_cid = local_dict.as_ref().map(|d| compute_raw_cid(d));
206206+207207+ let Some(ramjet_url) = &args.ramjet_url else {
208208+ // No server URL — just use the local dictionary if provided.
209209+ return Ok(args.zstd_dict_path.clone());
210210+ };
211211+212212+ let url = format!("{}/dictionary", ramjet_url.trim_end_matches('/'));
213213+ let client = reqwest::Client::new();
214214+ let mut request = client.get(&url);
215215+ if let Some(cid) = &local_cid {
216216+ request = request.header("If-None-Match", cid.as_str());
217217+ }
218218+219219+ let response = match request.send().await {
220220+ Ok(r) => r,
221221+ Err(e) => {
222222+ tracing::warn!(error = %e, "failed to fetch dictionary from server, using local");
223223+ return Ok(args.zstd_dict_path.clone());
224224+ }
225225+ };
226226+227227+ match response.status() {
228228+ reqwest::StatusCode::NOT_MODIFIED => {
229229+ tracing::info!("server dictionary matches local (CID unchanged)");
230230+ Ok(args.zstd_dict_path.clone())
231231+ }
232232+ reqwest::StatusCode::NOT_FOUND => {
233233+ tracing::info!("server has no zstd dictionary");
234234+ Ok(args.zstd_dict_path.clone())
235235+ }
236236+ reqwest::StatusCode::OK => {
237237+ let server_cid = response
238238+ .headers()
239239+ .get("etag")
240240+ .and_then(|v| v.to_str().ok())
241241+ .map(|s| s.to_string());
242242+243243+ if let (Some(local), Some(remote)) = (&local_cid, &server_cid) {
244244+ if local != remote {
245245+ tracing::warn!(
246246+ local_cid = %local,
247247+ server_cid = %remote,
248248+ "local zstd dictionary differs from server, using server's"
249249+ );
250250+ }
251251+ }
252252+253253+ let bytes = response.bytes().await?;
254254+ let dict_path = args.db_path.join("dictionary.zstd");
255255+ std::fs::create_dir_all(&args.db_path)?;
256256+ std::fs::write(&dict_path, &bytes)?;
257257+ tracing::info!(
258258+ dict_size = bytes.len(),
259259+ cid = server_cid.as_deref().unwrap_or("unknown"),
260260+ path = %dict_path.display(),
261261+ "fetched zstd dictionary from server"
262262+ );
263263+ Ok(Some(dict_path))
264264+ }
265265+ status => {
266266+ tracing::warn!(
267267+ %status,
268268+ "unexpected response from dictionary endpoint, using local"
269269+ );
270270+ Ok(args.zstd_dict_path.clone())
271271+ }
272272+ }
273273+}
274274+275275+async fn run_writer_loop(
276276+ db: Arc<FjallDb>,
277277+ tracked: CollectionMatcher,
278278+ mut rx: mpsc::Receiver<IngestEvent>,
279279+ stats: Arc<Stats>,
280280+ batch_size: usize,
281281+ batch_timeout_ms: u64,
282282+ sample_rate: f64,
283283+ cancel: CancellationToken,
284284+) {
285285+ let batch_timeout = Duration::from_millis(batch_timeout_ms);
286286+ use rand::RngExt;
287287+ let mut rng = rand::rng();
288288+289289+ loop {
290290+ let mut events = Vec::with_capacity(batch_size);
291291+292292+ tokio::select! {
293293+ biased;
294294+ _ = cancel.cancelled() => break,
295295+ event = rx.recv() => {
296296+ match event {
297297+ Some(e) => events.push(e),
298298+ None => break,
299299+ }
300300+ }
301301+ }
302302+303303+ let deadline = tokio::time::Instant::now() + batch_timeout;
304304+ while events.len() < batch_size {
305305+ tokio::select! {
306306+ biased;
307307+ _ = cancel.cancelled() => break,
308308+ _ = tokio::time::sleep_until(deadline) => break,
309309+ event = rx.recv() => {
310310+ match event {
311311+ Some(e) => events.push(e),
312312+ None => break,
313313+ }
314314+ }
315315+ }
316316+ }
317317+318318+ if events.is_empty() {
319319+ continue;
320320+ }
321321+322322+ if sample_rate > 0.0 {
323323+ for event in &events {
324324+ if sample_rate >= 1.0 || rng.random_range(0.0..1.0) < sample_rate {
325325+ print_event(event);
326326+ }
327327+ }
328328+ }
329329+330330+ stats
331331+ .events_ingested
332332+ .fetch_add(events.len() as u64, Ordering::Relaxed);
333333+334334+ match write_batch(&db, &tracked, &events, &stats) {
335335+ Ok(()) => {
336336+ stats.batches_committed.fetch_add(1, Ordering::Relaxed);
337337+ stats
338338+ .events_written
339339+ .fetch_add(events.len() as u64, Ordering::Relaxed);
340340+ }
341341+ Err(e) => {
342342+ tracing::error!(error = %e, "batch write failed");
343343+ }
344344+ }
345345+ }
346346+}
347347+348348+fn write_batch(
349349+ db: &FjallDb,
350350+ tracked: &CollectionMatcher,
351351+ events: &[IngestEvent],
352352+ stats: &Stats,
353353+) -> anyhow::Result<()> {
354354+ let mut batch = db.batch();
355355+ let mut max_seq: u64 = 0;
356356+ let mut batch_bytes: u64 = 0;
357357+358358+ for event in events {
359359+ match event {
360360+ IngestEvent::Commit {
361361+ seq,
362362+ did,
363363+ rev,
364364+ ops,
365365+ blocks,
366366+ ..
367367+ } => {
368368+ max_seq = max_seq.max(*seq);
369369+370370+ let block_map = parse_blocks_car(blocks);
371371+ let mut commit_ops = Vec::new();
372372+373373+ for op in ops {
374374+ if !tracked.matches(&op.collection) {
375375+ continue;
376376+ }
377377+378378+ let record_key = keys::encode_record_key(did, &op.collection, &op.rkey, rev);
379379+380380+ match op.op {
381381+ OpType::Create | OpType::Update => {
382382+ let cid_bytes = op.cid.as_deref().unwrap_or_default().to_vec();
383383+ let data = block_map.get(&cid_bytes).cloned().unwrap_or_default();
384384+ let record_value = RecordValue {
385385+ cid: cid_bytes,
386386+ data,
387387+ };
388388+ let encoded = record_value.encode();
389389+ batch_bytes += (record_key.len() + encoded.len()) as u64;
390390+ batch.insert(&db.records, &record_key, encoded);
391391+ stats.records_written.fetch_add(1, Ordering::Relaxed);
392392+ }
393393+ OpType::Delete => {
394394+ // Tombstone: empty value
395395+ batch.insert(&db.records, &record_key, b"");
396396+ stats.records_deleted.fetch_add(1, Ordering::Relaxed);
397397+ }
398398+ }
399399+400400+ commit_ops.push(op.clone());
401401+ }
402402+403403+ let mut repo_state = match db.get_repo_state(did) {
404404+ Ok(Some(rs)) => rs,
405405+ Ok(None) => {
406406+ tracing::debug!(%did, "new repo (no prior state)");
407407+ Default::default()
408408+ }
409409+ Err(e) => {
410410+ tracing::warn!(%did, error = %e, "corrupt repo_state, resetting");
411411+ Default::default()
412412+ }
413413+ };
414414+ repo_state.rev = rev.clone();
415415+ let state_encoded = repo_state.encode();
416416+ batch_bytes += (did.len() + state_encoded.len()) as u64;
417417+ batch.insert(&db.repo_state, did.as_bytes(), state_encoded);
418418+ }
419419+420420+ IngestEvent::Identity { seq, did, handle } => {
421421+ max_seq = max_seq.max(*seq);
422422+ if let Some(h) = handle {
423423+ let key = h.to_lowercase();
424424+ batch_bytes += (key.len() + did.len()) as u64;
425425+ batch.insert(&db.handle_to_did, key.as_bytes(), did.as_bytes());
426426+ }
427427+ }
428428+429429+ IngestEvent::Account {
430430+ seq,
431431+ did,
432432+ active,
433433+ status,
434434+ ..
435435+ } => {
436436+ max_seq = max_seq.max(*seq);
437437+ let mut repo_state = match db.get_repo_state(did) {
438438+ Ok(Some(rs)) => rs,
439439+ Ok(None) => {
440440+ tracing::debug!(%did, "new account (no prior state)");
441441+ Default::default()
442442+ }
443443+ Err(e) => {
444444+ tracing::warn!(%did, error = %e, "corrupt repo_state, resetting");
445445+ Default::default()
446446+ }
447447+ };
448448+ repo_state.status = if *active {
449449+ AccountStatus::Active
450450+ } else {
451451+ match status.as_deref() {
452452+ Some("deactivated") => AccountStatus::Deactivated,
453453+ Some("suspended") => AccountStatus::Suspended,
454454+ Some("deleted") => AccountStatus::Deleted,
455455+ Some("takendown") => AccountStatus::Takendown,
456456+ _ => AccountStatus::Deactivated,
457457+ }
458458+ };
459459+ let state_encoded = repo_state.encode();
460460+ batch_bytes += (did.len() + state_encoded.len()) as u64;
461461+ batch.insert(&db.repo_state, did.as_bytes(), state_encoded);
462462+ }
463463+464464+ IngestEvent::Sync { seq, .. } => {
465465+ max_seq = max_seq.max(*seq);
466466+ }
467467+ }
468468+ }
469469+470470+ if max_seq > 0 {
471471+ batch.insert(&db.meta, b"cursor", &max_seq.to_be_bytes());
472472+ }
473473+474474+ batch.commit()?;
475475+ stats
476476+ .bytes_written
477477+ .fetch_add(batch_bytes, Ordering::Relaxed);
478478+ Ok(())
479479+}
480480+481481+async fn report_stats_loop(
482482+ stats: Arc<Stats>,
483483+ db_path: PathBuf,
484484+ interval: Duration,
485485+ cancel: CancellationToken,
486486+) {
487487+ let mut ticker = tokio::time::interval(interval);
488488+ ticker.tick().await; // skip first immediate tick
489489+490490+ let mut prev_ingested: u64 = 0;
491491+ let mut prev_written: u64 = 0;
492492+ let mut prev_batches: u64 = 0;
493493+ let mut prev_records: u64 = 0;
494494+ let mut prev_deleted: u64 = 0;
495495+ let mut prev_bytes: u64 = 0;
496496+ let mut prev_time = Instant::now();
497497+498498+ loop {
499499+ tokio::select! {
500500+ biased;
501501+ _ = cancel.cancelled() => break,
502502+ _ = ticker.tick() => {}
503503+ }
504504+505505+ let now = Instant::now();
506506+ let elapsed = now.duration_since(prev_time).as_secs_f64();
507507+ prev_time = now;
508508+509509+ let ingested = stats.events_ingested.load(Ordering::Relaxed);
510510+ let written = stats.events_written.load(Ordering::Relaxed);
511511+ let batches = stats.batches_committed.load(Ordering::Relaxed);
512512+ let records = stats.records_written.load(Ordering::Relaxed);
513513+ let deleted = stats.records_deleted.load(Ordering::Relaxed);
514514+ let bytes = stats.bytes_written.load(Ordering::Relaxed);
515515+516516+ let d_ingested = ingested - prev_ingested;
517517+ let d_written = written - prev_written;
518518+ let d_batches = batches - prev_batches;
519519+ let d_records = records - prev_records;
520520+ let d_deleted = deleted - prev_deleted;
521521+ let d_bytes = bytes - prev_bytes;
522522+523523+ prev_ingested = ingested;
524524+ prev_written = written;
525525+ prev_batches = batches;
526526+ prev_records = records;
527527+ prev_deleted = deleted;
528528+ prev_bytes = bytes;
529529+530530+ let ingest_rate = d_ingested as f64 / elapsed;
531531+ let write_rate = d_written as f64 / elapsed;
532532+ let record_rate = d_records as f64 / elapsed;
533533+ let byte_rate = d_bytes as f64 / elapsed;
534534+535535+ let disk_bytes = dir_size(&db_path);
536536+537537+ tracing::info!(
538538+ events_per_sec = format!("{ingest_rate:.0}"),
539539+ writes_per_sec = format!("{write_rate:.0}"),
540540+ records_per_sec = format!("{record_rate:.0}"),
541541+ batches = d_batches,
542542+ records_written = d_records,
543543+ records_deleted = d_deleted,
544544+ write_bytes = format_bytes(d_bytes),
545545+ write_throughput = format!("{}/s", format_bytes(byte_rate as u64)),
546546+ total_events = ingested,
547547+ total_records = records,
548548+ total_bytes_written = format_bytes(bytes),
549549+ disk_usage = format_bytes(disk_bytes),
550550+ "stats"
551551+ );
552552+ }
553553+}
554554+555555+fn dir_size(path: &PathBuf) -> u64 {
556556+ walkdir(path)
557557+}
558558+559559+fn walkdir(path: &PathBuf) -> u64 {
560560+ let Ok(entries) = std::fs::read_dir(path) else {
561561+ return 0;
562562+ };
563563+ let mut total: u64 = 0;
564564+ for entry in entries.flatten() {
565565+ let Ok(meta) = entry.metadata() else {
566566+ continue;
567567+ };
568568+ if meta.is_dir() {
569569+ total += walkdir(&entry.path().to_path_buf());
570570+ } else {
571571+ total += meta.len();
572572+ }
573573+ }
574574+ total
575575+}
576576+577577+fn print_event(event: &IngestEvent) {
578578+ match event {
579579+ IngestEvent::Commit {
580580+ seq,
581581+ did,
582582+ rev,
583583+ ops,
584584+ blocks,
585585+ ..
586586+ } => {
587587+ let block_map = parse_blocks_car(blocks);
588588+ let mut json_ops = Vec::new();
589589+ for op in ops {
590590+ let cid_str = op
591591+ .cid
592592+ .as_ref()
593593+ .map(|c| {
594594+ cid::Cid::read_bytes(c.as_slice())
595595+ .map(|c| c.to_string())
596596+ .unwrap_or_else(|_| format!("{}B", c.len()))
597597+ })
598598+ .unwrap_or_default();
599599+ let mut op_json = serde_json::json!({
600600+ "op": format!("{:?}", op.op),
601601+ "collection": op.collection,
602602+ "rkey": op.rkey,
603603+ "cid": cid_str,
604604+ });
605605+ if let Some(cid_bytes) = &op.cid {
606606+ if let Some(data) = block_map.get(cid_bytes.as_slice()) {
607607+ if let Ok(ipld) =
608608+ atproto_dasl::drisl::from_slice::<atproto_dasl::Ipld>(data)
609609+ {
610610+ op_json["record"] = ipld_to_json(&ipld);
611611+ } else {
612612+ op_json["record_bytes"] = serde_json::Value::Number(data.len().into());
613613+ }
614614+ }
615615+ }
616616+ json_ops.push(op_json);
617617+ }
618618+ let commit = serde_json::json!({
619619+ "t": "#commit",
620620+ "seq": seq,
621621+ "did": did,
622622+ "rev": rev,
623623+ "blocks_bytes": blocks.len(),
624624+ "ops": json_ops,
625625+ });
626626+ println!(
627627+ "COMMIT {}",
628628+ serde_json::to_string_pretty(&commit).unwrap_or_default()
629629+ );
630630+ }
631631+ IngestEvent::Identity { seq, did, handle } => {
632632+ println!(
633633+ "IDENTITY seq={seq} did={did} handle={}",
634634+ handle.as_deref().unwrap_or("-")
635635+ );
636636+ }
637637+ IngestEvent::Account {
638638+ seq,
639639+ did,
640640+ active,
641641+ status,
642642+ ..
643643+ } => {
644644+ println!(
645645+ "ACCOUNT seq={seq} did={did} active={active} status={}",
646646+ status.as_deref().unwrap_or("-")
647647+ );
648648+ }
649649+ IngestEvent::Sync { seq, did, rev } => {
650650+ println!("SYNC seq={seq} did={did} rev={rev}");
651651+ }
652652+ }
653653+}
654654+655655+fn ipld_to_json(ipld: &atproto_dasl::Ipld) -> serde_json::Value {
656656+ use atproto_dasl::Ipld;
657657+ match ipld {
658658+ Ipld::Null => serde_json::Value::Null,
659659+ Ipld::Bool(b) => serde_json::Value::Bool(*b),
660660+ Ipld::Integer(i) => serde_json::json!(*i),
661661+ Ipld::Float(f) => serde_json::json!(*f),
662662+ Ipld::String(s) => serde_json::Value::String(s.clone()),
663663+ Ipld::Bytes(b) => {
664664+ use base64::Engine;
665665+ serde_json::json!({ "$bytes": base64::prelude::BASE64_STANDARD.encode(b) })
666666+ }
667667+ Ipld::List(list) => serde_json::Value::Array(list.iter().map(ipld_to_json).collect()),
668668+ Ipld::Map(map) => {
669669+ let obj: serde_json::Map<String, serde_json::Value> = map
670670+ .iter()
671671+ .map(|(k, v)| (k.clone(), ipld_to_json(v)))
672672+ .collect();
673673+ serde_json::Value::Object(obj)
674674+ }
675675+ Ipld::Link(cid) => {
676676+ serde_json::json!({ "$link": cid.to_string() })
677677+ }
678678+ }
679679+}
680680+681681+fn parse_blocks_car(blocks: &[u8]) -> HashMap<Vec<u8>, Vec<u8>> {
682682+ let mut map = HashMap::new();
683683+ if blocks.is_empty() {
684684+ return map;
685685+ }
686686+687687+ let mut cursor = std::io::Cursor::new(blocks);
688688+689689+ if atproto_dasl::CarHeader::read_from(&mut cursor).is_err() {
690690+ return map;
691691+ }
692692+693693+ loop {
694694+ match atproto_dasl::CarBlock::read_from(&mut cursor) {
695695+ Ok(Some(block)) => {
696696+ let cid_bytes = block.cid.to_bytes();
697697+ map.insert(cid_bytes, block.data);
698698+ }
699699+ Ok(None) => break,
700700+ Err(_) => break,
701701+ }
702702+ }
703703+704704+ map
705705+}
706706+707707+fn format_bytes(bytes: u64) -> String {
708708+ const KIB: u64 = 1024;
709709+ const MIB: u64 = 1024 * KIB;
710710+ const GIB: u64 = 1024 * MIB;
711711+712712+ if bytes >= GIB {
713713+ format!("{:.2} GiB", bytes as f64 / GIB as f64)
714714+ } else if bytes >= MIB {
715715+ format!("{:.2} MiB", bytes as f64 / MIB as f64)
716716+ } else if bytes >= KIB {
717717+ format!("{:.2} KiB", bytes as f64 / KIB as f64)
718718+ } else {
719719+ format!("{bytes} B")
720720+ }
721721+}
···11+//! Configuration types for the Ramjet service.
22+//!
33+//! Includes CLI argument parsing via clap, collection pattern matching
44+//! for routing firehose events, and service-level configuration.
55+66+use std::collections::HashSet;
77+use std::net::SocketAddr;
88+use std::path::PathBuf;
99+1010+use clap::Parser;
1111+1212+/// Collection pattern for matching ATProtocol NSIDs.
1313+///
1414+/// Supports exact match, single-segment wildcard (`.*`), and
1515+/// glob wildcard (`.**`) patterns.
1616+#[derive(Debug, Clone)]
1717+pub enum CollectionPattern {
1818+ /// Empty pattern — matches nothing.
1919+ None,
2020+ /// Wildcard `*` — matches every collection.
2121+ MatchAll,
2222+ /// Exact NSID match (e.g., `com.example`).
2323+ Exact(String),
2424+ /// Single-segment wildcard (e.g., `com.example.*` matches `com.example.foo`).
2525+ SingleWild(String),
2626+ /// Glob wildcard (e.g., `com.example.**` matches `com.example.foo.bar`).
2727+ GlobWild(String),
2828+}
2929+3030+impl CollectionPattern {
3131+ /// Parse a pattern string into a `CollectionPattern`.
3232+ pub fn parse(pattern: &str) -> Self {
3333+ let trimmed = pattern.trim();
3434+ if trimmed.is_empty() {
3535+ CollectionPattern::None
3636+ } else if trimmed == "*" {
3737+ CollectionPattern::MatchAll
3838+ } else if let Some(prefix) = trimmed.strip_suffix(".**") {
3939+ CollectionPattern::GlobWild(format!("{}.", prefix))
4040+ } else if let Some(prefix) = trimmed.strip_suffix(".*") {
4141+ CollectionPattern::SingleWild(format!("{}.", prefix))
4242+ } else {
4343+ CollectionPattern::Exact(trimmed.to_string())
4444+ }
4545+ }
4646+4747+ /// Test whether a collection NSID matches this pattern.
4848+ pub fn matches(&self, collection: &str) -> bool {
4949+ match self {
5050+ CollectionPattern::None => false,
5151+ CollectionPattern::MatchAll => true,
5252+ CollectionPattern::Exact(expected) => collection == expected,
5353+ CollectionPattern::SingleWild(prefix) => {
5454+ collection.starts_with(prefix.as_str())
5555+ && !collection[prefix.len()..].contains('.')
5656+ && collection.len() > prefix.len()
5757+ }
5858+ CollectionPattern::GlobWild(prefix) => {
5959+ collection.starts_with(prefix.as_str()) && collection.len() > prefix.len()
6060+ }
6161+ }
6262+ }
6363+}
6464+6565+/// A set of collection patterns evaluated with OR semantics.
6666+#[derive(Debug, Clone)]
6767+pub struct CollectionMatcher {
6868+ patterns: Vec<CollectionPattern>,
6969+}
7070+7171+impl CollectionMatcher {
7272+ /// Create a matcher from a space-separated pattern string.
7373+ ///
7474+ /// - `"*"` matches everything
7575+ /// - `""` matches nothing
7676+ /// - `"app.bsky.** community.lexicon.**"` matches multiple prefixes
7777+ pub fn new(pattern_str: &str) -> Self {
7878+ let trimmed = pattern_str.trim();
7979+ if trimmed.is_empty() {
8080+ return Self {
8181+ patterns: vec![CollectionPattern::None],
8282+ };
8383+ }
8484+ Self {
8585+ patterns: trimmed
8686+ .split_whitespace()
8787+ .map(CollectionPattern::parse)
8888+ .collect(),
8989+ }
9090+ }
9191+9292+ /// Test whether a collection NSID matches any pattern in this matcher.
9393+ pub fn matches(&self, collection: &str) -> bool {
9494+ self.patterns.iter().any(|p| p.matches(collection))
9595+ }
9696+}
9797+9898+/// Ramjet service configuration, parsed from CLI arguments and environment variables.
9999+#[derive(Parser, Debug)]
100100+#[command(
101101+ name = "ramjet",
102102+ version,
103103+ about = "ATProtocol event-stream, record, and blob service"
104104+)]
105105+pub struct CliArgs {
106106+ /// Path to the fjall database directory.
107107+ #[arg(long, env = "RAMJET_DB_PATH", default_value = "./data/ramjet.db")]
108108+ pub db_path: PathBuf,
109109+110110+ /// Upstream relay WebSocket host.
111111+ #[arg(long, env = "RAMJET_RELAY_HOST", default_value = "bsky.network")]
112112+ pub relay_host: String,
113113+114114+ /// HTTP bind address.
115115+ #[arg(long, env = "RAMJET_LISTEN_ADDR", default_value = "0.0.0.0:8080")]
116116+ pub listen_addr: SocketAddr,
117117+118118+ /// Space-separated collection patterns to persist (e.g., `"garden.lexicon.** com.atproto.lexicon.**"`).
119119+ #[arg(long, env = "RAMJET_TRACKED_COLLECTIONS", default_value = "*")]
120120+ pub tracked_collections: String,
121121+122122+ /// Space-separated collection patterns to forward as low-priority events.
123123+ /// `"*"` forwards everything not already tracked (default).
124124+ /// `""` forwards nothing.
125125+ #[arg(long, env = "RAMJET_FORWARD_COLLECTIONS", default_value = "*")]
126126+ pub forward_collections: String,
127127+128128+ /// Event retention window in hours.
129129+ #[arg(long, env = "RAMJET_EVENT_RETENTION_HOURS", default_value = "72")]
130130+ pub event_retention_hours: u64,
131131+132132+ /// Maximum events per WriteBatch.
133133+ #[arg(long, env = "RAMJET_BATCH_SIZE", default_value = "500")]
134134+ pub batch_size: usize,
135135+136136+ /// Maximum wait time (ms) for batch fill before flushing.
137137+ #[arg(long, env = "RAMJET_BATCH_TIMEOUT_MS", default_value = "100")]
138138+ pub batch_timeout_ms: u64,
139139+140140+ /// Comma-separated list of admin DIDs.
141141+ #[arg(long, env = "ADMIN_DIDS", default_value = "")]
142142+ pub admin_dids: String,
143143+144144+ /// Path to a zstd dictionary file for event compression.
145145+ /// When absent, events are stored uncompressed.
146146+ #[arg(long, env = "RAMJET_ZSTD_DICT_PATH")]
147147+ pub zstd_dict_path: Option<PathBuf>,
148148+149149+ /// Comma-separated list of DIDs to backfill on startup (skips already-backfilled repos).
150150+ #[arg(long, env = "RAMJET_BACKFILL", default_value = "")]
151151+ pub backfill: String,
152152+153153+ /// Consumer group definitions (repeatable). Format: `name:partition_count`.
154154+ /// Example: `--consumer-group indexers:3 --consumer-group notifiers:2`
155155+ #[arg(long, env = "RAMJET_CONSUMER_GROUPS", value_delimiter = ',')]
156156+ pub consumer_group: Vec<String>,
157157+}
158158+159159+/// A pre-defined consumer group with a fixed number of partitions.
160160+#[derive(Debug, Clone)]
161161+pub struct ConsumerGroup {
162162+ /// Group name (e.g., "indexers").
163163+ pub name: String,
164164+ /// Number of partitions. Must be >= 1.
165165+ pub partition_count: u16,
166166+}
167167+168168+impl ConsumerGroup {
169169+ /// Parse a `name:count` string into a `ConsumerGroup`.
170170+ pub fn parse(s: &str) -> Option<Self> {
171171+ let (name, count_str) = s.split_once(':')?;
172172+ let name = name.trim();
173173+ let count: u16 = count_str.trim().parse().ok()?;
174174+ if name.is_empty() || count == 0 {
175175+ return None;
176176+ }
177177+ Some(Self {
178178+ name: name.to_string(),
179179+ partition_count: count,
180180+ })
181181+ }
182182+}
183183+184184+/// Resolved service configuration with parsed matchers and sets.
185185+pub struct ServiceConfig {
186186+ /// Path to the fjall database directory.
187187+ pub db_path: PathBuf,
188188+ /// Upstream relay WebSocket host.
189189+ pub relay_host: String,
190190+ /// HTTP bind address.
191191+ pub listen_addr: SocketAddr,
192192+ /// Matcher for collections to persist.
193193+ pub tracked_collections: CollectionMatcher,
194194+ /// Matcher for collections to forward as low-priority.
195195+ pub forward_collections: CollectionMatcher,
196196+ /// Event retention window in hours.
197197+ pub event_retention_hours: u64,
198198+ /// Maximum events per WriteBatch.
199199+ pub batch_size: usize,
200200+ /// Maximum wait time (ms) for batch fill.
201201+ pub batch_timeout_ms: u64,
202202+ /// Set of admin DIDs.
203203+ pub admin_dids: HashSet<String>,
204204+ /// Optional path to a zstd dictionary file for event compression.
205205+ pub zstd_dict_path: Option<PathBuf>,
206206+ /// DIDs to backfill on startup.
207207+ pub backfill_dids: Vec<String>,
208208+ /// Pre-defined consumer groups.
209209+ pub consumer_groups: Vec<ConsumerGroup>,
210210+}
211211+212212+impl From<CliArgs> for ServiceConfig {
213213+ fn from(args: CliArgs) -> Self {
214214+ let admin_dids: HashSet<String> = args
215215+ .admin_dids
216216+ .split(',')
217217+ .map(|s| s.trim().to_string())
218218+ .filter(|s| !s.is_empty())
219219+ .collect();
220220+221221+ let backfill_dids: Vec<String> = args
222222+ .backfill
223223+ .split(',')
224224+ .map(|s| s.trim().to_string())
225225+ .filter(|s| !s.is_empty())
226226+ .collect();
227227+228228+ let consumer_groups: Vec<ConsumerGroup> = args
229229+ .consumer_group
230230+ .iter()
231231+ .filter_map(|s| {
232232+ let parsed = ConsumerGroup::parse(s);
233233+ if parsed.is_none() {
234234+ tracing::warn!(value = %s, "ignoring invalid --consumer-group value, expected name:count");
235235+ }
236236+ parsed
237237+ })
238238+ .collect();
239239+240240+ Self {
241241+ db_path: args.db_path,
242242+ relay_host: args.relay_host,
243243+ listen_addr: args.listen_addr,
244244+ tracked_collections: CollectionMatcher::new(&args.tracked_collections),
245245+ forward_collections: CollectionMatcher::new(&args.forward_collections),
246246+ event_retention_hours: args.event_retention_hours,
247247+ batch_size: args.batch_size,
248248+ batch_timeout_ms: args.batch_timeout_ms,
249249+ admin_dids,
250250+ zstd_dict_path: args.zstd_dict_path,
251251+ backfill_dids,
252252+ consumer_groups,
253253+ }
254254+ }
255255+}
256256+257257+#[cfg(test)]
258258+mod tests {
259259+ use super::*;
260260+261261+ #[test]
262262+ fn test_exact_match() {
263263+ let p = CollectionPattern::parse("com.example");
264264+ assert!(p.matches("com.example"));
265265+ assert!(!p.matches("com.example.foo"));
266266+ assert!(!p.matches("com"));
267267+ }
268268+269269+ #[test]
270270+ fn test_single_wildcard() {
271271+ let p = CollectionPattern::parse("com.example.*");
272272+ assert!(p.matches("com.example.foo"));
273273+ assert!(p.matches("com.example.bar"));
274274+ assert!(!p.matches("com.example"));
275275+ assert!(!p.matches("com.example.foo.baz"));
276276+ assert!(!p.matches("com"));
277277+ }
278278+279279+ #[test]
280280+ fn test_glob_wildcard() {
281281+ let p = CollectionPattern::parse("com.example.**");
282282+ assert!(p.matches("com.example.foo"));
283283+ assert!(p.matches("com.example.bar"));
284284+ assert!(p.matches("com.example.foo.bar"));
285285+ assert!(p.matches("com.example.bar.baz.qux"));
286286+ assert!(!p.matches("com.example"));
287287+ assert!(!p.matches("com"));
288288+ }
289289+290290+ #[test]
291291+ fn test_match_all() {
292292+ let p = CollectionPattern::parse("*");
293293+ assert!(p.matches("anything"));
294294+ assert!(p.matches("com.example.foo"));
295295+ }
296296+297297+ #[test]
298298+ fn test_none() {
299299+ let p = CollectionPattern::parse("");
300300+ assert!(!p.matches("anything"));
301301+ }
302302+303303+ #[test]
304304+ fn test_matcher_multiple_patterns() {
305305+ let m = CollectionMatcher::new("garden.lexicon.** com.atproto.lexicon.**");
306306+ assert!(m.matches("garden.lexicon.foo"));
307307+ assert!(m.matches("garden.lexicon.foo.bar"));
308308+ assert!(m.matches("com.atproto.lexicon.def"));
309309+ assert!(!m.matches("app.bsky.feed.post"));
310310+ assert!(!m.matches("garden.lexicon"));
311311+ }
312312+313313+ #[test]
314314+ fn test_matcher_star() {
315315+ let m = CollectionMatcher::new("*");
316316+ assert!(m.matches("anything"));
317317+ }
318318+319319+ #[test]
320320+ fn test_matcher_empty() {
321321+ let m = CollectionMatcher::new("");
322322+ assert!(!m.matches("anything"));
323323+ }
324324+325325+ #[test]
326326+ fn test_admin_dids_parsing() {
327327+ let args = CliArgs {
328328+ db_path: PathBuf::from("/tmp"),
329329+ relay_host: "bsky.network".to_string(),
330330+ listen_addr: "0.0.0.0:8080".parse().unwrap(),
331331+ tracked_collections: "*".to_string(),
332332+ forward_collections: "*".to_string(),
333333+ event_retention_hours: 72,
334334+ batch_size: 500,
335335+ batch_timeout_ms: 100,
336336+ admin_dids: "did:plc:abc, did:plc:def".to_string(),
337337+ zstd_dict_path: None,
338338+ backfill: String::new(),
339339+ consumer_group: Vec::new(),
340340+ };
341341+ let config = ServiceConfig::from(args);
342342+ assert!(config.admin_dids.contains("did:plc:abc"));
343343+ assert!(config.admin_dids.contains("did:plc:def"));
344344+ assert_eq!(config.admin_dids.len(), 2);
345345+ }
346346+}
+59
src/errors.rs
···11+//! Error types for the Ramjet service.
22+//!
33+//! All errors follow the convention: `error-ramjet-{domain}-{number} {message}: {details}`
44+55+/// Top-level error type for Ramjet operations.
66+#[derive(Debug, thiserror::Error)]
77+pub enum RamjetError {
88+ /// Fjall storage engine error.
99+ #[error("error-ramjet-storage-1 fjall operation failed: {0}")]
1010+ Fjall(#[from] fjall::Error),
1111+1212+ /// Key encoding or decoding error.
1313+ #[error("error-ramjet-storage-2 key encoding failed: {reason}")]
1414+ KeyEncoding {
1515+ /// Description of the encoding failure.
1616+ reason: String,
1717+ },
1818+1919+ /// Value encoding or decoding error.
2020+ #[error("error-ramjet-storage-3 value decoding failed: {reason}")]
2121+ ValueDecoding {
2222+ /// Description of the decoding failure.
2323+ reason: String,
2424+ },
2525+2626+ /// Invalid configuration.
2727+ #[error("error-ramjet-config-1 invalid configuration: {reason}")]
2828+ Config {
2929+ /// Description of the configuration error.
3030+ reason: String,
3131+ },
3232+3333+ /// HTTP server error.
3434+ #[error("error-ramjet-server-1 server error: {reason}")]
3535+ Server {
3636+ /// Description of the server error.
3737+ reason: String,
3838+ },
3939+4040+ /// Pipeline processing error.
4141+ #[error("error-ramjet-pipeline-1 pipeline error: {reason}")]
4242+ Pipeline {
4343+ /// Description of the pipeline error.
4444+ reason: String,
4545+ },
4646+4747+ /// I/O error.
4848+ #[error("error-ramjet-io-1 I/O error: {0}")]
4949+ Io(#[from] std::io::Error),
5050+5151+ /// Failed to load zstd dictionary file.
5252+ #[error("error-ramjet-storage-4 failed to load zstd dictionary at {path}: {source}")]
5353+ DictLoad {
5454+ /// Path to the dictionary file.
5555+ path: std::path::PathBuf,
5656+ /// Underlying I/O error.
5757+ source: std::io::Error,
5858+ },
5959+}
+13
src/lib.rs
···11+//! Ramjet: Rust-native ATProtocol event-stream, record, and blob service.
22+//!
33+//! Built on fjall v3 (pure-Rust LSM-tree) for storage, axum for HTTP/WS,
44+//! and the `atproto-crates` ecosystem for protocol types and identity.
55+66+#![forbid(unsafe_code)]
77+88+pub mod config;
99+pub mod errors;
1010+pub mod pipeline;
1111+pub mod server;
1212+pub mod storage;
1313+pub mod types;
+237
src/main.rs
···11+//! Ramjet service entry point.
22+//!
33+//! Initializes configuration, storage, metrics, and the HTTP server,
44+//! then spawns the async pipeline tasks.
55+66+use std::sync::Arc;
77+88+use atproto_identity::resolve::{
99+ HickoryDnsResolver, InnerIdentityResolver, SharedIdentityResolver,
1010+};
1111+use atproto_identity::traits::IdentityResolver;
1212+use clap::Parser;
1313+use tokio::net::TcpListener;
1414+use tokio::sync::{Semaphore, mpsc};
1515+use tokio_util::sync::CancellationToken;
1616+1717+use ramjet::config::{CliArgs, ServiceConfig};
1818+use ramjet::pipeline::backfill::run_backfill_worker;
1919+use ramjet::pipeline::fanout::FanOutChannels;
2020+use ramjet::pipeline::identity::run_identity_worker;
2121+use ramjet::pipeline::ingester::run_ingester;
2222+use ramjet::pipeline::writer::run_writer;
2323+use ramjet::server::metrics::Metrics;
2424+use ramjet::server::{AppState, build_router};
2525+use ramjet::storage::FjallDb;
2626+2727+#[tokio::main]
2828+async fn main() -> anyhow::Result<()> {
2929+ // Parse CLI arguments and build config
3030+ let args = CliArgs::parse();
3131+ let config = ServiceConfig::from(args);
3232+3333+ // Initialize tracing
3434+ tracing_subscriber::fmt()
3535+ .with_env_filter(
3636+ tracing_subscriber::EnvFilter::try_from_default_env()
3737+ .unwrap_or_else(|_| tracing_subscriber::EnvFilter::new("info")),
3838+ )
3939+ .init();
4040+4141+ tracing::info!(
4242+ db_path = %config.db_path.display(),
4343+ listen_addr = %config.listen_addr,
4444+ relay_host = %config.relay_host,
4545+ "starting ramjet"
4646+ );
4747+4848+ // Open storage
4949+ let db = Arc::new(FjallDb::open(
5050+ &config.db_path,
5151+ config.zstd_dict_path.as_deref(),
5252+ )?);
5353+ tracing::info!("fjall database opened");
5454+5555+ if let Ok(Some(cursor)) = db.get_cursor() {
5656+ tracing::info!(cursor, "resuming from stored cursor");
5757+ } else {
5858+ tracing::info!("no stored cursor, will start from live head");
5959+ }
6060+6161+ // Queue any DIDs specified via --backfill that haven't been backfilled yet.
6262+ for did in &config.backfill_dids {
6363+ let already_backfilled = db
6464+ .get_repo_state(did)
6565+ .ok()
6666+ .flatten()
6767+ .is_some_and(|rs| rs.backfilled);
6868+ if already_backfilled {
6969+ tracing::info!(%did, "skipping --backfill DID, already backfilled");
7070+ } else {
7171+ let queue_key = format!("backfill_queue\x00{did}");
7272+ db.meta.insert(queue_key.as_bytes(), b"")?;
7373+ tracing::info!(%did, "queued --backfill DID for backfill");
7474+ }
7575+ }
7676+7777+ // Create shared state
7878+ let config = Arc::new(config);
7979+ let fanout = Arc::new(FanOutChannels::with_consumer_groups(
8080+ 8192,
8181+ &config.consumer_groups,
8282+ ));
8383+ for group in &config.consumer_groups {
8484+ tracing::info!(
8585+ group = %group.name,
8686+ partitions = group.partition_count,
8787+ "registered consumer group"
8888+ );
8989+ }
9090+ let metrics = Arc::new(Metrics::new());
9191+ let cancel = CancellationToken::new();
9292+9393+ // Build identity resolver
9494+ let dns_resolver = HickoryDnsResolver::create_resolver(&[]);
9595+ let http_client = reqwest::Client::new();
9696+ let identity_resolver: Arc<dyn IdentityResolver> =
9797+ Arc::new(SharedIdentityResolver(Arc::new(InnerIdentityResolver {
9898+ dns_resolver: Arc::new(dns_resolver),
9999+ http_client,
100100+ plc_hostname: "plc.directory".to_string(),
101101+ })));
102102+103103+ let state = AppState {
104104+ db: db.clone(),
105105+ config: config.clone(),
106106+ metrics: metrics.clone(),
107107+ fanout: fanout.clone(),
108108+ resolver: identity_resolver.clone(),
109109+ };
110110+111111+ // Ingester → Writer channel
112112+ let (ingest_tx, ingest_rx) = mpsc::channel(81920);
113113+114114+ // Identity event channel
115115+ let (identity_tx, identity_rx) = mpsc::channel(40960);
116116+117117+ // Task monitor for tokio-metrics instrumentation
118118+ let task_monitor = metrics.tokio_metrics.monitor().clone();
119119+120120+ // Spawn tokio-metrics collector (updates prometheus gauges every 10s)
121121+ let tokio_metrics_handle = tokio::spawn({
122122+ let tokio_metrics = metrics.tokio_metrics.clone();
123123+ let cancel = cancel.clone();
124124+ async move {
125125+ tokio_metrics
126126+ .run_collector(std::time::Duration::from_secs(10), cancel)
127127+ .await;
128128+ }
129129+ });
130130+131131+ // Spawn pipeline tasks (all instrumented with tokio-metrics)
132132+ let ingester_handle = tokio::spawn(task_monitor.instrument({
133133+ let config = config.clone();
134134+ let db = db.clone();
135135+ let identity_tx = identity_tx.clone();
136136+ let metrics = metrics.clone();
137137+ let cancel = cancel.clone();
138138+ async move {
139139+ if let Err(e) = run_ingester(config, db, ingest_tx, identity_tx, metrics, cancel).await
140140+ {
141141+ tracing::error!(error = %e, "ingester failed");
142142+ }
143143+ }
144144+ }));
145145+146146+ let writer_handle = tokio::spawn(task_monitor.instrument({
147147+ let config = config.clone();
148148+ let db = db.clone();
149149+ let fanout = fanout.clone();
150150+ let metrics = metrics.clone();
151151+ let cancel = cancel.clone();
152152+ async move {
153153+ if let Err(e) = run_writer(config, db, ingest_rx, fanout, metrics, cancel).await {
154154+ tracing::error!(error = %e, "writer failed");
155155+ }
156156+ }
157157+ }));
158158+159159+ let identity_handle = tokio::spawn(task_monitor.instrument({
160160+ let db = db.clone();
161161+ let cancel = cancel.clone();
162162+ let semaphore = Arc::new(Semaphore::new(10));
163163+ let resolver = identity_resolver.clone();
164164+ let metrics = metrics.clone();
165165+ async move {
166166+ if let Err(e) =
167167+ run_identity_worker(db, resolver, semaphore, metrics, identity_rx, cancel).await
168168+ {
169169+ tracing::error!(error = %e, "identity worker failed");
170170+ }
171171+ }
172172+ }));
173173+174174+ let backfill_handle = tokio::spawn(task_monitor.instrument({
175175+ let db = db.clone();
176176+ let config = config.clone();
177177+ let fanout = fanout.clone();
178178+ let resolver = identity_resolver.clone();
179179+ let metrics = metrics.clone();
180180+ let cancel = cancel.clone();
181181+ async move {
182182+ if let Err(e) = run_backfill_worker(db, config, fanout, resolver, metrics, cancel).await
183183+ {
184184+ tracing::error!(error = %e, "backfill worker failed");
185185+ }
186186+ }
187187+ }));
188188+189189+ // Build router and bind
190190+ let router = build_router(state);
191191+ let listener = TcpListener::bind(config.listen_addr).await?;
192192+ tracing::info!(addr = %config.listen_addr, "HTTP server listening");
193193+194194+ // Serve with graceful shutdown
195195+ axum::serve(listener, router)
196196+ .with_graceful_shutdown(shutdown_signal(cancel.clone()))
197197+ .await?;
198198+199199+ // Cancel all pipeline tasks
200200+ cancel.cancel();
201201+ let _ = tokio::join!(
202202+ ingester_handle,
203203+ writer_handle,
204204+ identity_handle,
205205+ backfill_handle,
206206+ tokio_metrics_handle,
207207+ );
208208+209209+ tracing::info!("ramjet shut down");
210210+ Ok(())
211211+}
212212+213213+async fn shutdown_signal(cancel: CancellationToken) {
214214+ let ctrl_c = async {
215215+ tokio::signal::ctrl_c()
216216+ .await
217217+ .expect("failed to install Ctrl+C handler");
218218+ };
219219+220220+ #[cfg(unix)]
221221+ let terminate = async {
222222+ tokio::signal::unix::signal(tokio::signal::unix::SignalKind::terminate())
223223+ .expect("failed to install SIGTERM handler")
224224+ .recv()
225225+ .await;
226226+ };
227227+228228+ #[cfg(not(unix))]
229229+ let terminate = std::future::pending::<()>();
230230+231231+ tokio::select! {
232232+ () = ctrl_c => tracing::info!("received Ctrl+C"),
233233+ () = terminate => tracing::info!("received SIGTERM"),
234234+ }
235235+236236+ cancel.cancel();
237237+}
+411
src/pipeline/backfill.rs
···11+//! Backfill worker pipeline.
22+//!
33+//! Fetches full repository data from PDS endpoints for repos that need
44+//! resynchronization. Downloads are buffered via [`SpillableBuffer`] (spills
55+//! to disk when the response exceeds 10 MB) and parsed with
66+//! [`DiskRepository`] so that large repos never blow up memory.
77+88+use std::collections::HashMap;
99+use std::sync::Arc;
1010+use std::time::{Instant, SystemTime, UNIX_EPOCH};
1111+1212+use atproto_dasl::Ipld;
1313+use atproto_identity::traits::IdentityResolver;
1414+use atproto_repo::SpillableBuffer;
1515+use bytes::Bytes;
1616+use tokio_util::sync::CancellationToken;
1717+1818+use crate::config::ServiceConfig;
1919+use crate::pipeline::fanout::FanOutChannels;
2020+use crate::server::metrics::{KeyspaceOpLabels, Metrics, StorageOp};
2121+use crate::server::reconciliation;
2222+use crate::storage::FjallDb;
2323+use crate::storage::encoding::{
2424+ self, CompactOp, decode_timestamped_doc, encode_timestamped_doc, ipld_map, to_dag_cbor_bytes,
2525+};
2626+use crate::storage::keys;
2727+use crate::types::{EventType, FanOutEvent, OpType, RecordValue};
2828+2929+const BACKFILL_QUEUE_PREFIX: &[u8] = b"backfill_queue\x00";
3030+const POLL_INTERVAL_SECS: u64 = 5;
3131+const MAX_BACKFILL_FAILURES: u32 = 5;
3232+3333+/// Run the backfill worker task.
3434+pub async fn run_backfill_worker(
3535+ db: Arc<FjallDb>,
3636+ config: Arc<ServiceConfig>,
3737+ fanout: Arc<FanOutChannels>,
3838+ resolver: Arc<dyn IdentityResolver>,
3939+ metrics: Arc<Metrics>,
4040+ cancel: CancellationToken,
4141+) -> anyhow::Result<()> {
4242+ tracing::info!("backfill worker started");
4343+4444+ let client = reqwest::Client::builder()
4545+ .timeout(std::time::Duration::from_secs(60))
4646+ .build()?;
4747+4848+ let mut failure_counts: HashMap<String, u32> = HashMap::new();
4949+5050+ loop {
5151+ tokio::select! {
5252+ biased;
5353+ _ = cancel.cancelled() => break,
5454+ _ = tokio::time::sleep(std::time::Duration::from_secs(POLL_INTERVAL_SECS)) => {}
5555+ }
5656+5757+ let mut queue_entries = Vec::new();
5858+ for guard in db.meta.prefix(BACKFILL_QUEUE_PREFIX) {
5959+ let Ok((key, _value)) = guard.into_inner() else {
6060+ continue;
6161+ };
6262+ let key_bytes: &[u8] = &key;
6363+ if let Some(did_bytes) = key_bytes.strip_prefix(BACKFILL_QUEUE_PREFIX) {
6464+ if let Ok(did) = std::str::from_utf8(did_bytes) {
6565+ queue_entries.push(did.to_string());
6666+ }
6767+ }
6868+ }
6969+7070+ metrics.backfill_queue_depth.set(queue_entries.len() as u64);
7171+7272+ for did in &queue_entries {
7373+ if cancel.is_cancelled() {
7474+ break;
7575+ }
7676+7777+ let start = Instant::now();
7878+ match process_backfill(&db, &config, &fanout, &*resolver, &metrics, &client, did).await
7979+ {
8080+ Ok(()) => {
8181+ let queue_key = format!("backfill_queue\x00{did}");
8282+ let _ = db.meta.remove(queue_key.as_bytes());
8383+ failure_counts.remove(did.as_str());
8484+8585+ // Mark repo as backfilled so it won't be re-queued.
8686+ if let Ok(Some(mut repo_state)) = db.get_repo_state(did) {
8787+ repo_state.backfilled = true;
8888+ let _ = db.repo_state.insert(did.as_bytes(), repo_state.encode());
8989+ metrics
9090+ .storage_ops_total
9191+ .get_or_create(&KeyspaceOpLabels {
9292+ keyspace: "repo_state".to_string(),
9393+ op: StorageOp::Write,
9494+ })
9595+ .inc();
9696+ }
9797+9898+ metrics.backfill_repos_total.inc();
9999+ metrics
100100+ .backfill_duration_seconds
101101+ .observe(start.elapsed().as_secs_f64());
102102+ tracing::info!(%did, "backfill completed");
103103+ }
104104+ Err(e) => {
105105+ metrics.backfill_failures_total.inc();
106106+ let count = failure_counts.entry(did.clone()).or_insert(0);
107107+ *count += 1;
108108+109109+ if *count >= MAX_BACKFILL_FAILURES {
110110+ tracing::warn!(
111111+ %did,
112112+ failures = *count,
113113+ error = %e,
114114+ "backfill exceeded max failures, denying repo"
115115+ );
116116+117117+ let queue_key = format!("backfill_queue\x00{did}");
118118+ let _ = db.meta.remove(queue_key.as_bytes());
119119+ failure_counts.remove(did.as_str());
120120+121121+ let mut repo_state =
122122+ db.get_repo_state(did).ok().flatten().unwrap_or_default();
123123+ repo_state.denied = true;
124124+ let _ = db.repo_state.insert(did.as_bytes(), repo_state.encode());
125125+ } else {
126126+ tracing::warn!(
127127+ %did,
128128+ failures = *count,
129129+ error = %e,
130130+ "backfill failed, will retry"
131131+ );
132132+ }
133133+ }
134134+ }
135135+136136+ tokio::time::sleep(std::time::Duration::from_millis(500)).await;
137137+ }
138138+ }
139139+140140+ tracing::info!("backfill worker shutting down");
141141+ Ok(())
142142+}
143143+144144+/// Memory threshold for the spillable download buffer (10 MB).
145145+const SPILL_THRESHOLD: usize = 10 * 1024 * 1024;
146146+147147+async fn process_backfill(
148148+ db: &FjallDb,
149149+ config: &ServiceConfig,
150150+ fanout: &FanOutChannels,
151151+ resolver: &dyn IdentityResolver,
152152+ metrics: &Metrics,
153153+ client: &reqwest::Client,
154154+ did: &str,
155155+) -> anyhow::Result<()> {
156156+ let pds_url = find_pds_endpoint(db, resolver, did).await?;
157157+158158+ tracing::info!(%did, %pds_url, "starting backfill");
159159+160160+ let url = format!("{pds_url}/xrpc/com.atproto.sync.getRepo?did={did}");
161161+ let response = client.get(&url).send().await?;
162162+163163+ if !response.status().is_success() {
164164+ anyhow::bail!(
165165+ "getRepo returned {}: {}",
166166+ response.status(),
167167+ response.text().await.unwrap_or_default()
168168+ );
169169+ }
170170+171171+ // Use content-length hint to decide memory vs disk up front.
172172+ let content_length = response.content_length();
173173+ let mut buffer = SpillableBuffer::with_size_hint(SPILL_THRESHOLD, content_length).await?;
174174+175175+ let mut response = response;
176176+ while let Some(chunk) = response.chunk().await? {
177177+ buffer.write_chunk(&chunk).await?;
178178+ }
179179+180180+ tracing::info!(
181181+ %did,
182182+ bytes = buffer.bytes_written(),
183183+ on_disk = buffer.is_on_disk(),
184184+ "downloaded repo CAR"
185185+ );
186186+187187+ let reader = buffer.into_reader().await?;
188188+189189+ // Parse the CAR with disk-backed storage so large MSTs don't blow up memory.
190190+ let backfill_config = atproto_repo::RepoConfig::default()
191191+ .with_limits(atproto_dasl::LimitsConfig::high_throughput());
192192+ let repo = atproto_repo::DiskRepository::from_car(reader, backfill_config).await?;
193193+194194+ let commit_rev = repo.rev().to_string();
195195+ let collections = repo.list_collections().await?;
196196+197197+ let mut batch = db.batch();
198198+ let mut record_count: u64 = 0;
199199+200200+ let time_us = now_micros();
201201+202202+ /// A backfilled record pending fan-out after batch commit.
203203+ struct PendingEvent {
204204+ seq: u64,
205205+ collection: String,
206206+ payload: Bytes,
207207+ }
208208+209209+ let mut pending_events: Vec<PendingEvent> = Vec::new();
210210+211211+ for collection in &collections {
212212+ if !config.tracked_collections.matches(collection) {
213213+ continue;
214214+ }
215215+216216+ let entries = repo.list_collection(collection).await?;
217217+ for (path, cid) in &entries {
218218+ let record_bytes = repo.get_record_bytes(path).await?;
219219+ let Some(data) = record_bytes else { continue };
220220+221221+ let record_key = keys::encode_record_key(did, collection, &path.rkey, &commit_rev);
222222+ let cid_bytes: Vec<u8> = cid.to_bytes();
223223+224224+ let record_value = RecordValue {
225225+ cid: cid_bytes.clone(),
226226+ data: data.clone(),
227227+ };
228228+229229+ let compressed_record = db.compress_event(&record_value.encode());
230230+ batch.insert(&db.records, &record_key, compressed_record);
231231+ record_count += 1;
232232+233233+ // Allocate event sequence and persist compact event for replay.
234234+ let event_seq = db.next_event_seq();
235235+ let compact_op = CompactOp {
236236+ action: OpType::Create,
237237+ collection: collection.clone(),
238238+ rkey: path.rkey.clone(),
239239+ cid: Some(cid_bytes.clone()),
240240+ data: data.clone(),
241241+ };
242242+ let compact_event = encoding::encode_compact_commit_op(
243243+ event_seq,
244244+ time_us,
245245+ did,
246246+ &commit_rev,
247247+ &compact_op,
248248+ );
249249+ let compressed = db.compress_event(&compact_event);
250250+ let event_key = keys::encode_event_key(event_seq);
251251+ batch.insert(&db.events, &event_key, &compressed);
252252+ metrics
253253+ .storage_ops_total
254254+ .get_or_create(&KeyspaceOpLabels {
255255+ keyspace: "events".to_string(),
256256+ op: StorageOp::Write,
257257+ })
258258+ .inc();
259259+260260+ // Build fan-out payload.
261261+ let payload = create_backfill_commit_payload(
262262+ event_seq,
263263+ time_us,
264264+ did,
265265+ &commit_rev,
266266+ collection,
267267+ &path.rkey,
268268+ &cid_bytes,
269269+ &data,
270270+ );
271271+ pending_events.push(PendingEvent {
272272+ seq: event_seq,
273273+ collection: collection.clone(),
274274+ payload,
275275+ });
276276+ }
277277+ }
278278+279279+ let mut repo_state = db.get_repo_state(did)?.unwrap_or_default();
280280+ repo_state.rev = commit_rev;
281281+ batch.insert(&db.repo_state, did.as_bytes(), repo_state.encode());
282282+283283+ // Persist the current event sequence counter.
284284+ batch.insert(&db.meta, b"event_seq", &db.current_sequence().to_be_bytes());
285285+286286+ batch.commit()?;
287287+288288+ // Invalidate RIBLT sketch cache after records change.
289289+ reconciliation::invalidate_sketch_cache(db, did);
290290+291291+ // Fan-out events only after persistence succeeds.
292292+ for event in pending_events {
293293+ let fan_event = Arc::new(FanOutEvent {
294294+ seq: event.seq,
295295+ did: did.into(),
296296+ event_type: EventType::Commit {
297297+ collection: event.collection,
298298+ },
299299+ payload: event.payload,
300300+ });
301301+ let _ = fanout.high_priority_tx.send(fan_event.clone());
302302+ fanout.send_partitioned(&fan_event);
303303+ }
304304+305305+ metrics
306306+ .storage_ops_total
307307+ .get_or_create(&KeyspaceOpLabels {
308308+ keyspace: "records".to_string(),
309309+ op: StorageOp::Write,
310310+ })
311311+ .inc_by(record_count);
312312+ metrics
313313+ .storage_ops_total
314314+ .get_or_create(&KeyspaceOpLabels {
315315+ keyspace: "repo_state".to_string(),
316316+ op: StorageOp::Write,
317317+ })
318318+ .inc();
319319+ metrics.backfill_records_total.inc_by(record_count);
320320+321321+ tracing::info!(%did, record_count, "backfill wrote records");
322322+ Ok(())
323323+}
324324+325325+/// Current time in microseconds since Unix epoch.
326326+fn now_micros() -> u64 {
327327+ std::time::SystemTime::now()
328328+ .duration_since(std::time::UNIX_EPOCH)
329329+ .unwrap_or_default()
330330+ .as_micros() as u64
331331+}
332332+333333+/// Create the DAG-CBOR payload for a backfilled record (create operation).
334334+fn create_backfill_commit_payload(
335335+ seq: u64,
336336+ _time_us: u64,
337337+ did: &str,
338338+ rev: &str,
339339+ collection: &str,
340340+ rkey: &str,
341341+ cid_bytes: &[u8],
342342+ data: &[u8],
343343+) -> Bytes {
344344+ let cid_str = cid::Cid::read_bytes(cid_bytes)
345345+ .map(|c| c.to_string())
346346+ .unwrap_or_default();
347347+348348+ let mut commit_fields: Vec<(&str, Ipld)> = vec![
349349+ ("rev", Ipld::String(rev.to_string())),
350350+ ("operation", Ipld::String("create".to_string())),
351351+ ("collection", Ipld::String(collection.to_string())),
352352+ ("rkey", Ipld::String(rkey.to_string())),
353353+ ("cid", Ipld::String(cid_str)),
354354+ ];
355355+356356+ if let Ok(ipld) = atproto_dasl::drisl::from_slice::<Ipld>(data) {
357357+ commit_fields.push(("record", ipld));
358358+ }
359359+360360+ let event = ipld_map(vec![
361361+ ("seq", Ipld::Integer(seq.into())),
362362+ ("did", Ipld::String(did.to_string())),
363363+ ("kind", Ipld::String("commit".to_string())),
364364+ ("commit", ipld_map(commit_fields)),
365365+ ]);
366366+367367+ to_dag_cbor_bytes(&event).into()
368368+}
369369+370370+async fn find_pds_endpoint(
371371+ db: &FjallDb,
372372+ resolver: &dyn IdentityResolver,
373373+ did: &str,
374374+) -> anyhow::Result<String> {
375375+ // Try cached DID document first.
376376+ if let Some(doc_bytes) = db.did_to_doc.get(did.as_bytes())? {
377377+ let slice: &[u8] = &doc_bytes;
378378+ let (_ts, json_bytes) = decode_timestamped_doc(slice);
379379+ if let Ok(pds) = extract_pds_from_json(json_bytes) {
380380+ return Ok(pds);
381381+ }
382382+ }
383383+384384+ // No cached document (or no PDS in it) — resolve live and cache.
385385+ tracing::info!(%did, "resolving DID document for backfill");
386386+ let document = resolver.resolve(did).await?;
387387+ let doc_json = serde_json::to_vec(&document)?;
388388+ let now = SystemTime::now()
389389+ .duration_since(UNIX_EPOCH)
390390+ .unwrap_or_default()
391391+ .as_secs();
392392+ let value = encode_timestamped_doc(now, &doc_json);
393393+ let _ = db.did_to_doc.insert(did.as_bytes(), &value);
394394+395395+ extract_pds_from_json(&doc_json)
396396+}
397397+398398+fn extract_pds_from_json(json_bytes: &[u8]) -> anyhow::Result<String> {
399399+ let doc: serde_json::Value = serde_json::from_slice(json_bytes)?;
400400+ if let Some(services) = doc.get("service").and_then(|v| v.as_array()) {
401401+ for svc in services {
402402+ let svc_type = svc.get("type").and_then(|v| v.as_str()).unwrap_or("");
403403+ if svc_type == "AtprotoPersonalDataServer" {
404404+ if let Some(endpoint) = svc.get("serviceEndpoint").and_then(|v| v.as_str()) {
405405+ return Ok(endpoint.trim_end_matches('/').to_string());
406406+ }
407407+ }
408408+ }
409409+ }
410410+ anyhow::bail!("no PDS endpoint found in DID document")
411411+}
+115
src/pipeline/fanout.rs
···11+//! Fan-out channels for WebSocket event delivery.
22+//!
33+//! Two broadcast channels (high-priority and low-priority) connect
44+//! the batch writer to per-consumer WebSocket tasks. Optional consumer
55+//! groups add per-partition channels for sharded delivery.
66+77+use std::collections::HashMap;
88+use std::hash::{DefaultHasher, Hash, Hasher};
99+1010+use crate::config::ConsumerGroup;
1111+use crate::types::SharedFanOutEvent;
1212+use tokio::sync::broadcast;
1313+1414+/// Default capacity for broadcast channels.
1515+const DEFAULT_CAPACITY: usize = 8192;
1616+1717+/// Broadcast channels for event fan-out to WebSocket consumers.
1818+pub struct FanOutChannels {
1919+ /// High-priority channel for tracked collections, identity, account, and sync events.
2020+ pub high_priority_tx: broadcast::Sender<SharedFanOutEvent>,
2121+ /// Low-priority channel for forwarded (non-tracked) collection events.
2222+ pub low_priority_tx: broadcast::Sender<SharedFanOutEvent>,
2323+ /// Per-group per-partition channels. Key is group name.
2424+ consumer_groups: HashMap<String, GroupChannels>,
2525+}
2626+2727+/// Per-partition broadcast channels for a consumer group.
2828+struct GroupChannels {
2929+ /// One sender per partition, indexed by partition number.
3030+ partitions: Vec<broadcast::Sender<SharedFanOutEvent>>,
3131+}
3232+3333+impl FanOutChannels {
3434+ /// Create new fan-out channels with the given capacity.
3535+ pub fn new(capacity: usize) -> Self {
3636+ let (high_priority_tx, _) = broadcast::channel(capacity);
3737+ let (low_priority_tx, _) = broadcast::channel(capacity);
3838+ Self {
3939+ high_priority_tx,
4040+ low_priority_tx,
4141+ consumer_groups: HashMap::new(),
4242+ }
4343+ }
4444+4545+ /// Create fan-out channels with consumer group support.
4646+ pub fn with_consumer_groups(capacity: usize, groups: &[ConsumerGroup]) -> Self {
4747+ let mut channels = Self::new(capacity);
4848+ for group in groups {
4949+ let partitions = (0..group.partition_count)
5050+ .map(|_| broadcast::channel(capacity).0)
5151+ .collect();
5252+ channels
5353+ .consumer_groups
5454+ .insert(group.name.clone(), GroupChannels { partitions });
5555+ }
5656+ channels
5757+ }
5858+5959+ /// Subscribe to the high-priority channel.
6060+ pub fn subscribe_high(&self) -> broadcast::Receiver<SharedFanOutEvent> {
6161+ self.high_priority_tx.subscribe()
6262+ }
6363+6464+ /// Subscribe to the low-priority channel.
6565+ pub fn subscribe_low(&self) -> broadcast::Receiver<SharedFanOutEvent> {
6666+ self.low_priority_tx.subscribe()
6767+ }
6868+6969+ /// Subscribe to a specific partition of a consumer group.
7070+ /// Returns `None` if the group or partition doesn't exist.
7171+ pub fn subscribe_partition(
7272+ &self,
7373+ group: &str,
7474+ partition: u16,
7575+ ) -> Option<broadcast::Receiver<SharedFanOutEvent>> {
7676+ let gc = self.consumer_groups.get(group)?;
7777+ gc.partitions
7878+ .get(partition as usize)
7979+ .map(|tx| tx.subscribe())
8080+ }
8181+8282+ /// Get the partition count for a consumer group.
8383+ /// Returns `None` if the group doesn't exist.
8484+ pub fn partition_count(&self, group: &str) -> Option<u16> {
8585+ self.consumer_groups
8686+ .get(group)
8787+ .map(|gc| gc.partitions.len() as u16)
8888+ }
8989+9090+ /// Route an event to the correct partition channel for each consumer group.
9191+ /// The DID is hashed to determine the target partition.
9292+ pub fn send_partitioned(&self, event: &SharedFanOutEvent) {
9393+ if self.consumer_groups.is_empty() {
9494+ return;
9595+ }
9696+ let hash = did_hash(&event.did);
9797+ for gc in self.consumer_groups.values() {
9898+ let partition = (hash % gc.partitions.len() as u64) as usize;
9999+ let _ = gc.partitions[partition].send(event.clone());
100100+ }
101101+ }
102102+}
103103+104104+impl Default for FanOutChannels {
105105+ fn default() -> Self {
106106+ Self::new(DEFAULT_CAPACITY)
107107+ }
108108+}
109109+110110+/// Compute a deterministic hash of a DID string for partition assignment.
111111+pub fn did_hash(did: &str) -> u64 {
112112+ let mut hasher = DefaultHasher::new();
113113+ did.hash(&mut hasher);
114114+ hasher.finish()
115115+}
+187
src/pipeline/identity.rs
···11+//! Identity resolution pipeline.
22+//!
33+//! Processes `#identity` events by resolving full DID documents via an
44+//! `IdentityResolver` and updating the `did_to_doc` and `handle_to_did`
55+//! keyspaces.
66+77+use std::sync::Arc;
88+use std::time::{Instant, SystemTime, UNIX_EPOCH};
99+1010+use atproto_identity::traits::IdentityResolver;
1111+use tokio::sync::Semaphore;
1212+use tokio::sync::mpsc;
1313+use tokio_util::sync::CancellationToken;
1414+1515+use crate::server::metrics::{
1616+ IdentityOutcome, IdentityOutcomeLabels, KeyspaceOpLabels, Metrics, StorageOp,
1717+};
1818+use crate::storage::FjallDb;
1919+use crate::storage::encoding::{decode_timestamped_doc, encode_timestamped_doc};
2020+2121+/// An identity event to process.
2222+#[derive(Debug)]
2323+pub struct IdentityEvent {
2424+ pub did: String,
2525+ pub handle: Option<String>,
2626+}
2727+2828+/// Run the identity resolution worker.
2929+pub async fn run_identity_worker(
3030+ db: Arc<FjallDb>,
3131+ resolver: Arc<dyn IdentityResolver>,
3232+ semaphore: Arc<Semaphore>,
3333+ metrics: Arc<Metrics>,
3434+ mut rx: mpsc::Receiver<IdentityEvent>,
3535+ cancel: CancellationToken,
3636+) -> anyhow::Result<()> {
3737+ tracing::info!("identity worker started");
3838+3939+ loop {
4040+ tokio::select! {
4141+ biased;
4242+ _ = cancel.cancelled() => break,
4343+ event = rx.recv() => {
4444+ let Some(event) = event else { break };
4545+ let permit = semaphore.clone().acquire_owned().await?;
4646+ let db = db.clone();
4747+ let resolver = resolver.clone();
4848+ let metrics = metrics.clone();
4949+ tokio::spawn(async move {
5050+ process_identity_event(&db, &*resolver, &metrics, &event).await;
5151+ drop(permit);
5252+ });
5353+ }
5454+ }
5555+ }
5656+5757+ tracing::info!("identity worker shutting down");
5858+ Ok(())
5959+}
6060+6161+async fn process_identity_event(
6262+ db: &FjallDb,
6363+ resolver: &dyn IdentityResolver,
6464+ metrics: &Metrics,
6565+ event: &IdentityEvent,
6666+) {
6767+ let start = Instant::now();
6868+6969+ // Resolve the full DID document
7070+ let document = match resolver.resolve(&event.did).await {
7171+ Ok(doc) => doc,
7272+ Err(e) => {
7373+ let elapsed = start.elapsed().as_secs_f64();
7474+ metrics.identity_resolve_duration_seconds.observe(elapsed);
7575+ metrics
7676+ .identity_resolves_total
7777+ .get_or_create(&IdentityOutcomeLabels {
7878+ outcome: IdentityOutcome::Failure,
7979+ })
8080+ .inc();
8181+8282+ tracing::warn!(did = %event.did, error = %e, "failed to resolve DID document");
8383+ // If we have a handle from the firehose event, store a minimal mapping
8484+ // even if full resolution fails
8585+ if let Some(ref handle) = event.handle {
8686+ if let Err(e) = update_handle_mapping(db, &event.did, handle) {
8787+ tracing::warn!(did = %event.did, error = %e, "failed to update handle mapping");
8888+ }
8989+ }
9090+ return;
9191+ }
9292+ };
9393+9494+ let elapsed = start.elapsed().as_secs_f64();
9595+ metrics.identity_resolve_duration_seconds.observe(elapsed);
9696+ metrics
9797+ .identity_resolves_total
9898+ .get_or_create(&IdentityOutcomeLabels {
9999+ outcome: IdentityOutcome::Success,
100100+ })
101101+ .inc();
102102+103103+ // Serialize and store the full DID document with timestamp
104104+ let doc_json = match serde_json::to_vec(&document) {
105105+ Ok(j) => j,
106106+ Err(e) => {
107107+ tracing::warn!(did = %event.did, error = %e, "failed to serialize DID document");
108108+ return;
109109+ }
110110+ };
111111+ let now = SystemTime::now()
112112+ .duration_since(UNIX_EPOCH)
113113+ .unwrap_or_default()
114114+ .as_secs();
115115+ let value = encode_timestamped_doc(now, &doc_json);
116116+ if let Err(e) = db.did_to_doc.insert(event.did.as_bytes(), &value) {
117117+ tracing::warn!(did = %event.did, error = %e, "failed to store DID document");
118118+ return;
119119+ }
120120+ metrics
121121+ .storage_ops_total
122122+ .get_or_create(&KeyspaceOpLabels {
123123+ keyspace: "did_to_doc".to_string(),
124124+ op: StorageOp::Write,
125125+ })
126126+ .inc();
127127+128128+ // Extract handle from the resolved document's alsoKnownAs
129129+ let new_handle = document
130130+ .also_known_as
131131+ .iter()
132132+ .find_map(|alias| alias.strip_prefix("at://"));
133133+134134+ // Remove old handle mappings that differ from the new one
135135+ if let Some(new_handle) = new_handle {
136136+ if let Err(e) = remove_stale_handle_mappings(db, &event.did, new_handle) {
137137+ tracing::warn!(did = %event.did, error = %e, "failed to remove stale handle mappings");
138138+ }
139139+ let _ = db
140140+ .handle_to_did
141141+ .insert(new_handle.to_lowercase().as_bytes(), event.did.as_bytes());
142142+ metrics
143143+ .storage_ops_total
144144+ .get_or_create(&KeyspaceOpLabels {
145145+ keyspace: "handle_to_did".to_string(),
146146+ op: StorageOp::Write,
147147+ })
148148+ .inc();
149149+ }
150150+151151+ tracing::debug!(
152152+ did = %event.did,
153153+ handle = ?new_handle,
154154+ elapsed_ms = elapsed * 1000.0,
155155+ "stored DID document"
156156+ );
157157+}
158158+159159+/// Update handle→DID mapping without a full DID document resolution.
160160+fn update_handle_mapping(db: &FjallDb, did: &str, handle: &str) -> anyhow::Result<()> {
161161+ remove_stale_handle_mappings(db, did, handle)?;
162162+ db.handle_to_did
163163+ .insert(handle.to_lowercase().as_bytes(), did.as_bytes())?;
164164+ Ok(())
165165+}
166166+167167+/// Remove old handle→DID mappings for this DID that differ from the new handle.
168168+fn remove_stale_handle_mappings(db: &FjallDb, did: &str, new_handle: &str) -> anyhow::Result<()> {
169169+ if let Ok(Some(doc_bytes)) = db.did_to_doc.get(did.as_bytes()) {
170170+ let slice: &[u8] = &doc_bytes;
171171+ let (_ts, json_bytes) = decode_timestamped_doc(slice);
172172+ if let Ok(doc) = serde_json::from_slice::<serde_json::Value>(json_bytes) {
173173+ if let Some(aliases) = doc.get("alsoKnownAs").and_then(|v| v.as_array()) {
174174+ for alias in aliases {
175175+ if let Some(old_handle) = alias.as_str().and_then(|s| s.strip_prefix("at://")) {
176176+ let old_lower = old_handle.to_lowercase();
177177+ let new_lower = new_handle.to_lowercase();
178178+ if old_lower != new_lower {
179179+ let _ = db.handle_to_did.remove(old_lower.as_bytes());
180180+ }
181181+ }
182182+ }
183183+ }
184184+ }
185185+ }
186186+ Ok(())
187187+}
+570
src/pipeline/ingester.rs
···11+//! Firehose ingester pipeline.
22+//!
33+//! Maintains a WebSocket connection to the upstream relay, decodes
44+//! CBOR frames, and dispatches events to the batch writer channel.
55+66+use std::collections::VecDeque;
77+use std::sync::Arc;
88+use std::time::{Duration, Instant, SystemTime, UNIX_EPOCH};
99+1010+use atproto_dasl::Ipld;
1111+use futures_util::{SinkExt, StreamExt};
1212+use tokio::sync::mpsc;
1313+use tokio_util::sync::CancellationToken;
1414+1515+use crate::config::ServiceConfig;
1616+use crate::pipeline::identity::IdentityEvent;
1717+use crate::server::metrics::Metrics;
1818+use crate::storage::FjallDb;
1919+use crate::storage::encoding::decode_timestamped_doc;
2020+use crate::types::{CommitOp, OpType};
2121+2222+/// How old a DID document can be before we re-resolve it.
2323+const IDENTITY_REFRESH_SECS: u64 = 24 * 60 * 60;
2424+2525+/// How long without receiving an event before we log a warning.
2626+const SILENCE_WARN_SECS: u64 = 3 * 60;
2727+2828+/// How long after the silence warning before we force-disconnect.
2929+const SILENCE_DISCONNECT_SECS: u64 = 7 * 60;
3030+3131+/// Local buffer capacity between WebSocket reads and writer channel sends.
3232+/// Decouples relay reading from writer backpressure so the relay connection
3333+/// stays alive during transient writer stalls.
3434+const LOCAL_BUFFER_CAP: usize = 16384;
3535+3636+/// How often to send a client-initiated WebSocket ping to the relay.
3737+const PING_INTERVAL_SECS: u64 = 30;
3838+3939+/// Decoded event from the firehose, sent to the batch writer.
4040+#[derive(Debug)]
4141+pub enum IngestEvent {
4242+ /// A commit with record operations.
4343+ Commit {
4444+ seq: u64,
4545+ did: String,
4646+ rev: String,
4747+ since: Option<String>,
4848+ ops: Vec<CommitOp>,
4949+ blocks: Vec<u8>,
5050+ too_big: bool,
5151+ },
5252+ /// An identity change event.
5353+ Identity {
5454+ seq: u64,
5555+ did: String,
5656+ handle: Option<String>,
5757+ },
5858+ /// An account status change event.
5959+ Account {
6060+ seq: u64,
6161+ did: String,
6262+ active: bool,
6363+ status: Option<String>,
6464+ },
6565+ /// A sync (state-assertion) event indicating the current rev of a repo.
6666+ Sync { seq: u64, did: String, rev: String },
6767+}
6868+6969+impl IngestEvent {
7070+ pub fn seq(&self) -> u64 {
7171+ match self {
7272+ Self::Commit { seq, .. }
7373+ | Self::Identity { seq, .. }
7474+ | Self::Account { seq, .. }
7575+ | Self::Sync { seq, .. } => *seq,
7676+ }
7777+ }
7878+}
7979+8080+/// Run the firehose ingester task.
8181+pub async fn run_ingester(
8282+ config: Arc<ServiceConfig>,
8383+ db: Arc<FjallDb>,
8484+ tx: mpsc::Sender<IngestEvent>,
8585+ identity_tx: mpsc::Sender<IdentityEvent>,
8686+ metrics: Arc<Metrics>,
8787+ cancel: CancellationToken,
8888+) -> anyhow::Result<()> {
8989+ tracing::info!(relay = %config.relay_host, "firehose ingester started");
9090+9191+ loop {
9292+ if cancel.is_cancelled() {
9393+ break;
9494+ }
9595+9696+ let cursor = db.get_cursor().unwrap_or(None);
9797+ let url = if let Some(c) = cursor {
9898+ format!(
9999+ "wss://{}/xrpc/com.atproto.sync.subscribeRepos?cursor={}",
100100+ config.relay_host, c
101101+ )
102102+ } else {
103103+ format!(
104104+ "wss://{}/xrpc/com.atproto.sync.subscribeRepos",
105105+ config.relay_host
106106+ )
107107+ };
108108+109109+ tracing::info!(%url, "connecting to relay");
110110+111111+ let uri: http::Uri = match url.parse() {
112112+ Ok(u) => u,
113113+ Err(e) => {
114114+ tracing::error!(error = %e, "invalid relay URI");
115115+ tokio::time::sleep(std::time::Duration::from_secs(5)).await;
116116+ continue;
117117+ }
118118+ };
119119+120120+ let connect_result = tokio_websockets::ClientBuilder::from_uri(uri)
121121+ .connect()
122122+ .await;
123123+124124+ let (ws, _response) = match connect_result {
125125+ Ok(r) => r,
126126+ Err(e) => {
127127+ tracing::warn!(error = %e, "failed to connect to relay, retrying in 5s");
128128+ metrics.firehose_reconnects_total.inc();
129129+ metrics.firehose_connection_state.set(0);
130130+ tokio::time::sleep(std::time::Duration::from_secs(5)).await;
131131+ continue;
132132+ }
133133+ };
134134+135135+ metrics.firehose_connection_state.set(1);
136136+ metrics.firehose_reconnects_total.inc();
137137+ tracing::info!("connected to relay");
138138+139139+ let (mut ws_sink, mut ws_stream) = ws.split();
140140+141141+ let connected_at = Instant::now();
142142+ let mut last_event_at = Instant::now();
143143+ let mut events_this_connection: u64 = 0;
144144+ let mut silence_warned = false;
145145+ let mut buf: VecDeque<IngestEvent> = VecDeque::with_capacity(LOCAL_BUFFER_CAP);
146146+ let mut ping_interval = tokio::time::interval(Duration::from_secs(PING_INTERVAL_SECS));
147147+ ping_interval.reset(); // don't fire immediately
148148+149149+ 'connection: loop {
150150+ // Drain local buffer to writer channel (non-blocking)
151151+ while let Some(event) = buf.pop_front() {
152152+ let process_start = Instant::now();
153153+ match tx.try_send(event) {
154154+ Ok(()) => {
155155+ metrics
156156+ .ingester_process_duration_seconds
157157+ .observe(process_start.elapsed().as_secs_f64());
158158+ }
159159+ Err(mpsc::error::TrySendError::Full(event)) => {
160160+ buf.push_front(event);
161161+ break;
162162+ }
163163+ Err(mpsc::error::TrySendError::Closed(_)) => {
164164+ tracing::error!("writer channel closed");
165165+ cancel.cancel();
166166+ return Ok(());
167167+ }
168168+ }
169169+ }
170170+ metrics.ingester_buffer_depth.set(buf.len() as u64);
171171+172172+ if buf.len() >= LOCAL_BUFFER_CAP {
173173+ // Buffer full, channel full — must block on send to make progress.
174174+ // The relay connection stays alive via TCP buffers while we wait.
175175+ // Also keep sending pings so the relay doesn't consider us dead.
176176+ tokio::select! {
177177+ biased;
178178+ _ = cancel.cancelled() => {
179179+ tracing::info!("ingester cancellation requested");
180180+ break 'connection;
181181+ }
182182+ _ = ping_interval.tick() => {
183183+ if ws_sink.send(tokio_websockets::Message::ping("")).await.is_err() {
184184+ tracing::warn!("failed to send ping, disconnecting");
185185+ break 'connection;
186186+ }
187187+ }
188188+ permit = tx.reserve() => {
189189+ match permit {
190190+ Ok(permit) => {
191191+ let process_start = Instant::now();
192192+ permit.send(buf.pop_front().unwrap());
193193+ metrics
194194+ .ingester_process_duration_seconds
195195+ .observe(process_start.elapsed().as_secs_f64());
196196+ }
197197+ Err(_) => {
198198+ tracing::error!("writer channel closed");
199199+ cancel.cancel();
200200+ return Ok(());
201201+ }
202202+ }
203203+ }
204204+ }
205205+ continue;
206206+ }
207207+208208+ // Buffer has room — read from relay
209209+ let deadline = if silence_warned {
210210+ last_event_at + Duration::from_secs(SILENCE_WARN_SECS + SILENCE_DISCONNECT_SECS)
211211+ } else {
212212+ last_event_at + Duration::from_secs(SILENCE_WARN_SECS)
213213+ };
214214+215215+ tokio::select! {
216216+ biased;
217217+218218+ _ = cancel.cancelled() => {
219219+ tracing::info!("ingester cancellation requested");
220220+ break 'connection;
221221+ }
222222+223223+ _ = tokio::time::sleep_until(tokio::time::Instant::from_std(deadline)) => {
224224+ if !silence_warned {
225225+ tracing::warn!(
226226+ silence_secs = last_event_at.elapsed().as_secs(),
227227+ events_this_connection,
228228+ buf_len = buf.len(),
229229+ "no events received from relay for 3 minutes, will force-disconnect in 7 minutes",
230230+ );
231231+ silence_warned = true;
232232+ } else {
233233+ tracing::warn!(
234234+ silence_secs = last_event_at.elapsed().as_secs(),
235235+ events_this_connection,
236236+ buf_len = buf.len(),
237237+ "no events received from relay for 10 minutes, force-disconnecting",
238238+ );
239239+ break 'connection;
240240+ }
241241+ }
242242+243243+ _ = ping_interval.tick() => {
244244+ if ws_sink.send(tokio_websockets::Message::ping("")).await.is_err() {
245245+ tracing::warn!("failed to send ping, disconnecting");
246246+ break 'connection;
247247+ }
248248+ }
249249+250250+ msg = ws_stream.next() => {
251251+ let Some(result) = msg else {
252252+ tracing::warn!(
253253+ events_this_connection,
254254+ uptime_secs = connected_at.elapsed().as_secs(),
255255+ "relay connection closed (stream ended)"
256256+ );
257257+ break 'connection;
258258+ };
259259+260260+ let message = match result {
261261+ Ok(m) => m,
262262+ Err(e) => {
263263+ tracing::warn!(
264264+ error = %e,
265265+ events_this_connection,
266266+ uptime_secs = connected_at.elapsed().as_secs(),
267267+ "websocket error, disconnecting"
268268+ );
269269+ break 'connection;
270270+ }
271271+ };
272272+273273+ if message.is_pong() {
274274+ continue;
275275+ }
276276+277277+ if !message.is_binary() {
278278+ continue;
279279+ }
280280+281281+ let payload = message.into_payload();
282282+ let data: &[u8] = payload.as_ref();
283283+284284+ match decode_frame(data) {
285285+ Ok(Some(event)) => {
286286+ if event.seq() == 0 {
287287+ tracing::warn!("received event with seq 0, skipping");
288288+ last_event_at = Instant::now();
289289+ continue;
290290+ }
291291+ last_event_at = Instant::now();
292292+ silence_warned = false;
293293+ events_this_connection += 1;
294294+ metrics.firehose_events_total.inc();
295295+ let seq = event.seq();
296296+ metrics.firehose_seq.set(seq);
297297+298298+ let event_type = match &event {
299299+ IngestEvent::Commit { .. } => "#commit",
300300+ IngestEvent::Identity { .. } => "#identity",
301301+ IngestEvent::Account { .. } => "#account",
302302+ IngestEvent::Sync { .. } => "#sync",
303303+ };
304304+ metrics
305305+ .firehose_events_by_type
306306+ .get_or_create(&crate::server::metrics::EventTypeLabels {
307307+ event_type: event_type.to_string(),
308308+ })
309309+ .inc();
310310+311311+ maybe_refresh_identity(&event, &config, &db, &identity_tx).await;
312312+313313+ buf.push_back(event);
314314+ metrics.ingester_buffer_depth.set(buf.len() as u64);
315315+ }
316316+ Ok(None) => {
317317+ last_event_at = Instant::now();
318318+ silence_warned = false;
319319+ }
320320+ Err(e) => {
321321+ tracing::debug!(error = %e, "failed to decode firehose frame");
322322+ }
323323+ }
324324+ }
325325+ }
326326+ }
327327+328328+ metrics.firehose_connection_state.set(0);
329329+330330+ if cancel.is_cancelled() {
331331+ break;
332332+ }
333333+334334+ tracing::info!("reconnecting to relay in 2s");
335335+ tokio::time::sleep(std::time::Duration::from_secs(2)).await;
336336+ }
337337+338338+ tracing::info!("firehose ingester shutting down");
339339+ Ok(())
340340+}
341341+342342+/// If the event is a commit with tracked collections and the DID's identity
343343+/// hasn't been resolved in the last 24 hours, queue a resolution.
344344+async fn maybe_refresh_identity(
345345+ event: &IngestEvent,
346346+ config: &ServiceConfig,
347347+ db: &FjallDb,
348348+ identity_tx: &mpsc::Sender<IdentityEvent>,
349349+) {
350350+ let IngestEvent::Commit { did, ops, .. } = event else {
351351+ return;
352352+ };
353353+354354+ // Check if any op is for a tracked collection
355355+ let has_tracked = ops
356356+ .iter()
357357+ .any(|op| config.tracked_collections.matches(&op.collection));
358358+ if !has_tracked {
359359+ return;
360360+ }
361361+362362+ // Check if the DID document is missing or stale
363363+ let needs_refresh = match db.did_to_doc.get(did.as_bytes()) {
364364+ Ok(Some(bytes)) => {
365365+ let slice: &[u8] = &bytes;
366366+ let (ts, _) = decode_timestamped_doc(slice);
367367+ let now = SystemTime::now()
368368+ .duration_since(UNIX_EPOCH)
369369+ .unwrap_or_default()
370370+ .as_secs();
371371+ now.saturating_sub(ts) > IDENTITY_REFRESH_SECS
372372+ }
373373+ _ => true,
374374+ };
375375+376376+ if needs_refresh {
377377+ let _ = identity_tx.try_send(IdentityEvent {
378378+ did: did.clone(),
379379+ handle: None,
380380+ });
381381+ }
382382+}
383383+384384+/// Decode a firehose frame (two consecutive CBOR items: header + payload).
385385+fn decode_frame(data: &[u8]) -> anyhow::Result<Option<IngestEvent>> {
386386+ // The firehose sends two CBOR items concatenated in one binary frame.
387387+ // Use atproto-dasl's non-strict deserializer to decode them sequentially.
388388+ let mut reader = std::io::Cursor::new(data);
389389+ let mut de = atproto_dasl::drisl::de::Deserializer::non_strict(&mut reader);
390390+ let header: Ipld = serde::Deserialize::deserialize(&mut de)?;
391391+ let body: Ipld = serde::Deserialize::deserialize(&mut de)?;
392392+393393+ let Ipld::Map(ref header_map) = header else {
394394+ anyhow::bail!("header is not a map");
395395+ };
396396+397397+ let op = header_map
398398+ .get("op")
399399+ .and_then(|v| ipld_integer(v))
400400+ .unwrap_or(0);
401401+ if op != 1 {
402402+ return Ok(None); // op=-1 is error, skip
403403+ }
404404+405405+ let event_type = header_map
406406+ .get("t")
407407+ .and_then(|v| ipld_string(v))
408408+ .unwrap_or_default();
409409+410410+ let Ipld::Map(ref body_map) = body else {
411411+ anyhow::bail!("body is not a map");
412412+ };
413413+414414+ match event_type.as_str() {
415415+ "#commit" => decode_commit(body_map),
416416+ "#identity" => decode_identity(body_map),
417417+ "#account" => decode_account(body_map),
418418+ "#sync" => decode_sync(body_map),
419419+ _ => Ok(None),
420420+ }
421421+}
422422+423423+fn decode_commit(
424424+ body: &std::collections::BTreeMap<String, Ipld>,
425425+) -> anyhow::Result<Option<IngestEvent>> {
426426+ let seq = get_integer(body, "seq")
427427+ .and_then(|v| u64::try_from(v).ok())
428428+ .unwrap_or(0);
429429+ let did = get_string(body, "repo").unwrap_or_default();
430430+ let rev = get_string(body, "rev").unwrap_or_default();
431431+ let since = get_string_opt(body, "since");
432432+ let too_big = get_bool(body, "tooBig").unwrap_or(false);
433433+434434+ let blocks = get_bytes(body, "blocks").unwrap_or_default();
435435+436436+ let mut ops = Vec::new();
437437+ if let Some(Ipld::List(op_array)) = body.get("ops") {
438438+ for op_val in op_array {
439439+ if let Ipld::Map(op_map) = op_val {
440440+ let action = get_string(op_map, "action").unwrap_or_default();
441441+ let path = get_string(op_map, "path").unwrap_or_default();
442442+ let cid_bytes = get_link_bytes(op_map, "cid").or_else(|| get_bytes(op_map, "cid"));
443443+444444+ let op_type = match action.as_str() {
445445+ "create" => OpType::Create,
446446+ "update" => OpType::Update,
447447+ "delete" => OpType::Delete,
448448+ _ => continue,
449449+ };
450450+451451+ let (collection, rkey) = match path.split_once('/') {
452452+ Some((c, r)) => (c.to_string(), r.to_string()),
453453+ None => continue,
454454+ };
455455+456456+ ops.push(CommitOp {
457457+ op: op_type,
458458+ collection,
459459+ rkey,
460460+ cid: cid_bytes,
461461+ });
462462+ }
463463+ }
464464+ }
465465+466466+ Ok(Some(IngestEvent::Commit {
467467+ seq,
468468+ did,
469469+ rev,
470470+ since,
471471+ ops,
472472+ blocks,
473473+ too_big,
474474+ }))
475475+}
476476+477477+fn decode_identity(
478478+ body: &std::collections::BTreeMap<String, Ipld>,
479479+) -> anyhow::Result<Option<IngestEvent>> {
480480+ let seq = get_integer(body, "seq")
481481+ .and_then(|v| u64::try_from(v).ok())
482482+ .unwrap_or(0);
483483+ let did = get_string(body, "did").unwrap_or_default();
484484+ let handle = get_string_opt(body, "handle");
485485+486486+ Ok(Some(IngestEvent::Identity { seq, did, handle }))
487487+}
488488+489489+fn decode_account(
490490+ body: &std::collections::BTreeMap<String, Ipld>,
491491+) -> anyhow::Result<Option<IngestEvent>> {
492492+ let seq = get_integer(body, "seq")
493493+ .and_then(|v| u64::try_from(v).ok())
494494+ .unwrap_or(0);
495495+ let did = get_string(body, "did").unwrap_or_default();
496496+ let active = get_bool(body, "active").unwrap_or(true);
497497+ let status = get_string_opt(body, "status");
498498+499499+ Ok(Some(IngestEvent::Account {
500500+ seq,
501501+ did,
502502+ active,
503503+ status,
504504+ }))
505505+}
506506+507507+fn decode_sync(
508508+ body: &std::collections::BTreeMap<String, Ipld>,
509509+) -> anyhow::Result<Option<IngestEvent>> {
510510+ let seq = get_integer(body, "seq")
511511+ .and_then(|v| u64::try_from(v).ok())
512512+ .unwrap_or(0);
513513+ let did = get_string(body, "repo").unwrap_or_default();
514514+ let rev = get_string(body, "rev").unwrap_or_default();
515515+516516+ Ok(Some(IngestEvent::Sync { seq, did, rev }))
517517+}
518518+519519+// Ipld map helpers
520520+521521+fn ipld_string(v: &Ipld) -> Option<String> {
522522+ match v {
523523+ Ipld::String(s) => Some(s.clone()),
524524+ _ => None,
525525+ }
526526+}
527527+528528+fn ipld_integer(v: &Ipld) -> Option<i128> {
529529+ match v {
530530+ Ipld::Integer(i) => Some(*i),
531531+ _ => None,
532532+ }
533533+}
534534+535535+fn get_string(map: &std::collections::BTreeMap<String, Ipld>, key: &str) -> Option<String> {
536536+ map.get(key).and_then(ipld_string)
537537+}
538538+539539+fn get_string_opt(map: &std::collections::BTreeMap<String, Ipld>, key: &str) -> Option<String> {
540540+ match map.get(key) {
541541+ Some(Ipld::String(s)) => Some(s.clone()),
542542+ Some(Ipld::Null) | None => None,
543543+ _ => None,
544544+ }
545545+}
546546+547547+fn get_integer(map: &std::collections::BTreeMap<String, Ipld>, key: &str) -> Option<i128> {
548548+ map.get(key).and_then(ipld_integer)
549549+}
550550+551551+fn get_bool(map: &std::collections::BTreeMap<String, Ipld>, key: &str) -> Option<bool> {
552552+ match map.get(key) {
553553+ Some(Ipld::Bool(b)) => Some(*b),
554554+ _ => None,
555555+ }
556556+}
557557+558558+fn get_bytes(map: &std::collections::BTreeMap<String, Ipld>, key: &str) -> Option<Vec<u8>> {
559559+ match map.get(key) {
560560+ Some(Ipld::Bytes(b)) => Some(b.clone()),
561561+ _ => None,
562562+ }
563563+}
564564+565565+fn get_link_bytes(map: &std::collections::BTreeMap<String, Ipld>, key: &str) -> Option<Vec<u8>> {
566566+ match map.get(key) {
567567+ Some(Ipld::Link(cid)) => Some(cid.to_bytes()),
568568+ _ => None,
569569+ }
570570+}
+14
src/pipeline/mod.rs
···11+//! Async pipeline components for the Ramjet service.
22+//!
33+//! Three main pipelines connect via channels:
44+//! 1. **Ingester** — firehose WebSocket → mpsc channel
55+//! 2. **Writer** — mpsc channel → fjall WriteBatch → broadcast channels
66+//! 3. **Fan-out** — broadcast channels → per-client WebSocket
77+//!
88+//! Plus supporting workers for identity resolution and backfill.
99+1010+pub mod backfill;
1111+pub mod fanout;
1212+pub mod identity;
1313+pub mod ingester;
1414+pub mod writer;
+649
src/pipeline/writer.rs
···11+//! Batch writer pipeline.
22+//!
33+//! Drains the ingester channel in batches, constructs fjall WriteBatch
44+//! operations with atomic writes across keyspaces, and publishes events
55+//! to the fan-out broadcast channels.
66+77+use std::collections::HashMap;
88+use std::sync::Arc;
99+1010+use tokio::sync::mpsc;
1111+use tokio_util::sync::CancellationToken;
1212+1313+use crate::config::ServiceConfig;
1414+use crate::pipeline::fanout::FanOutChannels;
1515+use crate::pipeline::ingester::IngestEvent;
1616+use crate::server::metrics::{
1717+ CollectionOpLabels, EventOp, FanoutPriority, FanoutPriorityLabels, KeyspaceOpLabels, Metrics,
1818+ StorageOp,
1919+};
2020+use crate::storage::FjallDb;
2121+use atproto_dasl::Ipld;
2222+use bytes::Bytes;
2323+2424+use crate::server::reconciliation;
2525+use crate::storage::encoding::{self, CompactOp, ipld_map, to_dag_cbor_bytes};
2626+use crate::storage::keys;
2727+use crate::types::{
2828+ AccountStatus, CommitOp, EventType, FanOutEvent, OpType, RecordValue, RepoState,
2929+ SharedFanOutEvent,
3030+};
3131+3232+/// Current time in microseconds since Unix epoch.
3333+fn now_micros() -> u64 {
3434+ std::time::SystemTime::now()
3535+ .duration_since(std::time::UNIX_EPOCH)
3636+ .unwrap_or_default()
3737+ .as_micros() as u64
3838+}
3939+4040+/// Run the batch writer task.
4141+pub async fn run_writer(
4242+ config: Arc<ServiceConfig>,
4343+ db: Arc<FjallDb>,
4444+ mut rx: mpsc::Receiver<IngestEvent>,
4545+ fanout: Arc<FanOutChannels>,
4646+ metrics: Arc<Metrics>,
4747+ cancel: CancellationToken,
4848+) -> anyhow::Result<()> {
4949+ tracing::info!(
5050+ batch_size = config.batch_size,
5151+ batch_timeout_ms = config.batch_timeout_ms,
5252+ "batch writer started"
5353+ );
5454+5555+ let batch_timeout = std::time::Duration::from_millis(config.batch_timeout_ms);
5656+5757+ loop {
5858+ let mut events = Vec::with_capacity(config.batch_size);
5959+6060+ tokio::select! {
6161+ biased;
6262+ _ = cancel.cancelled() => break,
6363+ event = rx.recv() => {
6464+ match event {
6565+ Some(e) => events.push(e),
6666+ None => break,
6767+ }
6868+ }
6969+ }
7070+7171+ let deadline = tokio::time::Instant::now() + batch_timeout;
7272+ while events.len() < config.batch_size {
7373+ tokio::select! {
7474+ biased;
7575+ _ = cancel.cancelled() => break,
7676+ _ = tokio::time::sleep_until(deadline) => break,
7777+ event = rx.recv() => {
7878+ match event {
7979+ Some(e) => events.push(e),
8080+ None => break,
8181+ }
8282+ }
8383+ }
8484+ }
8585+8686+ if events.is_empty() {
8787+ continue;
8888+ }
8989+9090+ let batch_len = events.len();
9191+ let db2 = db.clone();
9292+ let config2 = config.clone();
9393+ let metrics2 = metrics.clone();
9494+ let result =
9595+ tokio::task::spawn_blocking(move || process_batch(&db2, &config2, &metrics2, &events))
9696+ .await;
9797+9898+ let pending = match result {
9999+ Ok(Ok(pending)) => pending,
100100+ Ok(Err(e)) => {
101101+ tracing::error!(error = %e, "batch write failed");
102102+ continue;
103103+ }
104104+ Err(e) => {
105105+ tracing::error!(error = %e, "spawn_blocking panicked");
106106+ continue;
107107+ }
108108+ };
109109+110110+ // Fan-out: broadcast sends are non-blocking and fast, keep in async context
111111+ for (fan_event, is_high) in pending {
112112+ if is_high {
113113+ let _ = fanout.high_priority_tx.send(fan_event.clone());
114114+ } else {
115115+ let _ = fanout.low_priority_tx.send(fan_event.clone());
116116+ }
117117+ fanout.send_partitioned(&fan_event);
118118+ metrics.writer_events_fanned_total.inc();
119119+ }
120120+121121+ metrics
122122+ .fanout_queue_depth
123123+ .get_or_create(&FanoutPriorityLabels {
124124+ priority: FanoutPriority::High,
125125+ })
126126+ .set(fanout.high_priority_tx.len() as u64);
127127+ metrics
128128+ .fanout_queue_depth
129129+ .get_or_create(&FanoutPriorityLabels {
130130+ priority: FanoutPriority::Low,
131131+ })
132132+ .set(fanout.low_priority_tx.len() as u64);
133133+134134+ metrics.writer_batch_size.observe(batch_len as f64);
135135+ metrics.writer_batches_total.inc();
136136+ }
137137+138138+ tracing::info!("batch writer shutting down");
139139+ Ok(())
140140+}
141141+142142+fn process_batch(
143143+ db: &FjallDb,
144144+ config: &ServiceConfig,
145145+ metrics: &Metrics,
146146+ events: &[IngestEvent],
147147+) -> anyhow::Result<Vec<(SharedFanOutEvent, bool)>> {
148148+ let mut batch = db.batch();
149149+ let mut max_seq: u64 = 0;
150150+ let mut pending_fanout: Vec<(SharedFanOutEvent, bool)> = Vec::new();
151151+ let mut repo_states: HashMap<String, RepoState> = HashMap::new();
152152+ let mut riblt_invalidated: std::collections::HashSet<String> = std::collections::HashSet::new();
153153+154154+ // Helper: get repo state from in-batch cache, falling back to db.
155155+ let get_repo_state =
156156+ |did: &str, cache: &HashMap<String, RepoState>, db: &FjallDb| -> RepoState {
157157+ if let Some(state) = cache.get(did) {
158158+ return state.clone();
159159+ }
160160+ match db.get_repo_state(did) {
161161+ Ok(Some(rs)) => rs,
162162+ Ok(None) | Err(_) => Default::default(),
163163+ }
164164+ };
165165+166166+ for event in events {
167167+ match event {
168168+ IngestEvent::Commit {
169169+ seq,
170170+ did,
171171+ rev,
172172+ since,
173173+ ops,
174174+ blocks,
175175+ too_big,
176176+ } => {
177177+ max_seq = max_seq.max(*seq);
178178+179179+ let state = get_repo_state(did, &repo_states, db);
180180+ if state.denied {
181181+ metrics.writer_commits_denied.inc();
182182+ continue;
183183+ }
184184+185185+ if ops.is_empty() {
186186+ metrics.writer_commits_empty_ops.inc();
187187+ }
188188+189189+ // Parse blocks CAR once to extract record data by CID
190190+ let block_map = parse_blocks_car(blocks);
191191+192192+ // Collect filtered ops with routing info
193193+ struct FilteredOp {
194194+ op: CommitOp,
195195+ is_tracked: bool,
196196+ data: Vec<u8>,
197197+ }
198198+199199+ let mut filtered_ops: Vec<FilteredOp> = Vec::new();
200200+ let mut has_tracked_op = false;
201201+202202+ for op in ops {
203203+ let is_tracked = config.tracked_collections.matches(&op.collection);
204204+ let is_forwarded = config.forward_collections.matches(&op.collection);
205205+206206+ if !is_tracked && !is_forwarded {
207207+ continue;
208208+ }
209209+210210+ if is_tracked {
211211+ has_tracked_op = true;
212212+ riblt_invalidated.insert(did.clone());
213213+214214+ let record_key =
215215+ keys::encode_record_key(did, &op.collection, &op.rkey, rev);
216216+217217+ match op.op {
218218+ OpType::Create | OpType::Update => {
219219+ let cid_bytes = op.cid.as_deref().unwrap_or_default().to_vec();
220220+ let data = block_map.get(&cid_bytes).cloned().unwrap_or_default();
221221+ let record_value = RecordValue {
222222+ cid: cid_bytes,
223223+ data,
224224+ };
225225+ let compressed_record = db.compress_event(&record_value.encode());
226226+ batch.insert(&db.records, &record_key, compressed_record);
227227+ metrics
228228+ .storage_ops_total
229229+ .get_or_create(&KeyspaceOpLabels {
230230+ keyspace: "records".to_string(),
231231+ op: StorageOp::Write,
232232+ })
233233+ .inc();
234234+ }
235235+ OpType::Delete => {
236236+ batch.insert(&db.records, &record_key, b"");
237237+ metrics
238238+ .storage_ops_total
239239+ .get_or_create(&KeyspaceOpLabels {
240240+ keyspace: "records".to_string(),
241241+ op: StorageOp::Delete,
242242+ })
243243+ .inc();
244244+ }
245245+ }
246246+247247+ let event_op = match op.op {
248248+ OpType::Create => EventOp::Create,
249249+ OpType::Update => EventOp::Update,
250250+ OpType::Delete => EventOp::Delete,
251251+ };
252252+ metrics
253253+ .writer_ops_by_collection
254254+ .get_or_create(&CollectionOpLabels {
255255+ collection: op.collection.clone(),
256256+ op: event_op,
257257+ })
258258+ .inc();
259259+ }
260260+261261+ let data = op
262262+ .cid
263263+ .as_ref()
264264+ .and_then(|cid_bytes| block_map.get(cid_bytes.as_slice()))
265265+ .cloned()
266266+ .unwrap_or_default();
267267+268268+ filtered_ops.push(FilteredOp {
269269+ op: op.clone(),
270270+ is_tracked,
271271+ data,
272272+ });
273273+ }
274274+275275+ // Update repo_state using in-batch cache.
276276+ let mut repo_state = get_repo_state(did, &repo_states, db);
277277+278278+ // Detect desync: if `since` doesn't match our stored rev, we
279279+ // missed intermediate commits and need to re-backfill.
280280+ let is_desynced = if let Some(s) = since {
281281+ !repo_state.rev.is_empty() && s != &repo_state.rev
282282+ } else {
283283+ false
284284+ };
285285+286286+ let prev_rev = repo_state.rev.clone();
287287+ repo_state.rev = rev.clone();
288288+ let needs_backfill = has_tracked_op && !repo_state.backfilled;
289289+ let needs_resync = has_tracked_op && repo_state.backfilled && is_desynced;
290290+ if needs_resync {
291291+ repo_state.backfilled = false;
292292+ tracing::info!(
293293+ %did,
294294+ since = since.as_deref().unwrap_or(""),
295295+ stored_rev = %prev_rev,
296296+ "commit desync detected, queueing re-backfill"
297297+ );
298298+ }
299299+ batch.insert(&db.repo_state, did.as_bytes(), repo_state.encode());
300300+ repo_states.insert(did.clone(), repo_state);
301301+ metrics
302302+ .storage_ops_total
303303+ .get_or_create(&KeyspaceOpLabels {
304304+ keyspace: "repo_state".to_string(),
305305+ op: StorageOp::Write,
306306+ })
307307+ .inc();
308308+309309+ if *too_big || needs_backfill || needs_resync {
310310+ let queue_key = format!("backfill_queue\x00{did}");
311311+ batch.insert(&db.meta, queue_key.as_bytes(), b"");
312312+ }
313313+314314+ if filtered_ops.is_empty() && !ops.is_empty() {
315315+ metrics.writer_commits_filtered.inc();
316316+ }
317317+318318+ // Emit one event per filtered op (granular fan-out)
319319+ let time_us = now_micros();
320320+ for fop in &filtered_ops {
321321+ let event_seq = db.next_event_seq();
322322+323323+ let compact_op = CompactOp {
324324+ action: fop.op.op,
325325+ collection: fop.op.collection.clone(),
326326+ rkey: fop.op.rkey.clone(),
327327+ cid: fop.op.cid.clone(),
328328+ data: fop.data.clone(),
329329+ };
330330+331331+ let compact_event = encoding::encode_compact_commit_op(
332332+ event_seq,
333333+ time_us,
334334+ did,
335335+ rev,
336336+ &compact_op,
337337+ );
338338+ let compressed = db.compress_event(&compact_event);
339339+ let event_key = keys::encode_event_key(event_seq);
340340+ batch.insert(&db.events, &event_key, &compressed);
341341+ metrics
342342+ .storage_ops_total
343343+ .get_or_create(&KeyspaceOpLabels {
344344+ keyspace: "events".to_string(),
345345+ op: StorageOp::Write,
346346+ })
347347+ .inc();
348348+349349+ let payload =
350350+ create_commit_op_payload(event_seq, time_us, did, rev, &fop.op, &block_map);
351351+ let fan_event = Arc::new(FanOutEvent {
352352+ seq: event_seq,
353353+ did: did.as_str().into(),
354354+ event_type: EventType::Commit {
355355+ collection: fop.op.collection.clone(),
356356+ },
357357+ payload,
358358+ });
359359+360360+ pending_fanout.push((fan_event, fop.is_tracked));
361361+ }
362362+ }
363363+364364+ IngestEvent::Identity { seq, did, handle } => {
365365+ max_seq = max_seq.max(*seq);
366366+367367+ if let Some(h) = handle {
368368+ batch.insert(
369369+ &db.handle_to_did,
370370+ h.to_lowercase().as_bytes(),
371371+ did.as_bytes(),
372372+ );
373373+ metrics
374374+ .storage_ops_total
375375+ .get_or_create(&KeyspaceOpLabels {
376376+ keyspace: "handle_to_did".to_string(),
377377+ op: StorageOp::Write,
378378+ })
379379+ .inc();
380380+ }
381381+382382+ let event_seq = db.next_event_seq();
383383+ let time_us = now_micros();
384384+385385+ let compact_event = encoding::encode_compact_identity_v2(
386386+ event_seq,
387387+ time_us,
388388+ did,
389389+ handle.as_deref(),
390390+ );
391391+ let compressed = db.compress_event(&compact_event);
392392+ let event_key = keys::encode_event_key(event_seq);
393393+ batch.insert(&db.events, &event_key, &compressed);
394394+ metrics
395395+ .storage_ops_total
396396+ .get_or_create(&KeyspaceOpLabels {
397397+ keyspace: "events".to_string(),
398398+ op: StorageOp::Write,
399399+ })
400400+ .inc();
401401+402402+ let payload = create_identity_payload(event_seq, time_us, did, handle.as_deref());
403403+ let fan_event = Arc::new(FanOutEvent {
404404+ seq: event_seq,
405405+ did: did.as_str().into(),
406406+ event_type: EventType::Identity,
407407+ payload,
408408+ });
409409+410410+ pending_fanout.push((fan_event, true));
411411+ }
412412+413413+ IngestEvent::Account {
414414+ seq,
415415+ did,
416416+ active,
417417+ status,
418418+ ..
419419+ } => {
420420+ max_seq = max_seq.max(*seq);
421421+422422+ let mut repo_state = get_repo_state(did, &repo_states, db);
423423+ repo_state.status = if *active {
424424+ AccountStatus::Active
425425+ } else {
426426+ match status.as_deref() {
427427+ Some("deactivated") => AccountStatus::Deactivated,
428428+ Some("suspended") => AccountStatus::Suspended,
429429+ Some("deleted") => AccountStatus::Deleted,
430430+ Some("takendown") => AccountStatus::Takendown,
431431+ _ => AccountStatus::Deactivated,
432432+ }
433433+ };
434434+ batch.insert(&db.repo_state, did.as_bytes(), repo_state.encode());
435435+ repo_states.insert(did.clone(), repo_state);
436436+ metrics
437437+ .storage_ops_total
438438+ .get_or_create(&KeyspaceOpLabels {
439439+ keyspace: "repo_state".to_string(),
440440+ op: StorageOp::Write,
441441+ })
442442+ .inc();
443443+444444+ let event_seq = db.next_event_seq();
445445+ let time_us = now_micros();
446446+447447+ let compact_event = encoding::encode_compact_account_v2(
448448+ event_seq,
449449+ time_us,
450450+ did,
451451+ *active,
452452+ status.as_deref(),
453453+ );
454454+ let compressed = db.compress_event(&compact_event);
455455+ let event_key = keys::encode_event_key(event_seq);
456456+ batch.insert(&db.events, &event_key, &compressed);
457457+ metrics
458458+ .storage_ops_total
459459+ .get_or_create(&KeyspaceOpLabels {
460460+ keyspace: "events".to_string(),
461461+ op: StorageOp::Write,
462462+ })
463463+ .inc();
464464+465465+ let payload =
466466+ create_account_payload(event_seq, time_us, did, *active, status.as_deref());
467467+ let fan_event = Arc::new(FanOutEvent {
468468+ seq: event_seq,
469469+ did: did.as_str().into(),
470470+ event_type: EventType::Account,
471471+ payload,
472472+ });
473473+474474+ pending_fanout.push((fan_event, true));
475475+ }
476476+477477+ IngestEvent::Sync { seq, did, rev } => {
478478+ max_seq = max_seq.max(*seq);
479479+480480+ let repo_state = get_repo_state(did, &repo_states, db);
481481+482482+ // Detect desync: if our stored rev doesn't match the sync
483483+ // event's rev, we're out of date and need to re-backfill.
484484+ if !repo_state.rev.is_empty() && repo_state.rev != *rev {
485485+ let stored_rev = repo_state.rev.clone();
486486+ let queue_key = format!("backfill_queue\x00{did}");
487487+ batch.insert(&db.meta, queue_key.as_bytes(), b"");
488488+489489+ // Mark as not backfilled so the backfill worker will
490490+ // re-process it.
491491+ let mut updated_state = repo_state;
492492+ updated_state.backfilled = false;
493493+ batch.insert(&db.repo_state, did.as_bytes(), updated_state.encode());
494494+ repo_states.insert(did.clone(), updated_state);
495495+ metrics
496496+ .storage_ops_total
497497+ .get_or_create(&KeyspaceOpLabels {
498498+ keyspace: "repo_state".to_string(),
499499+ op: StorageOp::Write,
500500+ })
501501+ .inc();
502502+503503+ tracing::info!(
504504+ %did,
505505+ %stored_rev,
506506+ sync_rev = %rev,
507507+ "sync desync detected, queueing re-backfill"
508508+ );
509509+ }
510510+ }
511511+ }
512512+ }
513513+514514+ if max_seq > 0 {
515515+ batch.insert(&db.meta, b"cursor", &max_seq.to_be_bytes());
516516+ }
517517+ batch.insert(&db.meta, b"event_seq", &db.current_sequence().to_be_bytes());
518518+519519+ // COMMIT — only return fan-out events after persistence succeeds
520520+ batch.commit()?;
521521+522522+ // Invalidate RIBLT sketch caches for DIDs with tracked record changes.
523523+ for did in &riblt_invalidated {
524524+ reconciliation::invalidate_sketch_cache(db, did);
525525+ }
526526+527527+ Ok(pending_fanout)
528528+}
529529+530530+/// Create the DAG-CBOR payload for a single commit operation.
531531+fn create_commit_op_payload(
532532+ seq: u64,
533533+ _time_us: u64,
534534+ did: &str,
535535+ rev: &str,
536536+ op: &CommitOp,
537537+ block_map: &HashMap<Vec<u8>, Vec<u8>>,
538538+) -> Bytes {
539539+ let operation = match op.op {
540540+ OpType::Create => "create",
541541+ OpType::Update => "update",
542542+ OpType::Delete => "delete",
543543+ };
544544+545545+ let mut commit_fields: Vec<(&str, Ipld)> = vec![
546546+ ("rev", Ipld::String(rev.to_string())),
547547+ ("operation", Ipld::String(operation.to_string())),
548548+ ("collection", Ipld::String(op.collection.clone())),
549549+ ("rkey", Ipld::String(op.rkey.clone())),
550550+ ];
551551+552552+ // Deletes omit record and cid
553553+ if op.op != OpType::Delete {
554554+ if let Some(cid_bytes) = &op.cid {
555555+ let cid_str = cid::Cid::read_bytes(cid_bytes.as_slice())
556556+ .map(|c| c.to_string())
557557+ .unwrap_or_default();
558558+ commit_fields.push(("cid", Ipld::String(cid_str)));
559559+560560+ if let Some(data) = block_map.get(cid_bytes.as_slice()) {
561561+ if let Ok(ipld) = atproto_dasl::drisl::from_slice::<Ipld>(data) {
562562+ commit_fields.push(("record", ipld));
563563+ }
564564+ }
565565+ }
566566+ }
567567+568568+ let event = ipld_map(vec![
569569+ ("seq", Ipld::Integer(seq.into())),
570570+ ("did", Ipld::String(did.to_string())),
571571+ ("kind", Ipld::String("commit".to_string())),
572572+ ("commit", ipld_map(commit_fields)),
573573+ ]);
574574+575575+ to_dag_cbor_bytes(&event).into()
576576+}
577577+578578+/// Create the DAG-CBOR payload for an identity event.
579579+fn create_identity_payload(seq: u64, _time_us: u64, did: &str, handle: Option<&str>) -> Bytes {
580580+ let mut identity_fields: Vec<(&str, Ipld)> = Vec::new();
581581+ if let Some(h) = handle {
582582+ identity_fields.push(("handle", Ipld::String(h.to_string())));
583583+ }
584584+585585+ let event = ipld_map(vec![
586586+ ("seq", Ipld::Integer(seq.into())),
587587+ ("did", Ipld::String(did.to_string())),
588588+ ("kind", Ipld::String("identity".to_string())),
589589+ ("identity", ipld_map(identity_fields)),
590590+ ]);
591591+592592+ to_dag_cbor_bytes(&event).into()
593593+}
594594+595595+/// Create the DAG-CBOR payload for an account event.
596596+fn create_account_payload(
597597+ seq: u64,
598598+ _time_us: u64,
599599+ did: &str,
600600+ active: bool,
601601+ status: Option<&str>,
602602+) -> Bytes {
603603+ let mut account_fields: Vec<(&str, Ipld)> = vec![("active", Ipld::Bool(active))];
604604+ if let Some(s) = status {
605605+ account_fields.push(("status", Ipld::String(s.to_string())));
606606+ }
607607+608608+ let event = ipld_map(vec![
609609+ ("seq", Ipld::Integer(seq.into())),
610610+ ("did", Ipld::String(did.to_string())),
611611+ ("kind", Ipld::String("account".to_string())),
612612+ ("account", ipld_map(account_fields)),
613613+ ]);
614614+615615+ to_dag_cbor_bytes(&event).into()
616616+}
617617+618618+/// Parse a blocks CAR slice into a map of CID bytes → record data.
619619+///
620620+/// The firehose sends record data inside a CAR (Content-Addressable aRchive)
621621+/// in the `blocks` field of `#commit` events. Each block's CID corresponds
622622+/// to an op's CID, and its data is the DAG-CBOR encoded record.
623623+fn parse_blocks_car(blocks: &[u8]) -> HashMap<Vec<u8>, Vec<u8>> {
624624+ let mut map = HashMap::new();
625625+ if blocks.is_empty() {
626626+ return map;
627627+ }
628628+629629+ let mut cursor = std::io::Cursor::new(blocks);
630630+631631+ // Skip the CAR header
632632+ if atproto_dasl::CarHeader::read_from(&mut cursor).is_err() {
633633+ return map;
634634+ }
635635+636636+ // Read all blocks
637637+ loop {
638638+ match atproto_dasl::CarBlock::read_from(&mut cursor) {
639639+ Ok(Some(block)) => {
640640+ let cid_bytes = block.cid.to_bytes();
641641+ map.insert(cid_bytes, block.data);
642642+ }
643643+ Ok(None) => break,
644644+ Err(_) => break,
645645+ }
646646+ }
647647+648648+ map
649649+}