···11-# hydrant
22-33-`hydrant` is an AT Protocol indexer built on the `fjall` database. it's built to be flexible, supporting both full-network indexing and filtered indexing (e.g., by DID), allowing querying with XRPCs (not only `com.atproto.*`!), providing an ordered event stream, etc. oh and it can also act as a relay!
44-55-you can see [random.wisp.place](https://tangled.org/did:plc:dfl62fgb7wtjj3fcbb72naae/random.wisp.place) (standalone binary using http API) or the [statusphere example](../examples/statusphere.rs) (hydrant-as-library) for examples. for rust docs look at https://hydrant.klbr.net/ for now.
66-77-**WARNING: *the db format is only partially stable.*** we provide migrations in hydrant itself, so nothing should go wrong! you should still probably keep backups just in case!
88-99-## what's here
1010-1111-- [getting started](getting-started.md): building, running, reverse proxying
1212-- [configuration](configuration.md): all environment variables
1313-- [build features](build-features.md): optional cargo features (`relay`, `backlinks`, etc.)
1414-- [concepts](concepts/README.md): how the stream works, relay comparison, multi-relay support
1515-- [rest api](api/README.md): management API reference
1616-- [xrpc](xrpc/README.md): data access via XRPC
1717-1818-## quick start
1919-2020-```bash
2121-cargo build --release
2222-export HYDRANT_DATABASE_PATH=./hydrant.db
2323-./target/release/hydrant
2424-```
11+`hydrant` is an AT Protocol indexer built on the `fjall` database. it's built to be flexible, supporting both full-network indexing and filtered indexing (e.g., by DID), allowing querying with XRPCs (not only `com.atproto.*`!), providing an ordered event stream, etc. oh and it can also act as a relay!
22+33+you can see [random.wisp.place](https://tangled.org/did:plc:dfl62fgb7wtjj3fcbb72naae/random.wisp.place) (standalone binary using http API) or the [statusphere example](../examples/statusphere.rs) (hydrant-as-library) for examples. for rust docs look at https://hydrant.klbr.net/ for now.
44+55+**WARNING: *the db format is only partially stable.*** we provide migrations in hydrant itself, so nothing should go wrong! you should still probably keep backups just in case!
66+77+## what's here
88+99+- [getting started](getting-started.md): building, running, reverse proxying
1010+- [configuration](configuration.md): all environment variables
1111+- [build features](build-features.md): optional cargo features (`relay`, `backlinks`, etc.)
1212+- [concepts](concepts/README.md): how the stream works, relay comparison, multi-relay support
1313+- [rest api](api/README.md): management API reference
1414+- [xrpc](xrpc/README.md): data access via XRPC
1515+1616+## quick start
1717+1818+```bash
1919+cargo build --release
2020+export HYDRANT_DATABASE_PATH=./hydrant.db
2121+./target/release/hydrant
2222+```
+18-3
docs/api/crawler.md
···8899## POST /crawler/sources
10101111-add a crawler source at runtime. body: `{ "url": string, "mode": "relay" | "by_collection" }`.
1111+add a crawler source at runtime.
1212+1313+| field | description |
1414+| :--- | :--- |
1515+| `url` | URL of the crawler source |
1616+| `mode` | `"relay"` or `"by_collection"` |
12171318the source is written to the database before the producer task is started, so it is safe to add sources and then immediately restart without losing them.
1419···18231924## DELETE /crawler/sources
20252121-remove a crawler source at runtime. body: `{ "url": string }`.
2626+remove a crawler source at runtime.
2727+2828+| field | description |
2929+| :--- | :--- |
3030+| `url` | URL of the source to remove |
22312332the producer task is stopped immediately.
2433···30393140## DELETE /crawler/cursors
32413333-reset stored cursors for a given crawler URL. body: `{ "key": "..." }` where key is a URL. clears the list-repos crawler cursor as well as any by-collection cursors associated with that URL. causes the next crawler pass to restart from the beginning.
4242+reset stored cursors for a given crawler URL.
4343+4444+| field | description |
4545+| :--- | :--- |
4646+| `key` | URL of the crawler source to reset |
4747+4848+clears the list-repos crawler cursor as well as any by-collection cursors associated with that URL. causes the next crawler pass to restart from the beginning.
+18-3
docs/api/firehose.md
···8899## POST /firehose/sources
10101111-add a firehose source at runtime. body: `{ "url": string, "is_pds": bool }`. `is_pds` defaults to `false`.
1111+add a firehose source at runtime.
1212+1313+| field | description |
1414+| :--- | :--- |
1515+| `url` | URL of the firehose source |
1616+| `is_pds` | whether the source is a direct PDS connection (default `false`) |
12171318the source is persisted to the database before the ingestor task is started.
1419···18231924## DELETE /firehose/sources
20252121-remove a firehose relay at runtime. body: `{ "url": string }`.
2626+remove a firehose relay at runtime.
2727+2828+| field | description |
2929+| :--- | :--- |
3030+| `url` | URL of the source to remove |
22312332the ingestor task is stopped immediately.
2433···30393140## DELETE /firehose/cursors
32413333-reset the stored cursor for a given firehose relay URL. body: `{ "key": "..." }` where key is a URL. causes the next firehose connection to restart from the beginning.
4242+reset the stored cursor for a given firehose relay URL.
4343+4444+| field | description |
4545+| :--- | :--- |
4646+| `key` | URL of the firehose source to reset |
4747+4848+causes the next firehose connection to restart from the beginning.
+7-1
docs/api/ingestion.md
···6677## PATCH /ingestion
8899-enable or disable ingestion components at runtime without restarting. body: `{ "crawler"?: bool, "firehose"?: bool, "backfill"?: bool }`. only provided fields are updated.
99+enable or disable ingestion components at runtime without restarting. only provided fields are updated.
1010+1111+| field | description |
1212+| :--- | :--- |
1313+| `crawler` | enable or disable the crawler |
1414+| `firehose` | enable or disable the firehose |
1515+| `backfill` | enable or disable the backfill worker |
10161117when disabled, each component finishes its current task before pausing (e.g. the backfill worker completes any in-flight repo syncs, the firehose finishes processing the current message). they resume immediately when re-enabled.
+10-3
docs/api/pds.md
···25252626## PUT /pds/tiers
27272828-assign a PDS to a named rate tier. body: `{ "host": string, "tier": string }`.
2828+assign a PDS to a named rate tier.
29293030-`host` is the PDS hostname (e.g. `pds.example.com`). `tier` must be one of the configured tier names; returns `400` if unknown.
3030+| field | description |
3131+| :--- | :--- |
3232+| `host` | PDS hostname (e.g. `pds.example.com`) |
3333+| `tier` | name of the rate tier to assign; returns `400` if unknown |
31343235assignments are persisted to the database and survive restarts. re-assigning the same host updates the tier in place without creating a duplicate.
33363437## DELETE /pds/tiers
35383636-remove an explicit tier assignment for a PDS. query parameter: `?host=<hostname>` (e.g. `?host=pds.example.com`).
3939+remove an explicit tier assignment for a PDS. query parameter:
4040+4141+| param | description |
4242+| :--- | :--- |
4343+| `host` | PDS hostname (e.g. `?host=pds.example.com`) |
37443845reverts the host to glob-rule resolution (not necessarily `default`; a matching `TIER_RULES` pattern still applies).
3946
+20-20
docs/configuration.md
···77| variable | default | description |
88| :--- | :--- | :--- |
99| `DATABASE_PATH` | `./hydrant.db` | path to the database folder |
1010-| `RUST_LOG` | `info` | log filter ([tracing env-filter syntax](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html)) |
1010+| `RUST_LOG` | `info` | log filter directives (e.g., `debug`, `hydrant=trace`). [tracing env-filter syntax](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html) |
1111| `API_PORT` | `3000` | port for the API server |
1212| `ENABLE_DEBUG` | `false` | enable debug endpoints |
1313-| `DEBUG_PORT` | `API_PORT + 1` | port for debug endpoints |
1313+| `DEBUG_PORT` | `API_PORT + 1` | port for debug endpoints (if enabled) |
14141515## indexing mode
16161717| variable | default | description |
1818| :--- | :--- | :--- |
1919| `FULL_NETWORK` | `false` (indexer), `true` (relay) | if `true`, discover and index all repos in the network |
2020-| `EPHEMERAL` | `false` (indexer), `true` (relay) | if enabled, no records are stored; events are deleted after `EPHEMERAL_TTL` |
2020+| `EPHEMERAL` | `false` (indexer), `true` (relay) | if enabled, no records are stored (in indexer mode). events are deleted after a certain duration (`EPHEMERAL_TTL`) |
2121| `EPHEMERAL_TTL` | `60min`, `3d` (relay) | how long to keep events before deletion |
2222-| `ONLY_INDEX_LINKS` | `false` | don't store record blocks, only the index. `getRecord`, `listRecords`, and `getRepo` will fail; the event stream still works but create/update events won't include record values |
2222+| `ONLY_INDEX_LINKS` | `false` | indexer only. if enabled, record blocks are not stored, only the index (records, counts, events) is kept. `getRecord`, `listRecords`, and `getRepo` will return errors. the event stream still works but create/update events will not include record values |
23232424## filter
25252626| variable | default | description |
2727| :--- | :--- | :--- |
2828-| `FILTER_SIGNALS` | | comma-separated NSID patterns triggering auto-discovery in filter mode (e.g. `app.bsky.feed.post,app.bsky.graph.*`) |
2929-| `FILTER_COLLECTIONS` | | comma-separated NSID patterns limiting which records are stored. empty = store all |
3030-| `FILTER_EXCLUDES` | | comma-separated DIDs to always skip |
2828+| `FILTER_SIGNALS` | | comma-separated list of NSID patterns to use for the filter (e.g. `app.bsky.feed.post,app.bsky.graph.*`) |
2929+| `FILTER_COLLECTIONS` | | comma-separated list of NSID patterns to use for the collections filter. empty = store all |
3030+| `FILTER_EXCLUDES` | | comma-separated list of DIDs to exclude from indexing |
31313232## firehose
33333434| variable | default | description |
3535| :--- | :--- | :--- |
3636-| `RELAY_HOST` | `wss://relay.fire.hose.cam/` (indexer), empty (relay) | single firehose source URL |
3737-| `RELAY_HOSTS` | | comma-separated firehose sources. prefix with `pds::` for direct PDS connections. overrides `RELAY_HOST` |
3838-| `SEED_HOSTS` | `https://bsky.network` (relay) | relay URLs to call `com.atproto.sync.listHosts` on at startup, adding every non-banned PDS as a firehose source |
3636+| `RELAY_HOST` | `wss://relay.fire.hose.cam/` (indexer), empty (relay) | URL of a single firehose source |
3737+| `RELAY_HOSTS` | | comma-separated list of firehose sources. if unset, falls back to `RELAY_HOST`. prefix a URL with `pds::` to mark it as a direct PDS connection (e.g. `pds::wss://pds.example.com`). bare URLs are treated as relays. defaults to empty in relay mode; PDS' are expected to be seeded via `SEED_HOSTS` or the firehose management API |
3838+| `SEED_HOSTS` | `https://bsky.network` (relay) | comma-separated list of base URLs to call `com.atproto.sync.listHosts` on at startup. hydrant adds every non-banned host as a PDS firehose source |
3939| `ENABLE_FIREHOSE` | `true` | whether to ingest relay subscriptions |
4040-| `FIREHOSE_WORKERS` | `8` (`24` full network) | concurrent workers for firehose events |
4040+| `FIREHOSE_WORKERS` | `8` (`24` full network) | number of concurrent workers for firehose events |
4141| `CURSOR_SAVE_INTERVAL` | `3sec` | how often to persist the firehose cursor |
42424343## crawler
44444545| variable | default | description |
4646| :--- | :--- | :--- |
4747-| `CRAWLER_URLS` | relay hosts (full network), `https://lightrail.microcosm.blue` (filter) | comma-separated `[mode::]url` crawler sources |
4848-| `ENABLE_CRAWLER` | `true` if full network or sources configured | whether to actively query the network |
4747+| `CRAWLER_URLS` | relay hosts (full network), `https://lightrail.microcosm.blue` (filter) | comma-separated list of `[mode::]url` crawler sources. mode is `relay` or `by_collection`; bare URLs use the default mode. set to empty string to disable crawling |
4848+| `ENABLE_CRAWLER` | `true` if full network or sources configured | whether to actively query the network for unknown repositories |
4949| `CRAWLER_MAX_PENDING_REPOS` | `2000` | max pending repos before the crawler pauses |
5050| `CRAWLER_RESUME_PENDING_REPOS` | `1000` | pending-repo count at which the crawler resumes |
5151···53535454| variable | default | description |
5555| :--- | :--- | :--- |
5656-| `BACKFILL_CONCURRENCY_LIMIT` | `16` (`64` full network) | max concurrent backfill tasks |
5656+| `BACKFILL_CONCURRENCY_LIMIT` | `16` (`64` full network) | maximum number of concurrent backfill tasks |
5757| `REPO_FETCH_TIMEOUT` | `5min` | timeout for fetching a repository |
5858-| `VERIFY_SIGNATURES` | `full` | signature verification: `full`, `backfill-only`, or `none` |
5959-| `PLC_URL` | `https://plc.wtf`, `https://plc.directory` (full network) | PLC directory base URL(s), comma-separated |
5858+| `VERIFY_SIGNATURES` | `full` | signature verification level: `full`, `backfill-only`, or `none` |
5959+| `PLC_URL` | `https://plc.wtf`, `https://plc.directory` (full network) | base URL(s) of the PLC directory, comma-separated |
6060| `IDENTITY_CACHE_SIZE` | `100000` | number of identity entries to cache in memory |
61616262## performance
63636464| variable | default | description |
6565| :--- | :--- | :--- |
6666-| `CACHE_SIZE` | `256` | database cache size in MB |
6666+| `CACHE_SIZE` | `256` | size of the database cache in MB |
67676868## rate limiting (relay mode)
69697070| variable | default | description |
7171| :--- | :--- | :--- |
7272-| `NEW_HOST_LIMIT` | `50` | max new hosts addable via `com.atproto.sync.requestCrawl` per day |
7373-| `RATE_TIERS` | | comma-separated tier definitions in `name:base/mul/hourly/daily[/account_limit]` format |
7474-| `TIER_RULES` | | comma-separated ordered glob rules in `pattern:tier_name` format; first match wins |
7272+| `NEW_HOST_LIMIT` | `50` | in relay mode, how many new hosts can be added via `com.atproto.sync.requestCrawl` per day |
7373+| `RATE_TIERS` | | comma-separated list of named rate tier definitions in `name:base/mul/hourly/daily[/account_limit]` format (e.g. `trusted:5000/10.0/18000000/432000000/10000000`). the optional account limit prevents new accounts from being created on a PDS once reached. built-in tiers (`default`, `trusted`) are always present and can be overridden |
7474+| `TIER_RULES` | | comma-separated ordered list of glob rules in `pattern:tier_name` format (e.g. `*.bsky.network:trusted`). rules are evaluated in order; first match wins. explicit API assignments via `PUT /pds/tiers` take precedence; the `default` tier is the final fallback. uses standard glob wildcards (`*`, `?`) matched against the PDS hostname |