very fast at protocol indexer with flexible filtering, xrpc queries, cursor-backed event stream, and more, built on fjall
rust fjall at-protocol atproto indexer
59
fork

Configure Feed

Select the types of activity you want to include in your feed.

[docs] add request body tables, restore config descriptions from readme

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dawn 4489948e af546208

+95 -54
+22 -24
docs/README.md
··· 1 - # hydrant 2 - 3 - `hydrant` is an AT Protocol indexer built on the `fjall` database. it's built to be flexible, supporting both full-network indexing and filtered indexing (e.g., by DID), allowing querying with XRPCs (not only `com.atproto.*`!), providing an ordered event stream, etc. oh and it can also act as a relay! 4 - 5 - you can see [random.wisp.place](https://tangled.org/did:plc:dfl62fgb7wtjj3fcbb72naae/random.wisp.place) (standalone binary using http API) or the [statusphere example](../examples/statusphere.rs) (hydrant-as-library) for examples. for rust docs look at https://hydrant.klbr.net/ for now. 6 - 7 - **WARNING: *the db format is only partially stable.*** we provide migrations in hydrant itself, so nothing should go wrong! you should still probably keep backups just in case! 8 - 9 - ## what's here 10 - 11 - - [getting started](getting-started.md): building, running, reverse proxying 12 - - [configuration](configuration.md): all environment variables 13 - - [build features](build-features.md): optional cargo features (`relay`, `backlinks`, etc.) 14 - - [concepts](concepts/README.md): how the stream works, relay comparison, multi-relay support 15 - - [rest api](api/README.md): management API reference 16 - - [xrpc](xrpc/README.md): data access via XRPC 17 - 18 - ## quick start 19 - 20 - ```bash 21 - cargo build --release 22 - export HYDRANT_DATABASE_PATH=./hydrant.db 23 - ./target/release/hydrant 24 - ``` 1 + `hydrant` is an AT Protocol indexer built on the `fjall` database. it's built to be flexible, supporting both full-network indexing and filtered indexing (e.g., by DID), allowing querying with XRPCs (not only `com.atproto.*`!), providing an ordered event stream, etc. oh and it can also act as a relay! 2 + 3 + you can see [random.wisp.place](https://tangled.org/did:plc:dfl62fgb7wtjj3fcbb72naae/random.wisp.place) (standalone binary using http API) or the [statusphere example](../examples/statusphere.rs) (hydrant-as-library) for examples. for rust docs look at https://hydrant.klbr.net/ for now. 4 + 5 + **WARNING: *the db format is only partially stable.*** we provide migrations in hydrant itself, so nothing should go wrong! you should still probably keep backups just in case! 6 + 7 + ## what's here 8 + 9 + - [getting started](getting-started.md): building, running, reverse proxying 10 + - [configuration](configuration.md): all environment variables 11 + - [build features](build-features.md): optional cargo features (`relay`, `backlinks`, etc.) 12 + - [concepts](concepts/README.md): how the stream works, relay comparison, multi-relay support 13 + - [rest api](api/README.md): management API reference 14 + - [xrpc](xrpc/README.md): data access via XRPC 15 + 16 + ## quick start 17 + 18 + ```bash 19 + cargo build --release 20 + export HYDRANT_DATABASE_PATH=./hydrant.db 21 + ./target/release/hydrant 22 + ```
+18 -3
docs/api/crawler.md
··· 8 8 9 9 ## POST /crawler/sources 10 10 11 - add a crawler source at runtime. body: `{ "url": string, "mode": "relay" | "by_collection" }`. 11 + add a crawler source at runtime. 12 + 13 + | field | description | 14 + | :--- | :--- | 15 + | `url` | URL of the crawler source | 16 + | `mode` | `"relay"` or `"by_collection"` | 12 17 13 18 the source is written to the database before the producer task is started, so it is safe to add sources and then immediately restart without losing them. 14 19 ··· 18 23 19 24 ## DELETE /crawler/sources 20 25 21 - remove a crawler source at runtime. body: `{ "url": string }`. 26 + remove a crawler source at runtime. 27 + 28 + | field | description | 29 + | :--- | :--- | 30 + | `url` | URL of the source to remove | 22 31 23 32 the producer task is stopped immediately. 24 33 ··· 30 39 31 40 ## DELETE /crawler/cursors 32 41 33 - reset stored cursors for a given crawler URL. body: `{ "key": "..." }` where key is a URL. clears the list-repos crawler cursor as well as any by-collection cursors associated with that URL. causes the next crawler pass to restart from the beginning. 42 + reset stored cursors for a given crawler URL. 43 + 44 + | field | description | 45 + | :--- | :--- | 46 + | `key` | URL of the crawler source to reset | 47 + 48 + clears the list-repos crawler cursor as well as any by-collection cursors associated with that URL. causes the next crawler pass to restart from the beginning.
+18 -3
docs/api/firehose.md
··· 8 8 9 9 ## POST /firehose/sources 10 10 11 - add a firehose source at runtime. body: `{ "url": string, "is_pds": bool }`. `is_pds` defaults to `false`. 11 + add a firehose source at runtime. 12 + 13 + | field | description | 14 + | :--- | :--- | 15 + | `url` | URL of the firehose source | 16 + | `is_pds` | whether the source is a direct PDS connection (default `false`) | 12 17 13 18 the source is persisted to the database before the ingestor task is started. 14 19 ··· 18 23 19 24 ## DELETE /firehose/sources 20 25 21 - remove a firehose relay at runtime. body: `{ "url": string }`. 26 + remove a firehose relay at runtime. 27 + 28 + | field | description | 29 + | :--- | :--- | 30 + | `url` | URL of the source to remove | 22 31 23 32 the ingestor task is stopped immediately. 24 33 ··· 30 39 31 40 ## DELETE /firehose/cursors 32 41 33 - reset the stored cursor for a given firehose relay URL. body: `{ "key": "..." }` where key is a URL. causes the next firehose connection to restart from the beginning. 42 + reset the stored cursor for a given firehose relay URL. 43 + 44 + | field | description | 45 + | :--- | :--- | 46 + | `key` | URL of the firehose source to reset | 47 + 48 + causes the next firehose connection to restart from the beginning.
+7 -1
docs/api/ingestion.md
··· 6 6 7 7 ## PATCH /ingestion 8 8 9 - enable or disable ingestion components at runtime without restarting. body: `{ "crawler"?: bool, "firehose"?: bool, "backfill"?: bool }`. only provided fields are updated. 9 + enable or disable ingestion components at runtime without restarting. only provided fields are updated. 10 + 11 + | field | description | 12 + | :--- | :--- | 13 + | `crawler` | enable or disable the crawler | 14 + | `firehose` | enable or disable the firehose | 15 + | `backfill` | enable or disable the backfill worker | 10 16 11 17 when disabled, each component finishes its current task before pausing (e.g. the backfill worker completes any in-flight repo syncs, the firehose finishes processing the current message). they resume immediately when re-enabled.
+10 -3
docs/api/pds.md
··· 25 25 26 26 ## PUT /pds/tiers 27 27 28 - assign a PDS to a named rate tier. body: `{ "host": string, "tier": string }`. 28 + assign a PDS to a named rate tier. 29 29 30 - `host` is the PDS hostname (e.g. `pds.example.com`). `tier` must be one of the configured tier names; returns `400` if unknown. 30 + | field | description | 31 + | :--- | :--- | 32 + | `host` | PDS hostname (e.g. `pds.example.com`) | 33 + | `tier` | name of the rate tier to assign; returns `400` if unknown | 31 34 32 35 assignments are persisted to the database and survive restarts. re-assigning the same host updates the tier in place without creating a duplicate. 33 36 34 37 ## DELETE /pds/tiers 35 38 36 - remove an explicit tier assignment for a PDS. query parameter: `?host=<hostname>` (e.g. `?host=pds.example.com`). 39 + remove an explicit tier assignment for a PDS. query parameter: 40 + 41 + | param | description | 42 + | :--- | :--- | 43 + | `host` | PDS hostname (e.g. `?host=pds.example.com`) | 37 44 38 45 reverts the host to glob-rule resolution (not necessarily `default`; a matching `TIER_RULES` pattern still applies). 39 46
+20 -20
docs/configuration.md
··· 7 7 | variable | default | description | 8 8 | :--- | :--- | :--- | 9 9 | `DATABASE_PATH` | `./hydrant.db` | path to the database folder | 10 - | `RUST_LOG` | `info` | log filter ([tracing env-filter syntax](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html)) | 10 + | `RUST_LOG` | `info` | log filter directives (e.g., `debug`, `hydrant=trace`). [tracing env-filter syntax](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html) | 11 11 | `API_PORT` | `3000` | port for the API server | 12 12 | `ENABLE_DEBUG` | `false` | enable debug endpoints | 13 - | `DEBUG_PORT` | `API_PORT + 1` | port for debug endpoints | 13 + | `DEBUG_PORT` | `API_PORT + 1` | port for debug endpoints (if enabled) | 14 14 15 15 ## indexing mode 16 16 17 17 | variable | default | description | 18 18 | :--- | :--- | :--- | 19 19 | `FULL_NETWORK` | `false` (indexer), `true` (relay) | if `true`, discover and index all repos in the network | 20 - | `EPHEMERAL` | `false` (indexer), `true` (relay) | if enabled, no records are stored; events are deleted after `EPHEMERAL_TTL` | 20 + | `EPHEMERAL` | `false` (indexer), `true` (relay) | if enabled, no records are stored (in indexer mode). events are deleted after a certain duration (`EPHEMERAL_TTL`) | 21 21 | `EPHEMERAL_TTL` | `60min`, `3d` (relay) | how long to keep events before deletion | 22 - | `ONLY_INDEX_LINKS` | `false` | don't store record blocks, only the index. `getRecord`, `listRecords`, and `getRepo` will fail; the event stream still works but create/update events won't include record values | 22 + | `ONLY_INDEX_LINKS` | `false` | indexer only. if enabled, record blocks are not stored, only the index (records, counts, events) is kept. `getRecord`, `listRecords`, and `getRepo` will return errors. the event stream still works but create/update events will not include record values | 23 23 24 24 ## filter 25 25 26 26 | variable | default | description | 27 27 | :--- | :--- | :--- | 28 - | `FILTER_SIGNALS` | | comma-separated NSID patterns triggering auto-discovery in filter mode (e.g. `app.bsky.feed.post,app.bsky.graph.*`) | 29 - | `FILTER_COLLECTIONS` | | comma-separated NSID patterns limiting which records are stored. empty = store all | 30 - | `FILTER_EXCLUDES` | | comma-separated DIDs to always skip | 28 + | `FILTER_SIGNALS` | | comma-separated list of NSID patterns to use for the filter (e.g. `app.bsky.feed.post,app.bsky.graph.*`) | 29 + | `FILTER_COLLECTIONS` | | comma-separated list of NSID patterns to use for the collections filter. empty = store all | 30 + | `FILTER_EXCLUDES` | | comma-separated list of DIDs to exclude from indexing | 31 31 32 32 ## firehose 33 33 34 34 | variable | default | description | 35 35 | :--- | :--- | :--- | 36 - | `RELAY_HOST` | `wss://relay.fire.hose.cam/` (indexer), empty (relay) | single firehose source URL | 37 - | `RELAY_HOSTS` | | comma-separated firehose sources. prefix with `pds::` for direct PDS connections. overrides `RELAY_HOST` | 38 - | `SEED_HOSTS` | `https://bsky.network` (relay) | relay URLs to call `com.atproto.sync.listHosts` on at startup, adding every non-banned PDS as a firehose source | 36 + | `RELAY_HOST` | `wss://relay.fire.hose.cam/` (indexer), empty (relay) | URL of a single firehose source | 37 + | `RELAY_HOSTS` | | comma-separated list of firehose sources. if unset, falls back to `RELAY_HOST`. prefix a URL with `pds::` to mark it as a direct PDS connection (e.g. `pds::wss://pds.example.com`). bare URLs are treated as relays. defaults to empty in relay mode; PDS' are expected to be seeded via `SEED_HOSTS` or the firehose management API | 38 + | `SEED_HOSTS` | `https://bsky.network` (relay) | comma-separated list of base URLs to call `com.atproto.sync.listHosts` on at startup. hydrant adds every non-banned host as a PDS firehose source | 39 39 | `ENABLE_FIREHOSE` | `true` | whether to ingest relay subscriptions | 40 - | `FIREHOSE_WORKERS` | `8` (`24` full network) | concurrent workers for firehose events | 40 + | `FIREHOSE_WORKERS` | `8` (`24` full network) | number of concurrent workers for firehose events | 41 41 | `CURSOR_SAVE_INTERVAL` | `3sec` | how often to persist the firehose cursor | 42 42 43 43 ## crawler 44 44 45 45 | variable | default | description | 46 46 | :--- | :--- | :--- | 47 - | `CRAWLER_URLS` | relay hosts (full network), `https://lightrail.microcosm.blue` (filter) | comma-separated `[mode::]url` crawler sources | 48 - | `ENABLE_CRAWLER` | `true` if full network or sources configured | whether to actively query the network | 47 + | `CRAWLER_URLS` | relay hosts (full network), `https://lightrail.microcosm.blue` (filter) | comma-separated list of `[mode::]url` crawler sources. mode is `relay` or `by_collection`; bare URLs use the default mode. set to empty string to disable crawling | 48 + | `ENABLE_CRAWLER` | `true` if full network or sources configured | whether to actively query the network for unknown repositories | 49 49 | `CRAWLER_MAX_PENDING_REPOS` | `2000` | max pending repos before the crawler pauses | 50 50 | `CRAWLER_RESUME_PENDING_REPOS` | `1000` | pending-repo count at which the crawler resumes | 51 51 ··· 53 53 54 54 | variable | default | description | 55 55 | :--- | :--- | :--- | 56 - | `BACKFILL_CONCURRENCY_LIMIT` | `16` (`64` full network) | max concurrent backfill tasks | 56 + | `BACKFILL_CONCURRENCY_LIMIT` | `16` (`64` full network) | maximum number of concurrent backfill tasks | 57 57 | `REPO_FETCH_TIMEOUT` | `5min` | timeout for fetching a repository | 58 - | `VERIFY_SIGNATURES` | `full` | signature verification: `full`, `backfill-only`, or `none` | 59 - | `PLC_URL` | `https://plc.wtf`, `https://plc.directory` (full network) | PLC directory base URL(s), comma-separated | 58 + | `VERIFY_SIGNATURES` | `full` | signature verification level: `full`, `backfill-only`, or `none` | 59 + | `PLC_URL` | `https://plc.wtf`, `https://plc.directory` (full network) | base URL(s) of the PLC directory, comma-separated | 60 60 | `IDENTITY_CACHE_SIZE` | `100000` | number of identity entries to cache in memory | 61 61 62 62 ## performance 63 63 64 64 | variable | default | description | 65 65 | :--- | :--- | :--- | 66 - | `CACHE_SIZE` | `256` | database cache size in MB | 66 + | `CACHE_SIZE` | `256` | size of the database cache in MB | 67 67 68 68 ## rate limiting (relay mode) 69 69 70 70 | variable | default | description | 71 71 | :--- | :--- | :--- | 72 - | `NEW_HOST_LIMIT` | `50` | max new hosts addable via `com.atproto.sync.requestCrawl` per day | 73 - | `RATE_TIERS` | | comma-separated tier definitions in `name:base/mul/hourly/daily[/account_limit]` format | 74 - | `TIER_RULES` | | comma-separated ordered glob rules in `pattern:tier_name` format; first match wins | 72 + | `NEW_HOST_LIMIT` | `50` | in relay mode, how many new hosts can be added via `com.atproto.sync.requestCrawl` per day | 73 + | `RATE_TIERS` | | comma-separated list of named rate tier definitions in `name:base/mul/hourly/daily[/account_limit]` format (e.g. `trusted:5000/10.0/18000000/432000000/10000000`). the optional account limit prevents new accounts from being created on a PDS once reached. built-in tiers (`default`, `trusted`) are always present and can be overridden | 74 + | `TIER_RULES` | | comma-separated ordered list of glob rules in `pattern:tier_name` format (e.g. `*.bsky.network:trusted`). rules are evaluated in order; first match wins. explicit API assignments via `PUT /pds/tiers` take precedence; the `default` tier is the final fallback. uses standard glob wildcards (`*`, `?`) matched against the PDS hostname |