docs/configuration.md at main

very fast at protocol indexer with flexible filtering, xrpc queries, cursor-backed event stream, and more, built on fjall
rust fjall at-protocol atproto indexer
fork
hydrant / docs / configuration.md
at main 76 lines 5.4 kB view raw view rendered
wrap content
did:plc:mb5to35neicxt4gemstoro… Binding multiple listen addresses 11hrs ago
82e340fd
 1---
 2title: configuration
 3---
 4
 5hydrant is configured via environment variables, all prefixed with `HYDRANT_` (except `RUST_LOG`). a `.env` file in the working directory is loaded automatically.
 6
 7## core
 8
 9| variable | default | description |
10| :--- | :--- | :--- |
11| `DATABASE_PATH` | `./hydrant.db` | path to the database folder |
12| `RUST_LOG` | `info` | log filter directives (e.g., `debug`, `hydrant=trace`). [tracing env-filter syntax](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html) |
13| `API_BIND` | `0.0.0.0:3000,[::]:3000` | comma-separated list of `<ip>:<port>` socket addresses to bind the API server to. literal IPs only (hostnames not resolved). when both an ipv4 and ipv6 entry share the same port, the v6 listener is set to v6-only to avoid bind collision; a lone `[::]:<port>` listens dual-stack |
14| `ENABLE_DEBUG` | `false` | enable debug endpoints |
15| `DEBUG_PORT` | first `API_BIND` port + 1 | port for debug endpoints (if enabled) |
16
17## indexing mode
18
19| variable | default | description |
20| :--- | :--- | :--- |
21| `FULL_NETWORK` | `false` (indexer), `true` (relay) | if `true`, discover and index all repos in the network |
22| `EPHEMERAL` | `false` (indexer), `true` (relay) | if enabled, no records are stored (in indexer mode). events are deleted after a certain duration (`EPHEMERAL_TTL`) |
23| `EPHEMERAL_TTL` | `60min`, `3d` (relay) | how long to keep events before deletion |
24| `ONLY_INDEX_LINKS` | `false` | indexer only. if enabled, record blocks are not stored, only the index (records, counts, events) is kept. `getRecord`, `listRecords`, and `getRepo` will return errors. the event stream still works but create/update events will not include record values |
25
26## filter
27
28| variable | default | description |
29| :--- | :--- | :--- |
30| `FILTER_SIGNALS` | | comma-separated list of NSID patterns to use for the filter (e.g. `app.bsky.feed.post,app.bsky.graph.*`) |
31| `FILTER_COLLECTIONS` | | comma-separated list of NSID patterns to use for the collections filter. empty = store all |
32| `FILTER_EXCLUDES` | | comma-separated list of DIDs to exclude from indexing |
33
34## firehose
35
36| variable | default | description |
37| :--- | :--- | :--- |
38| `RELAY_HOST` | `wss://relay.fire.hose.cam/` (indexer), empty (relay) | URL of a single firehose source |
39| `RELAY_HOSTS` | | comma-separated list of firehose sources. if unset, falls back to `RELAY_HOST`. prefix a URL with `pds::` to mark it as a direct PDS connection (e.g. `pds::wss://pds.example.com`). bare URLs are treated as relays. defaults to empty in relay mode; PDS' are expected to be seeded via `SEED_HOSTS` or the firehose management API |
40| `SEED_HOSTS` | `https://bsky.network` (relay) | comma-separated list of base URLs to call `com.atproto.sync.listHosts` on at startup. hydrant adds every non-banned host as a PDS firehose source |
41| `ENABLE_FIREHOSE` | `true` | whether to ingest relay subscriptions |
42| `FIREHOSE_WORKERS` | `8` (`24` full network) | number of concurrent workers for firehose events |
43| `CURSOR_SAVE_INTERVAL` | `3sec` | how often to persist the firehose cursor |
44
45## crawler
46
47| variable | default | description |
48| :--- | :--- | :--- |
49| `CRAWLER_URLS` | relay hosts (full network), `https://lightrail.microcosm.blue` (filter) | comma-separated list of `[mode::]url` crawler sources. mode is `relay` or `by_collection`; bare URLs use the default mode. set to empty string to disable crawling |
50| `ENABLE_CRAWLER` | `true` if full network or sources configured | whether to actively query the network for unknown repositories |
51| `CRAWLER_MAX_PENDING_REPOS` | `2000` | max pending repos before the crawler pauses |
52| `CRAWLER_RESUME_PENDING_REPOS` | `1000` | pending-repo count at which the crawler resumes |
53
54## backfill & identity
55
56| variable | default | description |
57| :--- | :--- | :--- |
58| `BACKFILL_CONCURRENCY_LIMIT` | `16` (`64` full network) | maximum number of concurrent backfill tasks |
59| `REPO_FETCH_TIMEOUT` | `5min` | timeout for fetching a repository |
60| `VERIFY_SIGNATURES` | `full` | signature verification level: `full`, `backfill-only`, or `none` |
61| `PLC_URL` | `https://plc.wtf`, `https://plc.directory` (full network) | base URL(s) of the PLC directory, comma-separated |
62| `IDENTITY_CACHE_SIZE` | `100000` | number of identity entries to cache in memory |
63
64## performance
65
66| variable | default | description |
67| :--- | :--- | :--- |
68| `CACHE_SIZE` | `256` | size of the database cache in MB |
69
70## rate limiting (relay mode)
71
72| variable | default | description |
73| :--- | :--- | :--- |
74| `NEW_HOST_LIMIT` | `50` | in relay mode, how many new hosts can be added via `com.atproto.sync.requestCrawl` per day |
75| `RATE_TIERS` | | comma-separated list of named rate tier definitions in `name:base/mul/hourly/daily[/account_limit]` format (e.g. `trusted:5000/10.0/18000000/432000000/10000000`). the optional account limit prevents new accounts from being created on a PDS once reached. built-in tiers (`default`, `trusted`) are always present and can be overridden |
76| `TIER_RULES` | | comma-separated ordered list of glob rules in `pattern:tier_name` format (e.g. `*.bsky.network:trusted`). rules are evaluated in order; first match wins. explicit API assignments via `PUT /pds/tiers` take precedence; the `default` tier is the final fallback. uses standard glob wildcards (`*`, `?`) matched against the PDS hostname |
Configure Feed

Configure Feed