very fast at protocol indexer with flexible filtering, xrpc queries, cursor-backed event stream, and more, built on fjall
rust fjall at-protocol atproto indexer
58
fork

Configure Feed

Select the types of activity you want to include in your feed.

[docs] initial documentation + add verbiage to devshell

Splits the README into structured wiki-ready markdown under docs/:
- docs/README.md — overview and index
- docs/getting-started.md — building, running, reverse proxying
- docs/configuration.md — all env vars
- docs/build-features.md — cargo feature flags
- docs/concepts/{vs-tap,relay}.md — stream behavior and relay/seeding concepts
- docs/api/{filter,ingestion,crawler,firehose,pds,repos,database}.md — REST API
- docs/xrpc/{atproto,hydrant,backlinks}.md — XRPC reference

flake.nix: adds github:90-008/verbiage as an input and includes the
verbiage package in the devshell so `verbiage docs/` can be used to
develop docs locally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dawn 9f5c0046 d6d96734

+506 -1
+24
docs/README.md
··· 1 + # hydrant 2 + 3 + `hydrant` is an AT Protocol indexer built on the `fjall` database. it's built to be flexible, supporting both full-network indexing and filtered indexing (e.g., by DID), allowing querying with XRPCs (not only `com.atproto.*`!), providing an ordered event stream, etc. oh and it can also act as a relay! 4 + 5 + you can see [random.wisp.place](https://tangled.org/did:plc:dfl62fgb7wtjj3fcbb72naae/random.wisp.place) (standalone binary using http API) or the [statusphere example](../examples/statusphere.rs) (hydrant-as-library) for examples. for rust docs look at https://hydrant.klbr.net/ for now. 6 + 7 + **WARNING: *the db format is only partially stable.*** we provide migrations in hydrant itself, so nothing should go wrong! you should still probably keep backups just in case! 8 + 9 + ## what's here 10 + 11 + - [getting started](getting-started.md): building, running, reverse proxying 12 + - [configuration](configuration.md): all environment variables 13 + - [build features](build-features.md): optional cargo features (`relay`, `backlinks`, etc.) 14 + - [concepts](concepts/README.md): how the stream works, relay comparison, multi-relay support 15 + - [rest api](api/README.md): management API reference 16 + - [xrpc](xrpc/README.md): data access via XRPC 17 + 18 + ## quick start 19 + 20 + ```bash 21 + cargo build --release 22 + export HYDRANT_DATABASE_PATH=./hydrant.db 23 + ./target/release/hydrant 24 + ```
+19
docs/api/README.md
··· 1 + # rest api 2 + 3 + hydrant's REST API is split into public endpoints (safe to expose) and management endpoints (keep private). see [getting started](../getting-started.md#reverse-proxying) for guidance on what to expose. 4 + 5 + ## public 6 + 7 + - `GET /stream`: subscribe to the event stream. query params: `cursor` (optional, start from a specific event ID). 8 + - `GET /stats`: get stats about the database (counts of repos, records, events; sizes of keyspaces on disk). 9 + - `GET /health` / `GET /_health`: health check. 10 + 11 + ## management 12 + 13 + - [filter](filter.md): NSID filter configuration 14 + - [ingestion](ingestion.md): enable/disable crawler, firehose, backfill at runtime 15 + - [crawler](crawler.md): crawler source management 16 + - [firehose](firehose.md): firehose source management 17 + - [pds](pds.md): rate-limit tier assignments 18 + - [repos](repos.md): explicit repository tracking, resyncing, untracking 19 + - [database](database.md): compression training, compaction
+17
docs/api/crawler.md
··· 1 + # crawler management 2 + 3 + - `GET /crawler/sources`: list all currently active crawler sources. 4 + - returns a JSON array of `{ "url": string, "mode": "relay" | "by_collection", "persisted": bool }`. 5 + - `persisted: true` means the source was added via the API and is stored in the database, it will survive a restart. `persisted: false` means the source came from `CRAWLER_URLS` and is not written to the database. 6 + - `POST /crawler/sources`: add a crawler source at runtime. 7 + - body: `{ "url": string, "mode": "relay" | "by_collection" }`. 8 + - the source is written to the database before the producer task is started, so it is safe to add sources and then immediately restart without losing them. 9 + - if a source with the same URL already exists (whether from `CRAWLER_URLS` or a previous `POST`), it is replaced: the running task is stopped and a new one is started with the new mode. any cursor state for that URL is preserved. 10 + - returns `201 Created` on success. 11 + - `DELETE /crawler/sources`: remove a crawler source at runtime. 12 + - body: `{ "url": string }`. 13 + - the producer task is stopped immediately. 14 + - if the source was added via the API (`persisted: true`), it is removed from the database and will not reappear on restart. if it came from `CRAWLER_URLS` (`persisted: false`), only the running task is stopped, the source will reappear on the next restart since `CRAWLER_URLS` is re-applied at startup. 15 + - cursor state is not cleared. use `DELETE /crawler/cursors` separately if you want the source to restart from the beginning when re-added. 16 + - returns `200 OK` if the source was found and removed, `404 Not Found` otherwise. 17 + - `DELETE /crawler/cursors`: reset stored cursors for a given crawler URL. body: `{ "key": "..." }` where key is a URL. clears the list-repos crawler cursor as well as any by-collection cursors associated with that URL. causes the next crawler pass to restart from the beginning.
+4
docs/api/database.md
··· 1 + # database operations 2 + 3 + - `POST /db/train`: train zstd compression dictionaries for the `repos`, `blocks`, and `events` keyspaces. dictionaries are written to disk; a restart is required to apply them. the crawler, firehose, and backfill worker are paused for the duration and restored on completion. 4 + - `POST /db/compact`: trigger a full major compaction of all database keyspaces in parallel. the crawler, firehose, and backfill worker are paused for the duration and restored on completion.
+36
docs/api/filter.md
··· 1 + # filter management 2 + 3 + - `GET /filter`: get the current filter configuration. 4 + - `PATCH /filter`: update the filter configuration. 5 + 6 + ## filter mode 7 + 8 + the `mode` field controls what gets indexed: 9 + 10 + | mode | behaviour | 11 + | :--- | :--- | 12 + | `filter` | auto-discovers and backfills any account whose firehose commit touches a collection matching one of the `signals` patterns. you can also explicitly track individual repositories via the `/repos` endpoint regardless of matching signals. | 13 + | `full` | index the entire network. `signals` are ignored for discovery, but `excludes` and `collections` still apply. | 14 + 15 + ## fields 16 + 17 + | field | type | description | 18 + | :--- | :--- | :--- | 19 + | `mode` | `"filter"` \| `"full"` | indexing mode (see above). | 20 + | `signals` | set update | NSID patterns (e.g. `app.bsky.feed.post` or `app.bsky.*`) that trigger auto-discovery in `filter` mode. | 21 + | `collections` | set update | NSID patterns used to filter which records are stored. if empty, all collections are stored. applies in all modes. | 22 + | `excludes` | set update | set of DIDs to always skip, regardless of mode. checked before any other filter logic. | 23 + 24 + ## set updates 25 + 26 + each set field accepts one of two forms: 27 + 28 + - **replace**: an array replaces the entire set, eg. `["app.bsky.feed.post", "app.bsky.graph.*"]` 29 + - **patch**: an object maps items to `true` (add) or `false` (remove), eg. `{"app.bsky.feed.post": true, "app.bsky.graph.*": false}` 30 + 31 + ## NSID patterns 32 + 33 + `signals` and `collections` support an optional `.*` suffix to match an entire namespace: 34 + 35 + - `app.bsky.feed.post`: exact match only 36 + - `app.bsky.feed.*`: matches any collection under `app.bsky.feed`
+18
docs/api/firehose.md
··· 1 + # firehose management 2 + 3 + - `GET /firehose/sources`: list all currently active firehose sources. 4 + - returns a JSON array of `{ "url": string, "persisted": bool, "is_pds": bool }`. 5 + - `persisted: true` means the source was added via the API and is stored in the database, it will survive a restart. `persisted: false` means the source came from `RELAY_HOSTS` and is not written to the database. 6 + - `is_pds: true` means the source is a direct PDS connection with host authority enforcement enabled. 7 + - `POST /firehose/sources`: add a firehose source at runtime. 8 + - body: `{ "url": string, "is_pds": bool }`. `is_pds` defaults to `false`. 9 + - the source is persisted to the database before the ingestor task is started. 10 + - if a source with the same URL already exists, it is replaced: the running task is stopped and a new one is started. any existing cursor state for that URL is preserved. 11 + - returns `201 Created` on success. 12 + - `DELETE /firehose/sources`: remove a firehose relay at runtime. 13 + - body: `{ "url": string }`. 14 + - the ingestor task is stopped immediately. 15 + - if the source was added via the API (`persisted: true`), it is removed from the database and will not reappear on restart. if it came from `RELAY_HOSTS` (`persisted: false`), only the running task is stopped; the source reappears on the next restart. 16 + - cursor state is not cleared. use `DELETE /firehose/cursors` separately if you want the relay to restart from the beginning when re-added. 17 + - returns `200 OK` if the relay was found and removed, `404 Not Found` otherwise. 18 + - `DELETE /firehose/cursors`: reset the stored cursor for a given firehose relay URL. body: `{ "key": "..." }` where key is a URL. causes the next firehose connection to restart from the beginning.
+7
docs/api/ingestion.md
··· 1 + # ingestion control 2 + 3 + - `GET /ingestion`: get the current ingestion status. 4 + - returns `{ "crawler": bool, "firehose": bool, "backfill": bool }`. 5 + - `PATCH /ingestion`: enable or disable ingestion components at runtime without restarting. 6 + - body: `{ "crawler"?: bool, "firehose"?: bool, "backfill"?: bool }`. only provided fields are updated. 7 + - when disabled, each component finishes its current task before pausing (e.g. the backfill worker completes any in-flight repo syncs, the firehose finishes processing the current message). they resume immediately when re-enabled.
+35
docs/api/pds.md
··· 1 + # PDS management 2 + 3 + hydrant rate-limits firehose events per PDS. each PDS is assigned to a named rate tier that controls how aggressively hydrant limits events from it. two built-in tiers are always present: `default` (conservative limits for unknown operators) and `trusted` (higher limits for well-behaved operators). additional tiers can be defined via `RATE_TIERS`. 4 + 5 + the per-second limit scales with the number of active accounts on the PDS: `max(per_second_base, accounts × per_second_account_mul)`. 6 + 7 + you can also define an optional `account_limit` for a rate tier. if a PDS exceeds this number of active accounts, hydrant will reject any new account creation events from it. 8 + 9 + the built-in tiers are defined as follows: 10 + - `default`: `50` per sec (floor), `+0.5` per account. max `3_600_000`/hr, `86_400_000`/day. `100` account limit. 11 + - `trusted`: `5000` per sec (floor), `+10.0` per account. max `18_000_000`/hr, `432_000_000`/day. `10_000_000` account limit. 12 + 13 + tiers are resolved in this order: 14 + 15 + 1. **explicit API assignment**, set via `PUT /pds/tiers`, stored in the database, survives restarts. 16 + 2. **glob rules**, from `TIER_RULES`, evaluated in order; first match wins. 17 + 3. **`default` tier**, applied if no rule or explicit assignment matches. 18 + 19 + deleting an API assignment reverts the host to glob-rule resolution, not necessarily back to `default`. if a rule like `*.bsky.network:trusted` matches the host, it will become trusted again without any further action. 20 + 21 + - `GET /pds/tiers`: list all current tier assignments alongside the available tier definitions. 22 + - returns `{ "assignments": [{ "host": string, "tier": string }], "rate_tiers": { <name>: { "per_second_base": int, "per_second_account_mul": float, "per_hour": int, "per_day": int } } }`. 23 + - `assignments` only contains PDSes with an explicit API assignment. hosts without one resolve via glob rules or fall back to `default`. 24 + - `PUT /pds/tiers`: assign a PDS to a named rate tier. 25 + - body: `{ "host": string, "tier": string }`. 26 + - `host` is the PDS hostname (e.g. `pds.example.com`). 27 + - `tier` must be one of the configured tier names. returns `400` if unknown. 28 + - assignments are persisted to the database and survive restarts. 29 + - re-assigning the same host updates the tier in place without creating a duplicate. 30 + - `DELETE /pds/tiers`: remove an explicit tier assignment for a PDS. 31 + - query parameter: `?host=<hostname>` (e.g. `?host=pds.example.com`). 32 + - reverts the host to glob-rule resolution (not necessarily `default`, a matching `TIER_RULES` pattern still applies). 33 + - returns `200` even if no assignment existed. 34 + - `GET /pds/rate-tiers`: list the available rate tier definitions. 35 + - returns a map of tier name to `{ "per_second_base", "per_second_account_mul", "per_hour", "per_day", "account_limit" }`.
+11
docs/api/repos.md
··· 1 + # repository management 2 + 3 + all `/repos` endpoints that return lists respond with NDJSON by default. send `Accept: application/json` or `Content-Type: application/json` to get a JSON array instead. 4 + 5 + - `GET /repos`: get a list of repositories and their sync status. supports pagination and filtering: 6 + - `limit`: max results (default 100, max 1000) 7 + - `cursor`: did key for paginating. 8 + - `GET /repos/{did}`: get the sync status and metadata of a specific repository. also returns the handle, PDS URL and the atproto signing key (these won't be available before the repo has been backfilled once at least). 9 + - `PUT /repos`: explicitly track repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). only affects repositories that are not known or are untracked. returns a list of the DIDs that were queued for backfill. 10 + - `DELETE /repos`: untrack repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). only affects repositories that are currently tracked. returns a list of the DIDs that were untracked. 11 + - `POST /repos/resync`: force a new backfill for one or more repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). only affects repositories hydrant already knows about. returns a list of the DIDs that were queued.
+16
docs/build-features.md
··· 1 + # build features 2 + 3 + `hydrant` has several optional compile-time features: 4 + 5 + | feature | default | description | 6 + | :--- | :--- | :--- | 7 + | `indexer` | yes | makes hydrant act as an indexer. incompatible with the relay feature. | 8 + | `indexer_stream` | yes | enables the event stream for the indexer. requires indexer feature. | 9 + | `relay` | no | makes hydrant act as a relay. incompatible with the indexer feature. | 10 + | `backlinks` | no | enables the backlinks indexer and XRPC endpoints (`blue.microcosm.links.*`). requires indexer feature. | 11 + 12 + to build with a specific feature: 13 + 14 + ```bash 15 + cargo build --release --features backlinks 16 + ```
+4
docs/concepts/README.md
··· 1 + # concepts 2 + 3 + - [hydrant vs tap](vs-tap.md): design comparison, stream behavior 4 + - [relay & seeding](relay.md): multi-relay support, firehose seeding, crawler sources
+52
docs/concepts/relay.md
··· 1 + # relay, seeding & crawler sources 2 + 3 + ## multiple relay support 4 + 5 + `hydrant` supports connecting to multiple relays simultaneously for firehose ingestion. when `RELAY_HOSTS` is configured with multiple URLs: 6 + 7 + - one independent firehose stream loop is spawned per relay 8 + - each relay maintains its own firehose cursor state 9 + - all ingestion loops share the same worker pool and database 10 + 11 + commit events are de-duplicated according to the repo `rev`. account / identity events are de-duplicated using the `time` field. 12 + 13 + ## direct PDS connections 14 + 15 + a firehose source can also be a direct connection to a PDS rather than a relay. prefix the URL with `pds::` to mark it as such: 16 + 17 + ``` 18 + HYDRANT_RELAY_HOSTS=wss://bsky.network,pds::wss://pds.example.com 19 + ``` 20 + 21 + only when a source is marked as a direct PDS (`is_pds: true`), hydrant enforces host authority. relays (`is_pds: false`, the default) are exempt from this check, since they forward commits from many PDSes by design. 22 + 23 + ## firehose seeding 24 + 25 + in relay mode, `RELAY_HOSTS` defaults to empty. set `SEED_HOSTS` to one or more relay base URLs and hydrant will call `com.atproto.sync.listHosts` on each at startup, adding every returned PDS as a firehose source: 26 + 27 + ``` 28 + HYDRANT_SEED_HOSTS=https://bsky.network 29 + ``` 30 + 31 + seeding runs as a background task so the main firehose loop is not blocked. seed URLs are fetched concurrently (up to four at a time) and the full `listHosts` pagination is consumed for each. if a request fails partway through, the hosts collected so far are still added and the failure is logged. 32 + 33 + each discovered host is added as a persistent PDS firehose source (`is_pds: true`), equivalent to calling `POST /firehose/sources`. 34 + 35 + banned hosts (`status: "banned"`) are skipped. all other statuses are included since the firehose ingestor retries on disconnect and transiently-unavailable hosts will reconnect on their own. 36 + 37 + seeding runs from latest cursor on restart so new PDS' added to the upstream relay since the last start are picked up automatically (if they haven't through firehose). sources that are already running are detected and skipped, so re-seeding is idempotent. 38 + 39 + ## crawler sources 40 + 41 + the crawler is configured separately from the firehose via `CRAWLER_URLS`. each source is a `[mode::]url` entry where the mode prefix is optional and defaults to `by_collection` in filter mode or `list_repos` in full-network mode. 42 + 43 + - `list_repos`: enumerates the network via `com.atproto.sync.listRepos`, checks each repo's collections via `describeRepo`. 44 + - `by_collection`: queries `com.atproto.sync.listReposByCollection` for each configured signal. more efficient for filtered indexing since it only surfaces repos that have matching records. cursors are stored per collection. note that it won't crawl anything if no signals are specified. 45 + 46 + ``` 47 + CRAWLER_URLS=by_collection::https://lightrail.microcosm.blue,list_repos::wss://bsky.network 48 + ``` 49 + 50 + each source maintains its own cursor so restarts resume mid-pass. 51 + 52 + sources can also be added and removed at runtime via the `/crawler/sources` API (see [crawler management](../api/crawler.md)). dynamically added sources are persisted to the database and survive restarts. `CRAWLER_URLS` sources are startup-only: they are not written to the database and will always reappear after a restart regardless of runtime changes (unless you change the config of course).
+17
docs/concepts/vs-tap.md
··· 1 + # hydrant vs tap 2 + 3 + while [`tap`](https://github.com/bluesky-social/indigo/tree/main/cmd/tap) is designed as a firehose consumer and simply just propagates events while handling sync, `hydrant` is flexible, it allows you to directly query the database for records, and it also provides an ordered view of events, allowing the use of a cursor to fetch events from a specific point. it can act as both an indexer or an ephemeral view of some window of events. 4 + 5 + you can also read [this blogpost](https://90008.leaflet.pub/3mhp3t4kuw22e) for a longer comparison. 6 + 7 + ## stream behavior 8 + 9 + the `WS /stream` (hydrant) and `WS /channel` (tap) endpoints have different designs: 10 + 11 + | aspect | `tap` (`/channel`) | `hydrant` (`/stream`) | 12 + | :--- | :--- | :--- | 13 + | distribution | sharded work queue: events are load-balanced across connected clients. if 5 clients connect, each receives ~20% of events. | broadcast: every connected client receives a full copy of the event stream. if 5 clients connect, all 5 receive 100% of events. | 14 + | cursors | server-managed: clients ACK messages. the server tracks progress and redelivers unacked messages. | client-managed: client provides `?cursor=123`. the server streams from that point. | 15 + | persistence | events are stored in an outbox and sent to the consumer, and removed from the outbox when acked. nothing is replayable. | `record` events are replayable. `identity`/`account` are ephemeral. use `GET /repos/:did` to query identity / account info (handle, pds, signing key, etc.). | 16 + | backfill | backfill events are mixed into the live queue and prioritized (per-repo, acting as synchronization barrier) by the server. | backfill simply inserts historical events (`live: false`) into the global event log. streaming is just reading this log sequentially. synchronization is the same as tap, `live: true` vs `live: false`. | 17 + | event types | `record`, `identity` (includes status) | `record`, `identity` (handle, cache-buster), `account` (status) |
+74
docs/configuration.md
··· 1 + # configuration 2 + 3 + hydrant is configured via environment variables, all prefixed with `HYDRANT_` (except `RUST_LOG`). a `.env` file in the working directory is loaded automatically. 4 + 5 + ## core 6 + 7 + | variable | default | description | 8 + | :--- | :--- | :--- | 9 + | `DATABASE_PATH` | `./hydrant.db` | path to the database folder | 10 + | `RUST_LOG` | `info` | log filter ([tracing env-filter syntax](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html)) | 11 + | `API_PORT` | `3000` | port for the API server | 12 + | `ENABLE_DEBUG` | `false` | enable debug endpoints | 13 + | `DEBUG_PORT` | `API_PORT + 1` | port for debug endpoints | 14 + 15 + ## indexing mode 16 + 17 + | variable | default | description | 18 + | :--- | :--- | :--- | 19 + | `FULL_NETWORK` | `false` (indexer), `true` (relay) | if `true`, discover and index all repos in the network | 20 + | `EPHEMERAL` | `false` (indexer), `true` (relay) | if enabled, no records are stored; events are deleted after `EPHEMERAL_TTL` | 21 + | `EPHEMERAL_TTL` | `60min`, `3d` (relay) | how long to keep events before deletion | 22 + | `ONLY_INDEX_LINKS` | `false` | don't store record blocks, only the index. `getRecord`, `listRecords`, and `getRepo` will fail; the event stream still works but create/update events won't include record values | 23 + 24 + ## filter 25 + 26 + | variable | default | description | 27 + | :--- | :--- | :--- | 28 + | `FILTER_SIGNALS` | | comma-separated NSID patterns triggering auto-discovery in filter mode (e.g. `app.bsky.feed.post,app.bsky.graph.*`) | 29 + | `FILTER_COLLECTIONS` | | comma-separated NSID patterns limiting which records are stored. empty = store all | 30 + | `FILTER_EXCLUDES` | | comma-separated DIDs to always skip | 31 + 32 + ## firehose 33 + 34 + | variable | default | description | 35 + | :--- | :--- | :--- | 36 + | `RELAY_HOST` | `wss://relay.fire.hose.cam/` (indexer), empty (relay) | single firehose source URL | 37 + | `RELAY_HOSTS` | | comma-separated firehose sources. prefix with `pds::` for direct PDS connections. overrides `RELAY_HOST` | 38 + | `SEED_HOSTS` | `https://bsky.network` (relay) | relay URLs to call `com.atproto.sync.listHosts` on at startup, adding every non-banned PDS as a firehose source | 39 + | `ENABLE_FIREHOSE` | `true` | whether to ingest relay subscriptions | 40 + | `FIREHOSE_WORKERS` | `8` (`24` full network) | concurrent workers for firehose events | 41 + | `CURSOR_SAVE_INTERVAL` | `3sec` | how often to persist the firehose cursor | 42 + 43 + ## crawler 44 + 45 + | variable | default | description | 46 + | :--- | :--- | :--- | 47 + | `CRAWLER_URLS` | relay hosts (full network), `https://lightrail.microcosm.blue` (filter) | comma-separated `[mode::]url` crawler sources | 48 + | `ENABLE_CRAWLER` | `true` if full network or sources configured | whether to actively query the network | 49 + | `CRAWLER_MAX_PENDING_REPOS` | `2000` | max pending repos before the crawler pauses | 50 + | `CRAWLER_RESUME_PENDING_REPOS` | `1000` | pending-repo count at which the crawler resumes | 51 + 52 + ## backfill & identity 53 + 54 + | variable | default | description | 55 + | :--- | :--- | :--- | 56 + | `BACKFILL_CONCURRENCY_LIMIT` | `16` (`64` full network) | max concurrent backfill tasks | 57 + | `REPO_FETCH_TIMEOUT` | `5min` | timeout for fetching a repository | 58 + | `VERIFY_SIGNATURES` | `full` | signature verification: `full`, `backfill-only`, or `none` | 59 + | `PLC_URL` | `https://plc.wtf`, `https://plc.directory` (full network) | PLC directory base URL(s), comma-separated | 60 + | `IDENTITY_CACHE_SIZE` | `100000` | number of identity entries to cache in memory | 61 + 62 + ## performance 63 + 64 + | variable | default | description | 65 + | :--- | :--- | :--- | 66 + | `CACHE_SIZE` | `256` | database cache size in MB | 67 + 68 + ## rate limiting (relay mode) 69 + 70 + | variable | default | description | 71 + | :--- | :--- | :--- | 72 + | `NEW_HOST_LIMIT` | `50` | max new hosts addable via `com.atproto.sync.requestCrawl` per day | 73 + | `RATE_TIERS` | | comma-separated tier definitions in `name:base/mul/hourly/daily[/account_limit]` format | 74 + | `TIER_RULES` | | comma-separated ordered glob rules in `pattern:tier_name` format; first match wins |
+55
docs/getting-started.md
··· 1 + # getting started 2 + 3 + ## requirements 4 + 5 + hydrant is written in rust and requires the rust toolchain (including `cargo`), `make`, `cmake` for some dependencies. you will also need the clang toolchain and the [wild linker](https://github.com/wild-linker/wild). 6 + 7 + ## building from source 8 + 9 + ```bash 10 + cargo build --release 11 + ``` 12 + 13 + the binary will be at `target/release/hydrant`. 14 + 15 + to build with optional features (e.g. `backlinks`): 16 + 17 + ```bash 18 + cargo build --release --features backlinks 19 + ``` 20 + 21 + see [build features](build-features.md) for the full list. 22 + 23 + ## running 24 + 25 + set the required environment variables and run the binary: 26 + 27 + ```bash 28 + export HYDRANT_DATABASE_PATH=./hydrant.db 29 + ./target/release/hydrant 30 + ``` 31 + 32 + see [configuration](configuration.md) for all available variables. if a `.env` file exists in the working directory it will be loaded automatically. 33 + 34 + ## reverse proxying 35 + 36 + it is **highly recommended** to run hydrant behind a reverse proxy (like nginx or caddy) if you intend to expose the XRPC or event stream APIs to the public. hydrant's API includes several management endpoints that do not require or support authentication. **you MUST NOT expose these management endpoints to the public internet.** 37 + 38 + ### public endpoints (safe to proxy) 39 + 40 + - `/xrpc/*`: XRPC endpoints. 41 + - `/stream`: hydrant's ordered event stream. 42 + - `/stats`: general database statistics. 43 + - `/health` / `/_health`: health check. 44 + 45 + ### management endpoints (keep private) 46 + 47 + - `/repos`: explicit repository tracking/resyncing/untracking. 48 + - `/filter`: management of NSID filter patterns. 49 + - `/ingestion`: manual control over component lifecycle (crawler, firehose, etc.). 50 + - `/crawler/sources`: management of crawler relays. 51 + - `/firehose/sources`: management of firehose relays. 52 + - `/pds/tiers`: rate-limit tier assignments. 53 + - `/db/train` / `/db/compact`: database maintenance tasks. 54 + - `*/cursors`: cursor management. 55 + - `/debug/*`: introspection and testing endpoints.
+7
docs/xrpc/README.md
··· 1 + # xrpc 2 + 3 + `hydrant` implements the following XRPC endpoints under `/xrpc/`. only expose `/xrpc/*` publicly, see [getting started](../getting-started.md#reverse-proxying) for guidance. 4 + 5 + - [com.atproto.*](atproto.md): standard AT Protocol endpoints 6 + - [systems.gaze.hydrant.*](hydrant.md): hydrant-specific extensions 7 + - [blue.microcosm.links.*](backlinks.md): backlinks (requires `--features backlinks`)
+16
docs/xrpc/atproto.md
··· 1 + # com.atproto.* 2 + 3 + these are standard atproto endpoints. you can look at [the atproto api reference](https://docs.bsky.app/docs/category/http-reference) for more info. 4 + 5 + the following are implemented currently: 6 + - `com.atproto.repo.getRecord` 7 + - `com.atproto.repo.listRecords` 8 + - `com.atproto.repo.describeRepo` (also see `systems.gaze.hydrant.describeRepo`) 9 + - `com.atproto.sync.getRepo` (`since` parameter not implemented!) 10 + - `com.atproto.sync.getHostStatus` 11 + - `com.atproto.sync.listHosts` 12 + - `com.atproto.sync.getRepoStatus` 13 + - `com.atproto.sync.listRepos` 14 + - `com.atproto.sync.getLatestCommit` 15 + - `com.atproto.sync.requestCrawl` (adds the host to firehose sources in relay mode) 16 + - `com.atproto.sync.subscribeRepos` (WebSocket firehose stream, requires `relay` feature)
+32
docs/xrpc/backlinks.md
··· 1 + # blue.microcosm.links.* 2 + 3 + hydrant implements a subset of [microcosm constellation](https://constellation.microcosm.blue/) when it's built with the `backlinks` cargo feature (`cargo build --features backlinks`). 4 + 5 + when enabled, hydrant indexes all AT URI and DID references found inside stored records into a reverse index. this lets you efficiently answer "what records link to this subject?". 6 + 7 + ## blue.microcosm.links.getBacklinks 8 + 9 + return records that link to a given subject. 10 + 11 + | param | required | description | 12 + | :--- | :--- | :--- | 13 + | `subject` | yes | AT URI or DID to look up backlinks for. | 14 + | `source` | no | filter by source collection, e.g. `app.bsky.feed.like`. also accepts `collection:path` form to further filter by field path, e.g. `app.bsky.feed.like:subject.uri`. the path is matched against the dotted field path within the record (`.` is prepended automatically). | 15 + | `limit` | no | max results to return (default 50, max 100). | 16 + | `cursor` | no | opaque pagination cursor from a previous response. | 17 + | `reverse` | no | if `true`, return results in reverse order (default `false`). | 18 + 19 + returns `{ backlinks: [{ uri, cid }], cursor? }`. 20 + 21 + results are ordered by source record rkey (ascending by default, descending when `reverse=true`). the cursor is stable across new insertions for TID rkey records. 22 + 23 + ## blue.microcosm.links.getBacklinksCount 24 + 25 + return the number of records that link to a given subject. 26 + 27 + | param | required | description | 28 + | :--- | :--- | :--- | 29 + | `subject` | yes | AT URI or DID to count backlinks for. | 30 + | `source` | no | filter by source collection (same format as `getBacklinks`). | 31 + 32 + returns `{ count }`.
+24
docs/xrpc/hydrant.md
··· 1 + # systems.gaze.hydrant.* 2 + 3 + these are some non-standard XRPCs that might be useful. 4 + 5 + ## systems.gaze.hydrant.countRecords 6 + 7 + return the total number of stored records in a collection. 8 + 9 + | param | required | description | 10 + | :--- | :--- | :--- | 11 + | `identifier` | yes | DID or handle of the repository. | 12 + | `collection` | yes | NSID of the collection. | 13 + 14 + returns `{ count }`. 15 + 16 + ## systems.gaze.hydrant.describeRepo 17 + 18 + return account and identity information about this repo. this is equal to `com.atproto.repo.describeRepo`, except we don't return the full DID document. the handle is bi-directionally verified, if its invalid or the handle does not exist we return "handle.invalid". 19 + 20 + | param | required | description | 21 + | :--- | :--- | :--- | 22 + | `identifier` | yes | DID or handle of the repository. | 23 + 24 + returns `{ did, handle, pds, collections }`.
+36 -1
flake.lock
··· 143 143 "type": "github" 144 144 } 145 145 }, 146 + "nixpkgs_3": { 147 + "locked": { 148 + "lastModified": 1776329215, 149 + "narHash": "sha256-a8BYi3mzoJ/AcJP8UldOx8emoPRLeWqALZWu4ZvjPXw=", 150 + "owner": "nixos", 151 + "repo": "nixpkgs", 152 + "rev": "b86751bc4085f48661017fa226dee99fab6c651b", 153 + "type": "github" 154 + }, 155 + "original": { 156 + "owner": "nixos", 157 + "ref": "nixpkgs-unstable", 158 + "repo": "nixpkgs", 159 + "type": "github" 160 + } 161 + }, 146 162 "parts": { 147 163 "inputs": { 148 164 "nixpkgs-lib": [ ··· 232 248 "inputs": { 233 249 "nci": "nci", 234 250 "nixpkgs": "nixpkgs_2", 235 - "parts": "parts_2" 251 + "parts": "parts_2", 252 + "verbiage": "verbiage" 236 253 } 237 254 }, 238 255 "rust-overlay": { ··· 297 314 "original": { 298 315 "owner": "numtide", 299 316 "repo": "treefmt-nix", 317 + "type": "github" 318 + } 319 + }, 320 + "verbiage": { 321 + "inputs": { 322 + "nixpkgs": "nixpkgs_3" 323 + }, 324 + "locked": { 325 + "lastModified": 1776647130, 326 + "narHash": "sha256-XCMjiqN2bvJ016q7JEW6tKOfk3pPrzPsACmbtSJPY50=", 327 + "owner": "90-008", 328 + "repo": "verbiage", 329 + "rev": "cf8cfa6e7cb4d9ab1fc4b5088c01235b09928850", 330 + "type": "github" 331 + }, 332 + "original": { 333 + "owner": "90-008", 334 + "repo": "verbiage", 300 335 "type": "github" 301 336 } 302 337 }
+2
flake.nix
··· 2 2 inputs.parts.url = "github:hercules-ci/flake-parts"; 3 3 inputs.nixpkgs.url = "github:nixos/nixpkgs/nixpkgs-unstable"; 4 4 inputs.nci.url = "github:90-008/nix-cargo-integration"; 5 + inputs.verbiage.url = "github:90-008/verbiage"; 5 6 6 7 outputs = 7 8 inp: ··· 36 37 clang 37 38 wild 38 39 psmisc 40 + inputs'.verbiage.packages.default 39 41 ]); 40 42 }); 41 43 };