···11-#### table-of-contents
22-33--> [hydrant](#hydrant)</br>
44--> [vs tap](#vs-tap) | [stream](#stream-behavior) | [multi-relay](#multiple-relay-support) | [seeding](#firehose-seeding) | [crawler sources](#crawler-sources)</br>
55--> [building](#building-and-running) | [proxying](#reverse-proxying) | [configuration](#configuration) | [build features](#build-features)</br>
66--> [rest api](#rest-api) | [filter](#filter-management) | [ingestion](#ingestion-control) | [crawler](#crawler-management) | [firehose](#firehose-management) | [pds](#pds-management) | [repos](#repository-management)</br>
77--> [xrpc api](#data-access-xrpc) | [atproto](#comatproto) | [backlinks](#bluemicrocosmlinks) | [identity](#bluemicrocosmidentity) | [custom](#systemsgazehydrant)
88-91# hydrant
102113`hydrant` is an AT Protocol indexer built on the `fjall` database. it's built to
···179[random.wisp.place](https://tangled.org/did:plc:dfl62fgb7wtjj3fcbb72naae/random.wisp.place)
1810(standalone binary using http API) or the [statusphere
1911example](./examples/statusphere.rs) (hydrant-as-library) for examples on how to
2020-use hydrant (for rust docs look at https://hydrant.klbr.net/ for now).
2121-2222-**WARNING: *the db format is only partially stable.*** we provide migrations in hydrant
2323-itself, so nothing should go wrong! you should still probably keep backups just in case!
2424-2525-## vs tap
2626-2727-<small>[<- back to toc](#table-of-contents)</small>
2828-2929-you can read [this blogpost](https://90008.leaflet.pub/3mhp3t4kuw22e) or read on below.
3030-3131-while [`tap`](https://github.com/bluesky-social/indigo/tree/main/cmd/tap) is
3232-designed as a firehose consumer and simply just propagates events while handling
3333-sync, `hydrant` is flexible, it allows you to directly query the database for
3434-records, and it also provides an ordered view of events, allowing the use of a
3535-cursor to fetch events from a specific point. it can act as both an indexer or
3636-an ephemeral view of some window of events.
3737-3838-### stream behavior
3939-4040-<small>[<- back to toc](#table-of-contents)</small>
4141-4242-the `WS /stream` (hydrant) and `WS /channel` (tap) endpoints have different designs:
4343-4444-| aspect | `tap` (`/channel`) | `hydrant` (`/stream`) |
4545-| :--- | :--- | :--- |
4646-| distribution | sharded work queue: events are load-balanced across connected clients. If 5 clients connect, each receives ~20% of events. | broadcast: every connected client receives a full copy of the event stream. if 5 clients connect, all 5 receive 100% of events. |
4747-| cursors | server-managed: clients ACK messages. the server tracks progress and redelivers unacked messages. | client-managed: client provides `?cursor=123`. the server streams from that point. |
4848-| persistence | events are stored in an outbox and sent to the consumer, and removed from the outbox when acked. nothing is replayable. | `record` events are replayable. `identity`/`account` are ephemeral. use `GET /repos/:did` to query identity / account info (handle, pds, signing key, etc.). |
4949-| backfill | backfill events are mixed into the live queue and prioritized (per-repo, acting as synchronization barrier) by the server. | backfill simply inserts historical events (`live: false`) into the global event log. streaming is just reading this log sequentially. synchronization is the same as tap, `live: true` vs `live: false`. |
5050-| event types | `record`, `identity` (includes status) | `record`, `identity` (handle, cache-buster), `account` (status) |
5151-5252-### multiple relay support
5353-5454-<small>[<- back to toc](#table-of-contents)</small>
5555-5656-`hydrant` supports connecting to multiple relays simultaneously for firehose
5757-ingestion. when `RELAY_HOSTS` is configured with multiple URLs:
5858-5959-- one independent firehose stream loop is spawned per relay
6060-- each relay maintains its own firehose cursor state
6161-- all ingestion loops share the same worker pool and database
6262-6363-commit events are de-duplicated according to the repo `rev`. account / identity
6464-events are de-duplicated using the `time` field. todo: decide what to do on
6565-relay-side account takedowns or if relays set the `time` field.
6666-6767-#### direct PDS connections
6868-6969-a firehose source can also be a direct connection to a PDS rather than a relay.
7070-prefix the URL with `pds::` to mark it as such:
7171-7272-```
7373-HYDRANT_RELAY_HOSTS=wss://bsky.network,pds::wss://pds.example.com
7474-```
7575-7676-only when a source is marked as a direct PDS (`is_pds: true`), hydrant enforces
7777-host authority. relays (`is_pds: false`, the default) are exempt from this check,
7878-since they forward commits from many PDSes by design. this means you will trust
7979-the relay on this though.
8080-8181-#### firehose seeding
8282-8383-<small>[<- back to toc](#table-of-contents)</small>
8484-8585-in relay mode, `RELAY_HOSTS` defaults to empty. set `SEED_HOSTS` to one or more
8686-relay base URLs and hydrant will call `com.atproto.sync.listHosts` on each at
8787-startup, adding every returned PDS as a firehose source:
8888-8989-```
9090-HYDRANT_SEED_HOSTS=https://bsky.network
9191-```
9292-9393-seeding runs as a background task so the main firehose loop is not blocked. seed
9494-URLs are fetched concurrently (up to four at a time) and the full `listHosts`
9595-pagination is consumed for each. if a request fails partway through, the hosts
9696-collected so far are still added and the failure is logged.
9797-9898-each discovered host is added as a persistent PDS firehose source (`is_pds: true`),
9999-equivalent to calling `POST /firehose/sources`.
100100-101101-banned hosts (`status: "banned"`) are skipped. all other statuses are included
102102-since the firehose ingestor retries on disconnect and transiently-unavailable
103103-hosts will reconnect on their own.
104104-105105-seeding runs from latest cursor on restart so new PDS' added to the upstream relay
106106-since the last start are picked up automatically (if they haven't through firehose).
107107-sources that are already running are detected and skipped, so re-seeding is idempotent.
108108-109109-### crawler sources
110110-111111-<small>[<- back to toc](#table-of-contents)</small>
112112-113113-the crawler is configured separately from the firehose via `CRAWLER_URLS`. each
114114-source is a `[mode::]url` entry where the mode prefix is optional and defaults
115115-to `by_collection` in filter mode or `list_repos` in full-network mode.
116116-117117-- `list_repos`: enumerates the network via `com.atproto.sync.listRepos`, checks
118118- each repo's collections via `describeRepo`.
119119-- `by_collection`: queries `com.atproto.sync.listReposByCollection` for each
120120- configured signal. more efficient for filtered indexing since it only surfaces
121121- repos that have matching records. cursors are stored per collection. note that
122122- it won't crawl anything if no signals are specified.
123123-124124-```
125125-CRAWLER_URLS=by_collection::https://lightrail.microcosm.blue,list_repos::wss://bsky.network
126126-```
127127-128128-each source maintains its own cursor so restarts resume mid-pass.
129129-130130-sources can also be added and removed at runtime via the `/crawler/sources` API
131131-(see [here](#crawler-management)). dynamically added sources are persisted to the
132132-database and survive restarts. `CRAWLER_URLS` sources are startup-only: they are
133133-not written to the database and will always reappear after a restart regardless of
134134-runtime changes (unless you change the config of course).
135135-136136-## building and running
137137-138138-<small>[<- back to toc](#table-of-contents)</small>
139139-140140-hydrant is written in rust and requires the rust toolchain (including `cargo`), `make`, `cmake`
141141-for some dependencies. you will also need the clang toolchain and the [wild linker](https://github.com/wild-linker/wild).
142142-143143-### from source
144144-145145-to build a production binary:
146146-147147-```bash
148148-cargo build --release
149149-```
150150-151151-the binary will be located at `target/release/hydrant`.
152152-153153-#### build features
154154-155155-see [build features](#build-features) for optional features (like `relay` or `backlinks`). to build with a specific feature:
156156-157157-```bash
158158-cargo build --release --features backlinks
159159-```
160160-161161-### running
162162-163163-you can run hydrant by executing the binary. make sure to provide the necessary
164164-environment variables (see [configuration](#configuration)).
165165-166166-```bash
167167-export HYDRANT_DATABASE_PATH=./hydrant.db
168168-./target/release/hydrant
169169-```
170170-171171-### reverse proxying
172172-173173-<small>[<- back to toc](#table-of-contents)</small>
174174-175175-it is **highly recommended** to run hydrant behind a reverse proxy (like nginx or
176176-caddy) if you intend to expose the XRPC or event stream APIs to the public. hydrant's
177177-API includes several management endpoints that do not require or support authentication.
178178-**you MUST NOT expose these management endpoints to the public internet.**
179179-180180-#### public endpoints (safe to proxy)
181181-182182-you should only expose the following paths:
183183-184184-- `/xrpc/*`: XRPC endpoints.
185185-- `/stream`: hydrant's ordered event stream.
186186-- `/stats`: general database statistics.
187187-- `/health` / `/_health`: health check.
188188-189189-#### management endpoints (keep private)
190190-191191-the following endpoints allow modifying the indexer state and should be kept internal:
192192-193193-- `/repos`: explicit repository tracking/resyncing/untracking.
194194-- `/filter`: management of NSID filter patterns.
195195-- `/ingestion`: manual control over component lifecycle (crawler, firehose, etc.).
196196-- `/crawler/sources`: management of crawler relays.
197197-- `/firehose/sources`: management of firehose relays.
198198-- `/pds/tiers`: rate-limit tier assignments.
199199-- `/db/train` / `/db/compact`: database maintenance tasks.
200200-- `*/cursors`: cursor management.
201201-- `/debug/*`: introspection and testing endpoints.
202202-203203-## configuration
204204-205205-<small>[<- back to toc](#table-of-contents)</small>
206206-207207-`hydrant` is configured via environment variables. all variables are prefixed
208208-with `HYDRANT_` (except `RUST_LOG`). if a `.env` file exists in the working
209209-directory, it will also be loaded automatically.
210210-211211-| variable | default | description |
212212-| :--- | :--- | :--- |
213213-| `DATABASE_PATH` | `./hydrant.db` | path to the database folder. |
214214-| `RUST_LOG` | `info` | log filter directives (e.g., `debug`, `hydrant=trace`). [`tracing` env-filter syntax](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html). |
215215-| `RELAY_HOST` | `wss://relay.fire.hose.cam/` (indexer), empty (relay) | URL of a firehose source. |
216216-| `RELAY_HOSTS` | | comma-separated list of firehose sources. if unset, falls back to `RELAY_HOST`. prefix a URL with `pds::` to mark it as a direct PDS connection (e.g. `pds::wss://pds.example.com`). bare URLs are treated as relays. defaults to empty in relay mode, PDS' are expected to be seeded via `SEED_HOSTS` or the firehose management API. |
217217-| `SEED_HOSTS` | `https://bsky.network` (relay) | comma-separated list of base URLs to call `com.atproto.sync.listHosts` on at startup. hydrant adds every non-banned host as a PDS firehose source. see [firehose seeding](#firehose-seeding). |
218218-| `CRAWLER_URLS` | relay hosts in full-network mode, `https://lightrail.microcosm.blue` in filter mode | comma-separated list of `[mode::]url` crawler sources. mode is `relay` or `by_collection`; bare URLs use the default mode. set to empty string to disable crawling. |
219219-| `PLC_URL` | `https://plc.wtf`, `https://plc.directory` if full network | base URL(s) of the PLC directory (comma-separated for multiple). |
220220-| `EPHEMERAL` | `false` (indexer), `true` (relay) | if enabled, no records are stored (in indexer mode). events are deleted after a certain duration (`EPHEMERAL_TTL`). |
221221-| `EPHEMERAL_TTL` | `60min`, `3d` in relay mode | decides after how long events should be deleted. |
222222-| `ONLY_INDEX_LINKS` | `false` | indexer only. if enabled, record blocks are not stored, only the index (records, counts, events) is kept. `getRecord`, `listRecords`, and `getRepo` will return errors. the event stream still works but create/update events will not include record values. |
223223-| `FULL_NETWORK` | `false` (indexer), `true` (relay) | if `true`, discovers and indexes all repositories in the network. |
224224-| `FILTER_SIGNALS` | | comma-separated list of NSID patterns to use for the filter (e.g. `app.bsky.feed.post,app.bsky.graph.*`). |
225225-| `FILTER_COLLECTIONS` | | comma-separated list of NSID patterns to use for the collections filter. |
226226-| `FILTER_EXCLUDES` | | comma-separated list of DIDs to exclude from indexing. |
227227-| `FIREHOSE_WORKERS` | `8` (`24` if full network) | number of concurrent workers for firehose events. |
228228-| `BACKFILL_CONCURRENCY_LIMIT` | `16` (`64` if full network) | maximum number of concurrent backfill tasks. |
229229-| `VERIFY_SIGNATURES` | `full` | signature verification level: `full`, `backfill-only`, or `none`. |
230230-| `CURSOR_SAVE_INTERVAL` | `3sec` | interval (in seconds) to save the firehose cursor. |
231231-| `REPO_FETCH_TIMEOUT` | `5min` | timeout (in seconds) for fetching repositories. |
232232-| `CACHE_SIZE` | `256` | size of the database cache in MB. |
233233-| `IDENTITY_CACHE_SIZE` | `100000` | number of identity entries to cache. |
234234-| `API_PORT` | `3000` | port for the API server. |
235235-| `ENABLE_DEBUG` | `false` | enable debug endpoints. |
236236-| `DEBUG_PORT` | `API_PORT + 1` | port for debug endpoints (if enabled). |
237237-| `ENABLE_FIREHOSE` | `true` | whether to ingest relay subscriptions. |
238238-| `ENABLE_CRAWLER` | `true` if full network or crawler sources are configured, `false` otherwise | whether to actively query the network for unknown repositories. |
239239-| `CRAWLER_MAX_PENDING_REPOS` | `2000` | max pending repos for crawler. |
240240-| `CRAWLER_RESUME_PENDING_REPOS` | `1000` | resume threshold for crawler pending repos. |
241241-| `NEW_HOST_LIMIT` | `50` | in relay mode, decides how many new hosts can be added via `com.atproto.sync.requestCrawl` in a day. |
242242-| `RATE_TIERS` | | comma-separated list of named rate tier definitions in `name:base/mul/hourly/daily[/account_limit]` format (e.g. `trusted:5000/10.0/18000000/432000000/10000000`). the optional account limit prevents new accounts from being created on this PDS once reached. built-in tiers (`default`, `trusted`) are always present and can be overridden. |
243243-| `TIER_RULES` | | comma-separated ordered list of glob rules in `pattern:tier_name` format (e.g. `*.bsky.network:trusted`). rules are evaluated in order; first match wins. explicit API assignments via `PUT /pds/tiers` take precedence over rules; the `default` tier is the final fallback. uses standard glob wildcards (`*`, `?`) matched against the PDS hostname. |
244244-245245-## build features
246246-247247-<small>[<- back to toc](#table-of-contents)</small>
248248-249249-`hydrant` has several optional compile-time features:
250250-251251-| feature | default | description |
252252-| :--- | :--- | :--- |
253253-| `indexer` | yes | makes hydrant act as an indexer. incompatible with the relay feature. |
254254-| `indexer_stream` | yes | enables the event stream for the indexer. requires indexer feature. |
255255-| `relay` | no | makes hydrant act as a relay. incompatible with the indexer feature. |
256256-| `backlinks` | no | enables the backlinks indexer and XRPC endpoints (`blue.microcosm.links.*`). requires indexer feature. |
257257-258258-## REST api
259259-260260-<small>[<- back to toc](#table-of-contents)</small>
261261-262262-### event stream
263263-264264-- `GET /stream`: subscribe to the event stream.
265265- - query parameters:
266266- - `cursor` (optional): start streaming from a specific event ID.
267267-268268-### stats
269269-270270-- `GET /stats`: get stats about the database:
271271- - `counts`: counts of repos, records, events, and errors, etc.
272272- - `sizes`: sizes of the database keyspaces on disk, in bytes.
273273-274274-### filter management
275275-276276-<small>[<- back to toc](#table-of-contents)</small>
277277-278278-- `GET /filter`: get the current filter configuration.
279279-- `PATCH /filter`: update the filter configuration.
280280-281281-#### filter mode
282282-283283-the `mode` field controls what gets indexed:
284284-285285-| mode | behaviour |
286286-| :--- | :--- |
287287-| `filter` | auto-discovers and backfills any account whose firehose commit touches a collection matching one of the `signals` patterns. you can also explicitly track individual repositories via the `/repos` endpoint regardless of matching signals. |
288288-| `full` | index the entire network. `signals` are ignored for discovery, but `excludes` and `collections` still apply. |
289289-290290-#### fields
291291-292292-| field | type | description |
293293-| :--- | :--- | :--- |
294294-| `mode` | `"filter"` \| `"full"` | indexing mode (see above). |
295295-| `signals` | set update | NSID patterns (e.g. `app.bsky.feed.post` or `app.bsky.*`) that trigger auto-discovery in `filter` mode. |
296296-| `collections` | set update | NSID patterns used to filter which records are stored. if empty, all collections are stored. applies in all modes. |
297297-| `excludes` | set update | set of DIDs to always skip, regardless of mode. checked before any other filter logic. |
298298-299299-#### set updates
300300-301301-each set field accepts one of two forms:
302302-303303-- **replace**: an array replaces the entire set, eg. `["did:plc:abc", "did:web:example.org"]`
304304-- **patch**: an object maps items to `true` (add) or `false` (remove), eg. `{"did:plc:abc": true, "did:web:example.org": false}`
305305-306306-#### NSID patterns
307307-308308-`signals` and `collections` support an optional `.*` suffix to match an entire namespace:
309309-310310-- `app.bsky.feed.post`: exact match only
311311-- `app.bsky.feed.*`: matches any collection under `app.bsky.feed`
312312-313313-### ingestion control
314314-315315-<small>[<- back to toc](#table-of-contents)</small>
316316-317317-- `GET /ingestion`: get the current ingestion status.
318318- - returns `{ "crawler": bool, "firehose": bool, "backfill": bool }`.
319319-- `PATCH /ingestion`: enable or disable ingestion components at runtime without
320320- restarting.
321321- - body: `{ "crawler"?: bool, "firehose"?: bool, "backfill"?: bool }`. only provided fields are updated.
322322- - when disabled, each component finishes its current task before pausing (e.g.
323323- the backfill worker completes any in-flight repo syncs, the firehose
324324- finishes processing the current message). they resume immediately when
325325- re-enabled.
326326-327327-### crawler management
328328-329329-<small>[<- back to toc](#table-of-contents)</small>
330330-331331-- `GET /crawler/sources`: list all currently active crawler sources.
332332- - returns a JSON array of `{ "url": string, "mode": "relay" | "by_collection", "persisted": bool }`.
333333- - `persisted: true` means the source was added via the API and is stored in the
334334- database, it will survive a restart. `persisted: false` means the source
335335- came from `CRAWLER_URLS` and is not written to the database.
336336-- `POST /crawler/sources`: add a crawler source at runtime.
337337- - body: `{ "url": string, "mode": "relay" | "by_collection" }`.
338338- - the source is written to the database before the producer task is started, so
339339- it is safe to add sources and then immediately restart without losing them.
340340- - if a source with the same URL already exists (whether from `CRAWLER_URLS` or
341341- a previous `POST`), it is replaced: the running task is stopped and a new one
342342- is started with the new mode. any cursor state for that URL is preserved.
343343- - returns `201 Created` on success.
344344-- `DELETE /crawler/sources`: remove a crawler source at runtime.
345345- - body: `{ "url": string }`.
346346- - the producer task is stopped immediately.
347347- - if the source was added via the API (`persisted: true`), it is removed from
348348- the database and will not reappear on restart. if it came from `CRAWLER_URLS`
349349- (`persisted: false`), only the running task is stopped, the source will
350350- reappear on the next restart since `CRAWLER_URLS` is re-applied at startup.
351351- (unless you remove it manually from your configuration of course).
352352- - cursor state is not cleared. use `DELETE /crawler/cursors` separately if you want
353353- the source to restart from the beginning when re-added.
354354- - returns `200 OK` if the source was found and removed, `404 Not Found` otherwise.
355355-- `DELETE /crawler/cursors`: reset stored cursors for a given crawler URL. body: `{ "key": "..." }`
356356- where key is a URL. clears the list-repos crawler cursor as well as any by-collection
357357- cursors associated with that URL. causes the next crawler pass to restart from the beginning.
358358-359359-### firehose management
1212+use hydrant.
36013361361-<small>[<- back to toc](#table-of-contents)</small>
362362-363363-- `GET /firehose/sources`: list all currently active firehose sources.
364364- - returns a JSON array of `{ "url": string, "persisted": bool, "is_pds": bool }`.
365365- - `persisted: true` means the source was added via the API and is stored in the
366366- database, it will survive a restart. `persisted: false` means the source
367367- came from `RELAY_HOSTS` and is not written to the database.
368368- - `is_pds: true` means the source is a direct PDS connection with host authority enforcement enabled.
369369-- `POST /firehose/sources`: add a firehose source at runtime.
370370- - body: `{ "url": string, "is_pds": bool }`. `is_pds` defaults to `false`.
371371- - the source is persisted to the database before the ingestor task is started.
372372- - if a source with the same URL already exists, it is replaced: the running
373373- task is stopped and a new one is started. any existing cursor state for that
374374- URL is preserved.
375375- - returns `201 Created` on success.
376376-- `DELETE /firehose/sources`: remove a firehose relay at runtime.
377377- - body: `{ "url": string }`.
378378- - the ingestor task is stopped immediately.
379379- - if the source was added via the API (`persisted: true`), it is removed from
380380- the database and will not reappear on restart. if it came from `RELAY_HOSTS`
381381- (`persisted: false`), only the running task is stopped; the source reappears
382382- on the next restart.
383383- - cursor state is not cleared. use `DELETE /firehose/cursors` separately if you want
384384- the relay to restart from the beginning when re-added.
385385- - returns `200 OK` if the relay was found and removed, `404 Not Found` otherwise.
386386-- `DELETE /firehose/cursors`: reset the stored cursor for a given firehose relay URL. body: `{ "key": "..." }`
387387- where key is a URL. causes the next firehose connection to restart from the beginning.
388388-389389-### PDS management
390390-391391-<small>[<- back to toc](#table-of-contents)</small>
392392-393393-hydrant rate-limits firehose events per PDS. each PDS is assigned to a named
394394-rate tier that controls how aggressively hydrant limits events from it. two
395395-built-in tiers are always present: `default` (conservative limits for unknown
396396-operators) and `trusted` (higher limits for well-behaved operators). additional
397397-tiers can be defined via `RATE_TIERS`.
398398-399399-the per-second limit scales with the number of active accounts on the PDS:
400400-`max(per_second_base, accounts × per_second_account_mul)`.
401401-402402-you can also define an optional `account_limit` for a rate tier. if a PDS
403403-exceeds this number of active accounts, hydrant will reject any new account
404404-creation events from it.
405405-406406-the built-in tiers are defined as follows:
407407-- `default`: `50` per sec (floor), `+0.5` per account. max `3_600_000`/hr, `86_400_000`/day. `100` account limit.
408408-- `trusted`: `5000` per sec (floor), `+10.0` per account. max `18_000_000`/hr, `432_000_000`/day. `10_000_000` account limit.
409409-410410-- `GET /pds/tiers`: list all current tier assignments alongside the available
411411- tier definitions.
412412- - returns `{ "assignments": [{ "host": string, "tier": string }], "rate_tiers": { <name>: { "per_second_base": int, "per_second_account_mul": float, "per_hour": int, "per_day": int } } }`.
413413- - `assignments` only contains PDSes with an explicit API assignment. hosts without one resolve via glob rules or fall back to `default`.
414414-- `PUT /pds/tiers`: assign a PDS to a named rate tier.
415415- - body: `{ "host": string, "tier": string }`.
416416- - `host` is the PDS hostname (e.g. `pds.example.com`).
417417- - `tier` must be one of the configured tier names. returns `400` if unknown.
418418- - assignments are persisted to the database and survive restarts.
419419- - re-assigning the same host updates the tier in place without creating a duplicate.
420420-- `DELETE /pds/tiers`: remove an explicit tier assignment for a PDS.
421421- - query parameter: `?host=<hostname>` (e.g. `?host=pds.example.com`).
422422- - reverts the host to glob-rule resolution (not necessarily `default`, a matching `TIER_RULES` pattern still applies).
423423- - returns `200` even if no assignment existed.
424424-- `GET /pds/rate-tiers`: list the available rate tier definitions.
425425- - returns a map of tier name to `{ "per_second_base", "per_second_account_mul", "per_hour", "per_day", "account_limit" }`.
426426-427427-tiers are resolved in this order:
428428-429429-1. **explicit API assignment**, set via `PUT /pds/tiers`, stored in the database, survives restarts.
430430-2. **glob rules**, from `TIER_RULES`, evaluated in order; first match wins.
431431-3. **`default` tier**, applied if no rule or explicit assignment matches.
432432-433433-deleting an API assignment reverts the host to glob-rule resolution, not necessarily back to `default`. if a rule like `*.bsky.network:trusted` matches the host, it will become trusted again without any further action.
434434-435435-### repository management
436436-437437-<small>[<- back to toc](#table-of-contents)</small>
438438-439439-all `/repos` endpoints that return lists respond with NDJSON by default. send `Accept: application/json` or `Content-Type: application/json` to get a JSON array instead.
440440-441441-- `GET /repos`: get a list of repositories and their sync status. supports pagination and filtering:
442442- - `limit`: max results (default 100, max 1000)
443443- - `cursor`: did key for paginating.
444444-- `GET /repos/{did}`: get the sync status and metadata of a specific repository.
445445- also returns the handle, PDS URL and the atproto signing key (these won't be
446446- available before the repo has been backfilled once at least).
447447-- `PUT /repos`: explicitly track repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same).
448448- only affects repositories that are not known or are untracked.
449449- returns a list of the DIDs that were queued for backfill.
450450-- `DELETE /repos`: untrack repositories.
451451- accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same).
452452- only affects repositories that are currently tracked.
453453- returns a list of the DIDs that were untracked.
454454-- `POST /repos/resync`: force a new backfill for one or more repositories.
455455- accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same).
456456- only affects repositories hydrant already knows about.
457457- returns a list of the DIDs that were queued.
458458-459459-### database operations
460460-461461-- `POST /db/train`: train zstd compression dictionaries for the `repos`,
462462- `blocks`, and `events` keyspaces. dictionaries are written to disk; a restart
463463- is required to apply them. the crawler, firehose, and backfill worker are
464464- paused for the duration and restored on completion.
465465-- `POST /db/compact`: trigger a full major compaction of all database keyspaces
466466- in parallel. the crawler, firehose, and backfill worker are paused for the
467467- duration and restored on completion.
468468-469469-## data access (xrpc)
470470-471471-<small>[<- back to toc](#table-of-contents)</small>
472472-473473-`hydrant` implements the following XRPC endpoints under `/xrpc/`:
474474-475475-### com.atproto.*
476476-477477-<small>[<- back to toc](#table-of-contents)</small>
478478-479479-these are standard atproto endpoints. you can look at [the atproto api reference](https://docs.bsky.app/docs/category/http-reference) for more info.
480480-481481-the following are implemented currently:
482482-- `com.atproto.repo.getRecord`
483483-- `com.atproto.repo.listRecords`
484484-- `com.atproto.repo.describeRepo` (also see `systems.gaze.hydrant.describeRepo`)
485485-- `com.atproto.sync.getRepo` (`since` parameter not implemented!)
486486-- `com.atproto.sync.getHostStatus`
487487-- `com.atproto.sync.listHosts`
488488-- `com.atproto.sync.getRepoStatus`
489489-- `com.atproto.sync.listRepos`
490490-- `com.atproto.sync.getLatestCommit`
491491-- `com.atproto.sync.requestCrawl` (adds the host to firehose sources in relay mode)
492492-- `com.atproto.sync.subscribeRepos` (WebSocket firehose stream, requires `relay` feature)
493493-494494-### systems.gaze.hydrant.*
495495-496496-<small>[<- back to toc](#table-of-contents)</small>
497497-498498-these are some non-standard XRPCs that might be useful.
499499-500500-#### systems.gaze.hydrant.countRecords
501501-502502-return the total number of stored records in a collection.
503503-504504-| param | required | description |
505505-| :--- | :--- | :--- |
506506-| `identifier` | yes | DID or handle of the repository. |
507507-| `collection` | yes | NSID of the collection. |
508508-509509-returns `{ count }`.
510510-511511-#### systems.gaze.hydrant.describeRepo
512512-513513-return account and identity information about this repo.
514514-this is equal to `com.atproto.repo.describeRepo`, except we don't return the full DID document.
515515-the handle is bi-directionally verified, if its invalid or the handle does not exist we return
516516-"handle.invalid".
517517-518518-| param | required | description |
519519-| :--- | :--- | :--- |
520520-| `identifier` | yes | DID or handle of the repository. |
521521-522522-returns `{ did, handle, pds, collections }`.
523523-524524-### blue.microcosm.links.*
525525-526526-<small>[<- back to toc](#table-of-contents)</small>
527527-528528-hydrant implements a subset of [microcosm constellation](https://constellation.microcosm.blue/)
529529-when it's built with the `backlinks` cargo feature (`cargo build --features backlinks`).
530530-531531-when enabled, hydrant indexes all AT URI and DID references found inside stored records into a
532532-reverse index. this lets you efficiently answer "what records link to this subject?".
533533-534534-#### blue.microcosm.links.getBacklinks
535535-536536-return records that link to a given subject.
537537-538538-| param | required | description |
539539-| :--- | :--- | :--- |
540540-| `subject` | yes | AT URI or DID to look up backlinks for. |
541541-| `source` | no | filter by source collection, e.g. `app.bsky.feed.like`. also accepts `collection:path` form to further filter by field path, e.g. `app.bsky.feed.like:subject.uri`. the path is matched against the dotted field path within the record (`.` is prepended automatically). |
542542-| `limit` | no | max results to return (default 50, max 100). |
543543-| `cursor` | no | opaque pagination cursor from a previous response. |
544544-| `reverse` | no | if `true`, return results in reverse order (default `false`). |
545545-546546-returns `{ backlinks: [{ uri, cid }], cursor? }`.
547547-548548-results are ordered by source record rkey (ascending by default, descending when `reverse=true`).
549549-the cursor is stable across new insertions for TID rkey records.
550550-551551-#### blue.microcosm.links.getBacklinksCount
552552-553553-return the number of records that link to a given subject.
554554-555555-| param | required | description |
556556-| :--- | :--- | :--- |
557557-| `subject` | yes | AT URI or DID to count backlinks for. |
558558-| `source` | no | filter by source collection (same format as `getBacklinks`). |
559559-560560-returns `{ count }`.
1414+for more information, look at the [documentation](https://hydrant.klbr.net).
+1
docs/README.md
···10101111## what's here
12121313+- [vs tap](concepts/vs-tap.md): comparison against tap, the go sync utility
1314- [getting started](getting-started.md): building, running, reverse proxying
1415- [configuration](configuration.md): all environment variables
1516- [build features](build-features.md): optional cargo features (`relay`, `backlinks`, etc.)