very fast at protocol indexer with flexible filtering, xrpc queries, cursor-backed event stream, and more, built on fjall
rust fjall at-protocol atproto indexer
59
fork

Configure Feed

Select the types of activity you want to include in your feed.

[docs] point to docs

dawn 310cfe5e 37ca6791

+3 -548
+2 -548
README.md
··· 1 - #### table-of-contents 2 - 3 - -> [hydrant](#hydrant)</br> 4 - -> [vs tap](#vs-tap) | [stream](#stream-behavior) | [multi-relay](#multiple-relay-support) | [seeding](#firehose-seeding) | [crawler sources](#crawler-sources)</br> 5 - -> [building](#building-and-running) | [proxying](#reverse-proxying) | [configuration](#configuration) | [build features](#build-features)</br> 6 - -> [rest api](#rest-api) | [filter](#filter-management) | [ingestion](#ingestion-control) | [crawler](#crawler-management) | [firehose](#firehose-management) | [pds](#pds-management) | [repos](#repository-management)</br> 7 - -> [xrpc api](#data-access-xrpc) | [atproto](#comatproto) | [backlinks](#bluemicrocosmlinks) | [identity](#bluemicrocosmidentity) | [custom](#systemsgazehydrant) 8 - 9 1 # hydrant 10 2 11 3 `hydrant` is an AT Protocol indexer built on the `fjall` database. it's built to ··· 17 9 [random.wisp.place](https://tangled.org/did:plc:dfl62fgb7wtjj3fcbb72naae/random.wisp.place) 18 10 (standalone binary using http API) or the [statusphere 19 11 example](./examples/statusphere.rs) (hydrant-as-library) for examples on how to 20 - use hydrant (for rust docs look at https://hydrant.klbr.net/ for now). 21 - 22 - **WARNING: *the db format is only partially stable.*** we provide migrations in hydrant 23 - itself, so nothing should go wrong! you should still probably keep backups just in case! 24 - 25 - ## vs tap 26 - 27 - <small>[<- back to toc](#table-of-contents)</small> 28 - 29 - you can read [this blogpost](https://90008.leaflet.pub/3mhp3t4kuw22e) or read on below. 30 - 31 - while [`tap`](https://github.com/bluesky-social/indigo/tree/main/cmd/tap) is 32 - designed as a firehose consumer and simply just propagates events while handling 33 - sync, `hydrant` is flexible, it allows you to directly query the database for 34 - records, and it also provides an ordered view of events, allowing the use of a 35 - cursor to fetch events from a specific point. it can act as both an indexer or 36 - an ephemeral view of some window of events. 37 - 38 - ### stream behavior 39 - 40 - <small>[<- back to toc](#table-of-contents)</small> 41 - 42 - the `WS /stream` (hydrant) and `WS /channel` (tap) endpoints have different designs: 43 - 44 - | aspect | `tap` (`/channel`) | `hydrant` (`/stream`) | 45 - | :--- | :--- | :--- | 46 - | distribution | sharded work queue: events are load-balanced across connected clients. If 5 clients connect, each receives ~20% of events. | broadcast: every connected client receives a full copy of the event stream. if 5 clients connect, all 5 receive 100% of events. | 47 - | cursors | server-managed: clients ACK messages. the server tracks progress and redelivers unacked messages. | client-managed: client provides `?cursor=123`. the server streams from that point. | 48 - | persistence | events are stored in an outbox and sent to the consumer, and removed from the outbox when acked. nothing is replayable. | `record` events are replayable. `identity`/`account` are ephemeral. use `GET /repos/:did` to query identity / account info (handle, pds, signing key, etc.). | 49 - | backfill | backfill events are mixed into the live queue and prioritized (per-repo, acting as synchronization barrier) by the server. | backfill simply inserts historical events (`live: false`) into the global event log. streaming is just reading this log sequentially. synchronization is the same as tap, `live: true` vs `live: false`. | 50 - | event types | `record`, `identity` (includes status) | `record`, `identity` (handle, cache-buster), `account` (status) | 51 - 52 - ### multiple relay support 53 - 54 - <small>[<- back to toc](#table-of-contents)</small> 55 - 56 - `hydrant` supports connecting to multiple relays simultaneously for firehose 57 - ingestion. when `RELAY_HOSTS` is configured with multiple URLs: 58 - 59 - - one independent firehose stream loop is spawned per relay 60 - - each relay maintains its own firehose cursor state 61 - - all ingestion loops share the same worker pool and database 62 - 63 - commit events are de-duplicated according to the repo `rev`. account / identity 64 - events are de-duplicated using the `time` field. todo: decide what to do on 65 - relay-side account takedowns or if relays set the `time` field. 66 - 67 - #### direct PDS connections 68 - 69 - a firehose source can also be a direct connection to a PDS rather than a relay. 70 - prefix the URL with `pds::` to mark it as such: 71 - 72 - ``` 73 - HYDRANT_RELAY_HOSTS=wss://bsky.network,pds::wss://pds.example.com 74 - ``` 75 - 76 - only when a source is marked as a direct PDS (`is_pds: true`), hydrant enforces 77 - host authority. relays (`is_pds: false`, the default) are exempt from this check, 78 - since they forward commits from many PDSes by design. this means you will trust 79 - the relay on this though. 80 - 81 - #### firehose seeding 82 - 83 - <small>[<- back to toc](#table-of-contents)</small> 84 - 85 - in relay mode, `RELAY_HOSTS` defaults to empty. set `SEED_HOSTS` to one or more 86 - relay base URLs and hydrant will call `com.atproto.sync.listHosts` on each at 87 - startup, adding every returned PDS as a firehose source: 88 - 89 - ``` 90 - HYDRANT_SEED_HOSTS=https://bsky.network 91 - ``` 92 - 93 - seeding runs as a background task so the main firehose loop is not blocked. seed 94 - URLs are fetched concurrently (up to four at a time) and the full `listHosts` 95 - pagination is consumed for each. if a request fails partway through, the hosts 96 - collected so far are still added and the failure is logged. 97 - 98 - each discovered host is added as a persistent PDS firehose source (`is_pds: true`), 99 - equivalent to calling `POST /firehose/sources`. 100 - 101 - banned hosts (`status: "banned"`) are skipped. all other statuses are included 102 - since the firehose ingestor retries on disconnect and transiently-unavailable 103 - hosts will reconnect on their own. 104 - 105 - seeding runs from latest cursor on restart so new PDS' added to the upstream relay 106 - since the last start are picked up automatically (if they haven't through firehose). 107 - sources that are already running are detected and skipped, so re-seeding is idempotent. 108 - 109 - ### crawler sources 110 - 111 - <small>[<- back to toc](#table-of-contents)</small> 112 - 113 - the crawler is configured separately from the firehose via `CRAWLER_URLS`. each 114 - source is a `[mode::]url` entry where the mode prefix is optional and defaults 115 - to `by_collection` in filter mode or `list_repos` in full-network mode. 116 - 117 - - `list_repos`: enumerates the network via `com.atproto.sync.listRepos`, checks 118 - each repo's collections via `describeRepo`. 119 - - `by_collection`: queries `com.atproto.sync.listReposByCollection` for each 120 - configured signal. more efficient for filtered indexing since it only surfaces 121 - repos that have matching records. cursors are stored per collection. note that 122 - it won't crawl anything if no signals are specified. 123 - 124 - ``` 125 - CRAWLER_URLS=by_collection::https://lightrail.microcosm.blue,list_repos::wss://bsky.network 126 - ``` 127 - 128 - each source maintains its own cursor so restarts resume mid-pass. 129 - 130 - sources can also be added and removed at runtime via the `/crawler/sources` API 131 - (see [here](#crawler-management)). dynamically added sources are persisted to the 132 - database and survive restarts. `CRAWLER_URLS` sources are startup-only: they are 133 - not written to the database and will always reappear after a restart regardless of 134 - runtime changes (unless you change the config of course). 135 - 136 - ## building and running 137 - 138 - <small>[<- back to toc](#table-of-contents)</small> 139 - 140 - hydrant is written in rust and requires the rust toolchain (including `cargo`), `make`, `cmake` 141 - for some dependencies. you will also need the clang toolchain and the [wild linker](https://github.com/wild-linker/wild). 142 - 143 - ### from source 144 - 145 - to build a production binary: 146 - 147 - ```bash 148 - cargo build --release 149 - ``` 150 - 151 - the binary will be located at `target/release/hydrant`. 152 - 153 - #### build features 154 - 155 - see [build features](#build-features) for optional features (like `relay` or `backlinks`). to build with a specific feature: 156 - 157 - ```bash 158 - cargo build --release --features backlinks 159 - ``` 160 - 161 - ### running 162 - 163 - you can run hydrant by executing the binary. make sure to provide the necessary 164 - environment variables (see [configuration](#configuration)). 165 - 166 - ```bash 167 - export HYDRANT_DATABASE_PATH=./hydrant.db 168 - ./target/release/hydrant 169 - ``` 170 - 171 - ### reverse proxying 172 - 173 - <small>[<- back to toc](#table-of-contents)</small> 174 - 175 - it is **highly recommended** to run hydrant behind a reverse proxy (like nginx or 176 - caddy) if you intend to expose the XRPC or event stream APIs to the public. hydrant's 177 - API includes several management endpoints that do not require or support authentication. 178 - **you MUST NOT expose these management endpoints to the public internet.** 179 - 180 - #### public endpoints (safe to proxy) 181 - 182 - you should only expose the following paths: 183 - 184 - - `/xrpc/*`: XRPC endpoints. 185 - - `/stream`: hydrant's ordered event stream. 186 - - `/stats`: general database statistics. 187 - - `/health` / `/_health`: health check. 188 - 189 - #### management endpoints (keep private) 190 - 191 - the following endpoints allow modifying the indexer state and should be kept internal: 192 - 193 - - `/repos`: explicit repository tracking/resyncing/untracking. 194 - - `/filter`: management of NSID filter patterns. 195 - - `/ingestion`: manual control over component lifecycle (crawler, firehose, etc.). 196 - - `/crawler/sources`: management of crawler relays. 197 - - `/firehose/sources`: management of firehose relays. 198 - - `/pds/tiers`: rate-limit tier assignments. 199 - - `/db/train` / `/db/compact`: database maintenance tasks. 200 - - `*/cursors`: cursor management. 201 - - `/debug/*`: introspection and testing endpoints. 202 - 203 - ## configuration 204 - 205 - <small>[<- back to toc](#table-of-contents)</small> 206 - 207 - `hydrant` is configured via environment variables. all variables are prefixed 208 - with `HYDRANT_` (except `RUST_LOG`). if a `.env` file exists in the working 209 - directory, it will also be loaded automatically. 210 - 211 - | variable | default | description | 212 - | :--- | :--- | :--- | 213 - | `DATABASE_PATH` | `./hydrant.db` | path to the database folder. | 214 - | `RUST_LOG` | `info` | log filter directives (e.g., `debug`, `hydrant=trace`). [`tracing` env-filter syntax](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html). | 215 - | `RELAY_HOST` | `wss://relay.fire.hose.cam/` (indexer), empty (relay) | URL of a firehose source. | 216 - | `RELAY_HOSTS` | | comma-separated list of firehose sources. if unset, falls back to `RELAY_HOST`. prefix a URL with `pds::` to mark it as a direct PDS connection (e.g. `pds::wss://pds.example.com`). bare URLs are treated as relays. defaults to empty in relay mode, PDS' are expected to be seeded via `SEED_HOSTS` or the firehose management API. | 217 - | `SEED_HOSTS` | `https://bsky.network` (relay) | comma-separated list of base URLs to call `com.atproto.sync.listHosts` on at startup. hydrant adds every non-banned host as a PDS firehose source. see [firehose seeding](#firehose-seeding). | 218 - | `CRAWLER_URLS` | relay hosts in full-network mode, `https://lightrail.microcosm.blue` in filter mode | comma-separated list of `[mode::]url` crawler sources. mode is `relay` or `by_collection`; bare URLs use the default mode. set to empty string to disable crawling. | 219 - | `PLC_URL` | `https://plc.wtf`, `https://plc.directory` if full network | base URL(s) of the PLC directory (comma-separated for multiple). | 220 - | `EPHEMERAL` | `false` (indexer), `true` (relay) | if enabled, no records are stored (in indexer mode). events are deleted after a certain duration (`EPHEMERAL_TTL`). | 221 - | `EPHEMERAL_TTL` | `60min`, `3d` in relay mode | decides after how long events should be deleted. | 222 - | `ONLY_INDEX_LINKS` | `false` | indexer only. if enabled, record blocks are not stored, only the index (records, counts, events) is kept. `getRecord`, `listRecords`, and `getRepo` will return errors. the event stream still works but create/update events will not include record values. | 223 - | `FULL_NETWORK` | `false` (indexer), `true` (relay) | if `true`, discovers and indexes all repositories in the network. | 224 - | `FILTER_SIGNALS` | | comma-separated list of NSID patterns to use for the filter (e.g. `app.bsky.feed.post,app.bsky.graph.*`). | 225 - | `FILTER_COLLECTIONS` | | comma-separated list of NSID patterns to use for the collections filter. | 226 - | `FILTER_EXCLUDES` | | comma-separated list of DIDs to exclude from indexing. | 227 - | `FIREHOSE_WORKERS` | `8` (`24` if full network) | number of concurrent workers for firehose events. | 228 - | `BACKFILL_CONCURRENCY_LIMIT` | `16` (`64` if full network) | maximum number of concurrent backfill tasks. | 229 - | `VERIFY_SIGNATURES` | `full` | signature verification level: `full`, `backfill-only`, or `none`. | 230 - | `CURSOR_SAVE_INTERVAL` | `3sec` | interval (in seconds) to save the firehose cursor. | 231 - | `REPO_FETCH_TIMEOUT` | `5min` | timeout (in seconds) for fetching repositories. | 232 - | `CACHE_SIZE` | `256` | size of the database cache in MB. | 233 - | `IDENTITY_CACHE_SIZE` | `100000` | number of identity entries to cache. | 234 - | `API_PORT` | `3000` | port for the API server. | 235 - | `ENABLE_DEBUG` | `false` | enable debug endpoints. | 236 - | `DEBUG_PORT` | `API_PORT + 1` | port for debug endpoints (if enabled). | 237 - | `ENABLE_FIREHOSE` | `true` | whether to ingest relay subscriptions. | 238 - | `ENABLE_CRAWLER` | `true` if full network or crawler sources are configured, `false` otherwise | whether to actively query the network for unknown repositories. | 239 - | `CRAWLER_MAX_PENDING_REPOS` | `2000` | max pending repos for crawler. | 240 - | `CRAWLER_RESUME_PENDING_REPOS` | `1000` | resume threshold for crawler pending repos. | 241 - | `NEW_HOST_LIMIT` | `50` | in relay mode, decides how many new hosts can be added via `com.atproto.sync.requestCrawl` in a day. | 242 - | `RATE_TIERS` | | comma-separated list of named rate tier definitions in `name:base/mul/hourly/daily[/account_limit]` format (e.g. `trusted:5000/10.0/18000000/432000000/10000000`). the optional account limit prevents new accounts from being created on this PDS once reached. built-in tiers (`default`, `trusted`) are always present and can be overridden. | 243 - | `TIER_RULES` | | comma-separated ordered list of glob rules in `pattern:tier_name` format (e.g. `*.bsky.network:trusted`). rules are evaluated in order; first match wins. explicit API assignments via `PUT /pds/tiers` take precedence over rules; the `default` tier is the final fallback. uses standard glob wildcards (`*`, `?`) matched against the PDS hostname. | 244 - 245 - ## build features 246 - 247 - <small>[<- back to toc](#table-of-contents)</small> 248 - 249 - `hydrant` has several optional compile-time features: 250 - 251 - | feature | default | description | 252 - | :--- | :--- | :--- | 253 - | `indexer` | yes | makes hydrant act as an indexer. incompatible with the relay feature. | 254 - | `indexer_stream` | yes | enables the event stream for the indexer. requires indexer feature. | 255 - | `relay` | no | makes hydrant act as a relay. incompatible with the indexer feature. | 256 - | `backlinks` | no | enables the backlinks indexer and XRPC endpoints (`blue.microcosm.links.*`). requires indexer feature. | 257 - 258 - ## REST api 259 - 260 - <small>[<- back to toc](#table-of-contents)</small> 261 - 262 - ### event stream 263 - 264 - - `GET /stream`: subscribe to the event stream. 265 - - query parameters: 266 - - `cursor` (optional): start streaming from a specific event ID. 267 - 268 - ### stats 269 - 270 - - `GET /stats`: get stats about the database: 271 - - `counts`: counts of repos, records, events, and errors, etc. 272 - - `sizes`: sizes of the database keyspaces on disk, in bytes. 273 - 274 - ### filter management 275 - 276 - <small>[<- back to toc](#table-of-contents)</small> 277 - 278 - - `GET /filter`: get the current filter configuration. 279 - - `PATCH /filter`: update the filter configuration. 280 - 281 - #### filter mode 282 - 283 - the `mode` field controls what gets indexed: 284 - 285 - | mode | behaviour | 286 - | :--- | :--- | 287 - | `filter` | auto-discovers and backfills any account whose firehose commit touches a collection matching one of the `signals` patterns. you can also explicitly track individual repositories via the `/repos` endpoint regardless of matching signals. | 288 - | `full` | index the entire network. `signals` are ignored for discovery, but `excludes` and `collections` still apply. | 289 - 290 - #### fields 291 - 292 - | field | type | description | 293 - | :--- | :--- | :--- | 294 - | `mode` | `"filter"` \| `"full"` | indexing mode (see above). | 295 - | `signals` | set update | NSID patterns (e.g. `app.bsky.feed.post` or `app.bsky.*`) that trigger auto-discovery in `filter` mode. | 296 - | `collections` | set update | NSID patterns used to filter which records are stored. if empty, all collections are stored. applies in all modes. | 297 - | `excludes` | set update | set of DIDs to always skip, regardless of mode. checked before any other filter logic. | 298 - 299 - #### set updates 300 - 301 - each set field accepts one of two forms: 302 - 303 - - **replace**: an array replaces the entire set, eg. `["did:plc:abc", "did:web:example.org"]` 304 - - **patch**: an object maps items to `true` (add) or `false` (remove), eg. `{"did:plc:abc": true, "did:web:example.org": false}` 305 - 306 - #### NSID patterns 307 - 308 - `signals` and `collections` support an optional `.*` suffix to match an entire namespace: 309 - 310 - - `app.bsky.feed.post`: exact match only 311 - - `app.bsky.feed.*`: matches any collection under `app.bsky.feed` 312 - 313 - ### ingestion control 314 - 315 - <small>[<- back to toc](#table-of-contents)</small> 316 - 317 - - `GET /ingestion`: get the current ingestion status. 318 - - returns `{ "crawler": bool, "firehose": bool, "backfill": bool }`. 319 - - `PATCH /ingestion`: enable or disable ingestion components at runtime without 320 - restarting. 321 - - body: `{ "crawler"?: bool, "firehose"?: bool, "backfill"?: bool }`. only provided fields are updated. 322 - - when disabled, each component finishes its current task before pausing (e.g. 323 - the backfill worker completes any in-flight repo syncs, the firehose 324 - finishes processing the current message). they resume immediately when 325 - re-enabled. 326 - 327 - ### crawler management 328 - 329 - <small>[<- back to toc](#table-of-contents)</small> 330 - 331 - - `GET /crawler/sources`: list all currently active crawler sources. 332 - - returns a JSON array of `{ "url": string, "mode": "relay" | "by_collection", "persisted": bool }`. 333 - - `persisted: true` means the source was added via the API and is stored in the 334 - database, it will survive a restart. `persisted: false` means the source 335 - came from `CRAWLER_URLS` and is not written to the database. 336 - - `POST /crawler/sources`: add a crawler source at runtime. 337 - - body: `{ "url": string, "mode": "relay" | "by_collection" }`. 338 - - the source is written to the database before the producer task is started, so 339 - it is safe to add sources and then immediately restart without losing them. 340 - - if a source with the same URL already exists (whether from `CRAWLER_URLS` or 341 - a previous `POST`), it is replaced: the running task is stopped and a new one 342 - is started with the new mode. any cursor state for that URL is preserved. 343 - - returns `201 Created` on success. 344 - - `DELETE /crawler/sources`: remove a crawler source at runtime. 345 - - body: `{ "url": string }`. 346 - - the producer task is stopped immediately. 347 - - if the source was added via the API (`persisted: true`), it is removed from 348 - the database and will not reappear on restart. if it came from `CRAWLER_URLS` 349 - (`persisted: false`), only the running task is stopped, the source will 350 - reappear on the next restart since `CRAWLER_URLS` is re-applied at startup. 351 - (unless you remove it manually from your configuration of course). 352 - - cursor state is not cleared. use `DELETE /crawler/cursors` separately if you want 353 - the source to restart from the beginning when re-added. 354 - - returns `200 OK` if the source was found and removed, `404 Not Found` otherwise. 355 - - `DELETE /crawler/cursors`: reset stored cursors for a given crawler URL. body: `{ "key": "..." }` 356 - where key is a URL. clears the list-repos crawler cursor as well as any by-collection 357 - cursors associated with that URL. causes the next crawler pass to restart from the beginning. 358 - 359 - ### firehose management 12 + use hydrant. 360 13 361 - <small>[<- back to toc](#table-of-contents)</small> 362 - 363 - - `GET /firehose/sources`: list all currently active firehose sources. 364 - - returns a JSON array of `{ "url": string, "persisted": bool, "is_pds": bool }`. 365 - - `persisted: true` means the source was added via the API and is stored in the 366 - database, it will survive a restart. `persisted: false` means the source 367 - came from `RELAY_HOSTS` and is not written to the database. 368 - - `is_pds: true` means the source is a direct PDS connection with host authority enforcement enabled. 369 - - `POST /firehose/sources`: add a firehose source at runtime. 370 - - body: `{ "url": string, "is_pds": bool }`. `is_pds` defaults to `false`. 371 - - the source is persisted to the database before the ingestor task is started. 372 - - if a source with the same URL already exists, it is replaced: the running 373 - task is stopped and a new one is started. any existing cursor state for that 374 - URL is preserved. 375 - - returns `201 Created` on success. 376 - - `DELETE /firehose/sources`: remove a firehose relay at runtime. 377 - - body: `{ "url": string }`. 378 - - the ingestor task is stopped immediately. 379 - - if the source was added via the API (`persisted: true`), it is removed from 380 - the database and will not reappear on restart. if it came from `RELAY_HOSTS` 381 - (`persisted: false`), only the running task is stopped; the source reappears 382 - on the next restart. 383 - - cursor state is not cleared. use `DELETE /firehose/cursors` separately if you want 384 - the relay to restart from the beginning when re-added. 385 - - returns `200 OK` if the relay was found and removed, `404 Not Found` otherwise. 386 - - `DELETE /firehose/cursors`: reset the stored cursor for a given firehose relay URL. body: `{ "key": "..." }` 387 - where key is a URL. causes the next firehose connection to restart from the beginning. 388 - 389 - ### PDS management 390 - 391 - <small>[<- back to toc](#table-of-contents)</small> 392 - 393 - hydrant rate-limits firehose events per PDS. each PDS is assigned to a named 394 - rate tier that controls how aggressively hydrant limits events from it. two 395 - built-in tiers are always present: `default` (conservative limits for unknown 396 - operators) and `trusted` (higher limits for well-behaved operators). additional 397 - tiers can be defined via `RATE_TIERS`. 398 - 399 - the per-second limit scales with the number of active accounts on the PDS: 400 - `max(per_second_base, accounts × per_second_account_mul)`. 401 - 402 - you can also define an optional `account_limit` for a rate tier. if a PDS 403 - exceeds this number of active accounts, hydrant will reject any new account 404 - creation events from it. 405 - 406 - the built-in tiers are defined as follows: 407 - - `default`: `50` per sec (floor), `+0.5` per account. max `3_600_000`/hr, `86_400_000`/day. `100` account limit. 408 - - `trusted`: `5000` per sec (floor), `+10.0` per account. max `18_000_000`/hr, `432_000_000`/day. `10_000_000` account limit. 409 - 410 - - `GET /pds/tiers`: list all current tier assignments alongside the available 411 - tier definitions. 412 - - returns `{ "assignments": [{ "host": string, "tier": string }], "rate_tiers": { <name>: { "per_second_base": int, "per_second_account_mul": float, "per_hour": int, "per_day": int } } }`. 413 - - `assignments` only contains PDSes with an explicit API assignment. hosts without one resolve via glob rules or fall back to `default`. 414 - - `PUT /pds/tiers`: assign a PDS to a named rate tier. 415 - - body: `{ "host": string, "tier": string }`. 416 - - `host` is the PDS hostname (e.g. `pds.example.com`). 417 - - `tier` must be one of the configured tier names. returns `400` if unknown. 418 - - assignments are persisted to the database and survive restarts. 419 - - re-assigning the same host updates the tier in place without creating a duplicate. 420 - - `DELETE /pds/tiers`: remove an explicit tier assignment for a PDS. 421 - - query parameter: `?host=<hostname>` (e.g. `?host=pds.example.com`). 422 - - reverts the host to glob-rule resolution (not necessarily `default`, a matching `TIER_RULES` pattern still applies). 423 - - returns `200` even if no assignment existed. 424 - - `GET /pds/rate-tiers`: list the available rate tier definitions. 425 - - returns a map of tier name to `{ "per_second_base", "per_second_account_mul", "per_hour", "per_day", "account_limit" }`. 426 - 427 - tiers are resolved in this order: 428 - 429 - 1. **explicit API assignment**, set via `PUT /pds/tiers`, stored in the database, survives restarts. 430 - 2. **glob rules**, from `TIER_RULES`, evaluated in order; first match wins. 431 - 3. **`default` tier**, applied if no rule or explicit assignment matches. 432 - 433 - deleting an API assignment reverts the host to glob-rule resolution, not necessarily back to `default`. if a rule like `*.bsky.network:trusted` matches the host, it will become trusted again without any further action. 434 - 435 - ### repository management 436 - 437 - <small>[<- back to toc](#table-of-contents)</small> 438 - 439 - all `/repos` endpoints that return lists respond with NDJSON by default. send `Accept: application/json` or `Content-Type: application/json` to get a JSON array instead. 440 - 441 - - `GET /repos`: get a list of repositories and their sync status. supports pagination and filtering: 442 - - `limit`: max results (default 100, max 1000) 443 - - `cursor`: did key for paginating. 444 - - `GET /repos/{did}`: get the sync status and metadata of a specific repository. 445 - also returns the handle, PDS URL and the atproto signing key (these won't be 446 - available before the repo has been backfilled once at least). 447 - - `PUT /repos`: explicitly track repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). 448 - only affects repositories that are not known or are untracked. 449 - returns a list of the DIDs that were queued for backfill. 450 - - `DELETE /repos`: untrack repositories. 451 - accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). 452 - only affects repositories that are currently tracked. 453 - returns a list of the DIDs that were untracked. 454 - - `POST /repos/resync`: force a new backfill for one or more repositories. 455 - accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). 456 - only affects repositories hydrant already knows about. 457 - returns a list of the DIDs that were queued. 458 - 459 - ### database operations 460 - 461 - - `POST /db/train`: train zstd compression dictionaries for the `repos`, 462 - `blocks`, and `events` keyspaces. dictionaries are written to disk; a restart 463 - is required to apply them. the crawler, firehose, and backfill worker are 464 - paused for the duration and restored on completion. 465 - - `POST /db/compact`: trigger a full major compaction of all database keyspaces 466 - in parallel. the crawler, firehose, and backfill worker are paused for the 467 - duration and restored on completion. 468 - 469 - ## data access (xrpc) 470 - 471 - <small>[<- back to toc](#table-of-contents)</small> 472 - 473 - `hydrant` implements the following XRPC endpoints under `/xrpc/`: 474 - 475 - ### com.atproto.* 476 - 477 - <small>[<- back to toc](#table-of-contents)</small> 478 - 479 - these are standard atproto endpoints. you can look at [the atproto api reference](https://docs.bsky.app/docs/category/http-reference) for more info. 480 - 481 - the following are implemented currently: 482 - - `com.atproto.repo.getRecord` 483 - - `com.atproto.repo.listRecords` 484 - - `com.atproto.repo.describeRepo` (also see `systems.gaze.hydrant.describeRepo`) 485 - - `com.atproto.sync.getRepo` (`since` parameter not implemented!) 486 - - `com.atproto.sync.getHostStatus` 487 - - `com.atproto.sync.listHosts` 488 - - `com.atproto.sync.getRepoStatus` 489 - - `com.atproto.sync.listRepos` 490 - - `com.atproto.sync.getLatestCommit` 491 - - `com.atproto.sync.requestCrawl` (adds the host to firehose sources in relay mode) 492 - - `com.atproto.sync.subscribeRepos` (WebSocket firehose stream, requires `relay` feature) 493 - 494 - ### systems.gaze.hydrant.* 495 - 496 - <small>[<- back to toc](#table-of-contents)</small> 497 - 498 - these are some non-standard XRPCs that might be useful. 499 - 500 - #### systems.gaze.hydrant.countRecords 501 - 502 - return the total number of stored records in a collection. 503 - 504 - | param | required | description | 505 - | :--- | :--- | :--- | 506 - | `identifier` | yes | DID or handle of the repository. | 507 - | `collection` | yes | NSID of the collection. | 508 - 509 - returns `{ count }`. 510 - 511 - #### systems.gaze.hydrant.describeRepo 512 - 513 - return account and identity information about this repo. 514 - this is equal to `com.atproto.repo.describeRepo`, except we don't return the full DID document. 515 - the handle is bi-directionally verified, if its invalid or the handle does not exist we return 516 - "handle.invalid". 517 - 518 - | param | required | description | 519 - | :--- | :--- | :--- | 520 - | `identifier` | yes | DID or handle of the repository. | 521 - 522 - returns `{ did, handle, pds, collections }`. 523 - 524 - ### blue.microcosm.links.* 525 - 526 - <small>[<- back to toc](#table-of-contents)</small> 527 - 528 - hydrant implements a subset of [microcosm constellation](https://constellation.microcosm.blue/) 529 - when it's built with the `backlinks` cargo feature (`cargo build --features backlinks`). 530 - 531 - when enabled, hydrant indexes all AT URI and DID references found inside stored records into a 532 - reverse index. this lets you efficiently answer "what records link to this subject?". 533 - 534 - #### blue.microcosm.links.getBacklinks 535 - 536 - return records that link to a given subject. 537 - 538 - | param | required | description | 539 - | :--- | :--- | :--- | 540 - | `subject` | yes | AT URI or DID to look up backlinks for. | 541 - | `source` | no | filter by source collection, e.g. `app.bsky.feed.like`. also accepts `collection:path` form to further filter by field path, e.g. `app.bsky.feed.like:subject.uri`. the path is matched against the dotted field path within the record (`.` is prepended automatically). | 542 - | `limit` | no | max results to return (default 50, max 100). | 543 - | `cursor` | no | opaque pagination cursor from a previous response. | 544 - | `reverse` | no | if `true`, return results in reverse order (default `false`). | 545 - 546 - returns `{ backlinks: [{ uri, cid }], cursor? }`. 547 - 548 - results are ordered by source record rkey (ascending by default, descending when `reverse=true`). 549 - the cursor is stable across new insertions for TID rkey records. 550 - 551 - #### blue.microcosm.links.getBacklinksCount 552 - 553 - return the number of records that link to a given subject. 554 - 555 - | param | required | description | 556 - | :--- | :--- | :--- | 557 - | `subject` | yes | AT URI or DID to count backlinks for. | 558 - | `source` | no | filter by source collection (same format as `getBacklinks`). | 559 - 560 - returns `{ count }`. 14 + for more information, look at the [documentation](https://hydrant.klbr.net).
+1
docs/README.md
··· 10 10 11 11 ## what's here 12 12 13 + - [vs tap](concepts/vs-tap.md): comparison against tap, the go sync utility 13 14 - [getting started](getting-started.md): building, running, reverse proxying 14 15 - [configuration](configuration.md): all environment variables 15 16 - [build features](build-features.md): optional cargo features (`relay`, `backlinks`, etc.)