very fast at protocol indexer with flexible filtering, xrpc queries, cursor-backed event stream, and more, built on fjall
rust fjall at-protocol atproto indexer
59
fork

Configure Feed

Select the types of activity you want to include in your feed.

[docs] convert nested lists to headings in api docs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dawn af546208 612f69e4

+126 -65
+31 -15
docs/api/crawler.md
··· 1 1 # crawler management 2 2 3 - - `GET /crawler/sources`: list all currently active crawler sources. 4 - - returns a JSON array of `{ "url": string, "mode": "relay" | "by_collection", "persisted": bool }`. 5 - - `persisted: true` means the source was added via the API and is stored in the database, it will survive a restart. `persisted: false` means the source came from `CRAWLER_URLS` and is not written to the database. 6 - - `POST /crawler/sources`: add a crawler source at runtime. 7 - - body: `{ "url": string, "mode": "relay" | "by_collection" }`. 8 - - the source is written to the database before the producer task is started, so it is safe to add sources and then immediately restart without losing them. 9 - - if a source with the same URL already exists (whether from `CRAWLER_URLS` or a previous `POST`), it is replaced: the running task is stopped and a new one is started with the new mode. any cursor state for that URL is preserved. 10 - - returns `201 Created` on success. 11 - - `DELETE /crawler/sources`: remove a crawler source at runtime. 12 - - body: `{ "url": string }`. 13 - - the producer task is stopped immediately. 14 - - if the source was added via the API (`persisted: true`), it is removed from the database and will not reappear on restart. if it came from `CRAWLER_URLS` (`persisted: false`), only the running task is stopped, the source will reappear on the next restart since `CRAWLER_URLS` is re-applied at startup. 15 - - cursor state is not cleared. use `DELETE /crawler/cursors` separately if you want the source to restart from the beginning when re-added. 16 - - returns `200 OK` if the source was found and removed, `404 Not Found` otherwise. 17 - - `DELETE /crawler/cursors`: reset stored cursors for a given crawler URL. body: `{ "key": "..." }` where key is a URL. clears the list-repos crawler cursor as well as any by-collection cursors associated with that URL. causes the next crawler pass to restart from the beginning. 3 + ## GET /crawler/sources 4 + 5 + list all currently active crawler sources. returns a JSON array of `{ "url": string, "mode": "relay" | "by_collection", "persisted": bool }`. 6 + 7 + `persisted: true` means the source was added via the API and is stored in the database; it will survive a restart. `persisted: false` means the source came from `CRAWLER_URLS` and is not written to the database. 8 + 9 + ## POST /crawler/sources 10 + 11 + add a crawler source at runtime. body: `{ "url": string, "mode": "relay" | "by_collection" }`. 12 + 13 + the source is written to the database before the producer task is started, so it is safe to add sources and then immediately restart without losing them. 14 + 15 + if a source with the same URL already exists (whether from `CRAWLER_URLS` or a previous `POST`), it is replaced: the running task is stopped and a new one is started with the new mode. any cursor state for that URL is preserved. 16 + 17 + returns `201 Created` on success. 18 + 19 + ## DELETE /crawler/sources 20 + 21 + remove a crawler source at runtime. body: `{ "url": string }`. 22 + 23 + the producer task is stopped immediately. 24 + 25 + if the source was added via the API (`persisted: true`), it is removed from the database and will not reappear on restart. if it came from `CRAWLER_URLS` (`persisted: false`), only the running task is stopped; the source will reappear on the next restart since `CRAWLER_URLS` is re-applied at startup. 26 + 27 + cursor state is not cleared. use `DELETE /crawler/cursors` separately if you want the source to restart from the beginning when re-added. 28 + 29 + returns `200 OK` if the source was found and removed, `404 Not Found` otherwise. 30 + 31 + ## DELETE /crawler/cursors 32 + 33 + reset stored cursors for a given crawler URL. body: `{ "key": "..." }` where key is a URL. clears the list-repos crawler cursor as well as any by-collection cursors associated with that URL. causes the next crawler pass to restart from the beginning.
+31 -16
docs/api/firehose.md
··· 1 1 # firehose management 2 2 3 - - `GET /firehose/sources`: list all currently active firehose sources. 4 - - returns a JSON array of `{ "url": string, "persisted": bool, "is_pds": bool }`. 5 - - `persisted: true` means the source was added via the API and is stored in the database, it will survive a restart. `persisted: false` means the source came from `RELAY_HOSTS` and is not written to the database. 6 - - `is_pds: true` means the source is a direct PDS connection with host authority enforcement enabled. 7 - - `POST /firehose/sources`: add a firehose source at runtime. 8 - - body: `{ "url": string, "is_pds": bool }`. `is_pds` defaults to `false`. 9 - - the source is persisted to the database before the ingestor task is started. 10 - - if a source with the same URL already exists, it is replaced: the running task is stopped and a new one is started. any existing cursor state for that URL is preserved. 11 - - returns `201 Created` on success. 12 - - `DELETE /firehose/sources`: remove a firehose relay at runtime. 13 - - body: `{ "url": string }`. 14 - - the ingestor task is stopped immediately. 15 - - if the source was added via the API (`persisted: true`), it is removed from the database and will not reappear on restart. if it came from `RELAY_HOSTS` (`persisted: false`), only the running task is stopped; the source reappears on the next restart. 16 - - cursor state is not cleared. use `DELETE /firehose/cursors` separately if you want the relay to restart from the beginning when re-added. 17 - - returns `200 OK` if the relay was found and removed, `404 Not Found` otherwise. 18 - - `DELETE /firehose/cursors`: reset the stored cursor for a given firehose relay URL. body: `{ "key": "..." }` where key is a URL. causes the next firehose connection to restart from the beginning. 3 + ## GET /firehose/sources 4 + 5 + list all currently active firehose sources. returns a JSON array of `{ "url": string, "persisted": bool, "is_pds": bool }`. 6 + 7 + `persisted: true` means the source was added via the API and is stored in the database; it will survive a restart. `persisted: false` means the source came from `RELAY_HOSTS` and is not written to the database. `is_pds: true` means the source is a direct PDS connection with host authority enforcement enabled. 8 + 9 + ## POST /firehose/sources 10 + 11 + add a firehose source at runtime. body: `{ "url": string, "is_pds": bool }`. `is_pds` defaults to `false`. 12 + 13 + the source is persisted to the database before the ingestor task is started. 14 + 15 + if a source with the same URL already exists, it is replaced: the running task is stopped and a new one is started. any existing cursor state for that URL is preserved. 16 + 17 + returns `201 Created` on success. 18 + 19 + ## DELETE /firehose/sources 20 + 21 + remove a firehose relay at runtime. body: `{ "url": string }`. 22 + 23 + the ingestor task is stopped immediately. 24 + 25 + if the source was added via the API (`persisted: true`), it is removed from the database and will not reappear on restart. if it came from `RELAY_HOSTS` (`persisted: false`), only the running task is stopped; the source reappears on the next restart. 26 + 27 + cursor state is not cleared. use `DELETE /firehose/cursors` separately if you want the relay to restart from the beginning when re-added. 28 + 29 + returns `200 OK` if the relay was found and removed, `404 Not Found` otherwise. 30 + 31 + ## DELETE /firehose/cursors 32 + 33 + reset the stored cursor for a given firehose relay URL. body: `{ "key": "..." }` where key is a URL. causes the next firehose connection to restart from the beginning.
+9 -5
docs/api/ingestion.md
··· 1 1 # ingestion control 2 2 3 - - `GET /ingestion`: get the current ingestion status. 4 - - returns `{ "crawler": bool, "firehose": bool, "backfill": bool }`. 5 - - `PATCH /ingestion`: enable or disable ingestion components at runtime without restarting. 6 - - body: `{ "crawler"?: bool, "firehose"?: bool, "backfill"?: bool }`. only provided fields are updated. 7 - - when disabled, each component finishes its current task before pausing (e.g. the backfill worker completes any in-flight repo syncs, the firehose finishes processing the current message). they resume immediately when re-enabled. 3 + ## GET /ingestion 4 + 5 + get the current ingestion status. returns `{ "crawler": bool, "firehose": bool, "backfill": bool }`. 6 + 7 + ## PATCH /ingestion 8 + 9 + enable or disable ingestion components at runtime without restarting. body: `{ "crawler"?: bool, "firehose"?: bool, "backfill"?: bool }`. only provided fields are updated. 10 + 11 + when disabled, each component finishes its current task before pausing (e.g. the backfill worker completes any in-flight repo syncs, the firehose finishes processing the current message). they resume immediately when re-enabled.
+31 -22
docs/api/pds.md
··· 6 6 7 7 you can also define an optional `account_limit` for a rate tier. if a PDS exceeds this number of active accounts, hydrant will reject any new account creation events from it. 8 8 9 - the built-in tiers are defined as follows: 10 - - `default`: `50` per sec (floor), `+0.5` per account. max `3_600_000`/hr, `86_400_000`/day. `100` account limit. 11 - - `trusted`: `5000` per sec (floor), `+10.0` per account. max `18_000_000`/hr, `432_000_000`/day. `10_000_000` account limit. 9 + the built-in tiers are: 12 10 13 - tiers are resolved in this order: 11 + | tier | per_second_base | per_second_account_mul | per_hour | per_day | account_limit | 12 + | :--- | :--- | :--- | :--- | :--- | :--- | 13 + | `default` | 50 | +0.5 | 3,600,000 | 86,400,000 | 100 | 14 + | `trusted` | 5000 | +10.0 | 18,000,000 | 432,000,000 | 10,000,000 | 14 15 15 - 1. **explicit API assignment**, set via `PUT /pds/tiers`, stored in the database, survives restarts. 16 - 2. **glob rules**, from `TIER_RULES`, evaluated in order; first match wins. 17 - 3. **`default` tier**, applied if no rule or explicit assignment matches. 16 + tiers are resolved in this order: explicit API assignment (set via `PUT /pds/tiers`, stored in the database, survives restarts), then glob rules (from `TIER_RULES`, evaluated in order; first match wins), then the `default` tier (applied if nothing else matches). 18 17 19 18 deleting an API assignment reverts the host to glob-rule resolution, not necessarily back to `default`. if a rule like `*.bsky.network:trusted` matches the host, it will become trusted again without any further action. 20 19 21 - - `GET /pds/tiers`: list all current tier assignments alongside the available tier definitions. 22 - - returns `{ "assignments": [{ "host": string, "tier": string }], "rate_tiers": { <name>: { "per_second_base": int, "per_second_account_mul": float, "per_hour": int, "per_day": int } } }`. 23 - - `assignments` only contains PDSes with an explicit API assignment. hosts without one resolve via glob rules or fall back to `default`. 24 - - `PUT /pds/tiers`: assign a PDS to a named rate tier. 25 - - body: `{ "host": string, "tier": string }`. 26 - - `host` is the PDS hostname (e.g. `pds.example.com`). 27 - - `tier` must be one of the configured tier names. returns `400` if unknown. 28 - - assignments are persisted to the database and survive restarts. 29 - - re-assigning the same host updates the tier in place without creating a duplicate. 30 - - `DELETE /pds/tiers`: remove an explicit tier assignment for a PDS. 31 - - query parameter: `?host=<hostname>` (e.g. `?host=pds.example.com`). 32 - - reverts the host to glob-rule resolution (not necessarily `default`, a matching `TIER_RULES` pattern still applies). 33 - - returns `200` even if no assignment existed. 34 - - `GET /pds/rate-tiers`: list the available rate tier definitions. 35 - - returns a map of tier name to `{ "per_second_base", "per_second_account_mul", "per_hour", "per_day", "account_limit" }`. 20 + ## GET /pds/tiers 21 + 22 + list all current tier assignments alongside the available tier definitions. returns `{ "assignments": [{ "host": string, "tier": string }], "rate_tiers": { <name>: { "per_second_base": int, "per_second_account_mul": float, "per_hour": int, "per_day": int } } }`. 23 + 24 + `assignments` only contains PDSes with an explicit API assignment. hosts without one resolve via glob rules or fall back to `default`. 25 + 26 + ## PUT /pds/tiers 27 + 28 + assign a PDS to a named rate tier. body: `{ "host": string, "tier": string }`. 29 + 30 + `host` is the PDS hostname (e.g. `pds.example.com`). `tier` must be one of the configured tier names; returns `400` if unknown. 31 + 32 + assignments are persisted to the database and survive restarts. re-assigning the same host updates the tier in place without creating a duplicate. 33 + 34 + ## DELETE /pds/tiers 35 + 36 + remove an explicit tier assignment for a PDS. query parameter: `?host=<hostname>` (e.g. `?host=pds.example.com`). 37 + 38 + reverts the host to glob-rule resolution (not necessarily `default`; a matching `TIER_RULES` pattern still applies). 39 + 40 + returns `200` even if no assignment existed. 41 + 42 + ## GET /pds/rate-tiers 43 + 44 + list the available rate tier definitions. returns a map of tier name to `{ "per_second_base", "per_second_account_mul", "per_hour", "per_day", "account_limit" }`.
+24 -7
docs/api/repos.md
··· 2 2 3 3 all `/repos` endpoints that return lists respond with NDJSON by default. send `Accept: application/json` or `Content-Type: application/json` to get a JSON array instead. 4 4 5 - - `GET /repos`: get a list of repositories and their sync status. supports pagination and filtering: 6 - - `limit`: max results (default 100, max 1000) 7 - - `cursor`: did key for paginating. 8 - - `GET /repos/{did}`: get the sync status and metadata of a specific repository. also returns the handle, PDS URL and the atproto signing key (these won't be available before the repo has been backfilled once at least). 9 - - `PUT /repos`: explicitly track repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). only affects repositories that are not known or are untracked. returns a list of the DIDs that were queued for backfill. 10 - - `DELETE /repos`: untrack repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). only affects repositories that are currently tracked. returns a list of the DIDs that were untracked. 11 - - `POST /repos/resync`: force a new backfill for one or more repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). only affects repositories hydrant already knows about. returns a list of the DIDs that were queued. 5 + ## GET /repos 6 + 7 + get a list of repositories and their sync status. supports pagination and filtering: 8 + 9 + | param | description | 10 + | :--- | :--- | 11 + | `limit` | max results (default 100, max 1000) | 12 + | `cursor` | did key for paginating | 13 + 14 + ## GET /repos/{did} 15 + 16 + get the sync status and metadata of a specific repository. also returns the handle, PDS URL and the atproto signing key (these won't be available before the repo has been backfilled at least once). 17 + 18 + ## PUT /repos 19 + 20 + explicitly track repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). only affects repositories that are not known or are untracked. returns a list of the DIDs that were queued for backfill. 21 + 22 + ## DELETE /repos 23 + 24 + untrack repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). only affects repositories that are currently tracked. returns a list of the DIDs that were untracked. 25 + 26 + ## POST /repos/resync 27 + 28 + force a new backfill for one or more repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). only affects repositories hydrant already knows about. returns a list of the DIDs that were queued.