···11# crawler management
2233-- `GET /crawler/sources`: list all currently active crawler sources.
44- - returns a JSON array of `{ "url": string, "mode": "relay" | "by_collection", "persisted": bool }`.
55- - `persisted: true` means the source was added via the API and is stored in the database, it will survive a restart. `persisted: false` means the source came from `CRAWLER_URLS` and is not written to the database.
66-- `POST /crawler/sources`: add a crawler source at runtime.
77- - body: `{ "url": string, "mode": "relay" | "by_collection" }`.
88- - the source is written to the database before the producer task is started, so it is safe to add sources and then immediately restart without losing them.
99- - if a source with the same URL already exists (whether from `CRAWLER_URLS` or a previous `POST`), it is replaced: the running task is stopped and a new one is started with the new mode. any cursor state for that URL is preserved.
1010- - returns `201 Created` on success.
1111-- `DELETE /crawler/sources`: remove a crawler source at runtime.
1212- - body: `{ "url": string }`.
1313- - the producer task is stopped immediately.
1414- - if the source was added via the API (`persisted: true`), it is removed from the database and will not reappear on restart. if it came from `CRAWLER_URLS` (`persisted: false`), only the running task is stopped, the source will reappear on the next restart since `CRAWLER_URLS` is re-applied at startup.
1515- - cursor state is not cleared. use `DELETE /crawler/cursors` separately if you want the source to restart from the beginning when re-added.
1616- - returns `200 OK` if the source was found and removed, `404 Not Found` otherwise.
1717-- `DELETE /crawler/cursors`: reset stored cursors for a given crawler URL. body: `{ "key": "..." }` where key is a URL. clears the list-repos crawler cursor as well as any by-collection cursors associated with that URL. causes the next crawler pass to restart from the beginning.
33+## GET /crawler/sources
44+55+list all currently active crawler sources. returns a JSON array of `{ "url": string, "mode": "relay" | "by_collection", "persisted": bool }`.
66+77+`persisted: true` means the source was added via the API and is stored in the database; it will survive a restart. `persisted: false` means the source came from `CRAWLER_URLS` and is not written to the database.
88+99+## POST /crawler/sources
1010+1111+add a crawler source at runtime. body: `{ "url": string, "mode": "relay" | "by_collection" }`.
1212+1313+the source is written to the database before the producer task is started, so it is safe to add sources and then immediately restart without losing them.
1414+1515+if a source with the same URL already exists (whether from `CRAWLER_URLS` or a previous `POST`), it is replaced: the running task is stopped and a new one is started with the new mode. any cursor state for that URL is preserved.
1616+1717+returns `201 Created` on success.
1818+1919+## DELETE /crawler/sources
2020+2121+remove a crawler source at runtime. body: `{ "url": string }`.
2222+2323+the producer task is stopped immediately.
2424+2525+if the source was added via the API (`persisted: true`), it is removed from the database and will not reappear on restart. if it came from `CRAWLER_URLS` (`persisted: false`), only the running task is stopped; the source will reappear on the next restart since `CRAWLER_URLS` is re-applied at startup.
2626+2727+cursor state is not cleared. use `DELETE /crawler/cursors` separately if you want the source to restart from the beginning when re-added.
2828+2929+returns `200 OK` if the source was found and removed, `404 Not Found` otherwise.
3030+3131+## DELETE /crawler/cursors
3232+3333+reset stored cursors for a given crawler URL. body: `{ "key": "..." }` where key is a URL. clears the list-repos crawler cursor as well as any by-collection cursors associated with that URL. causes the next crawler pass to restart from the beginning.
+31-16
docs/api/firehose.md
···11# firehose management
2233-- `GET /firehose/sources`: list all currently active firehose sources.
44- - returns a JSON array of `{ "url": string, "persisted": bool, "is_pds": bool }`.
55- - `persisted: true` means the source was added via the API and is stored in the database, it will survive a restart. `persisted: false` means the source came from `RELAY_HOSTS` and is not written to the database.
66- - `is_pds: true` means the source is a direct PDS connection with host authority enforcement enabled.
77-- `POST /firehose/sources`: add a firehose source at runtime.
88- - body: `{ "url": string, "is_pds": bool }`. `is_pds` defaults to `false`.
99- - the source is persisted to the database before the ingestor task is started.
1010- - if a source with the same URL already exists, it is replaced: the running task is stopped and a new one is started. any existing cursor state for that URL is preserved.
1111- - returns `201 Created` on success.
1212-- `DELETE /firehose/sources`: remove a firehose relay at runtime.
1313- - body: `{ "url": string }`.
1414- - the ingestor task is stopped immediately.
1515- - if the source was added via the API (`persisted: true`), it is removed from the database and will not reappear on restart. if it came from `RELAY_HOSTS` (`persisted: false`), only the running task is stopped; the source reappears on the next restart.
1616- - cursor state is not cleared. use `DELETE /firehose/cursors` separately if you want the relay to restart from the beginning when re-added.
1717- - returns `200 OK` if the relay was found and removed, `404 Not Found` otherwise.
1818-- `DELETE /firehose/cursors`: reset the stored cursor for a given firehose relay URL. body: `{ "key": "..." }` where key is a URL. causes the next firehose connection to restart from the beginning.
33+## GET /firehose/sources
44+55+list all currently active firehose sources. returns a JSON array of `{ "url": string, "persisted": bool, "is_pds": bool }`.
66+77+`persisted: true` means the source was added via the API and is stored in the database; it will survive a restart. `persisted: false` means the source came from `RELAY_HOSTS` and is not written to the database. `is_pds: true` means the source is a direct PDS connection with host authority enforcement enabled.
88+99+## POST /firehose/sources
1010+1111+add a firehose source at runtime. body: `{ "url": string, "is_pds": bool }`. `is_pds` defaults to `false`.
1212+1313+the source is persisted to the database before the ingestor task is started.
1414+1515+if a source with the same URL already exists, it is replaced: the running task is stopped and a new one is started. any existing cursor state for that URL is preserved.
1616+1717+returns `201 Created` on success.
1818+1919+## DELETE /firehose/sources
2020+2121+remove a firehose relay at runtime. body: `{ "url": string }`.
2222+2323+the ingestor task is stopped immediately.
2424+2525+if the source was added via the API (`persisted: true`), it is removed from the database and will not reappear on restart. if it came from `RELAY_HOSTS` (`persisted: false`), only the running task is stopped; the source reappears on the next restart.
2626+2727+cursor state is not cleared. use `DELETE /firehose/cursors` separately if you want the relay to restart from the beginning when re-added.
2828+2929+returns `200 OK` if the relay was found and removed, `404 Not Found` otherwise.
3030+3131+## DELETE /firehose/cursors
3232+3333+reset the stored cursor for a given firehose relay URL. body: `{ "key": "..." }` where key is a URL. causes the next firehose connection to restart from the beginning.
+9-5
docs/api/ingestion.md
···11# ingestion control
2233-- `GET /ingestion`: get the current ingestion status.
44- - returns `{ "crawler": bool, "firehose": bool, "backfill": bool }`.
55-- `PATCH /ingestion`: enable or disable ingestion components at runtime without restarting.
66- - body: `{ "crawler"?: bool, "firehose"?: bool, "backfill"?: bool }`. only provided fields are updated.
77- - when disabled, each component finishes its current task before pausing (e.g. the backfill worker completes any in-flight repo syncs, the firehose finishes processing the current message). they resume immediately when re-enabled.
33+## GET /ingestion
44+55+get the current ingestion status. returns `{ "crawler": bool, "firehose": bool, "backfill": bool }`.
66+77+## PATCH /ingestion
88+99+enable or disable ingestion components at runtime without restarting. body: `{ "crawler"?: bool, "firehose"?: bool, "backfill"?: bool }`. only provided fields are updated.
1010+1111+when disabled, each component finishes its current task before pausing (e.g. the backfill worker completes any in-flight repo syncs, the firehose finishes processing the current message). they resume immediately when re-enabled.
+31-22
docs/api/pds.md
···6677you can also define an optional `account_limit` for a rate tier. if a PDS exceeds this number of active accounts, hydrant will reject any new account creation events from it.
8899-the built-in tiers are defined as follows:
1010-- `default`: `50` per sec (floor), `+0.5` per account. max `3_600_000`/hr, `86_400_000`/day. `100` account limit.
1111-- `trusted`: `5000` per sec (floor), `+10.0` per account. max `18_000_000`/hr, `432_000_000`/day. `10_000_000` account limit.
99+the built-in tiers are:
12101313-tiers are resolved in this order:
1111+| tier | per_second_base | per_second_account_mul | per_hour | per_day | account_limit |
1212+| :--- | :--- | :--- | :--- | :--- | :--- |
1313+| `default` | 50 | +0.5 | 3,600,000 | 86,400,000 | 100 |
1414+| `trusted` | 5000 | +10.0 | 18,000,000 | 432,000,000 | 10,000,000 |
14151515-1. **explicit API assignment**, set via `PUT /pds/tiers`, stored in the database, survives restarts.
1616-2. **glob rules**, from `TIER_RULES`, evaluated in order; first match wins.
1717-3. **`default` tier**, applied if no rule or explicit assignment matches.
1616+tiers are resolved in this order: explicit API assignment (set via `PUT /pds/tiers`, stored in the database, survives restarts), then glob rules (from `TIER_RULES`, evaluated in order; first match wins), then the `default` tier (applied if nothing else matches).
18171918deleting an API assignment reverts the host to glob-rule resolution, not necessarily back to `default`. if a rule like `*.bsky.network:trusted` matches the host, it will become trusted again without any further action.
20192121-- `GET /pds/tiers`: list all current tier assignments alongside the available tier definitions.
2222- - returns `{ "assignments": [{ "host": string, "tier": string }], "rate_tiers": { <name>: { "per_second_base": int, "per_second_account_mul": float, "per_hour": int, "per_day": int } } }`.
2323- - `assignments` only contains PDSes with an explicit API assignment. hosts without one resolve via glob rules or fall back to `default`.
2424-- `PUT /pds/tiers`: assign a PDS to a named rate tier.
2525- - body: `{ "host": string, "tier": string }`.
2626- - `host` is the PDS hostname (e.g. `pds.example.com`).
2727- - `tier` must be one of the configured tier names. returns `400` if unknown.
2828- - assignments are persisted to the database and survive restarts.
2929- - re-assigning the same host updates the tier in place without creating a duplicate.
3030-- `DELETE /pds/tiers`: remove an explicit tier assignment for a PDS.
3131- - query parameter: `?host=<hostname>` (e.g. `?host=pds.example.com`).
3232- - reverts the host to glob-rule resolution (not necessarily `default`, a matching `TIER_RULES` pattern still applies).
3333- - returns `200` even if no assignment existed.
3434-- `GET /pds/rate-tiers`: list the available rate tier definitions.
3535- - returns a map of tier name to `{ "per_second_base", "per_second_account_mul", "per_hour", "per_day", "account_limit" }`.
2020+## GET /pds/tiers
2121+2222+list all current tier assignments alongside the available tier definitions. returns `{ "assignments": [{ "host": string, "tier": string }], "rate_tiers": { <name>: { "per_second_base": int, "per_second_account_mul": float, "per_hour": int, "per_day": int } } }`.
2323+2424+`assignments` only contains PDSes with an explicit API assignment. hosts without one resolve via glob rules or fall back to `default`.
2525+2626+## PUT /pds/tiers
2727+2828+assign a PDS to a named rate tier. body: `{ "host": string, "tier": string }`.
2929+3030+`host` is the PDS hostname (e.g. `pds.example.com`). `tier` must be one of the configured tier names; returns `400` if unknown.
3131+3232+assignments are persisted to the database and survive restarts. re-assigning the same host updates the tier in place without creating a duplicate.
3333+3434+## DELETE /pds/tiers
3535+3636+remove an explicit tier assignment for a PDS. query parameter: `?host=<hostname>` (e.g. `?host=pds.example.com`).
3737+3838+reverts the host to glob-rule resolution (not necessarily `default`; a matching `TIER_RULES` pattern still applies).
3939+4040+returns `200` even if no assignment existed.
4141+4242+## GET /pds/rate-tiers
4343+4444+list the available rate tier definitions. returns a map of tier name to `{ "per_second_base", "per_second_account_mul", "per_hour", "per_day", "account_limit" }`.
+24-7
docs/api/repos.md
···2233all `/repos` endpoints that return lists respond with NDJSON by default. send `Accept: application/json` or `Content-Type: application/json` to get a JSON array instead.
4455-- `GET /repos`: get a list of repositories and their sync status. supports pagination and filtering:
66- - `limit`: max results (default 100, max 1000)
77- - `cursor`: did key for paginating.
88-- `GET /repos/{did}`: get the sync status and metadata of a specific repository. also returns the handle, PDS URL and the atproto signing key (these won't be available before the repo has been backfilled once at least).
99-- `PUT /repos`: explicitly track repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). only affects repositories that are not known or are untracked. returns a list of the DIDs that were queued for backfill.
1010-- `DELETE /repos`: untrack repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). only affects repositories that are currently tracked. returns a list of the DIDs that were untracked.
1111-- `POST /repos/resync`: force a new backfill for one or more repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). only affects repositories hydrant already knows about. returns a list of the DIDs that were queued.
55+## GET /repos
66+77+get a list of repositories and their sync status. supports pagination and filtering:
88+99+| param | description |
1010+| :--- | :--- |
1111+| `limit` | max results (default 100, max 1000) |
1212+| `cursor` | did key for paginating |
1313+1414+## GET /repos/{did}
1515+1616+get the sync status and metadata of a specific repository. also returns the handle, PDS URL and the atproto signing key (these won't be available before the repo has been backfilled at least once).
1717+1818+## PUT /repos
1919+2020+explicitly track repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). only affects repositories that are not known or are untracked. returns a list of the DIDs that were queued for backfill.
2121+2222+## DELETE /repos
2323+2424+untrack repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). only affects repositories that are currently tracked. returns a list of the DIDs that were untracked.
2525+2626+## POST /repos/resync
2727+2828+force a new backfill for one or more repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). only affects repositories hydrant already knows about. returns a list of the DIDs that were queued.