very fast at protocol indexer with flexible filtering, xrpc queries, cursor-backed event stream, and more, built on fjall
rust fjall at-protocol atproto indexer
58
fork

Configure Feed

Select the types of activity you want to include in your feed.

[docs] add table of contents to readme

dawn aec2e459 8e837802

+61 -27
+61 -27
README.md
··· 1 + #### table-of-contents 2 + 3 + -> [hydrant](#hydrant)</br> 4 + -> [vs tap](#vs-tap)</br> 5 + -> [configuration](#configuration)</br> 6 + -> [rest api](#rest-api) | [filter](#filter-management) | [ingestion](#ingestion-control) | [crawler](#crawler-management) | [firehose](#firehose-management) | [repos](#repository-management)</br> 7 + -> [xrpc api](#data-access-xrpc) | [backlinks](#bluemicrocosmlinks) | [atproto](#comatproto) | [custom](#systemsgazehydrant) 8 + 1 9 # hydrant 2 10 3 11 `hydrant` is an AT Protocol indexer built on the `fjall` database. it's built to ··· 17 25 you dont mind losing your existing backfilled data in hydrant if you already 18 26 processed them.). 19 27 20 - ## vs `tap` 28 + ## vs tap 29 + 30 + <small>[<- back to toc](#table-of-contents)</small> 21 31 22 32 while [`tap`](https://github.com/bluesky-social/indigo/tree/main/cmd/tap) is 23 33 designed as a firehose consumer and simply just propagates events while handling ··· 76 86 77 87 ## configuration 78 88 89 + <small>[<- back to toc](#table-of-contents)</small> 90 + 79 91 `hydrant` is configured via environment variables. all variables are prefixed 80 92 with `HYDRANT_` (except `RUST_LOG`). if a `.env` file exists in the working 81 93 directory, it will also be loaded automatically. ··· 111 123 112 124 ## REST api 113 125 126 + <small>[<- back to toc](#table-of-contents)</small> 127 + 128 + ### event stream 129 + 130 + - `GET /stream`: subscribe to the event stream. 131 + - query parameters: 132 + - `cursor` (optional): start streaming from a specific event ID. 133 + 134 + ### stats 135 + 136 + - `GET /stats`: get stats about the database: 137 + - `counts`: counts of repos, records, events, and errors, etc. 138 + - `sizes`: sizes of the database keyspaces on disk, in bytes. 139 + 114 140 ### filter management 141 + 142 + <small>[<- back to toc](#table-of-contents)</small> 115 143 116 144 - `GET /filter`: get the current filter configuration. 117 145 - `PATCH /filter`: update the filter configuration. ··· 150 178 151 179 ### ingestion control 152 180 181 + <small>[<- back to toc](#table-of-contents)</small> 182 + 153 183 - `GET /ingestion`: get the current ingestion status. 154 184 - returns `{ "crawler": bool, "firehose": bool, "backfill": bool }`. 155 185 - `PATCH /ingestion`: enable or disable ingestion components at runtime without ··· 160 190 finishes processing the current message). they resume immediately when 161 191 re-enabled. 162 192 163 - ### crawler source management 193 + ### crawler management 194 + 195 + <small>[<- back to toc](#table-of-contents)</small> 164 196 165 197 - `GET /crawler/sources`: list all currently active crawler sources. 166 198 - returns a JSON array of `{ "url": string, "mode": "relay" | "by_collection", "persisted": bool }`. ··· 187 219 the source to restart from the beginning when re-added. 188 220 - returns `200 OK` if the source was found and removed, `404 Not Found` otherwise. 189 221 190 - ### firehose source management 222 + ### firehose management 223 + 224 + <small>[<- back to toc](#table-of-contents)</small> 191 225 192 226 - `GET /firehose/sources`: list all currently active firehose relay sources. 193 227 - returns a JSON array of `{ "url": string, "persisted": bool }`. ··· 212 246 the relay to restart from the beginning when re-added. 213 247 - returns `200 OK` if the relay was found and removed, `404 Not Found` otherwise. 214 248 215 - ### database operations 216 - 217 - - `POST /db/train`: train zstd compression dictionaries for the `repos`, 218 - `blocks`, and `events` keyspaces. dictionaries are written to disk; a restart 219 - is required to apply them. the crawler, firehose, and backfill worker are 220 - paused for the duration and restored on completion. 221 - - `POST /db/compact`: trigger a full major compaction of all database keyspaces 222 - in parallel. the crawler, firehose, and backfill worker are paused for the 223 - duration and restored on completion. 224 - - `DELETE /cursors`: reset all stored cursors for a given URL. body: `{ "key": "..." }` 225 - where key is a URL. clears both the firehose cursor and the relay crawler cursor, 226 - as well as any by-collection cursors associated with that URL. causes the next 227 - firehose connection and crawler pass to restart from the beginning. 228 - 229 249 ### repository management 230 250 251 + <small>[<- back to toc](#table-of-contents)</small> 252 + 231 253 - `GET /repos`: get an NDJSON stream of repositories and their sync status. supports pagination and filtering: 232 254 - `limit`: max results (default 100, max 1000) 233 255 - `cursor`: opaque key for paginating. ··· 238 260 - `PUT /repos`: explicitly track repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). 239 261 - `DELETE /repos`: untrack repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). 240 262 241 - ### event stream 242 - 243 - - `GET /stream`: subscribe to the event stream. 244 - - query parameters: 245 - - `cursor` (optional): start streaming from a specific event ID. 246 - 247 - ### stats 263 + ### database operations 248 264 249 - - `GET /stats`: get stats about the database: 250 - - `counts`: counts of repos, records, events, and errors, etc. 251 - - `sizes`: sizes of the database keyspaces on disk, in bytes. 265 + - `POST /db/train`: train zstd compression dictionaries for the `repos`, 266 + `blocks`, and `events` keyspaces. dictionaries are written to disk; a restart 267 + is required to apply them. the crawler, firehose, and backfill worker are 268 + paused for the duration and restored on completion. 269 + - `POST /db/compact`: trigger a full major compaction of all database keyspaces 270 + in parallel. the crawler, firehose, and backfill worker are paused for the 271 + duration and restored on completion. 272 + - `DELETE /cursors`: reset all stored cursors for a given URL. body: `{ "key": "..." }` 273 + where key is a URL. clears both the firehose cursor and the relay crawler cursor, 274 + as well as any by-collection cursors associated with that URL. causes the next 275 + firehose connection and crawler pass to restart from the beginning. 252 276 253 277 ## data access (xrpc) 278 + 279 + <small>[<- back to toc](#table-of-contents)</small> 254 280 255 281 `hydrant` implements the following XRPC endpoints under `/xrpc/`: 256 282 257 283 ### com.atproto.* 258 284 285 + <small>[<- back to toc](#table-of-contents)</small> 286 + 287 + these are standard atproto endpoints. you can look at [the atproto api reference](https://docs.bsky.app/docs/category/http-reference) for more info. 288 + 259 289 the following are implemented currently: 260 290 - `com.atproto.repo.getRecord` 261 291 - `com.atproto.repo.listRecords` 262 292 263 293 ### systems.gaze.hydrant.* 294 + 295 + <small>[<- back to toc](#table-of-contents)</small> 264 296 265 297 these are some non-standard XRPCs that might be useful. 266 298 ··· 276 308 returns `{ count }`. 277 309 278 310 ### blue.microcosm.links.* 311 + 312 + <small>[<- back to toc](#table-of-contents)</small> 279 313 280 314 hydrant implements a subset of [microcosm constellation](https://constellation.microcosm.blue/) 281 315 when it's built with the `backlinks` cargo feature (`cargo build --features backlinks`).