···67676868the crawler is configured separately from the firehose via `CRAWLER_URLS`. each
6969source is a `[mode::]url` entry where the mode prefix is optional and defaults
7070-to `by_collection` in filter mode or `relay` in full-network mode.
7070+to `by_collection` in filter mode or `list_repos` in full-network mode.
71717272-- `relay`: enumerates the network via `com.atproto.sync.listRepos`, then checks
7373- each repo's collections via `describeRepo`. used for full-network discovery.
7272+- `list_repos`: enumerates the network via `com.atproto.sync.listRepos`, checks
7373+ each repo's collections via `describeRepo`.
7474- `by_collection`: queries `com.atproto.sync.listReposByCollection` for each
7575 configured signal. more efficient for filtered indexing since it only surfaces
7676- repos that have matching records. cursors are stored per collection.
7676+ repos that have matching records. cursors are stored per collection. note that
7777+ it won't crawl anything if no signals are specified.
77787879```
7979-CRAWLER_URLS=by_collection::https://lightrail.microcosm.blue,relay::wss://bsky.network
8080+CRAWLER_URLS=by_collection::https://lightrail.microcosm.blue,list_repos::wss://bsky.network
8081```
81828283each source maintains its own cursor so restarts resume mid-pass.
···221222 the source to restart from the beginning when re-added.
222223 - returns `200 OK` if the source was found and removed, `404 Not Found` otherwise.
223224- `DELETE /crawler/cursors`: reset stored cursors for a given crawler URL. body: `{ "key": "..." }`
224224- where key is a URL. clears the relay crawler cursor as well as any by-collection
225225+ where key is a URL. clears the list-repos crawler cursor as well as any by-collection
225226 cursors associated with that URL. causes the next crawler pass to restart from the beginning.
226227227228### firehose management