···133133- [x] config: db mem limit `--fjall-cache-mb`
134134- [x] config: per-host request rate self-throttling `--crawl-qps` (name from collectiondir)
135135- [x] resync: estimate CAR size from `getRecord` mst height; `getRepo` if it's likely very small
136136+- [x] admin view of backfill state etc
136137- [ ] special did:web ident cache behaviour to keep reusing a stale resolution on failure
137137-- [ ] admin view of backfill state etc
138138- [ ] vanity stats for optimizations, like how many in-flight repos were saved from resync due to high-water-mark firehose cursor persistence
139139- [ ] if the upstream is a PDS (check with describeServer?) then make only accept events for DIDs that have it as their PDS
140140- [ ] use `since` on getRepo for resync to get a smaller partial export in many cases (and then more-carefully do the actual resync)
···143143- [ ] check response headers and adjust self-throttling rate limits per-host if present
144144- [ ] make backfill go _really fast_
145145- [ ] clean up commit validation (eg we're checking signatures twice, lenient handling is weird)
146146+- [ ] metrics for db size
146147147148going to be annoying but doable
148149- [ ] multi-relay subscriber
+2
readme.md
···2233**status: almost working well but _not stable yet!!_**
4455+**...full-network backfill mostly works but needs tuning and tweaking. firehose collection extraction mostly works but needs more verification**.
66+57Lightrail uses the _adjacent keys_ in firehose commit CAR slices to detect first-record-added-to and last-record-removed-from collections in atproto repos, _statelessly_. Since most commits don't change repos' collection lists, this eliminates most of the work to maintain an accurate repos-by-collection index.
6879Compared to Bluesky's [`collectiondir`](https://github.com/bluesky-social/indigo/tree/main/cmd/collectiondir) service, lightrail: