collection directory backfill#
lightrail (current)#
lightrail handles its own backfill automatically via --deep-crawl. on startup it:
- discovers PDS hosts from the relay's
com.atproto.sync.listHostsendpoint - crawls each host's
com.atproto.sync.listReposto enumerate all DIDs - resyncs each DID — fetches
describeRepoto get collections, indexes(DID, collection)pairs in fjall
progress is tracked internally (resync queue in fjall). pod restarts resume from where they left off.
monitoring: GET /admin/status (basic auth) returns resync_queue_depth, resyncs_completed_total, and upstream_backfill_complete. the grafana dashboard has panels for these metrics.
timing: full resync of ~6.8M repos took ~2.5 days. rate is governed by --crawl-qps (default 8) and PDS response times.
no manual intervention needed. lightrail manages the entire lifecycle — host discovery, crawling, retry on failure, and rate-limit cooldown.
collectiondir (legacy, replaced 2026-03-27)#
the previous Go-based collectiondir required manual backfill via scripts/backfill. that workflow is no longer needed. the collectiondir helm release is scaled to 0 but kept for rollback — see just indigo collectiondir-publish.