collection directory backfill#

lightrail (current)#

lightrail handles its own backfill automatically via --deep-crawl. on startup it:

discovers PDS hosts from the relay's com.atproto.sync.listHosts endpoint
crawls each host's com.atproto.sync.listRepos to enumerate all DIDs
resyncs each DID — fetches describeRepo to get collections, indexes (DID, collection) pairs in fjall

progress is tracked internally (resync queue in fjall). pod restarts resume from where they left off.

monitoring: GET /admin/status (basic auth) returns resync_queue_depth, resyncs_completed_total, and upstream_backfill_complete. the grafana dashboard has panels for these metrics.

timing: full resync of ~6.8M repos took ~2.5 days. rate is governed by --crawl-qps (default 8) and PDS response times.

no manual intervention needed. lightrail manages the entire lifecycle — host discovery, crawling, retry on failure, and rate-limit cooldown.

collectiondir (legacy, replaced 2026-03-27)#

the previous Go-based collectiondir required manual backfill via scripts/backfill. that workflow is no longer needed. the collectiondir helm release is scaled to 0 but kept for rollback — see just indigo collectiondir-publish.

Configure Feed

Configure Feed

collection directory backfill#

lightrail (current)#

collectiondir (legacy, replaced 2026-03-27)#