···33//! Walks `com.atproto.sync.listRepos` and probes each repository to populate
44//! the rbc/cbr index before or alongside the live firehose feed.
55//!
66-//! Large repos are enumerated via binary-search `getRecord` probing (`probe`).
66+//! Large repos are enumerated via sequential `getRecord` probing (`probe`):
77+//! one request per collection, walking right-adjacent MST keys from the minimum
88+//! legal key to the end of the repo.
79//! Small repos take the fast path of fetching the full repo CAR (`small_repo`).
810911pub mod list_repos;
+27-12
src/backfill/probe.rs
···11-//! Binary-search `getRecord` probing for large-repo backfill.
11+//! `getRecord`-probing for large-repo backfill.
22+//!
33+//! MST keys have the form `<collection>/<rkey>`, where `collection` is an NSID
44+//! and `rkey` is a Record Key, both subject to format restrictions and a total
55+//! byte-length cap defined in the AT Protocol specs.
66+//!
77+//! `getRecord` always includes the keys adjacent to the queried key in its CAR
88+//! slice response, even when the record does not exist. The probing algorithm
99+//! exploits this to enumerate every collection with one request per collection:
1010+//!
1111+//! 1. Query `getRecord` with the **minimum legal MST key** — the
1212+//! lexicographically lowest string that is a valid `<collection>/<rkey>`.
1313+//! The record won't exist, but the right-adjacent key in the CAR slice is
1414+//! the lowest key actually present in the repo, revealing the first
1515+//! collection.
1616+//! 2. For that collection, compute the **maximum legal rkey** and query
1717+//! `getRecord` with `<collection>/<max_rkey>`. The right-adjacent key in
1818+//! the response is the first key of the *next* collection in the repo.
1919+//! 3. Repeat step 2 for each newly discovered collection until no right-adjacent
2020+//! key is returned, signalling that all collections have been found.
221//!
33-//! Since every ATProto collection has a known minimum and maximum possible rkey,
44-//! `getRecord` returns adjacent keys even when the requested key does not exist.
55-//! This lets us binary-search the MST to enumerate all collections without
66-//! fetching the full repo CAR.
2222+//! Each discovered `(did, collection)` pair is written to the rbc/cbr index
2323+//! via `db::index::insert`.
724825use crate::db::DbRef;
926use crate::error::Result;
10271111-/// Probe `did` to enumerate its collections via `getRecord` binary search.
2828+/// Probe `did` to enumerate its collections via sequential `getRecord` requests.
1229///
1313-/// 1. Issue a `getRecord` for the midpoint of the NSID key space.
1414-/// 2. Feed the returned CAR slice to `mst::adjacent::extract_adjacent`.
1515-/// 3. Use adjacent keys to narrow the search and recurse until all collection
1616-/// boundaries are discovered.
1717-/// 4. Write results to the rbc/cbr index via `db::index::insert`.
3030+/// Starts from the minimum legal MST key and follows right-adjacent keys one
3131+/// collection at a time until the end of the repo is reached. One XRPC request
3232+/// is issued per collection present in the repo.
1833pub async fn probe_repo(host: &str, did: &str, db: DbRef) -> Result<()> {
1934 let _ = (host, did, db);
2020- todo!("binary-search getRecord probing to enumerate collections")
3535+ todo!("sequential getRecord probing: walk right-adjacent keys to enumerate collections")
2136}