Select the types of activity you want to include in your feed.
collection directory service
lookup repos by collection (who has app.bsky.feed.post records ?)
firehose consumer and crawl PDS by listRepos,describeRepo
daily-active-users collections
···11+# Collection Directory
22+33+Maintain a directory of which repos use which collections of records.
44+55+e.g. "app.bsky.feed.post" is used by did:alice did:bob
66+77+Firehose consumer and crawler of PDS via listRepos and describeRepo.
88+99+The primary query is:
1010+1111+```
1212+/v1/getDidsForCollection?collection={}&cursor={}
1313+```
1414+1515+It returns JSON:
1616+1717+```json
1818+{"dids":["did:A", "..."],
1919+"cursor":"opaque text"}
2020+```
2121+2222+query parameter `collection` may be repeated up to 10 times. They must always be sent in the same order or the cursor will break.
2323+2424+If multiple collections are specified, the result stream is not guaranteed to be de-duplicated on Did and Dids may be repeated.
2525+(A merge window is used so that the service is _likely_ to not send duplicate Dids.)
2626+2727+2828+### Analytics queries
2929+3030+```
3131+/v1/listCollections?c={}&cursor={}&limit={50<=limit<=1000}
3232+```
3333+3434+`listCollections` returns JSON with a map of collection name to approximate number of dids implementing it.
3535+With no `c` parameter it returns all known collections with cursor paging.
3636+With up to 20 repeated `c` paramaters it returns only those collections (no paging).
3737+It may be the cached result of a computation, up to several minutes out of date.
3838+```json
3939+{"collections":{"app.bsky.feed.post": 123456789, "some collection": 42},
4040+"cursor":"opaque text"}
4141+```
4242+4343+4444+## Design
4545+4646+### Schema
4747+4848+The primary database is (collection, seen time int64 milliseconds, did)
4949+5050+This allows for efficient cursor fetching of more dids for a collection.
5151+5252+e.g. A new service starts consuming the firehose for events it wants in collection `com.newservice.data.thing`,
5353+it then calls the collection directory for a list of repos which may have already created data in this collection,
5454+and does `getRepo` calls to those repo's PDSes to get prior data.
5555+By the time it is done paging forward through the collection directory results and getting those repos,
5656+it will have backfilled data and new data it has collected live off the firehose.