this repo has no description
1
2`collectiondir`: Directory of Accounts by Collection
3====================================================
4
5This is a small atproto microservice which maintains a directory of which accounts in the network (DIDs) have data (records) for which collections (NSIDs).
6
7It primarily serves the `com.atproto.sync.listReposByCollection` API endpoint:
8
9```
10GET /xrpc/com.atproto.sync.listReposByCollection?collection=com.atproto.sync.listReposByCollection?collection=com.atproto.lexicon.schema&limit=3
11
12{
13 "repos": [
14 { "did": "did:plc:4sm3vprfyl55ui3yhjd7w4po" },
15 { "did": "did:plc:xhkqwjmxuo65vwbwuiz53qor" },
16 { "did": "did:plc:w3aonw33w3mz3mwws34x5of6" }
17 ],
18 "cursor": "QQAAAEkAAAGVgFFLb2RpZDpwbGM6dzNhb253MzN3M216M213d3MzNHg1b2Y2AA=="
19}
20```
21
22Features and design points:
23
24- persists data in a local key/value database (pebble)
25- consumes from the firehose to stay up to date with record creation
26- can bootstrap the full network using `com.atproto.sync.listRepos` and `com.atproto.repo.describeRepo`
27- single golang binary for easy deployment
28
29
30## Analytics Endpoint
31
32```
33/v1/listCollections?c={}&cursor={}&limit={50<=limit<=1000}
34```
35
36`listCollections` returns JSON with a map of collection name to approximate number of dids implementing it.
37With no `c` parameter it returns all known collections with cursor paging.
38With up to 20 repeated `c` parameters it returns only those collections (no paging).
39It may be the cached result of a computation, up to several minutes out of date.
40```json
41{"collections":{"app.bsky.feed.post": 123456789, "some collection": 42},
42"cursor":"opaque text"}
43```
44
45
46## Database Schema
47
48The primary database is (collection, seen time int64 milliseconds, did)
49
50This allows for efficient cursor fetching of more dids for a collection.
51
52e.g. A new service starts consuming the firehose for events it wants in collection `com.newservice.data.thing`,
53it then calls the collection directory for a list of repos which may have already created data in this collection,
54and does `getRepo` calls to those repo's PDSes to get prior data.
55By the time it is done paging forward through the collection directory results and getting those repos,
56it will have backfilled data and new data it has collected live off the firehose.