lightweight com.atproto.sync.listReposByCollection
45
fork

Configure Feed

Select the types of activity you want to include in your feed.

always more todos, always always

phil 9deee8c1 b8e39320

+10 -5
+10 -5
hacking.md
··· 104 104 - [ ] resync short-circuit: tiny repos may actually return their entire CAR for getRecord 105 105 - [ ] commit CAR handling: generate a list of keys with gaps noted, to reliably detect missing adjacent keys 106 106 - [ ] repo-stream: drop record block contents with processor fn 107 - 107 + - [ ] meta/metrics keyspace for general stats 108 + - [ ] total repos (hyperloglog estimate?) 109 + - [ ] resync queue size 108 110 109 111 very much still todo but i'm getting tired 110 112 - [x] config: add a `--heavy` mode that always uses `getRepo` and never `describeRepo` 111 113 - [x] config: db mem limit `--fjall-cache-mb` 112 114 - [x] config: per-host request rate self-throttling `--crawl-qps` (name from collectiondir) 113 115 - [ ] resync: estimate CAR size from `getRecord` mst height; `getRepo` if it's likely very small 114 - - [ ] multi-relay subscriber 115 - - [ ] special did:web behaviour to keep reusing a stale resolution on failure 116 + - [ ] special did:web ident cache behaviour to keep reusing a stale resolution on failure 116 117 - [ ] admin view of backfill state etc 117 118 - [ ] vanity stats for optimizations, like how many in-flight repos were saved from resync due to high-water-mark firehose cursor persistence 118 119 - [ ] if the upstream is a PDS (check with describeServer?) then make only accept events for DIDs that have it as their PDS ··· 120 121 - [ ] combine the throttled http client instance, the db, and the admin info into an appstate fineeeee 121 122 - [ ] bad word filtering? (collectiondir has it) 122 123 - [ ] check response headers and adjust self-throttling rate limits per-host if present 124 + - [ ] make backfill go _really fast_ 125 + 126 + going to be annoying but doable 127 + - [ ] multi-relay subscriber 123 128 124 129 125 130 ### special-casing ··· 130 135 ## some choices 131 136 132 137 - tokio for async runtime: works good 133 - - jacquard almost everywhere: makes things *so much* easier 138 + - jacquard almost everywhere: works good 134 139 - repo-stream for CAR processing 135 - - fjall: workload is write-heavy so LSM is a good fit, space efficiency also very desirable 140 + - fjall: workload is write-heavy so LSM works good, space efficiency also very nice 136 141 137 142 138 143 ## resync: getting a repo's full collection list