Fast and robust atproto CAR file processing in rust
14
fork

Configure Feed

Select the types of activity you want to include in your feed.

update timings etc

authored by

phil and committed by tangled.org 3326eb0c b9b4da67

+29 -3
+29 -3
readme.md
··· 71 71 72 72 current car processing times (records processed into their length usize, phil's dev machine): 73 73 74 - - 128MiB CAR file: `350ms` 74 + - 450MiB CAR file (huge): `1.3s` 75 + - 128MiB (huge): `350ms` 75 76 - 5.0MiB: `6.8ms` 76 77 - 279KiB: `170us` 77 78 - 3.4KiB: `5.2us` ··· 85 86 static GLOBAL: MiMalloc = MiMalloc; 86 87 ``` 87 88 88 - - 128MiB CAR file: `310ms` (-13%) 89 + - 450MiB CAR file: `1.1s` (-15%) 90 + - 128MiB: `310ms` (-13%) 89 91 - 5.0MiB: `6.1ms` (-10%) 90 92 - 279KiB: `160us` (-5%) 91 93 - 3.4KiB: `5.7us` (-9%) 92 94 - empty: `660ns` (-7%) 93 95 96 + processing CARs requires buffering blocks, so it can consume a lot of memory. repo-stream's in-memory driver has minimal memory overhead, but there are two ways to make it work with less mem (you can do either or both!) 94 97 95 - running the huge-car benchmark 98 + 1. spill blocks to disk 99 + 2. inline block processing 100 + 101 + #### spill blocks to disk 102 + 103 + this is a little slower but can greatly reduce the memory used. there's nothing special you need to do for this. 104 + 105 + 106 + #### inline block processing 107 + 108 + if you don't need to store the complete records, you can have repo-stream try to optimistically apply a processing function to the raw blocks as they are streamed in. 109 + 110 + 111 + #### constrained mem perf comparison 112 + 113 + sketchy benchmark but hey. mimalloc is enabled, and the processing spills to disk. inline processing reduces entire records to 8 bytes (usize of the raw record block size): 114 + 115 + - 450MiB CAR file: `5.0s` (4.5x slowdown for disk) 116 + - 128MiB: `1.27s` (4.1x slowdown) 117 + 118 + fortunately, most CARs in the ATmosphere are very small, so for eg. backfill purposes, the vast majority of inputs will not face this slowdown. 119 + 120 + 121 + #### running the huge-car benchmark 96 122 97 123 - to avoid committing it to the repo, you have to pass it in through the env for now. 98 124