fix: cap did_cache at 500K entries to prevent OOM

the DID→UID lookup cache in event_log.zig was completely unbounded —
no max size, no eviction, no TTL. every DID seen on the firehose got
cached forever. with 61M+ DIDs on the network, this grows linearly
until OOM kill.

cap at 500K entries (~40 MB). on eviction, clear the entire map —
there's no per-entry timestamp to sort by, and the postgres fallback
is fast enough (~0.5ms per miss) that a full clear is fine.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zzstoatzz 4 months ago 0f30da76 2a885c6f

+22 -11

1 changed file

expand all

src

event_log.zig

+22 -11

src/event_log.zig

··· 94 94 // DID → UID cache (matches indigo's bidirectional ARC cache) 95 95 did_cache: std.StringHashMapUnmanaged(u64) = .{}, 96 96 did_cache_mutex: std.Thread.Mutex = .{}, 97 + max_did_cache_size: u32 = 500_000, 97 98 98 99 // write buffer (flushed periodically or when threshold hit) 99 100 outbuf: std.ArrayListUnmanaged(u8) = .{}, ··· 276 277 var r = row; 277 278 defer r.deinit() catch {}; 278 279 const uid: u64 = @intCast(r.get(i64, 0)); 279 - // populate cache 280 - const did_duped = try self.allocator.dupe(u8, did); 281 - self.did_cache_mutex.lock(); 282 - defer self.did_cache_mutex.unlock(); 283 - self.did_cache.put(self.allocator, did_duped, uid) catch { 284 - self.allocator.free(did_duped); 285 - }; 280 + self.didCachePut(did, uid); 286 281 return uid; 287 282 } 288 283 ··· 303 298 defer row.deinit() catch {}; 304 299 const uid: u64 = @intCast(row.get(i64, 0)); 305 300 306 - // populate cache 307 - const did_duped = try self.allocator.dupe(u8, did); 301 + self.didCachePut(did, uid); 302 + return uid; 303 + } 304 + 305 + /// insert into did_cache, evicting if at capacity. 306 + /// this is a pure lookup cache over postgres — clearing it only costs 307 + /// ~0.5ms per miss on the next lookup for that DID. 308 + fn didCachePut(self: *DiskPersist, did: []const u8, uid: u64) void { 308 309 self.did_cache_mutex.lock(); 309 310 defer self.did_cache_mutex.unlock(); 311 + 312 + // evict when at capacity: free all keys and clear the map. 313 + // unlike the validator cache there's no per-entry timestamp to sort by, 314 + // and the postgres fallback is fast enough that a full clear is fine. 315 + if (self.did_cache.count() >= self.max_did_cache_size) { 316 + log.info("did_cache at capacity ({d}), clearing", .{self.did_cache.count()}); 317 + var it = self.did_cache.iterator(); 318 + while (it.next()) |entry| self.allocator.free(entry.key_ptr.*); 319 + self.did_cache.clearRetainingCapacity(); 320 + } 321 + 322 + const did_duped = self.allocator.dupe(u8, did) catch return; 310 323 self.did_cache.put(self.allocator, did_duped, uid) catch { 311 324 self.allocator.free(did_duped); 312 325 }; 313 - 314 - return uid; 315 326 } 316 327 317 328 /// per-DID sync state for chain tracking

Configure Feed

Configure Feed