Dim-table the DID columns; document the migration learnings
Schema change for storage and query efficiency:
- Add `dids` dim table (BIGINT id PK, VARCHAR did)
- Fact tables (types, urls, backlinks, mentions) now reference DIDs
through BIGINT did_id / subject_did_id columns
- In-process map[string]int64, hydrated from `dids` at startup, source
of truth for ID assignment; new IDs append to a dedicated dim
appender that flushes before the fact appenders so a fact row is
never visible without its referenced dim row
Required dropping the existing 5.35 GB DuckDB file (DROP TABLE +
CHECKPOINT doesn't return space to the OS in 1.5.x). Cursor in sqlite
preserved, harvester replays back to Jetstream's retention boundary
on reconnect.
Also captured today's findings in learnings.md: DuckDB's process-level
exclusive lock (read-only opens fail too), the file-shrink limitation,
the live-machine entrypoint-swap pattern for ad-hoc queries, the
shared-cpu-1x:1024 OOM, and per-day storage growth / pruning strategy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>