Real-time index of opencode sessions
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

Add tree-aware watch regenesis architecture notes

rektide a1079d61 5eee8013

+282
+282
doc/discovery/regenesis-watchman.md
··· 1 + # Regenesis Watchman: Change-Driven Tree Coherence 2 + 3 + > This document revisits watch/change architecture in light of the current staged tree work. It is not a copy of the original watchman plan; it focuses on coherence risks, change routing, and the registries we now operate. 4 + 5 + ## Problem 6 + 7 + The crate now has a stronger staged model (plan -> resolve -> hydrate) and is moving toward mmap-first tree leaves. That improves ergonomics and memory behavior, but it also raises consistency challenges when storage changes while staged work is in progress. 8 + 9 + We need a change architecture that guarantees: 10 + 11 + 1. Structural queries stay coherent with filesystem reality. 12 + 2. Tree stages can detect when their inputs are stale. 13 + 3. Reference leaves remain valid snapshots while allowing lazy reloads for subsequent requests. 14 + 4. Consumers receive structured change signals, not opaque path strings. 15 + 16 + ## Current State 17 + 18 + Current repositories/registries already present in code: 19 + 20 + ### Filesystem repositories (source of truth) 21 + 22 + - Session files: `storage/session/<project>/<session>.json` 23 + - Message files: `storage/message/<session>/<message>.json` 24 + - Part files: `storage/part/<message>/<part>.json` 25 + - Session diffs: `storage/session_diff/<session>.json` 26 + 27 + Path modeling is in [`/src/storage/paths.rs`](/src/storage/paths.rs). 28 + 29 + ### In-memory structural registries 30 + 31 + `SessionIndex` in [`/src/index.rs`](/src/index.rs) currently stores: 32 + 33 + - `session_metas` 34 + - `message_metas` 35 + - `part_refs` 36 + - `session_ids_by_project` 37 + - `message_ids_by_session` 38 + - `part_ids_by_message` 39 + 40 + These are the primary relationship registries where structural changes must flow. 41 + 42 + ### In-memory mapping registry 43 + 44 + `MappedFileCache` in [`/src/storage/mmap.rs`](/src/storage/mmap.rs) stores: 45 + 46 + - `path -> Arc<MappedFile>` 47 + 48 + This is the byte-level registry for payload access and snapshot semantics. 49 + 50 + ### Flow-stage registries (ephemeral) 51 + 52 + Flow decomposition in [`/src/materializer.rs`](/src/materializer.rs) adds staged containers: 53 + 54 + - `SessionFlowScope` 55 + - `MessageFlowScope` 56 + - `SessionFlowResult` 57 + 58 + These are per-request repositories and can become stale if changes happen mid-flow. 59 + 60 + ## Key Concerns Introduced by Watch-Driven Updates 61 + 62 + ### 1) Multi-registry coherence 63 + 64 + A single file change can affect multiple registries (for example, deleting a session affects session map, reverse indexes, message maps, part refs, mmap cache entries). 65 + 66 + Concern: 67 + 68 + - Partial apply can leave index and cache inconsistent. 69 + 70 + ### 2) Stage staleness during plan -> resolve -> hydrate 71 + 72 + A flow plan can be generated at generation `g`, but resolve/hydrate may run after `g+1` changes arrive. 73 + 74 + Concern: 75 + 76 + - planned IDs may refer to deleted or replaced entities. 77 + 78 + ### 3) Path-key vs entity-key drift 79 + 80 + Watchman emits paths; tree APIs use entity IDs and relationships. 81 + 82 + Concern: 83 + 84 + - path-only invalidation is insufficient for tree-level coherence and CDC semantics. 85 + 86 + ### 4) `is_fresh_instance` and continuity break 87 + 88 + When Watchman loses continuity, incremental guarantees are gone. 89 + 90 + Concern: 91 + 92 + - all registries need synchronized recovery policy, not ad-hoc partial clears. 93 + 94 + ### 5) Backpressure and event loss behavior 95 + 96 + Burst writes can overrun in-process queues. 97 + 98 + Concern: 99 + 100 + - if drops occur, consumers need explicit resync signaling and cursor semantics. 101 + 102 + ### 6) Mmap lifetime versus "latest" view 103 + 104 + `Arc<MappedFile>` intentionally provides snapshot behavior for in-flight users. 105 + 106 + Concern: 107 + 108 + - without generation metadata, callers cannot know whether a given leaf is latest or stale. 109 + 110 + ### 7) Lock contention under hot write streams 111 + 112 + Naive write-heavy index updates can block read paths. 113 + 114 + Concern: 115 + 116 + - staged tree APIs lose responsiveness under high change rates. 117 + 118 + ## Registries/Repositories Where Changes Should Flow 119 + 120 + For this architecture, change propagation should explicitly target these repositories in order: 121 + 122 + 1. **Change ingest repository** 123 + - raw file changes from watch backend 124 + - canonicalized `FileChange` records 125 + 126 + 2. **Classification repository** 127 + - `FileChange -> EntityKey + ChangeOp` 128 + - path and identity joined into one event 129 + 130 + 3. **Structural registry repository** (SessionIndex maps) 131 + - upsert/remove metadata and reverse indexes 132 + 133 + 4. **Mapping registry repository** (MappedFileCache) 134 + - evict or mark stale path mappings 135 + 136 + 5. **Generation registry repository** 137 + - `generation`, `entity_dirty`, `path_dirty`, and per-batch cursor sequence 138 + 139 + 6. **Flow coherence repository** 140 + - plan/resolve/hydrate generation stamps and stale checks 141 + 142 + 7. **CDC/event repository** 143 + - low-level `CdcEvent` 144 + - session-level `SessionUpdate` 145 + 146 + 8. **Observability repository** 147 + - metrics, traces, lag, drop counters 148 + 149 + ## Draft Architecture (Tree-Aware) 150 + 151 + ```mermaid 152 + flowchart TD 153 + Feed[Watchman/notify/manual feed] --> Ingest[FileChange ingest repository] 154 + Ingest --> Classify[Entity classifier repository] 155 + Classify --> ApplyIndex[SessionIndex apply repository] 156 + Classify --> ApplyCache[MappedFileCache apply repository] 157 + Classify --> ApplyGen[Generation registry] 158 + 159 + ApplyIndex --> Plan[Flow planner] 160 + ApplyGen --> Plan 161 + Plan --> Resolve[Ref-tree resolver] 162 + ApplyGen --> Resolve 163 + Resolve --> Hydrate[Optional hydrator] 164 + 165 + Classify --> CDC[CdcEvent repository] 166 + CDC --> SessionUpdates[SessionUpdate repository] 167 + CDC --> Metrics[Observability repository] 168 + ``` 169 + 170 + ## Key Design Choices 171 + 172 + ### 1) Single-writer apply loop 173 + 174 + Decision: 175 + 176 + - One writer task applies all structural/cache/generation updates in ordered batches. 177 + 178 + Why: 179 + 180 + - Ensures deterministic registry mutation order. 181 + - Avoids interleaving bugs across related maps. 182 + 183 + ### 2) Generation as flow contract, not just cache invalidation 184 + 185 + Decision: 186 + 187 + - Every plan/resolve/hydrate stage carries generation bounds. 188 + 189 + Why: 190 + 191 + - lets staged APIs detect stale plans and either retry or return typed stale errors. 192 + 193 + ### 3) Entity-first classification 194 + 195 + Decision: 196 + 197 + - Convert paths into `EntityKey` early and keep both path + key in events. 198 + 199 + Why: 200 + 201 + - trees and CDC operate on identity keys; cache invalidation still needs paths. 202 + 203 + ### 4) Explicit resync protocol 204 + 205 + Decision: 206 + 207 + - on continuity loss or queue overflow, emit `ResyncStarted/ResyncCompleted` and rebuild structural registries in one controlled pass. 208 + 209 + Why: 210 + 211 + - prevents silent divergence between local registries and filesystem truth. 212 + 213 + ### 5) Snapshot-friendly reference leaves 214 + 215 + Decision: 216 + 217 + - reference leaves remain valid for in-flight reads even when newer generations exist. 218 + 219 + Why: 220 + 221 + - preserves safe snapshot semantics while allowing next reads to observe fresh data. 222 + 223 + ## Tree-Specific Coherence Rules 224 + 225 + 1. `SessionFlowScope` includes `planned_generation`. 226 + 2. Resolve verifies `current_generation >= planned_generation` and checks per-entity dirtiness. 227 + 3. Hydrate verifies leaf freshness by entity/path generation before parse. 228 + 4. If stale, return typed stale result or internally retry from re-plan policy. 229 + 230 + ## Draft Types to Add 231 + 232 + ```rust 233 + pub struct FlowGenerationGuard { 234 + pub planned_generation: u64, 235 + pub resolved_generation: u64, 236 + } 237 + 238 + pub enum FlowStaleness { 239 + Fresh, 240 + StaleEntity { key: EntityKey, dirty_generation: u64 }, 241 + StalePath { path: PathBuf, dirty_generation: u64 }, 242 + } 243 + 244 + pub struct ApplyBatchReport { 245 + pub generation: u64, 246 + pub applied_events: usize, 247 + pub structural_updates: usize, 248 + pub cache_updates: usize, 249 + } 250 + ``` 251 + 252 + ## Risks and Mitigations 253 + 254 + Risk: Registry fan-out complexity grows quickly. 255 + 256 + - Mitigation: strict apply pipeline and shared `ApplyContext` used by all mutation handlers. 257 + 258 + Risk: Over-invalidation reduces cache efficiency. 259 + 260 + - Mitigation: track both entity and path scopes; invalidate minimally. 261 + 262 + Risk: Recovery storms on repeated backend disruptions. 263 + 264 + - Mitigation: coalesce resync triggers and debounce full rebuilds. 265 + 266 + Risk: Feature incompatibility in watch backend dependencies. 267 + 268 + - Mitigation: keep `ChangeFeed` abstraction and test manual/notify feeds as parity backstops. 269 + 270 + ## Acceptance Criteria 271 + 272 + 1. All registry updates for one change batch are applied atomically in one writer loop. 273 + 2. Tree stage outputs carry generation context and can report staleness. 274 + 3. Path and entity invalidation are both represented and test-covered. 275 + 4. Resync protocol is explicit and observable. 276 + 5. CDC streams are emitted from the same canonical apply path used by index/cache updates. 277 + 278 + ## What This Does Not Require 279 + 280 + - It does not require implementing the exact original watchman plan. 281 + - It does not require immediate durable event log persistence. 282 + - It does require registry coherence and tree-stage correctness regardless of backend.