Real-time index of opencode sessions
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

Add recombination plan for flow and registry type unification

rektide 9e37815e a1079d61

+266
+266
doc/discovery/regenesis-recomb.md
··· 1 + # Regenesis Recomb: Type Unification Across Flow Registries 2 + 3 + > This document proposes type recombination and unification work to reduce impedance between flow-stage registries, index/storage registries, and the rebuilt public API. 4 + 5 + ## Problem 6 + 7 + The API has improved significantly with staged flow methods and clearer index boundaries, but we still carry type friction between layers: 8 + 9 + - path-oriented storage types 10 + - ID-oriented graph/index types 11 + - flow-stage scope/result types 12 + - watch/change event types (planned) 13 + 14 + This friction shows up as repeated conversions, duplicated key shapes, and uneven error/report surfaces. 15 + 16 + ## Current State 17 + 18 + Key registries and stage outputs in the codebase: 19 + 20 + - Structural index registries in [`/src/index.rs`](/src/index.rs) 21 + - `session_metas`, `message_metas`, `part_refs` 22 + - reverse indexes by project/session/message 23 + - Mapping registry in [`/src/storage/mmap.rs`](/src/storage/mmap.rs) 24 + - `path -> Arc<MappedFile>` 25 + - Flow-stage registries in [`/src/materializer.rs`](/src/materializer.rs) 26 + - `SessionFlowScope`, `MessageFlowScope`, `SessionFlowResult` 27 + - Path resolution + typed reads in [`/src/storage/paths.rs`](/src/storage/paths.rs), [`/src/storage/reader.rs`](/src/storage/reader.rs) 28 + 29 + What is good: 30 + 31 + 1. Staged flow now exists and is configurable with Bon. 32 + 2. Index graph uses typed IDs, not raw strings. 33 + 3. Mapping cache is explicit and shared. 34 + 35 + Where impedance remains: 36 + 37 + 1. Key shapes vary by subsystem (path tuples vs ad-hoc struct fields). 38 + 2. Stage outputs bundle hydrated and structural concerns inconsistently. 39 + 3. Error surfaces are mostly operation-centric, not stage/key-centric. 40 + 4. Planned watch/event model is not yet fully integrated with flow result types. 41 + 42 + ## Unification Goal 43 + 44 + Adopt one canonical domain key model and one canonical staged result model, then derive all registries and APIs from those shared types. 45 + 46 + ```mermaid 47 + flowchart LR 48 + Keys[Canonical keys] --> Index[Index registries] 49 + Keys --> Paths[Path registry adapters] 50 + Keys --> Flow[Plan/resolve/hydrate registries] 51 + Keys --> CDC[Change events] 52 + 53 + Stages[Canonical stage result envelope] --> Flow 54 + Stages --> CDC 55 + Stages --> Reports[Diagnostics/reports] 56 + ``` 57 + 58 + ## Draft Type Recombination 59 + 60 + ### 1) Canonical keys package 61 + 62 + Introduce one key module (for example `src/domain/key.rs`) that defines: 63 + 64 + ```rust 65 + pub struct SessionKey { 66 + pub project_id: String, 67 + pub session_id: SessionId, 68 + } 69 + 70 + pub struct MessageKey { 71 + pub session_id: SessionId, 72 + pub message_id: MessageId, 73 + } 74 + 75 + pub struct PartKey { 76 + pub message_id: MessageId, 77 + pub part_id: PartId, 78 + } 79 + 80 + pub enum EntityKey { 81 + Session(SessionKey), 82 + Message(MessageKey), 83 + Part(PartKey), 84 + SessionDiff { session_id: SessionId }, 85 + } 86 + ``` 87 + 88 + Why: 89 + 90 + - eliminates ad-hoc tuple/field copies 91 + - gives index, flow, and CDC the same identity language 92 + 93 + ### 2) Canonical path binding type 94 + 95 + Bind identity and path together once: 96 + 97 + ```rust 98 + pub struct EntityPath { 99 + pub key: EntityKey, 100 + pub path: std::path::PathBuf, 101 + } 102 + ``` 103 + 104 + Why: 105 + 106 + - path-only and key-only systems both need this bridge 107 + - avoids repeated path classification logic per subsystem 108 + 109 + ### 3) Canonical stage envelope 110 + 111 + Define a generic stage output envelope: 112 + 113 + ```rust 114 + pub struct StageResult<T, R> { 115 + pub value: T, 116 + pub report: R, 117 + pub generation: Option<u64>, 118 + } 119 + ``` 120 + 121 + Apply to: 122 + 123 + - planning (`StageResult<SessionPlan, PlanReport>`) 124 + - resolve (`StageResult<SessionRefTree, ResolveReport>`) 125 + - hydrate (`StageResult<SessionHydratedTree, HydrateReport>`) 126 + 127 + Why: 128 + 129 + - consistent return contracts 130 + - diagnostics always available without a side channel 131 + 132 + ### 4) Canonical leaf reference type 133 + 134 + Align tree + cache with one leaf shape: 135 + 136 + ```rust 137 + pub struct EntityRef { 138 + pub key: EntityKey, 139 + pub span: MappedSpan, 140 + } 141 + ``` 142 + 143 + Why: 144 + 145 + - one reference primitive across resolver and hydrator 146 + - natural join with CDC keys 147 + 148 + ### 5) Canonical report item shape 149 + 150 + Unify error/skip diagnostics: 151 + 152 + ```rust 153 + pub struct StageIssue { 154 + pub stage: StageName, 155 + pub key: Option<EntityKey>, 156 + pub path: Option<std::path::PathBuf>, 157 + pub kind: IssueKind, 158 + pub message: String, 159 + } 160 + ``` 161 + 162 + Why: 163 + 164 + - avoids bespoke report enums per stage where not needed 165 + - improves observability and test assertions 166 + 167 + ## API Improvements from Unification 168 + 169 + ### A) Materializer stage API convergence 170 + 171 + Move from mixed return shapes to a consistent staged API family: 172 + 173 + - `plan_session_tree(options) -> StageResult<SessionPlan, PlanReport>` 174 + - `resolve_session_tree(plan) -> StageResult<SessionRefTree, ResolveReport>` 175 + - `hydrate_session_tree(ref_tree) -> StageResult<SessionHydratedTree, HydrateReport>` 176 + 177 + ### B) Convenience wrappers become thin adapters 178 + 179 + Existing convenience methods remain but delegate to canonical stages: 180 + 181 + - `load_session_tree(id)` 182 + - `load_message_with_parts(id)` 183 + 184 + ### C) Error + report policy toggles 185 + 186 + Use typed policy enum instead of ad-hoc behavior: 187 + 188 + ```rust 189 + pub enum ConsistencyPolicy { 190 + Strict, 191 + Tolerant, 192 + } 193 + ``` 194 + 195 + This policy applies equally to index build, flow resolve, and hydrate. 196 + 197 + ## Registry-by-Registry Unification Map 198 + 199 + ### Structural index registry 200 + 201 + Unify map keys: 202 + 203 + - `HashMap<SessionKey, SessionMeta>` for session-level metadata where project context is required 204 + - keep reverse indexes keyed by canonical keys where possible 205 + 206 + ### Mapping registry 207 + 208 + Keep `path -> Arc<MappedFile>` internally, but surface `EntityRef` for external stage outputs. 209 + 210 + ### Flow registries 211 + 212 + Replace bespoke scope structs with canonical plan graph nodes keyed by `EntityKey` variants. 213 + 214 + ### CDC/event registry 215 + 216 + Emit canonical keys in all events; avoid introducing parallel key enums. 217 + 218 + ## Key Design Choices 219 + 220 + ### 1) Canonical key types live in one domain module 221 + 222 + Decision: 223 + 224 + - no duplicate key structs in `index`, `materializer`, and `watch` modules. 225 + 226 + ### 2) Stage outputs always carry reports 227 + 228 + Decision: 229 + 230 + - make diagnostics first-class in method signatures. 231 + 232 + ### 3) Reference-first trees remain the default flow product 233 + 234 + Decision: 235 + 236 + - hydration is layered and explicit. 237 + 238 + ### 4) Policy is explicit and reusable 239 + 240 + Decision: 241 + 242 + - strict/tolerant behavior is configured once and honored across stages. 243 + 244 + ## Implementation Plan 245 + 246 + 1. Add canonical key module and migrate internal signatures. 247 + 2. Add shared stage/report envelope types. 248 + 3. Refactor flow stage types to use canonical keys. 249 + 4. Introduce `EntityRef` and align resolve outputs. 250 + 5. Rewire convenience API methods to staged wrappers. 251 + 6. Update tests to assert report contents and key consistency. 252 + 253 + ## Acceptance Criteria 254 + 255 + 1. No duplicate identity key models across index/materializer/watch layers. 256 + 2. All stage APIs return consistent envelope + report structures. 257 + 3. Flow outputs and CDC events share the same key vocabulary. 258 + 4. Convenience methods are wrappers over canonical staged APIs. 259 + 5. Tests verify strict/tolerant policy behavior using unified reports. 260 + 261 + ## Expected Outcomes 262 + 263 + - Lower cognitive overhead when traversing code. 264 + - Fewer conversion bugs between path/key/flow/event layers. 265 + - Better API ergonomics for library consumers. 266 + - Cleaner path to full API docs because contracts are uniform.