Real-time index of opencode sessions
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

doc: append alternative design sketches appendix to watchman discovery

Five alternatives: eager channel reload, content-hash validation,
two-tier hot/cold index, coarse epoch invalidation, Arc eviction.
Each with code sketch, pros/cons, and a comparison table.

rektide fa116161 279d48eb

+174
+174
doc/discovery/watchman.md
··· 501 501 - [Watchman Subscribe API](https://facebook.github.io/watchman/docs/cmd/subscribe) 502 502 - [Watchman Expression Syntax](https://facebook.github.io/watchman/docs/expr/allof) 503 503 - [Clockspec Documentation](https://facebook.github.io/watchman/docs/clockspec) 504 + 505 + --- 506 + 507 + ## Appendix: Alternative Design Sketches 508 + 509 + ### A. Channel-Driven Eager Reload 510 + 511 + Instead of dirty-tracking, the watcher task classifies events into `SessionEvent` variants and sends them over a `tokio::sync::mpsc` channel. A dedicated consumer task eagerly reloads affected objects. 512 + 513 + ```rust 514 + // Watcher produces typed events 515 + let (tx, mut rx) = tokio::sync::mpsc::channel::<Vec<SessionEvent>>(64); 516 + 517 + // Consumer drains and reloads 518 + tokio::spawn(async move { 519 + while let Some(batch) = rx.recv().await { 520 + let mut index = index.write(); 521 + for event in batch { 522 + match event { 523 + SessionEvent::SessionChanged { project_id, session_id, path } => { 524 + let info = reader.read_session(&project_id, &session_id)?; 525 + index.upsert_session(session_id, info); 526 + } 527 + SessionEvent::PartDeleted { message_id, part_id } => { 528 + index.remove_part(&part_id); 529 + cache.invalidate(&paths.part_file(&message_id, &part_id)); 530 + } 531 + // ... 532 + } 533 + } 534 + } 535 + }); 536 + ``` 537 + 538 + **Pros:** Index is always current; no stale reads possible. Simple mental model — events go in, state comes out. 539 + 540 + **Cons:** Reloads files that may never be accessed. Under burst writes (active session producing many parts), the consumer falls behind and holds a write lock during reload I/O. Requires `RwLock<SessionIndex>` — every read now contends with the reload task. 541 + 542 + --- 543 + 544 + ### B. Content-Hash Validation 545 + 546 + Use watchman's `content.sha1hex` field to detect whether a file *actually* changed content (not just mtime). Cache entries store their content hash; on access, compare against the dirty tracker's recorded hash. 547 + 548 + ```rust 549 + query_result_type! { 550 + struct HashedFile { 551 + name: NameField, 552 + exists: ExistsField, 553 + content_sha1: ContentSha1HexField, 554 + } 555 + } 556 + 557 + struct CacheEntry { 558 + mmap: Arc<MappedFile>, 559 + content_hash: String, 560 + } 561 + 562 + impl TrackedCache { 563 + fn get(&self, path: &Path) -> Result<Arc<MappedFile>> { 564 + if let Some(entry) = self.files.read().get(path) { 565 + if let Some(new_hash) = self.tracker.dirty_hash(path) { 566 + if entry.content_hash == new_hash { 567 + // mtime changed but content didn't — still clean 568 + return Ok(Arc::clone(&entry.mmap)); 569 + } 570 + } else { 571 + return Ok(Arc::clone(&entry.mmap)); 572 + } 573 + } 574 + self.reload(path) 575 + } 576 + } 577 + ``` 578 + 579 + **Pros:** Avoids redundant reloads when only metadata changed (e.g., `touch` without content change). Especially useful if opencode does atomic-rename writes that bump mtime even when content is identical. 580 + 581 + **Cons:** Requesting `content.sha1hex` makes watchman compute the hash server-side on every change — adds latency to the event delivery. Overkill for JSON files that almost always have different content when mtime changes. 582 + 583 + --- 584 + 585 + ### C. Two-Tier Index: Hot Metadata + Cold Content 586 + 587 + Split the index into two layers. The **hot tier** holds lightweight metadata (ids, timestamps, counts) and is always eagerly rebuilt on change. The **cold tier** holds deserialized content (`SessionInfo`, `Message`, `Part`) and is lazily loaded. 588 + 589 + ```rust 590 + /// Always current — rebuilt eagerly from directory listings 591 + pub struct HotIndex { 592 + sessions: HashMap<SessionId, HotSessionMeta>, // id, project, mtime only 593 + messages: HashMap<MessageId, HotMessageMeta>, // id, session_id, role only 594 + parts: HashMap<PartId, HotPartMeta>, // id, message_id, path only 595 + } 596 + 597 + /// Loaded on demand — may be stale, checked via mtime 598 + pub struct ColdCache { 599 + sessions: HashMap<SessionId, (i64, SessionInfo)>, // (mtime, data) 600 + messages: HashMap<MessageId, (i64, Message)>, 601 + parts: HashMap<PartId, (i64, Part)>, 602 + } 603 + 604 + impl ColdCache { 605 + fn get_session(&self, id: &SessionId, hot: &HotIndex) -> Option<&SessionInfo> { 606 + let hot_meta = hot.sessions.get(id)?; 607 + let (cached_mtime, data) = self.sessions.get(id)?; 608 + if *cached_mtime >= hot_meta.mtime { 609 + Some(data) 610 + } else { 611 + None // caller reloads 612 + } 613 + } 614 + } 615 + ``` 616 + 617 + **Pros:** Structural queries ("how many sessions?", "list messages for session X") are always instant and never stale. Only content access triggers I/O. Natural separation of concerns. 618 + 619 + **Cons:** Hot tier rebuild still requires directory listing I/O on every change batch. Two data structures to maintain. Mtime comparison is racy — a file could be written twice within the same second. 620 + 621 + --- 622 + 623 + ### D. Epoch-Based Invalidation (Coarse-Grained) 624 + 625 + Instead of per-file tracking, maintain a single epoch per entity *directory*. When any file in `session/<projectID>/` changes, bump that directory's epoch. All session metadata for that project is considered stale. 626 + 627 + ```rust 628 + pub struct EpochTracker { 629 + session_epochs: RwLock<HashMap<String, u64>>, // project_id → epoch 630 + message_epochs: RwLock<HashMap<SessionId, u64>>, 631 + part_epochs: RwLock<HashMap<MessageId, u64>>, 632 + } 633 + ``` 634 + 635 + **Pros:** Extremely simple — one counter per directory, no per-file bookkeeping. Low memory overhead. Trivial to implement. 636 + 637 + **Cons:** Over-invalidates. If one session in a project changes, every session in that project is marked stale. Acceptable if projects typically have few active sessions, problematic if hundreds of sessions share a project. 638 + 639 + --- 640 + 641 + ### E. `Arc::strong_count` Eviction 642 + 643 + Rather than tracking dirtiness, just evict changed files from the cache. Consumers holding an `Arc<MappedFile>` continue using their snapshot; new accesses get a fresh mmap. No generation tracking needed — the `Arc` reference count naturally manages lifetime. 644 + 645 + ```rust 646 + impl MappedFileCache { 647 + /// Called by watcher: remove the cache entry. 648 + /// Existing Arc holders keep their snapshot. 649 + /// Next access triggers a fresh mmap. 650 + fn evict(&self, path: &Path) { 651 + self.files.write().remove(path); 652 + } 653 + } 654 + 655 + // Consumer code — snapshot semantics 656 + let mmap = cache.get(&path)?; // Arc<MappedFile> 657 + // ... use mmap for duration of request ... 658 + // if file was evicted mid-use, this Arc still holds the old mapping 659 + // next cache.get() will produce the new version 660 + ``` 661 + 662 + **Pros:** Simplest possible implementation — just cache eviction, no new types. Naturally provides snapshot isolation for in-flight reads. No generation counters, no dirty maps. 663 + 664 + **Cons:** No way to know *whether* something changed without trying to access it. No way to batch-check "is my view current?". Loses the stale-data-on-error fallback — if the file is mid-write when we remap, we get a partial read (though atomic rename avoids this). 665 + 666 + --- 667 + 668 + ### Comparison 669 + 670 + | Approach | Complexity | Read Overhead | Write Overhead | Stale Window | 671 + |---|---|---|---|---| 672 + | **Generation (chosen)** | Medium | 1 atomic load | Hash insert + atomic tick | Until next access | 673 + | **A. Eager channel** | Medium | None (always fresh) | Full re-parse per change | None | 674 + | **B. Content hash** | Medium | Hash compare | Watchman hashes server-side | Until next access | 675 + | **C. Two-tier** | High | Hot=none, Cold=mtime check | Dir listing + hot rebuild | Hot=none, Cold=until access | 676 + | **D. Epoch** | Low | 1 atomic load | 1 atomic tick per dir | Until next access (over-invalidates) | 677 + | **E. Arc eviction** | Low | Cache miss on evicted | Hash remove | Until next access |