personal memory agent
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

docs: update OBSERVE.md for standalone observer architecture

Fix stale 3-mode state machine (now 2-mode since tmux extraction).
Document the observer architecture: standalone agents uploading via
HTTP ingest API. Add sol remote CLI reference for managing observers.
Note solstone-tmux and solstone-macos as standalone packages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

+53 -27
+53 -27
docs/OBSERVE.md
··· 2 2 3 3 Multimodal capture and AI-powered analysis of desktop activity. 4 4 5 - On macOS, capture is handled by the `solstone-macos` native companion app rather than a built-in Python observer. 5 + ## Observer Architecture 6 + 7 + Observers are independent capture agents that upload segments to solstone via the HTTP ingest API (`/app/remote/ingest/<key>`). Each observer runs as its own process with its own lifecycle — solstone core is the journal + processing engine. 8 + 9 + | Observer | What it captures | Repo | Runs as | 10 + |----------|-----------------|------|---------| 11 + | **solstone-macos** | Screen + audio on macOS | `solstone-macos` | Native menu bar app | 12 + | **solstone-tmux** | Tmux terminal sessions | `solstone-tmux` | systemd user service / standalone | 13 + | **Linux observer** | Screen + audio on Linux | Built-in (`observe/linux/`) | Supervisor-managed process | 14 + 15 + ### Managing observers 16 + 17 + ```bash 18 + # List all registered observers 19 + sol remote list 20 + 21 + # Register a new observer 22 + sol remote create <name> 23 + 24 + # Check observer status 25 + sol remote status <name> 26 + 27 + # Rename an observer 28 + sol remote rename <old> <new> 29 + 30 + # Revoke an observer's key 31 + sol remote revoke <name> 32 + ``` 6 33 7 34 ## Commands 8 35 ··· 17 44 ## Architecture 18 45 19 46 ``` 20 - sol observer (platform-detected capture) 47 + Observers (standalone or built-in) 48 + ↓ HTTP multipart upload 49 + Remote Ingest API (/app/remote/ingest/<key>) 21 50 22 - Raw media files (*.flac, *.ogg, *.opus, *.wav, *.webm, tmux_*.jsonl) 51 + Raw media files (*.flac, *.webm, tmux_*.jsonl) 23 52 24 53 sol sense (coordination) 25 54 ├── sol transcribe → audio.jsonl 26 55 └── sol describe → screen.jsonl 27 56 ``` 28 57 29 - ## Observer State Machine 58 + ## Linux Observer State Machine 30 59 31 - The Linux observer operates in three modes based on activity: 60 + The Linux observer operates in two modes based on desktop activity: 32 61 33 62 ``` 34 - SCREENCAST 35 - ↗ ↘ 36 - (screen) (screen idle) 37 - ↑ ↓ 38 - IDLE ←----→ TMUX 39 - (tmux active) 63 + SCREENCAST ←→ IDLE 40 64 ``` 41 - 42 - **Mode priority**: Screen activity always wins over tmux (owner is physically present). 43 65 44 66 | Mode | Trigger | Captures | 45 67 |------|---------|----------| 46 68 | SCREENCAST | Screen active (not idle/locked/power-save) | Video + Audio | 47 - | TMUX | Screen idle but tmux has recent client activity | Terminal content + Audio | 48 - | IDLE | Both screen and tmux inactive | Audio only (if threshold met) | 69 + | IDLE | Screen idle, locked, or power-save | Audio only (if threshold met) | 49 70 50 71 **Segment boundaries** are triggered by: 51 - - Transitions to/from SCREENCAST mode (owner returns to or leaves desktop) 72 + - Transitions between SCREENCAST and IDLE modes 52 73 - Mute state changes 53 74 - 5-minute window elapsed 54 - 55 - TMUX ↔ IDLE transitions do **not** trigger boundaries, allowing tmux segments to run the full 5-minute window like screencast segments. 56 75 57 76 ## Key Components 58 77 59 - - **observer.py** - Unified entry point with platform detection 60 - - **linux/observer.py** - Linux capture using native APIs 61 - - **linux/screencast.py** - XDG Portal screencast with PipeWire + GStreamer 62 - - **gnome/activity.py** - GNOME-specific activity detection (idle, lock, power save) 63 - - **sense.py** - File watcher that dispatches transcription and description jobs 64 - - **transcribe.py** - Audio transcription with faster-whisper and sentence-level embeddings 65 - - **describe.py** - Vision analysis with Gemini, category-based prompts 66 - - **categories/** - Category-specific prompts for screen content (see [SCREEN_CATEGORIES.md](SCREEN_CATEGORIES.md)) 78 + - **observer.py** — Unified entry point with platform detection 79 + - **linux/observer.py** — Linux capture: audio + screencast + activity detection 80 + - **linux/screencast.py** — XDG Portal screencast with PipeWire + GStreamer 81 + - **gnome/activity.py** — GNOME-specific activity detection (idle, lock, power save) 82 + - **remote_client.py** — HTTP upload client for observer → server communication 83 + - **sense.py** — File watcher that dispatches transcription and description jobs 84 + - **transcribe.py** — Audio transcription with faster-whisper and sentence-level embeddings 85 + - **describe.py** — Vision analysis with Gemini, category-based prompts 86 + - **categories/** — Category-specific prompts for screen content (see [SCREEN_CATEGORIES.md](SCREEN_CATEGORIES.md)) 67 87 68 - Standalone tmux capture is handled by the external `solstone-tmux` package, which runs as its own systemd user service. 88 + ## Standalone Observers 89 + 90 + **Tmux capture** is handled by the `solstone-tmux` package, which runs as its own systemd user service. See `solstone-tmux` repo for setup instructions. 91 + 92 + **macOS capture** is handled by the `solstone-macos` native Swift app. See `solstone-macos` repo. 93 + 94 + Both upload segments via the same HTTP ingest API used by the built-in Linux observer. 69 95 70 96 ## Output Formats 71 97