personal memory agent
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

at main 108 lines 3.9 kB view raw view rendered
1# Observe Module 2 3Multimodal capture and AI-powered analysis of desktop activity. 4 5## Observer Architecture 6 7Observers are independent capture agents that upload segments to solstone via the HTTP ingest API (`/app/observer/ingest/<key>`). Each observer runs as its own process with its own lifecycle — solstone core is the journal + processing engine. 8 9| Observer | What it captures | Repo | Runs as | 10|----------|-----------------|------|---------| 11| **solstone-linux** | Screen + audio on Linux | `solstone-linux` | systemd user service / standalone | 12| **solstone-macos** | Screen + audio on macOS | `solstone-macos` | Native menu bar app | 13| **solstone-tmux** | Tmux terminal sessions | `solstone-tmux` | systemd user service / standalone | 14 15### Managing observers 16 17```bash 18# List all registered observers 19sol observer list 20 21# Register a new observer 22sol observer create <name> 23 24# Install and pair an observer 25sol observer install <name> 26 27# Check observer status 28sol observer status <name> 29 30# Rename an observer 31sol observer rename <old> <new> 32 33# Revoke an observer's key 34sol observer revoke <name> 35``` 36 37## Commands 38 39| Command | Purpose | 40|---------|---------| 41| `sol observer` | Screen and audio capture (auto-detects platform) | 42| `sol observe-linux` | Screen and audio capture on Linux (direct) | 43| `sol transcribe` | Audio transcription with faster-whisper | 44| `sol describe` | Visual analysis of screen recordings | 45| `sol grab` | Walk available screen frames and optionally write frame images | 46| `sol sense` | Unified observation coordination | 47 48## Architecture 49 50``` 51Observers (standalone or built-in) 52 ↓ HTTP multipart upload 53Observer Ingest API (/app/observer/ingest/<key>) 54 55 Raw media files (*.flac, *.webm, tmux_*.jsonl) 56 57sol sense (coordination) 58 ├── sol transcribe → audio.jsonl 59 └── sol describe → screen.jsonl 60``` 61 62## Linux Observer State Machine 63 64The Linux observer operates in two modes based on desktop activity: 65 66``` 67SCREENCAST ←→ IDLE 68``` 69 70| Mode | Trigger | Captures | 71|------|---------|----------| 72| SCREENCAST | Screen active (not idle/locked/power-save) | Video + Audio | 73| IDLE | Screen idle, locked, or power-save | Audio only (if threshold met) | 74 75**Segment boundaries** are triggered by: 76- Transitions between SCREENCAST and IDLE modes 77- Mute state changes 78- 5-minute window elapsed 79 80## Key Components 81 82- **observer.py** — Unified entry point with platform detection 83- **linux/observer.py** — Linux capture: audio + screencast + activity detection 84- **linux/screencast.py** — XDG Portal screencast with PipeWire + GStreamer 85- **gnome/activity.py** — GNOME-specific activity detection (idle, lock, power save) 86- **observer_client.py** — HTTP upload client for observer → server communication 87- **sense.py** — File watcher that dispatches transcription and description jobs 88- **transcribe.py** — Audio transcription with faster-whisper and sentence-level embeddings 89- **describe.py** — Vision analysis with Gemini, category-based prompts 90- **categories/** — Category-specific prompts for screen content (see [SCREEN_CATEGORIES.md](SCREEN_CATEGORIES.md)) 91 92## Standalone Observers 93 94**Tmux capture** is handled by the `solstone-tmux` package, which runs as its own systemd user service. See `solstone-tmux` repo for setup instructions. 95 96**macOS capture** is handled by the `solstone-macos` native Swift app. See `solstone-macos` repo. 97 98Both upload segments via the same HTTP ingest API used by the built-in Linux observer. 99 100## Output Formats 101 102See [captures.md](../talent/journal/references/captures.md) for detailed extract schemas: 103- Audio transcripts: `audio.jsonl` with timestamps (speaker detection not included) 104- Screen analysis: `screen.jsonl` with frame-by-frame categorization 105 106## Configuration 107 108Requires the journal directory at project root. API keys for transcription/vision services configured in `.env`.