personal memory agent
1# Observe Module
2
3Multimodal capture and AI-powered analysis of desktop activity.
4
5## Observer Architecture
6
7Observers are independent capture agents that upload segments to solstone via the HTTP ingest API (`/app/observer/ingest/<key>`). Each observer runs as its own process with its own lifecycle — solstone core is the journal + processing engine.
8
9| Observer | What it captures | Repo | Runs as |
10|----------|-----------------|------|---------|
11| **solstone-linux** | Screen + audio on Linux | `solstone-linux` | systemd user service / standalone |
12| **solstone-macos** | Screen + audio on macOS | `solstone-macos` | Native menu bar app |
13| **solstone-tmux** | Tmux terminal sessions | `solstone-tmux` | systemd user service / standalone |
14
15### Managing observers
16
17```bash
18# List all registered observers
19sol observer list
20
21# Register a new observer
22sol observer create <name>
23
24# Install and pair an observer
25sol observer install <name>
26
27# Check observer status
28sol observer status <name>
29
30# Rename an observer
31sol observer rename <old> <new>
32
33# Revoke an observer's key
34sol observer revoke <name>
35```
36
37## Commands
38
39| Command | Purpose |
40|---------|---------|
41| `sol observer` | Screen and audio capture (auto-detects platform) |
42| `sol observe-linux` | Screen and audio capture on Linux (direct) |
43| `sol transcribe` | Audio transcription with faster-whisper |
44| `sol describe` | Visual analysis of screen recordings |
45| `sol grab` | Walk available screen frames and optionally write frame images |
46| `sol sense` | Unified observation coordination |
47
48## Architecture
49
50```
51Observers (standalone or built-in)
52 ↓ HTTP multipart upload
53Observer Ingest API (/app/observer/ingest/<key>)
54 ↓
55 Raw media files (*.flac, *.webm, tmux_*.jsonl)
56 ↓
57sol sense (coordination)
58 ├── sol transcribe → audio.jsonl
59 └── sol describe → screen.jsonl
60```
61
62## Linux Observer State Machine
63
64The Linux observer operates in two modes based on desktop activity:
65
66```
67SCREENCAST ←→ IDLE
68```
69
70| Mode | Trigger | Captures |
71|------|---------|----------|
72| SCREENCAST | Screen active (not idle/locked/power-save) | Video + Audio |
73| IDLE | Screen idle, locked, or power-save | Audio only (if threshold met) |
74
75**Segment boundaries** are triggered by:
76- Transitions between SCREENCAST and IDLE modes
77- Mute state changes
78- 5-minute window elapsed
79
80## Key Components
81
82- **observer.py** — Unified entry point with platform detection
83- **linux/observer.py** — Linux capture: audio + screencast + activity detection
84- **linux/screencast.py** — XDG Portal screencast with PipeWire + GStreamer
85- **gnome/activity.py** — GNOME-specific activity detection (idle, lock, power save)
86- **observer_client.py** — HTTP upload client for observer → server communication
87- **sense.py** — File watcher that dispatches transcription and description jobs
88- **transcribe.py** — Audio transcription with faster-whisper and sentence-level embeddings
89- **describe.py** — Vision analysis with Gemini, category-based prompts
90- **categories/** — Category-specific prompts for screen content (see [SCREEN_CATEGORIES.md](SCREEN_CATEGORIES.md))
91
92## Standalone Observers
93
94**Tmux capture** is handled by the `solstone-tmux` package, which runs as its own systemd user service. See `solstone-tmux` repo for setup instructions.
95
96**macOS capture** is handled by the `solstone-macos` native Swift app. See `solstone-macos` repo.
97
98Both upload segments via the same HTTP ingest API used by the built-in Linux observer.
99
100## Output Formats
101
102See [captures.md](../talent/journal/references/captures.md) for detailed extract schemas:
103- Audio transcripts: `audio.jsonl` with timestamps (speaker detection not included)
104- Screen analysis: `screen.jsonl` with frame-by-frame categorization
105
106## Configuration
107
108Requires the journal directory at project root. API keys for transcription/vision services configured in `.env`.