personal memory agent
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

Add extensible category formatter system for screen analysis

Introduce per-category formatters in observe/categories/ that enable rich
markdown output from vision analysis results:

- Add categories/ package with meeting.py formatter for structured output
- Dynamic category discovery from .json files (no hardcoded list)
- Formatter dispatch in screen.py with fallback to default rendering
- Meeting formatter shows participants with video/muted icons, screen share

Rename describe/ to categories/ to avoid import conflict with describe.py.
Add README.md documenting how to add new categories.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

+431 -46
+68
observe/categories/README.md
··· 1 + # Screen Description Categories 2 + 3 + This directory contains category prompts and formatters for vision analysis of screencast frames. 4 + 5 + ## Adding a New Category 6 + 7 + Each category requires 2-3 files: 8 + 9 + ### 1. `<category>.json` (required) 10 + 11 + Metadata specifying the output format: 12 + 13 + ```json 14 + { 15 + "output": "markdown" 16 + } 17 + ``` 18 + 19 + Set `"output": "json"` if the prompt produces structured JSON. 20 + 21 + ### 2. `<category>.txt` (required) 22 + 23 + The vision prompt template sent to the model. Should instruct the model to: 24 + - Analyze the screenshot for this specific category 25 + - Return content in the format specified by `.json` (markdown or JSON) 26 + 27 + ### 3. `<category>.py` (optional) 28 + 29 + Custom formatter for rich markdown output. If not provided, default formatting applies: 30 + - Markdown content: displayed with category header 31 + - JSON content: displayed in a code block 32 + 33 + To add a custom formatter, create a `format` function: 34 + 35 + ```python 36 + def format(content: Any, context: dict) -> str: 37 + """Format category content to markdown. 38 + 39 + Args: 40 + content: The category content (str for markdown, dict for JSON) 41 + context: Dict with: 42 + - frame: Full frame dict from JSONL 43 + - file_path: Path to JSONL file 44 + - timestamp_str: Formatted time like "14:30:22" 45 + 46 + Returns: 47 + Formatted markdown string (empty string to skip) 48 + """ 49 + # Your formatting logic here 50 + return "**Header:**\n\nFormatted content..." 51 + ``` 52 + 53 + ## Current Categories 54 + 55 + | Category | Output | Formatter | Description | 56 + |----------|--------|-----------|-------------| 57 + | meeting | json | ✓ | Video conferencing with participants | 58 + | messaging | markdown | - | Chat and email apps | 59 + | browsing | markdown | - | Web browsing content | 60 + | reading | markdown | - | Documents and PDFs | 61 + | productivity | markdown | - | Spreadsheets, calendars, etc. | 62 + 63 + ## How It Works 64 + 65 + 1. `observe/describe.py` runs initial categorization to identify primary/secondary categories 66 + 2. For categories with prompts here, a follow-up request extracts detailed content 67 + 3. Results are stored in JSONL under the category name (e.g., `"meeting": {...}`) 68 + 4. `observe/screen.py` formats JSONL to markdown, using custom formatters when available
+9
observe/categories/__init__.py
··· 1 + """Category prompts and formatters for screen description. 2 + 3 + This package contains: 4 + - <category>.json: Metadata (output format: json/markdown) 5 + - <category>.txt: Vision prompt template 6 + - <category>.py: Optional formatter for rich markdown output 7 + 8 + See README.md for adding new categories. 9 + """
+61
observe/categories/meeting.py
··· 1 + """Formatter for meeting category content. 2 + 3 + Renders meeting analysis JSON to rich markdown with participants and screen share. 4 + """ 5 + 6 + from typing import Any 7 + 8 + 9 + def format(content: Any, context: dict) -> str: 10 + """Format meeting analysis to markdown. 11 + 12 + Args: 13 + content: Meeting analysis dict with platform, participants, screen_share 14 + context: Dict with frame, file_path, timestamp_str 15 + 16 + Returns: 17 + Formatted markdown string 18 + """ 19 + if not isinstance(content, dict): 20 + return "" 21 + 22 + lines = [] 23 + 24 + # Platform header 25 + platform = content.get("platform", "unknown") 26 + lines.append(f"**Meeting** ({platform})") 27 + lines.append("") 28 + 29 + # Participants 30 + participants = content.get("participants", []) 31 + if participants: 32 + lines.append("**Participants:**") 33 + for p in participants: 34 + # Handle both dict format (new) and string format (legacy) 35 + if isinstance(p, dict): 36 + name = p.get("name", "Unknown") 37 + status = p.get("status", "unknown") 38 + video = "📹" if p.get("video") else "🔇" 39 + lines.append(f"- {video} {name} ({status})") 40 + else: 41 + # Legacy: participant is just a name string 42 + lines.append(f"- {p}") 43 + lines.append("") 44 + 45 + # Screen share 46 + screen_share = content.get("screen_share") 47 + if screen_share: 48 + presenter = screen_share.get("presenter") 49 + description = screen_share.get("description", "") 50 + formatted_text = screen_share.get("formatted_text", "") 51 + 52 + presenter_str = f" by {presenter}" if presenter else "" 53 + lines.append(f"**Screen Share{presenter_str}:**") 54 + if description: 55 + lines.append(f"*{description}*") 56 + lines.append("") 57 + if formatted_text: 58 + lines.append(formatted_text.strip()) 59 + lines.append("") 60 + 61 + return "\n".join(lines)
+2 -2
observe/describe.py
··· 40 40 41 41 def _discover_category_prompts() -> dict[str, dict]: 42 42 """ 43 - Discover available category prompts from describe/ directory. 43 + Discover available category prompts from categories/ directory. 44 44 45 45 Each category has a .txt prompt and .json metadata file. 46 46 ··· 49 49 dict[str, dict] 50 50 Mapping of category name to metadata (including 'prompt' text) 51 51 """ 52 - describe_dir = Path(__file__).parent / "describe" 52 + describe_dir = Path(__file__).parent / "categories" 53 53 if not describe_dir.exists(): 54 54 logger.warning(f"Category prompts directory not found: {describe_dir}") 55 55 return {}
observe/describe/browsing.json observe/categories/browsing.json
observe/describe/browsing.txt observe/categories/browsing.txt
observe/describe/meeting.json observe/categories/meeting.json
observe/describe/meeting.txt observe/categories/meeting.txt
observe/describe/messaging.json observe/categories/messaging.json
observe/describe/messaging.txt observe/categories/messaging.txt
observe/describe/productivity.json observe/categories/productivity.json
observe/describe/productivity.txt observe/categories/productivity.txt
observe/describe/reading.json observe/categories/reading.json
observe/describe/reading.txt observe/categories/reading.txt
+97 -22
observe/screen.py
··· 11 11 import json 12 12 import logging 13 13 from datetime import datetime 14 + from importlib import import_module 14 15 from pathlib import Path 15 - from typing import Any 16 + from typing import Any, Callable 16 17 17 18 from observe.utils import load_analysis_frames, parse_screen_filename 18 19 19 20 logger = logging.getLogger(__name__) 21 + 22 + # Cache for discovered category formatters 23 + _formatter_cache: dict[str, Callable | None] = {} 24 + 25 + 26 + def _discover_categories() -> list[str]: 27 + """Discover available categories from observe/describe/ directory. 28 + 29 + Categories are defined by .json metadata files in the describe/ package. 30 + 31 + Returns: 32 + List of category names (e.g., ["meeting", "messaging", ...]) 33 + """ 34 + describe_dir = Path(__file__).parent / "categories" 35 + if not describe_dir.exists(): 36 + return [] 37 + return sorted(p.stem for p in describe_dir.glob("*.json")) 38 + 39 + 40 + # Discover categories at module load time 41 + CATEGORIES = _discover_categories() 42 + 43 + 44 + def _load_category_formatter(category: str) -> Callable | None: 45 + """Load formatter for a category from observe.categories.<category>. 46 + 47 + Args: 48 + category: Category name (e.g., "meeting", "messaging") 49 + 50 + Returns: 51 + The format function or None if not found 52 + """ 53 + if category in _formatter_cache: 54 + return _formatter_cache[category] 55 + 56 + try: 57 + module = import_module(f"observe.categories.{category}") 58 + formatter = getattr(module, "format", None) 59 + _formatter_cache[category] = formatter 60 + return formatter 61 + except (ImportError, AttributeError) as e: 62 + logger.debug(f"No formatter for category {category}: {e}") 63 + _formatter_cache[category] = None 64 + return None 65 + 66 + 67 + def _format_category_content(category: str, content: Any, context: dict) -> str: 68 + """Format category-specific content to markdown. 69 + 70 + Tries discovered formatter first, falls back to default rendering. 71 + 72 + Args: 73 + category: Category name 74 + content: Category content (str for markdown, dict for JSON) 75 + context: Dict with frame, file_path, timestamp_str 76 + 77 + Returns: 78 + Formatted markdown string 79 + """ 80 + # Try discovered formatter 81 + formatter = _load_category_formatter(category) 82 + if formatter: 83 + result = formatter(content, context) 84 + if result: 85 + return result 86 + 87 + # Default formatting 88 + if isinstance(content, str): 89 + return f"**{category.title()}:**\n\n{content.strip()}\n" 90 + elif isinstance(content, dict): 91 + return f"**{category.title()}:**\n\n```json\n{json.dumps(content, indent=2)}\n```\n" 92 + return "" 20 93 21 94 22 95 def format_screen( ··· 157 230 lines.append(description) 158 231 lines.append("") 159 232 160 - # Add category-specific content if present 233 + # Build context for category formatters 234 + timestamp_str = f"{abs_hour:02d}:{abs_minute:02d}:{abs_second:02d}" 235 + format_context = { 236 + "frame": frame, 237 + "file_path": file_path, 238 + "timestamp_str": timestamp_str, 239 + } 240 + 241 + # Add category-specific content using formatter dispatch 161 242 # New format uses category name as key (e.g., "meeting", "messaging") 162 243 # Old format used "extracted_text" and "meeting_analysis" 163 - text_categories = ["messaging", "browsing", "reading", "productivity"] 164 - for cat in text_categories: 165 - if cat in frame: 166 - lines.append(f"**{cat.title()}:**") 167 - lines.append("") 168 - lines.append(frame[cat].strip()) 169 - lines.append("") 170 - break 171 - else: 172 - # Fall back to legacy extracted_text field 244 + has_category_content = False 245 + for cat in CATEGORIES: 246 + content = frame.get(cat) 247 + # Also check legacy "meeting_analysis" key for meeting 248 + if cat == "meeting" and not content: 249 + content = frame.get("meeting_analysis") 250 + if content: 251 + formatted = _format_category_content(cat, content, format_context) 252 + if formatted: 253 + lines.append(formatted) 254 + has_category_content = True 255 + 256 + # Fall back to legacy extracted_text field if no category content 257 + if not has_category_content: 173 258 extracted_text = frame.get("extracted_text") 174 259 if extracted_text: 175 260 lines.append("**Extracted Text:**") ··· 178 263 lines.append(extracted_text.strip()) 179 264 lines.append("```") 180 265 lines.append("") 181 - 182 - # Add meeting analysis if present (new: "meeting", old: "meeting_analysis") 183 - meeting = frame.get("meeting") or frame.get("meeting_analysis") 184 - if meeting: 185 - lines.append("**Meeting:**") 186 - lines.append("") 187 - lines.append("```json") 188 - lines.append(json.dumps(meeting, indent=2)) 189 - lines.append("```") 190 - lines.append("") 191 266 192 267 # Calculate absolute unix timestamp in milliseconds 193 268 frame_timestamp_ms = base_timestamp_ms + int(frame_offset * 1000)
+22 -20
tests/test_describe_config.py
··· 1 1 """Tests for observe/describe.py category prompt discovery.""" 2 2 3 - import json 4 3 from pathlib import Path 5 4 from unittest.mock import patch 6 5 7 6 import pytest 8 7 8 + # Import the processor module 9 + from observe import describe as describe_module 10 + 9 11 10 12 def test_category_prompts_discovered(): 11 13 """Test that category prompts are discovered on import.""" 12 - from observe.describe import CATEGORY_PROMPTS 14 + CATEGORY_PROMPTS = describe_module.CATEGORY_PROMPTS 13 15 14 16 # Should have discovered some category prompts 15 17 assert len(CATEGORY_PROMPTS) > 0 ··· 19 21 20 22 def test_category_prompts_have_required_fields(): 21 23 """Test that discovered categories have required metadata.""" 22 - from observe.describe import CATEGORY_PROMPTS 24 + CATEGORY_PROMPTS = describe_module.CATEGORY_PROMPTS 23 25 24 26 for category, metadata in CATEGORY_PROMPTS.items(): 25 27 # Each category should have 'output' and 'prompt' fields ··· 37 39 38 40 def test_meeting_category_is_json(): 39 41 """Test that meeting category outputs JSON.""" 40 - from observe.describe import CATEGORY_PROMPTS 42 + CATEGORY_PROMPTS = describe_module.CATEGORY_PROMPTS 41 43 42 44 assert "meeting" in CATEGORY_PROMPTS 43 45 assert CATEGORY_PROMPTS["meeting"]["output"] == "json" ··· 45 47 46 48 def test_text_categories_are_markdown(): 47 49 """Test that text-based categories output markdown.""" 48 - from observe.describe import CATEGORY_PROMPTS 50 + CATEGORY_PROMPTS = describe_module.CATEGORY_PROMPTS 49 51 50 52 text_categories = ["messaging", "browsing", "reading", "productivity"] 51 53 for category in text_categories: ··· 57 59 58 60 def test_discover_category_prompts_with_missing_dir(tmp_path): 59 61 """Test that discovery handles missing directory gracefully.""" 60 - from observe.describe import _discover_category_prompts 62 + _discover_category_prompts = describe_module._discover_category_prompts 61 63 62 - with patch("observe.describe.Path") as mock_path: 64 + with patch.object(describe_module, "Path") as mock_path: 63 65 # Mock to point to non-existent directory 64 66 mock_describe_dir = tmp_path / "nonexistent" 65 67 mock_path.return_value.parent.__truediv__.return_value = mock_describe_dir ··· 70 72 71 73 def test_discover_category_prompts_with_valid_dir(tmp_path): 72 74 """Test that discovery works with valid category files.""" 73 - from observe.describe import _discover_category_prompts 75 + _discover_category_prompts = describe_module._discover_category_prompts 74 76 75 77 # Create test category directory 76 - describe_dir = tmp_path / "describe" 77 - describe_dir.mkdir() 78 + categories_dir = tmp_path / "categories" 79 + categories_dir.mkdir() 78 80 79 81 # Create test category files 80 - (describe_dir / "test.json").write_text('{"output": "markdown"}') 81 - (describe_dir / "test.txt").write_text("Test prompt content") 82 + (categories_dir / "test.json").write_text('{"output": "markdown"}') 83 + (categories_dir / "test.txt").write_text("Test prompt content") 82 84 83 - with patch("observe.describe.Path") as mock_path: 84 - mock_path.return_value.parent.__truediv__.return_value = describe_dir 85 + with patch.object(describe_module, "Path") as mock_path: 86 + mock_path.return_value.parent.__truediv__.return_value = categories_dir 85 87 86 88 result = _discover_category_prompts() 87 89 assert "test" in result ··· 91 93 92 94 def test_discover_category_prompts_skips_incomplete(tmp_path): 93 95 """Test that discovery skips categories without matching txt file.""" 94 - from observe.describe import _discover_category_prompts 96 + _discover_category_prompts = describe_module._discover_category_prompts 95 97 96 98 # Create test category directory 97 - describe_dir = tmp_path / "describe" 98 - describe_dir.mkdir() 99 + categories_dir = tmp_path / "categories" 100 + categories_dir.mkdir() 99 101 100 102 # Create JSON without matching txt 101 - (describe_dir / "incomplete.json").write_text('{"output": "json"}') 103 + (categories_dir / "incomplete.json").write_text('{"output": "json"}') 102 104 103 - with patch("observe.describe.Path") as mock_path: 104 - mock_path.return_value.parent.__truediv__.return_value = describe_dir 105 + with patch.object(describe_module, "Path") as mock_path: 106 + mock_path.return_value.parent.__truediv__.return_value = categories_dir 105 107 106 108 result = _discover_category_prompts() 107 109 assert "incomplete" not in result
+2 -1
tests/test_formatters.py
··· 254 254 255 255 chunks, meta = format_screen(entries) 256 256 257 - assert "**Meeting:**" in chunks[0]["markdown"] 257 + # New meeting formatter uses "**Meeting** (platform)" format 258 + assert "**Meeting**" in chunks[0]["markdown"] 258 259 assert "Alice" in chunks[0]["markdown"] 259 260 260 261 def test_format_screen_extracts_metadata(self):
+170 -1
tests/test_screen_formatter.py
··· 2 2 3 3 from pathlib import Path 4 4 5 - from observe.screen import format_screen, format_screen_text 5 + from observe.screen import ( 6 + CATEGORIES, 7 + _discover_categories, 8 + _load_category_formatter, 9 + format_screen, 10 + format_screen_text, 11 + ) 6 12 7 13 8 14 def test_format_screen_extracts_segment_from_directory(): ··· 258 264 259 265 assert "indexer" in meta 260 266 assert meta["indexer"]["topic"] == "screen" 267 + 268 + 269 + def test_load_category_formatter_finds_meeting(): 270 + """Test that meeting formatter can be loaded from describe/.""" 271 + formatter = _load_category_formatter("meeting") 272 + assert formatter is not None 273 + assert callable(formatter) 274 + 275 + 276 + def test_load_category_formatter_returns_none_for_missing(): 277 + """Test that missing formatter returns None without error.""" 278 + formatter = _load_category_formatter("nonexistent_category") 279 + assert formatter is None 280 + 281 + 282 + def test_load_category_formatter_caches_result(): 283 + """Test that formatter loading is cached.""" 284 + # Clear cache first 285 + from observe.screen import _formatter_cache 286 + 287 + _formatter_cache.clear() 288 + 289 + # First call loads 290 + formatter1 = _load_category_formatter("meeting") 291 + # Second call should return cached 292 + formatter2 = _load_category_formatter("meeting") 293 + 294 + assert formatter1 is formatter2 295 + assert "meeting" in _formatter_cache 296 + 297 + 298 + def test_meeting_formatter_output(): 299 + """Test that meeting formatter produces expected markdown.""" 300 + from observe.categories.meeting import format as meeting_format 301 + 302 + content = { 303 + "platform": "zoom", 304 + "participants": [ 305 + {"name": "Alice", "status": "speaking", "video": True}, 306 + {"name": "Bob", "status": "muted", "video": False}, 307 + ], 308 + "screen_share": { 309 + "presenter": "Alice", 310 + "description": "Showing slides", 311 + "formatted_text": "# Slide Title\n\nBullet points...", 312 + }, 313 + } 314 + 315 + result = meeting_format(content, {}) 316 + 317 + assert "**Meeting** (zoom)" in result 318 + assert "📹 Alice (speaking)" in result 319 + assert "🔇 Bob (muted)" in result 320 + assert "**Screen Share by Alice:**" in result 321 + assert "*Showing slides*" in result 322 + assert "# Slide Title" in result 323 + 324 + 325 + def test_format_screen_uses_meeting_formatter(): 326 + """Test that format_screen uses the meeting formatter for meeting content.""" 327 + frames = [ 328 + { 329 + "timestamp": 0, 330 + "analysis": { 331 + "primary": "meeting", 332 + "visual_description": "Video call", 333 + }, 334 + "meeting": { 335 + "platform": "meet", 336 + "participants": [ 337 + {"name": "Test User", "status": "active", "video": True}, 338 + ], 339 + "screen_share": None, 340 + }, 341 + }, 342 + ] 343 + 344 + context = { 345 + "file_path": Path("20240101/120000/screen.jsonl"), 346 + "include_entity_context": False, 347 + } 348 + 349 + chunks, meta = format_screen(frames, context) 350 + markdown = chunks[0]["markdown"] 351 + 352 + # Should use meeting formatter, not JSON dump 353 + assert "**Meeting** (meet)" in markdown 354 + assert "📹 Test User (active)" in markdown 355 + # Should NOT have JSON code block (that was the old format) 356 + assert "```json" not in markdown 357 + 358 + 359 + def test_format_screen_falls_back_for_missing_formatter(): 360 + """Test that categories without .py formatter use default formatting.""" 361 + frames = [ 362 + { 363 + "timestamp": 0, 364 + "analysis": { 365 + "primary": "messaging", 366 + "visual_description": "Chat app", 367 + }, 368 + "messaging": "**Alice**: Hello!\n**Bob**: Hi there!", 369 + }, 370 + ] 371 + 372 + context = {"include_entity_context": False} 373 + 374 + chunks, meta = format_screen(frames, context) 375 + markdown = chunks[0]["markdown"] 376 + 377 + # Should use default text formatting 378 + assert "**Messaging:**" in markdown 379 + assert "**Alice**: Hello!" in markdown 380 + 381 + 382 + def test_format_screen_handles_multiple_categories(): 383 + """Test that both primary and secondary categories are formatted.""" 384 + frames = [ 385 + { 386 + "timestamp": 0, 387 + "analysis": { 388 + "primary": "meeting", 389 + "secondary": "productivity", 390 + "overlap": False, 391 + "visual_description": "Meeting with shared doc", 392 + }, 393 + "meeting": { 394 + "platform": "teams", 395 + "participants": [{"name": "User", "status": "active", "video": True}], 396 + "screen_share": None, 397 + }, 398 + "productivity": "| Task | Status |\n|------|--------|\n| Review | Done |", 399 + }, 400 + ] 401 + 402 + context = {"include_entity_context": False} 403 + 404 + chunks, meta = format_screen(frames, context) 405 + markdown = chunks[0]["markdown"] 406 + 407 + # Both categories should be present 408 + assert "**Meeting** (teams)" in markdown 409 + assert "**Productivity:**" in markdown 410 + assert "| Task | Status |" in markdown 411 + 412 + 413 + def test_categories_list(): 414 + """Test that CATEGORIES includes expected values.""" 415 + assert "meeting" in CATEGORIES 416 + assert "messaging" in CATEGORIES 417 + assert "browsing" in CATEGORIES 418 + assert "reading" in CATEGORIES 419 + assert "productivity" in CATEGORIES 420 + 421 + 422 + def test_discover_categories_finds_json_files(): 423 + """Test that _discover_categories finds categories from .json files.""" 424 + categories = _discover_categories() 425 + # Should find categories defined by .json files in describe/ 426 + assert len(categories) > 0 427 + assert "meeting" in categories 428 + # Should be sorted 429 + assert categories == sorted(categories)