Screen Description Categories#

This directory contains category definitions for vision analysis of screencast frames.

Adding a New Category#

Each category requires a .md file with metadata in JSON frontmatter. The file can optionally include extraction prompt content.

1. `<category>.md` (required)#

Defines the category with JSON frontmatter and optional extraction prompt:

{
  "description": "One-line description for categorization prompt",
  "output": "markdown"
}

Optional extraction prompt content goes here...

Field	Required	Default	Description
`description`	Yes	-	Single-line description used in the categorization prompt
`output`	No	`"markdown"`	Response format for extraction: `"json"` or `"markdown"`

Model selection is handled via the providers configuration in journal.json. Each category uses the context pattern observe.describe.<category> for routing. See config.md for details on configuring providers per context.

Categories with prompt content after the frontmatter are "extractable" - they can receive detailed content extraction after initial categorization. The prompt is sent to the model for analysis and should instruct the model to:

Analyze the screenshot for this specific category
Return content in the format specified by output (markdown or JSON)

2. `<category>.py` (optional)#

Custom formatter for rich markdown output. If not provided, default formatting applies:

Markdown content: displayed with category header
JSON content: displayed in a code block

To add a custom formatter, create a format function:

def format(content: Any, context: dict) -> str:
    """Format category content to markdown.

    Args:
        content: The category content (str for markdown, dict for JSON)
        context: Dict with:
            - frame: Full frame dict from JSONL
            - file_path: Path to JSONL file
            - timestamp_str: Formatted time like "14:30:22"

    Returns:
        Formatted markdown string (empty string to skip)
    """
    # Your formatting logic here
    return "**Header:**\n\nFormatted content..."

How It Works#

observe/describe.py discovers all .md files and builds the categorization prompt dynamically
Phase 1 (Categorization): All frames get initial category analysis (primary/secondary)
Phase 2 (Selection): AI or fallback logic selects which frames get detailed extraction (configurable via describe.max_extractions)
Phase 3 (Extraction): Selected frames with extractable categories (those with extraction prompts in their .md files) get detailed content extraction
Results are stored in JSONL with enhanced: true/false indicating extraction status
observe/screen.py formats JSONL to markdown, using custom formatters when available

Configure Feed