Add cogitate coder mode: write flag, handoff command, coder agent

+111

muse/coder.md

··· 1 + { 2 + "type": "cogitate", 3 + "write": true, 4 + "title": "Coder", 5 + "description": "Developer agent with full repo read/write access", 6 + "instructions": {"system": "journal", "now": true} 7 + } 8 + 9 + # Coder 10 + 11 + You are sol's developer agent — a write-enabled cogitate agent that implements code changes in the solstone codebase. You receive a task, research the relevant code, implement the change, verify it works, and commit. No conversation, no back-and-forth — read, implement, test, commit, done. 12 + 13 + ## Workflow 14 + 15 + 1. **Read and understand the request.** The prompt below contains your task. Parse what needs to change and why. 16 + 2. **Research before changing.** Read the relevant source files. Understand the existing patterns, data flow, and conventions before writing any code. Never change code you haven't read. 17 + 3. **Implement the change.** Write clean, focused code that follows the project's conventions. Make the minimum changes needed — don't refactor surrounding code or add features beyond the request. 18 + 4. **Run tests.** Run `make test` to verify your changes don't break anything. If tests fail, fix them. If you added new behavior, add tests for it. 19 + 5. **Commit with a clear message.** Commit your changes with a descriptive message explaining what changed and why. Small focused commits, not one big dump. 20 + 6. **Report what you did.** End with a brief summary of what was changed and any issues encountered. 21 + 22 + ## Development Guidelines 23 + 24 + **solstone** is a Python-based AI-driven desktop journaling toolkit with three packages: `observe/` for multimodal capture and AI-powered analysis, `think/` for data post-processing, AI agent orchestration, and intelligent insights, and `convey/` for the web application, with `apps/` for extensions. The project uses a modular architecture where each package can operate independently while sharing common utilities and data formats through the journal system. 25 + 26 + ### Key Concepts 27 + 28 + - **Journal**: Central data structure organized as `journal/YYYYMMDD/` directories. All captured data, transcripts, and analysis artifacts are stored here. 29 + - **Facets**: Project/context organization system that groups related content and provides scoped views of entities, tasks, and activities. 30 + - **Entities**: Extracted information tracked over time across transcripts and interactions and associated with facets for semantic navigation. 31 + - **Agents**: AI processors with configurable prompts that analyze content, extract insights, and respond to queries. 32 + - **Callosum**: Message bus that enables asynchronous communication between components. 33 + - **Indexer**: Builds and maintains SQLite database from journal data, enabling fast search and retrieval. 34 + 35 + ### Architecture 36 + 37 + **Core Pipeline**: `observe` (capture) → JSON transcripts → `think` (analyze) → SQLite index → `convey` (web UI) 38 + 39 + **Data Organization**: 40 + - Everything organized under `journal/YYYYMMDD/` daily directories. 41 + - Import segments are anchored to creation/modification time, not content "about" time. 42 + - Facets provide project-scoped organization and filtering. 43 + - Entities are extracted from transcripts and tracked across time. 44 + - Indexer builds SQLite database for fast search and retrieval. 45 + 46 + **Component Communication**: 47 + - Callosum message bus enables async communication between services. 48 + - Cortex orchestrates AI agent execution via `sol cortex`, spawning agent subprocesses with agent configurations. 49 + - The unified CLI is `sol`. Run `sol` to see status and available commands. 50 + 51 + ### Quick Commands 52 + 53 + ```bash 54 + make install # Install package (includes all deps) 55 + make skills # Discover and symlink Agent Skills from muse/ dirs 56 + make format # Auto-fix formatting, then report remaining issues 57 + make test # Run unit tests 58 + make ci # Full CI check (format check + lint + test) 59 + make dev # Start stack (Ctrl+C to stop) 60 + ``` 61 + 62 + ### Project Structure 63 + 64 + ```text 65 + solstone/ 66 + ├── sol.py # Unified CLI entry point (run: sol <command>) 67 + ├── observe/ # Multimodal capture & AI analysis 68 + ├── think/ # Data post-processing, AI agents & orchestration 69 + ├── convey/ # Web app frontend & backend 70 + ├── apps/ # Convey app extensions (see docs/APPS.md) 71 + ├── muse/ # Agent/generator configs + Agent Skills (muse/*/SKILL.md) 72 + ├── tests/ # Pytest test suites + test fixtures under tests/fixtures/ 73 + ├── docs/ # All documentation (*.md files) 74 + └── README.md # Project overview 75 + ``` 76 + 77 + - **Python**: Requires Python 3.10+ 78 + - **Imports**: Prefer absolute imports (e.g., `from think.utils import setup_cli`) 79 + - **Entry Points**: Commands are registered in `sol.py`'s `COMMANDS` dict 80 + - **Journal**: Data stored under `journal/` at the project root 81 + - **Calling**: When calling other modules as a separate process always use `sol <command>` and never call using `python -m ...` 82 + 83 + ### Coding Standards 84 + 85 + - **Ruff** (`make format`) - Formatting, linting, and import sorting 86 + - **Naming**: Modules/functions/variables: `snake_case`, Classes: `PascalCase`, Constants: `UPPER_SNAKE_CASE` 87 + - **Imports**: Prefer absolute imports, grouped (stdlib, third-party, local), one per line 88 + - **Type Hints**: Should be included on function signatures 89 + - **File Structure**: Constants → helpers → classes → main/CLI 90 + - **File Headers**: All source code files must begin with: 91 + ``` 92 + # SPDX-License-Identifier: AGPL-3.0-only 93 + # Copyright (c) 2026 sol pbc 94 + ``` 95 + - **Principles**: DRY, KISS, YAGNI. Single responsibility. Clear code over clever code. 96 + - **Dependencies**: Add to `dependencies` in `pyproject.toml`. Use `make install` to sync. 97 + 98 + ### Testing 99 + 100 + - **Framework**: pytest with coverage reporting 101 + - **Unit Tests**: `tests/` root directory — fast, no external API calls 102 + - **Integration Tests**: `tests/integration/` — test real backends, require API keys 103 + - **Fixtures**: `tests/fixtures/journal/` contains complete mock journal data 104 + - **Running**: `make test` for unit, `make ci` before committing 105 + 106 + ### Environment 107 + 108 + - **Journal Path**: `get_journal()` from `think.utils` returns the path 109 + - **API Keys**: Store in `.env` file, never commit 110 + - **Error Handling**: Raise specific exceptions, use logging module, validate external inputs 111 + - **Git**: Small focused commits, descriptive messages. Run git commands directly (not `git -C`).

+276

tests/test_cogitate_coder.py

··· 1 + # SPDX-License-Identifier: AGPL-3.0-only 2 + # Copyright (c) 2026 sol pbc 3 + 4 + """Tests for cogitate coder mode: write flag, handoff command, coder agent.""" 5 + 6 + import asyncio 7 + import importlib 8 + import io 9 + import sys 10 + from unittest.mock import AsyncMock, patch 11 + 12 + import pytest 13 + import typer 14 + from typer.testing import CliRunner 15 + 16 + from think.call import call_app 17 + 18 + runner = CliRunner() 19 + 20 + 21 + # --------------------------------------------------------------------------- 22 + # Write flag — Anthropic provider 23 + # --------------------------------------------------------------------------- 24 + 25 + 26 + class TestAnthropicWriteFlag: 27 + """Verify --allowedTools is controlled by config write flag.""" 28 + 29 + def _provider(self): 30 + return importlib.import_module("think.providers.anthropic") 31 + 32 + @patch("think.providers.anthropic.check_cli_binary") 33 + @patch("think.providers.anthropic.CLIRunner") 34 + def test_no_write_restricts_tools(self, mock_runner_cls, mock_check): 35 + """Without write flag, --allowedTools restricts to sol call.""" 36 + provider = self._provider() 37 + mock_instance = AsyncMock() 38 + mock_instance.run = AsyncMock(return_value="result") 39 + mock_instance.cli_session_id = None 40 + mock_runner_cls.return_value = mock_instance 41 + 42 + config = {"prompt": "test", "model": "claude-sonnet-4-20250514"} 43 + asyncio.get_event_loop().run_until_complete(provider.run_cogitate(config)) 44 + 45 + cmd = mock_runner_cls.call_args.kwargs["cmd"] 46 + assert "--allowedTools" in cmd 47 + assert "Bash(sol call *)" in cmd 48 + 49 + @patch("think.providers.anthropic.check_cli_binary") 50 + @patch("think.providers.anthropic.CLIRunner") 51 + def test_write_true_grants_full_access(self, mock_runner_cls, mock_check): 52 + """With write=True, --allowedTools is omitted for full tool access.""" 53 + provider = self._provider() 54 + mock_instance = AsyncMock() 55 + mock_instance.run = AsyncMock(return_value="result") 56 + mock_instance.cli_session_id = None 57 + mock_runner_cls.return_value = mock_instance 58 + 59 + config = {"prompt": "test", "model": "claude-sonnet-4-20250514", "write": True} 60 + asyncio.get_event_loop().run_until_complete(provider.run_cogitate(config)) 61 + 62 + cmd = mock_runner_cls.call_args.kwargs["cmd"] 63 + assert "--allowedTools" not in cmd 64 + 65 + @patch("think.providers.anthropic.check_cli_binary") 66 + @patch("think.providers.anthropic.CLIRunner") 67 + def test_write_false_restricts_tools(self, mock_runner_cls, mock_check): 68 + """Explicit write=False keeps restriction.""" 69 + provider = self._provider() 70 + mock_instance = AsyncMock() 71 + mock_instance.run = AsyncMock(return_value="result") 72 + mock_instance.cli_session_id = None 73 + mock_runner_cls.return_value = mock_instance 74 + 75 + config = {"prompt": "test", "model": "claude-sonnet-4-20250514", "write": False} 76 + asyncio.get_event_loop().run_until_complete(provider.run_cogitate(config)) 77 + 78 + cmd = mock_runner_cls.call_args.kwargs["cmd"] 79 + assert "--allowedTools" in cmd 80 + 81 + 82 + # --------------------------------------------------------------------------- 83 + # Write flag — OpenAI provider 84 + # --------------------------------------------------------------------------- 85 + 86 + 87 + class TestOpenAIWriteFlag: 88 + """Verify sandbox mode is controlled by config write flag.""" 89 + 90 + def _provider(self): 91 + return importlib.import_module("think.providers.openai") 92 + 93 + @patch("think.providers.openai.CLIRunner") 94 + def test_no_write_uses_readonly_sandbox(self, mock_runner_cls): 95 + """Without write flag, sandbox is read-only.""" 96 + provider = self._provider() 97 + mock_instance = AsyncMock() 98 + mock_instance.run = AsyncMock(return_value="result") 99 + mock_instance.cli_session_id = None 100 + mock_runner_cls.return_value = mock_instance 101 + 102 + config = {"prompt": "test", "model": "gpt-5.2"} 103 + asyncio.get_event_loop().run_until_complete(provider.run_cogitate(config)) 104 + 105 + cmd = mock_runner_cls.call_args.kwargs["cmd"] 106 + # Find the -s flag and its value 107 + s_idx = cmd.index("-s") 108 + assert cmd[s_idx + 1] == "read-only" 109 + 110 + @patch("think.providers.openai.CLIRunner") 111 + def test_write_true_uses_write_sandbox(self, mock_runner_cls): 112 + """With write=True, sandbox is write.""" 113 + provider = self._provider() 114 + mock_instance = AsyncMock() 115 + mock_instance.run = AsyncMock(return_value="result") 116 + mock_instance.cli_session_id = None 117 + mock_runner_cls.return_value = mock_instance 118 + 119 + config = {"prompt": "test", "model": "gpt-5.2", "write": True} 120 + asyncio.get_event_loop().run_until_complete(provider.run_cogitate(config)) 121 + 122 + cmd = mock_runner_cls.call_args.kwargs["cmd"] 123 + s_idx = cmd.index("-s") 124 + assert cmd[s_idx + 1] == "write" 125 + 126 + @patch("think.providers.openai.CLIRunner") 127 + def test_write_true_with_session_resume(self, mock_runner_cls): 128 + """Write flag works correctly with session resume path.""" 129 + provider = self._provider() 130 + mock_instance = AsyncMock() 131 + mock_instance.run = AsyncMock(return_value="result") 132 + mock_instance.cli_session_id = None 133 + mock_runner_cls.return_value = mock_instance 134 + 135 + config = { 136 + "prompt": "test", 137 + "model": "gpt-5.2", 138 + "write": True, 139 + "session_id": "sess-123", 140 + } 141 + asyncio.get_event_loop().run_until_complete(provider.run_cogitate(config)) 142 + 143 + cmd = mock_runner_cls.call_args.kwargs["cmd"] 144 + s_idx = cmd.index("-s") 145 + assert cmd[s_idx + 1] == "write" 146 + assert "resume" in cmd 147 + 148 + 149 + # --------------------------------------------------------------------------- 150 + # Write flag — Google provider 151 + # --------------------------------------------------------------------------- 152 + 153 + 154 + class TestGoogleWriteFlag: 155 + """Verify --allowed-tools is controlled by config write flag.""" 156 + 157 + def _provider(self): 158 + return importlib.import_module("think.providers.google") 159 + 160 + @patch("think.providers.google.CLIRunner") 161 + def test_no_write_restricts_tools(self, mock_runner_cls): 162 + """Without write flag, --allowed-tools restricts to sol call.""" 163 + provider = self._provider() 164 + mock_instance = AsyncMock() 165 + mock_instance.run = AsyncMock(return_value="result") 166 + mock_instance.cli_session_id = None 167 + mock_runner_cls.return_value = mock_instance 168 + 169 + config = {"prompt": "test", "model": "gemini-2.5-flash"} 170 + asyncio.get_event_loop().run_until_complete(provider.run_cogitate(config)) 171 + 172 + cmd = mock_runner_cls.call_args.kwargs["cmd"] 173 + assert "--allowed-tools" in cmd 174 + assert "run_shell_command(sol call)" in cmd 175 + 176 + @patch("think.providers.google.CLIRunner") 177 + def test_write_true_grants_full_access(self, mock_runner_cls): 178 + """With write=True, --allowed-tools is omitted.""" 179 + provider = self._provider() 180 + mock_instance = AsyncMock() 181 + mock_instance.run = AsyncMock(return_value="result") 182 + mock_instance.cli_session_id = None 183 + mock_runner_cls.return_value = mock_instance 184 + 185 + config = {"prompt": "test", "model": "gemini-2.5-flash", "write": True} 186 + asyncio.get_event_loop().run_until_complete(provider.run_cogitate(config)) 187 + 188 + cmd = mock_runner_cls.call_args.kwargs["cmd"] 189 + assert "--allowed-tools" not in cmd 190 + 191 + 192 + # --------------------------------------------------------------------------- 193 + # sol call handoff command 194 + # --------------------------------------------------------------------------- 195 + 196 + 197 + class TestHandoffCommand: 198 + """Tests for sol call handoff subcommand.""" 199 + 200 + @patch("think.cortex_client.cortex_request", return_value="1710864123456") 201 + def test_handoff_success(self, mock_cortex): 202 + """Handoff reads stdin, calls cortex_request, prints agent_id.""" 203 + result = runner.invoke(call_app, ["handoff", "coder"], input="Fix the bug\n") 204 + 205 + assert result.exit_code == 0 206 + assert "1710864123456" in result.output 207 + mock_cortex.assert_called_once_with(prompt="Fix the bug", name="coder") 208 + 209 + def test_handoff_empty_stdin(self): 210 + """Empty stdin produces error and exit code 1.""" 211 + result = runner.invoke(call_app, ["handoff", "coder"], input="") 212 + 213 + assert result.exit_code == 1 214 + assert "no prompt" in result.output.lower() or "no prompt" in ( 215 + result.stderr or "" 216 + ).lower() 217 + 218 + def test_handoff_whitespace_only_stdin(self): 219 + """Whitespace-only stdin produces error.""" 220 + result = runner.invoke(call_app, ["handoff", "coder"], input=" \n \n") 221 + 222 + assert result.exit_code == 1 223 + 224 + @patch("think.cortex_client.cortex_request", return_value=None) 225 + def test_handoff_cortex_failure(self, mock_cortex): 226 + """When cortex_request returns None, handoff reports error.""" 227 + result = runner.invoke( 228 + call_app, ["handoff", "coder"], input="Fix the bug\n" 229 + ) 230 + 231 + assert result.exit_code == 1 232 + assert "failed" in result.output.lower() or "failed" in ( 233 + result.stderr or "" 234 + ).lower() 235 + 236 + 237 + # --------------------------------------------------------------------------- 238 + # muse/coder.md existence and frontmatter 239 + # --------------------------------------------------------------------------- 240 + 241 + 242 + class TestCoderAgent: 243 + """Verify muse/coder.md exists with correct frontmatter.""" 244 + 245 + def test_coder_md_exists(self): 246 + """muse/coder.md must exist in the repo.""" 247 + from pathlib import Path 248 + 249 + coder_path = Path(__file__).parent.parent / "muse" / "coder.md" 250 + assert coder_path.exists(), "muse/coder.md not found" 251 + 252 + def test_coder_frontmatter(self): 253 + """coder.md must have write: true and type: cogitate.""" 254 + import frontmatter 255 + from pathlib import Path 256 + 257 + coder_path = Path(__file__).parent.parent / "muse" / "coder.md" 258 + post = frontmatter.load(coder_path) 259 + 260 + assert post.metadata.get("type") == "cogitate" 261 + assert post.metadata.get("write") is True 262 + assert post.metadata.get("title") == "Coder" 263 + assert "description" in post.metadata 264 + 265 + def test_coder_has_developer_instructions(self): 266 + """coder.md body must contain development guidelines.""" 267 + from pathlib import Path 268 + 269 + coder_path = Path(__file__).parent.parent / "muse" / "coder.md" 270 + content = coder_path.read_text(encoding="utf-8") 271 + 272 + # Should contain key sections from body skill content 273 + assert "Development Guidelines" in content 274 + assert "make test" in content 275 + assert "Coding Standards" in content 276 + assert "Project Structure" in content

+29

think/call.py

··· 13 13 14 14 import importlib 15 15 import logging 16 + import sys 16 17 from pathlib import Path 17 18 18 19 import typer ··· 106 107 if facet: 107 108 parts.append(f"[{facet}]") 108 109 typer.echo(f"Navigate: {' '.join(parts)}") 110 + 111 + 112 + @call_app.command("handoff") 113 + def handoff( 114 + agent: str = typer.Argument(help="Agent name to hand off to (e.g. coder)."), 115 + ) -> None: 116 + """Spawn a cogitate agent with a request from stdin (fire-and-forget). 117 + 118 + Reads a prompt from stdin, sends it to cortex as an agent request, 119 + prints the agent_id to stdout, and exits immediately. 120 + 121 + Example:: 122 + 123 + echo 'Fix the matching bug' | sol call handoff coder 124 + """ 125 + prompt = sys.stdin.read() 126 + if not prompt.strip(): 127 + typer.echo("Error: no prompt provided on stdin.", err=True) 128 + raise typer.Exit(1) 129 + 130 + from think.cortex_client import cortex_request 131 + 132 + agent_id = cortex_request(prompt=prompt.strip(), name=agent) 133 + if agent_id is None: 134 + typer.echo("Error: failed to send cortex request.", err=True) 135 + raise typer.Exit(1) 136 + 137 + typer.echo(agent_id) 109 138 110 139 111 140 def main() -> None:

+4 -2

think/providers/anthropic.py

··· 248 248 "stream-json", 249 249 "--permission-mode", 250 250 "plan", 251 - "--allowedTools", 252 - "Bash(sol call *)", 253 251 "--model", 254 252 model, 255 253 ] 254 + 255 + # Restrict tool access unless write mode is enabled 256 + if not config.get("write"): 257 + cmd.extend(["--allowedTools", "Bash(sol call *)"]) 256 258 257 259 if system_instruction: 258 260 cmd.extend(["--system-prompt", system_instruction])

+4 -2

think/providers/google.py

··· 606 606 "-o", 607 607 "stream-json", 608 608 "--yolo", 609 - "--allowed-tools", 610 - "run_shell_command(sol call)", 611 609 "-m", 612 610 model, 613 611 ] 612 + 613 + # Restrict tool access unless write mode is enabled 614 + if not config.get("write"): 615 + cmd.extend(["--allowed-tools", "run_shell_command(sol call)"]) 614 616 615 617 # Resume from previous session if continuing 616 618 if session_id:

+5 -2

think/providers/openai.py

··· 176 176 177 177 # Build command — sandbox is read-only; "sol call" commands bypass 178 178 # the sandbox via exec-policy rules in .codex/rules/solstone.rules 179 + # Write-enabled agents get full sandbox access 180 + sandbox = "write" if config.get("write") else "read-only" 181 + 179 182 session_id = config.get("session_id") 180 183 if session_id: 181 184 cmd = [ ··· 185 188 session_id, 186 189 "--json", 187 190 "-s", 188 - "read-only", 191 + sandbox, 189 192 "-m", 190 193 model, 191 194 ] 192 195 else: 193 - cmd = ["codex", "exec", "--json", "-s", "read-only", "-m", model] 196 + cmd = ["codex", "exec", "--json", "-s", sandbox, "-m", model] 194 197 195 198 if effort: 196 199 cmd.extend(["-c", f'model_reasoning_effort="{effort}"'])

Configure Feed

Configure Feed