🐱 Medium-horizon agent planning MCP server
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

Remove flawed automatic validation setup

Slopped together, doesn't actually represent the intended use cases.

-1584
-137
validation/README.md
··· 1 - # 9plan Agent Validation System 2 - 3 - This directory contains infrastructure for validating the 9plan MCP server using the **"Claude sandwich"** pattern - an outer AI agent controls an inner AI agent to test the server in realistic conditions. 4 - 5 - ## Overview 6 - 7 - The validation system tests 9plan by having an outer Claude: 8 - 1. Spawn an inner Claude that thinks it's just building an app 9 - 2. Guide the inner Claude step-by-step through building "Notekeeper" (a test project) 10 - 3. Verify the inner Claude correctly uses 9plan tools 11 - 4. Check invariants and state between steps 12 - 13 - This provides more realistic testing than unit tests because the inner agent behaves naturally, not knowing it's being tested. 14 - 15 - ## Directory Structure 16 - 17 - ``` 18 - validation/ 19 - ├── README.md # This file 20 - ├── outer-claude-guide.md # Instructions for the outer Claude 21 - ├── sandbox/ 22 - │ └── notekeeper/ # Test project for inner Claude to build 23 - │ ├── package.json # Pre-configured 24 - │ ├── tsconfig.json # Pre-configured 25 - │ └── src/ # Empty - inner Claude populates this 26 - └── scenarios/ 27 - ├── schema.md # Scenario file format documentation 28 - ├── notekeeper-full.yaml # Complete Notekeeper build scenario 29 - ├── decomposition-test.yaml # Test plan decomposition workflow 30 - ├── error-conditions.yaml # Test error handling 31 - ├── dependency-resolution.yaml # Test semantic search for dependencies 32 - └── session-recovery.yaml # Test session resume after context loss 33 - ``` 34 - 35 - ## How It Works 36 - 37 - ### The "Claude Sandwich" Pattern 38 - 39 - ``` 40 - ┌─────────────────────────────────────────────────────┐ 41 - │ OUTER CLAUDE (the puppetmaster) │ 42 - │ │ │ 43 - │ ├─→ Reads outer-claude-guide.md │ 44 - │ ├─→ Loads scenario file (e.g., notekeeper-full) │ 45 - │ │ │ 46 - │ ├─→ Runs: claude -p "Create session" --json │ 47 - │ │ └─→ INNER CLAUDE creates 9plan session │ 48 - │ │ │ 49 - │ ├─→ Validates state (admin tools, filesystem) │ 50 - │ │ │ 51 - │ ├─→ Runs: claude -p "Add plans" --resume $sid │ 52 - │ │ └─→ INNER CLAUDE adds plans to queue │ 53 - │ │ │ 54 - │ ├─→ Validates state... │ 55 - │ │ │ 56 - │ └─→ Continues until scenario complete │ 57 - └─────────────────────────────────────────────────────┘ 58 - ``` 59 - 60 - ### Key Components 61 - 62 - 1. **Outer Claude** - Reads scenarios, runs commands, validates state 63 - 2. **Inner Claude** - Builds the app using 9plan (doesn't know it's a test) 64 - 3. **Notekeeper** - Simple CLI app used as test project 65 - 4. **Scenarios** - YAML files describing what to test and expected outcomes 66 - 5. **Admin Tools** - `9plan_admin_*` tools for state verification 67 - 68 - ## Running Validation 69 - 70 - ### Prerequisites 71 - 72 - 1. 9plan MCP server is built (`npm run build`) 73 - 2. `.mcp.json` is configured in project root 74 - 3. `claude` CLI is available 75 - 76 - ### Running a Scenario 77 - 78 - Ask your outer Claude (the one you're talking to): 79 - 80 - ``` 81 - Please run the 9plan validation scenario at validation/scenarios/notekeeper-full.yaml 82 - ``` 83 - 84 - The outer Claude will: 85 - 1. Read `outer-claude-guide.md` for instructions 86 - 2. Parse the scenario file 87 - 3. Execute step-by-step, spawning inner Claude instances 88 - 4. Validate state between steps 89 - 5. Report pass/fail with details 90 - 91 - ### Manual Testing 92 - 93 - You can also run individual steps manually. **Important**: You must include `--mcp-config` and `--allowedTools` flags: 94 - 95 - ```powershell 96 - # Define allowed tools 97 - $ALLOWED = "Read,Write,mcp__9plan__9plan_session_create,mcp__9plan__9plan_session_resume,mcp__9plan__9plan_queue_add,mcp__9plan__9plan_queue_pull,mcp__9plan__9plan_plan_complete,mcp__9plan__9plan_plan_defer,mcp__9plan__9plan_plan_discard,mcp__9plan__9plan_history_search,mcp__9plan__9plan_history_get" 98 - 99 - # Start a session 100 - $r = claude -p "Create a 9plan session for Notekeeper" --mcp-config ".mcp.json" --allowedTools $ALLOWED --output-format json 101 - $sid = ($r | ConvertFrom-Json).session_id 102 - 103 - # Continue with more prompts (using --resume)... 104 - claude -p "Add a plan for the storage module" --resume $sid --mcp-config ".mcp.json" --allowedTools $ALLOWED 105 - ``` 106 - 107 - **Note on `--allowedTools`**: MCP tools use the format `mcp__<server>__<toolname>`. The inner Claude cannot prompt for permissions in non-interactive mode, so all needed tools must be pre-approved. 108 - 109 - ## Scenarios 110 - 111 - | Scenario | Description | 112 - |----------|-------------| 113 - | `notekeeper-full` | Build complete Notekeeper app from scratch | 114 - | `decomposition-test` | Test plan decomposition and parent aggregation | 115 - | `error-conditions` | Verify error handling (pull with active, etc.) | 116 - | `dependency-resolution` | Test semantic search for input dependencies | 117 - | `session-recovery` | Test resuming session after context loss | 118 - 119 - ## Admin Tools 120 - 121 - These tools help the outer Claude verify state between steps: 122 - 123 - | Tool | Purpose | 124 - |------|---------| 125 - | `9plan_admin_validate` | Check all invariants hold | 126 - | `9plan_admin_sessions` | List all sessions | 127 - | `9plan_admin_state` | Dump detailed session state | 128 - 129 - ## Success Criteria 130 - 131 - Validation passes when: 132 - - [ ] All scenarios execute without errors 133 - - [ ] Invariants hold at every step (checked via admin tools) 134 - - [ ] Inner Claude successfully builds Notekeeper 135 - - [ ] Notekeeper CLI works (`add`, `list`, `search`, `delete` commands) 136 - - [ ] History search returns expected results 137 - - [ ] Decomposition/aggregation workflow completes correctly
-229
validation/outer-claude-guide.md
··· 1 - # Outer Claude Validation Guide 2 - 3 - You are the **outer Claude** - the puppetmaster who will control an inner Claude to validate the 9plan MCP server. This guide explains how to run validation scenarios. 4 - 5 - ## Pre-Validation Checklist 6 - 7 - Before starting, verify: 8 - 9 - - [ ] 9plan MCP server is built: Run `npm run build` in project root 10 - - [ ] `.mcp.json` exists in project root with 9plan configured 11 - - [ ] The `claude` CLI is available and authenticated 12 - - [ ] The sandbox project exists at `validation/sandbox/notekeeper/` 13 - 14 - ## Critical: CLI Flags for Inner Claude 15 - 16 - Every `claude -p` command MUST include these flags: 17 - 18 - ### `--mcp-config ".mcp.json"` 19 - Loads the 9plan MCP server configuration so the inner Claude can use 9plan tools. 20 - 21 - ### `--allowedTools "..."` 22 - Pre-approves tools for non-interactive use. The inner Claude cannot prompt for permissions in `-p` mode, so you must pre-approve all tools it will need. 23 - 24 - **Tool naming format**: MCP tools use the pattern `mcp__<server>__<tool>`. For 9plan, this is `mcp__9plan__<toolname>`. 25 - 26 - **Standard allowedTools for full validation:** 27 - ``` 28 - --allowedTools "Read,Write,mcp__9plan__9plan_session_create,mcp__9plan__9plan_session_resume,mcp__9plan__9plan_queue_add,mcp__9plan__9plan_queue_pull,mcp__9plan__9plan_plan_complete,mcp__9plan__9plan_plan_defer,mcp__9plan__9plan_plan_discard,mcp__9plan__9plan_history_search,mcp__9plan__9plan_history_get" 29 - ``` 30 - 31 - **Why `Read,Write`?** The inner Claude needs file access to read plan files and create implementation files. 32 - 33 - ## How to Run Validation 34 - 35 - ### Step 1: Load the Scenario 36 - 37 - Read the scenario file you want to run (e.g., `validation/scenarios/notekeeper-full.yaml`). This contains: 38 - - `task_description`: What to tell the inner Claude 39 - - `steps`: Sequence of prompts and expected outcomes 40 - - `verification`: Commands to run at the end 41 - 42 - ### Step 2: Start Inner Claude Session 43 - 44 - Run the first prompt to create a session: 45 - 46 - ```powershell 47 - $ALLOWED = "Read,Write,mcp__9plan__9plan_session_create,mcp__9plan__9plan_session_resume,mcp__9plan__9plan_queue_add,mcp__9plan__9plan_queue_pull,mcp__9plan__9plan_plan_complete,mcp__9plan__9plan_plan_defer,mcp__9plan__9plan_plan_discard,mcp__9plan__9plan_history_search,mcp__9plan__9plan_history_get" 48 - 49 - $result = claude -p "<first prompt from scenario>" --mcp-config ".mcp.json" --allowedTools $ALLOWED --output-format json 50 - $sessionId = ($result | ConvertFrom-Json).session_id 51 - ``` 52 - 53 - Save the `session_id` - you'll need it for `--resume` in subsequent steps. Also note: defining `$ALLOWED` once makes subsequent commands cleaner. 54 - 55 - ### Step 3: Validate State 56 - 57 - Between each step, check that the inner Claude did the right thing: 58 - 59 - 1. **Check filesystem** - Use `ls`, `cat`, or Read tool to verify files were created/modified 60 - 2. **Use admin tools** - Call `9plan_admin_state` to see queue/active plan state 61 - 3. **Check for errors** - Look for error messages in the inner Claude's response 62 - 63 - ### Step 4: Send Next Prompt 64 - 65 - Continue the conversation with `--resume`: 66 - 67 - ```powershell 68 - $result = claude -p "<next prompt from scenario>" --resume $sessionId --mcp-config ".mcp.json" --allowedTools $ALLOWED --output-format json 69 - ``` 70 - 71 - ### Step 5: Repeat Until Done 72 - 73 - Continue steps 3-4 until all prompts in the scenario are complete. 74 - 75 - ### Step 6: Final Verification 76 - 77 - Run the verification commands from the scenario: 78 - 79 - ```powershell 80 - # Example: Test the Notekeeper CLI 81 - cd validation/sandbox/notekeeper 82 - npm run build 83 - node dist/index.js add "Test note" 84 - node dist/index.js list 85 - ``` 86 - 87 - ## Handling Failures 88 - 89 - ### Inner Claude Makes Mistake 90 - 91 - If the inner Claude does something unexpected: 92 - 1. Note what went wrong 93 - 2. Decide if this is a 9plan bug or expected agent behavior 94 - 3. You may need to guide the inner Claude with a corrective prompt 95 - 96 - ### State Verification Fails 97 - 98 - If `9plan_admin_validate` returns issues: 99 - 1. This is likely a 9plan bug 100 - 2. Document the exact state and what went wrong 101 - 3. Report the failure 102 - 103 - ### Inner Claude Gets Stuck 104 - 105 - If the inner Claude seems confused or stuck: 106 - 1. Try a more specific prompt 107 - 2. Check if the scenario description is unclear 108 - 3. You may need to provide hints 109 - 110 - ## Prompting the Inner Claude 111 - 112 - ### Good Prompts 113 - 114 - - Be specific about what you want done 115 - - Reference 9plan tools naturally (e.g., "create a session", "add a plan") 116 - - Don't mention that this is a test 117 - 118 - ### Bad Prompts 119 - 120 - - "Test the 9plan server" (reveals it's a test) 121 - - "Use 9plan_session_create" (too prescriptive about tool names) 122 - - Vague instructions that could be interpreted multiple ways 123 - 124 - ## Example Validation Flow 125 - 126 - ```powershell 127 - # Define allowed tools once 128 - $ALLOWED = "Read,Write,mcp__9plan__9plan_session_create,mcp__9plan__9plan_session_resume,mcp__9plan__9plan_queue_add,mcp__9plan__9plan_queue_pull,mcp__9plan__9plan_plan_complete,mcp__9plan__9plan_plan_defer,mcp__9plan__9plan_plan_discard,mcp__9plan__9plan_history_search,mcp__9plan__9plan_history_get" 129 - 130 - # Step 1: Create session 131 - $r1 = claude -p "I want to build a simple CLI note-taking app called Notekeeper. Start by creating a 9plan session to track this work." --mcp-config ".mcp.json" --allowedTools $ALLOWED --output-format json 132 - $sid = ($r1 | ConvertFrom-Json).session_id 133 - 134 - # Validate: Session should be created 135 - # - Check $env:LOCALAPPDATA/9plan/Data/sessions/ for new directory 136 - # - The inner Claude should tell you the session name 137 - 138 - # Step 2: Add plans 139 - $r2 = claude -p "Now let's plan out the work. Add plans for: 1) the storage module, 2) add command, 3) list command, 4) search command, 5) delete command. Add them in the order they should be executed." --resume $sid --mcp-config ".mcp.json" --allowedTools $ALLOWED --output-format json 140 - 141 - # Validate: Plans should be in queue 142 - # - Check plans/ directory in the session folder 143 - # - Verify plan files exist (e.g., k7f3m.txt) 144 - 145 - # Step 3: Execute first plan 146 - $r3 = claude -p "Pull the first plan and implement the storage module in validation/sandbox/notekeeper/src/storage.ts" --resume $sid --mcp-config ".mcp.json" --allowedTools $ALLOWED --output-format json 147 - 148 - # Validate: Storage module created 149 - # - Check validation/sandbox/notekeeper/src/storage.ts exists 150 - # - Check plan file was deleted from plans/ 151 - # - History should now contain the completed plan 152 - 153 - # Continue with remaining plans... 154 - ``` 155 - 156 - ## Using Admin Tools 157 - 158 - Between steps, use these tools to verify state: 159 - 160 - ### 9plan_admin_validate 161 - 162 - Returns whether all invariants hold: 163 - ```json 164 - { 165 - "valid": true, 166 - "invariants": { 167 - "single_active_plan": true, 168 - "queue_order_preserved": true, 169 - "files_match_database": true 170 - }, 171 - "issues": [] 172 - } 173 - ``` 174 - 175 - ### 9plan_admin_state 176 - 177 - Returns detailed state dump: 178 - ```json 179 - { 180 - "session_name": "copper-velvet-morning", 181 - "queue": [ 182 - {"id": "k7f3m", "goal": "Create storage module", "position": 1} 183 - ], 184 - "active_plan": null, 185 - "completed_count": 0, 186 - "plan_files": ["k7f3m.txt"] 187 - } 188 - ``` 189 - 190 - ### 9plan_admin_sessions 191 - 192 - Lists all sessions: 193 - ```json 194 - { 195 - "sessions": [ 196 - {"name": "copper-velvet-morning", "created": "2024-01-15", "plans": 5} 197 - ] 198 - } 199 - ``` 200 - 201 - ## Success Criteria 202 - 203 - A scenario passes when: 204 - 205 - 1. **All prompts execute** - Inner Claude responds to each step 206 - 2. **State is valid at each step** - `9plan_admin_validate` returns no issues 207 - 3. **Expected outcomes match** - Files created, plans completed, etc. 208 - 4. **Verification passes** - Final commands work as expected 209 - 210 - ## Reporting Results 211 - 212 - After running a scenario, report: 213 - 214 - ``` 215 - ## Validation Report: [scenario-name] 216 - 217 - **Status**: PASS / FAIL 218 - 219 - **Steps Completed**: X/Y 220 - 221 - **Issues Found**: 222 - - (list any problems) 223 - 224 - **Inner Claude Behavior**: 225 - - (notes on how the inner Claude performed) 226 - 227 - **9plan Bugs Found**: 228 - - (list any bugs in the MCP server) 229 - ```
-24
validation/sandbox/notekeeper/package.json
··· 1 - { 2 - "name": "notekeeper", 3 - "version": "1.0.0", 4 - "description": "A simple CLI note-taking app for 9plan validation testing", 5 - "type": "module", 6 - "main": "dist/index.js", 7 - "scripts": { 8 - "build": "tsc", 9 - "start": "node dist/index.js", 10 - "clean": "rimraf dist" 11 - }, 12 - "keywords": [ 13 - "cli", 14 - "notes", 15 - "validation" 16 - ], 17 - "author": "", 18 - "license": "MIT", 19 - "devDependencies": { 20 - "typescript": "^5.0.0", 21 - "rimraf": "^5.0.0", 22 - "@types/node": "^20.0.0" 23 - } 24 - }
-63
validation/sandbox/notekeeper/src/storage.ts
··· 1 - /** 2 - * Storage module for notekeeper CLI app 3 - * Provides save/load functions for persisting notes data to disk 4 - */ 5 - 6 - import { readFile, writeFile, mkdir } from 'node:fs/promises'; 7 - import { dirname } from 'node:path'; 8 - 9 - /** The shape of our stored data */ 10 - export interface NotesData { 11 - notes: Array<{ 12 - id: string; 13 - content: string; 14 - createdAt: string; 15 - }>; 16 - } 17 - 18 - /** Default empty state when no data file exists */ 19 - const DEFAULT_DATA: NotesData = { 20 - notes: [], 21 - }; 22 - 23 - /** Default storage file path */ 24 - const DEFAULT_STORAGE_PATH = './data/notes.json'; 25 - 26 - /** 27 - * Load notes data from disk 28 - * Returns default empty state if file doesn't exist 29 - * 30 - * @param filePath - Path to the storage file (defaults to ./data/notes.json) 31 - * @returns The loaded notes data, or default empty state if file not found 32 - */ 33 - export async function load(filePath: string = DEFAULT_STORAGE_PATH): Promise<NotesData> { 34 - try { 35 - const content = await readFile(filePath, 'utf-8'); 36 - const data = JSON.parse(content) as NotesData; 37 - return data; 38 - } catch (error) { 39 - // Handle file not found gracefully - return default state 40 - if (error instanceof Error && 'code' in error && error.code === 'ENOENT') { 41 - return { ...DEFAULT_DATA, notes: [] }; 42 - } 43 - // Re-throw other errors (parse errors, permission errors, etc.) 44 - throw error; 45 - } 46 - } 47 - 48 - /** 49 - * Save notes data to disk 50 - * Creates parent directories if they don't exist 51 - * 52 - * @param data - The notes data to save 53 - * @param filePath - Path to the storage file (defaults to ./data/notes.json) 54 - */ 55 - export async function save(data: NotesData, filePath: string = DEFAULT_STORAGE_PATH): Promise<void> { 56 - // Ensure parent directory exists 57 - const dir = dirname(filePath); 58 - await mkdir(dir, { recursive: true }); 59 - 60 - // Write data with pretty formatting for readability 61 - const content = JSON.stringify(data, null, 2); 62 - await writeFile(filePath, content, 'utf-8'); 63 - }
-20
validation/sandbox/notekeeper/tsconfig.json
··· 1 - { 2 - "compilerOptions": { 3 - "target": "ES2022", 4 - "module": "NodeNext", 5 - "moduleResolution": "NodeNext", 6 - "lib": ["ES2022"], 7 - "outDir": "./dist", 8 - "rootDir": "./src", 9 - "strict": true, 10 - "esModuleInterop": true, 11 - "skipLibCheck": true, 12 - "forceConsistentCasingInFileNames": true, 13 - "resolveJsonModule": true, 14 - "declaration": true, 15 - "declarationMap": true, 16 - "sourceMap": true 17 - }, 18 - "include": ["src/**/*"], 19 - "exclude": ["node_modules", "dist"] 20 - }
-152
validation/scenarios/decomposition-test.yaml
··· 1 - schema_version: "1.0" 2 - scenario_id: decomposition-test 3 - description: | 4 - Test plan decomposition and parent aggregation workflow. 5 - Verifies that parent plans can be deferred, child plans executed, 6 - and parent plans re-pulled for aggregation. 7 - 8 - prerequisites: 9 - 9plan_built: true 10 - 11 - steps: 12 - # Step 1: Create session 13 - - id: create_session 14 - description: "Create a session for decomposition testing" 15 - prompt: | 16 - Create a 9plan session to test building a small utility library. 17 - 18 - expected: 19 - tools_called: 20 - - 9plan_session_create 21 - state: 22 - session_exists: true 23 - 24 - # Step 2: Add parent plan that's too big 25 - - id: add_parent_plan 26 - description: "Add a plan that needs decomposition" 27 - prompt: | 28 - Add a plan for "Build complete utility library" that includes: 29 - - String utilities (capitalize, truncate, etc.) 30 - - Array utilities (unique, flatten, etc.) 31 - - Date utilities (format, parse, etc.) 32 - 33 - This is a big plan that covers multiple areas. 34 - 35 - expected: 36 - tools_called: 37 - - 9plan_queue_add 38 - state: 39 - queue_length: 1 40 - 41 - # Step 3: Pull and recognize need for decomposition 42 - - id: pull_and_decompose 43 - description: "Pull the plan and decompose it" 44 - prompt: | 45 - Pull the plan. This is too big to do in one go - let's break it down. 46 - 47 - Add three sub-plans: 48 - 1. String utilities module 49 - 2. Array utilities module 50 - 3. Date utilities module 51 - 52 - Then defer the parent plan to the back of the queue so we can 53 - aggregate the results later. 54 - 55 - expected: 56 - tools_called: 57 - - 9plan_queue_pull 58 - - 9plan_queue_add # Adding subplans 59 - - 9plan_plan_defer 60 - state: 61 - # After: 3 subplans at front, parent at back = 4 total 62 - queue_length: 4 63 - 64 - validation: 65 - admin_validate: true 66 - # Verify parent was deferred with reason 67 - custom_check: | 68 - # The parent plan's notes should mention decomposition 69 - 70 - # Step 4: Execute first subplan 71 - - id: execute_subplan_1 72 - description: "Execute string utilities subplan" 73 - prompt: | 74 - Pull and complete the string utilities plan. 75 - Just describe what would be in it - no need to actually implement. 76 - 77 - expected: 78 - tools_called: 79 - - 9plan_queue_pull 80 - - 9plan_plan_complete 81 - state: 82 - queue_length: 3 83 - completed_count: 1 84 - 85 - # Step 5: Execute second subplan 86 - - id: execute_subplan_2 87 - description: "Execute array utilities subplan" 88 - prompt: | 89 - Pull and complete the array utilities plan. 90 - 91 - expected: 92 - tools_called: 93 - - 9plan_queue_pull 94 - - 9plan_plan_complete 95 - state: 96 - queue_length: 2 97 - completed_count: 2 98 - 99 - # Step 6: Execute third subplan 100 - - id: execute_subplan_3 101 - description: "Execute date utilities subplan" 102 - prompt: | 103 - Pull and complete the date utilities plan. 104 - 105 - expected: 106 - tools_called: 107 - - 9plan_queue_pull 108 - - 9plan_plan_complete 109 - state: 110 - queue_length: 1 # Only parent remains 111 - completed_count: 3 112 - 113 - # Step 7: Re-pull parent and aggregate 114 - - id: aggregate_parent 115 - description: "Re-pull parent plan and aggregate child outcomes" 116 - prompt: | 117 - Pull the remaining plan (the parent). 118 - Use 9plan_history_search or 9plan_history_get to find the child outcomes. 119 - Then complete the parent with an aggregated summary of all the work done. 120 - 121 - expected: 122 - tools_called: 123 - - 9plan_queue_pull 124 - - 9plan_history_search # or 9plan_history_get 125 - - 9plan_plan_complete 126 - state: 127 - queue_length: 0 128 - completed_count: 4 129 - queue_empty: true 130 - 131 - validation: 132 - admin_validate: true 133 - 134 - verification: 135 - history_searches: 136 - - query: "string utilities" 137 - min_results: 1 138 - 139 - - query: "array utilities" 140 - min_results: 1 141 - 142 - - query: "date utilities" 143 - min_results: 1 144 - 145 - - query: "utility library" 146 - min_results: 1 147 - description: "Should find the parent plan" 148 - 149 - final_state: 150 - queue_empty: true 151 - all_plans_completed: true 152 - completed_count: 4
-148
validation/scenarios/dependency-resolution.yaml
··· 1 - schema_version: "1.0" 2 - scenario_id: dependency-resolution 3 - description: | 4 - Test semantic search for resolving input dependencies between plans. 5 - Verifies that 9plan_history_search correctly finds completed plans 6 - that match input descriptions. 7 - 8 - prerequisites: 9 - 9plan_built: true 10 - 11 - steps: 12 - # Step 1: Create session 13 - - id: create_session 14 - description: "Create a session for dependency testing" 15 - prompt: | 16 - Create a 9plan session to build a simple API client library. 17 - 18 - expected: 19 - tools_called: 20 - - 9plan_session_create 21 - state: 22 - session_exists: true 23 - 24 - # Step 2: Add foundation plan 25 - - id: add_foundation 26 - description: "Add a plan that produces outputs other plans will need" 27 - prompt: | 28 - Add a plan for creating an HTTP client wrapper. 29 - The outputs should include: 30 - - httpClient module with get(), post(), put(), delete() methods 31 - - Error handling types (ApiError, NetworkError) 32 - 33 - expected: 34 - tools_called: 35 - - 9plan_queue_add 36 - state: 37 - queue_length: 1 38 - 39 - # Step 3: Add dependent plan 40 - - id: add_dependent 41 - description: "Add a plan that depends on the foundation" 42 - prompt: | 43 - Add a plan for creating user API methods. 44 - The inputs should reference the httpClient module from the previous plan. 45 - The outputs should include: 46 - - userApi module with getUser(), createUser(), updateUser() methods 47 - 48 - expected: 49 - tools_called: 50 - - 9plan_queue_add 51 - state: 52 - queue_length: 2 53 - 54 - # Step 4: Add another dependent plan 55 - - id: add_another_dependent 56 - description: "Add another plan that also depends on the foundation" 57 - prompt: | 58 - Add a plan for creating posts API methods. 59 - The inputs should also reference the httpClient module. 60 - The outputs should include: 61 - - postsApi module with getPosts(), createPost() methods 62 - 63 - expected: 64 - tools_called: 65 - - 9plan_queue_add 66 - state: 67 - queue_length: 3 68 - 69 - # Step 5: Execute foundation plan 70 - - id: execute_foundation 71 - description: "Pull and complete the HTTP client plan" 72 - prompt: | 73 - Pull the first plan (HTTP client) and complete it. 74 - Describe what was created in the outcome - mention the httpClient module, 75 - the get/post/put/delete methods, and the error types. 76 - 77 - expected: 78 - tools_called: 79 - - 9plan_queue_pull 80 - - 9plan_plan_complete 81 - state: 82 - queue_length: 2 83 - completed_count: 1 84 - 85 - # Step 6: Execute dependent plan with dependency resolution 86 - - id: execute_with_resolution 87 - description: "Pull user API plan and resolve its dependency" 88 - prompt: | 89 - Pull the next plan (user API). 90 - Before implementing, search the history for "httpClient module" to find 91 - where the HTTP client was created. Then complete the plan, mentioning 92 - that you found and used the httpClient from the previous work. 93 - 94 - expected: 95 - tools_called: 96 - - 9plan_queue_pull 97 - - 9plan_history_search 98 - - 9plan_plan_complete 99 - state: 100 - queue_length: 1 101 - completed_count: 2 102 - response_contains: 103 - - "httpClient" # Should mention finding the dependency 104 - 105 - validation: 106 - # Verify the search actually found the foundation plan 107 - custom_check: | 108 - # history_search for "httpClient" should return 1 result 109 - 110 - # Step 7: Execute final plan with same dependency 111 - - id: execute_final 112 - description: "Pull posts API plan and resolve same dependency" 113 - prompt: | 114 - Pull the final plan (posts API). 115 - Search history for the httpClient again and complete the plan. 116 - 117 - expected: 118 - tools_called: 119 - - 9plan_queue_pull 120 - - 9plan_history_search 121 - - 9plan_plan_complete 122 - state: 123 - queue_length: 0 124 - completed_count: 3 125 - 126 - verification: 127 - # Verify semantic search works for various queries 128 - history_searches: 129 - - query: "httpClient get post" 130 - min_results: 1 131 - description: "Should find HTTP client plan" 132 - 133 - - query: "user API getUser" 134 - min_results: 1 135 - description: "Should find user API plan" 136 - 137 - - query: "posts API createPost" 138 - min_results: 1 139 - description: "Should find posts API plan" 140 - 141 - - query: "ApiError NetworkError" 142 - min_results: 1 143 - description: "Should find HTTP client by error types" 144 - 145 - final_state: 146 - queue_empty: true 147 - all_plans_completed: true 148 - completed_count: 3
-138
validation/scenarios/error-conditions.yaml
··· 1 - schema_version: "1.0" 2 - scenario_id: error-conditions 3 - description: | 4 - Test error handling for invalid operations. 5 - Verifies that 9plan returns appropriate errors for invalid state transitions. 6 - 7 - prerequisites: 8 - 9plan_built: true 9 - 10 - steps: 11 - # Step 1: Create session 12 - - id: create_session 13 - description: "Create a session for error testing" 14 - prompt: | 15 - Create a 9plan session for testing error conditions. 16 - 17 - expected: 18 - tools_called: 19 - - 9plan_session_create 20 - state: 21 - session_exists: true 22 - 23 - # Step 2: Try to pull from empty queue 24 - - id: pull_empty_queue 25 - description: "Try to pull when queue is empty" 26 - prompt: | 27 - Try to pull a plan from the queue. 28 - The queue should be empty, so this should fail or indicate there's nothing to pull. 29 - 30 - expected: 31 - tools_called: 32 - - 9plan_queue_pull 33 - response_contains: 34 - - "empty" # Should mention queue is empty 35 - 36 - # Step 3: Try to complete without active plan 37 - - id: complete_no_active 38 - description: "Try to complete when no plan is active" 39 - prompt: | 40 - Try to complete a plan with outcome "test". 41 - There's no active plan, so this should fail. 42 - 43 - expected: 44 - tools_called: 45 - - 9plan_plan_complete 46 - response_contains: 47 - - "no active" # Should mention no active plan 48 - 49 - # Step 4: Try to defer without active plan 50 - - id: defer_no_active 51 - description: "Try to defer when no plan is active" 52 - prompt: | 53 - Try to defer a plan with reason "testing". 54 - There's no active plan, so this should fail. 55 - 56 - expected: 57 - tools_called: 58 - - 9plan_plan_defer 59 - response_contains: 60 - - "no active" 61 - 62 - # Step 5: Add a plan and pull it 63 - - id: setup_active_plan 64 - description: "Add and pull a plan to set up active state" 65 - prompt: | 66 - Add a simple test plan and then pull it so we have an active plan. 67 - 68 - expected: 69 - tools_called: 70 - - 9plan_queue_add 71 - - 9plan_queue_pull 72 - state: 73 - active_plan: true 74 - 75 - # Step 6: Try to pull again while plan is active 76 - - id: pull_while_active 77 - description: "Try to pull when a plan is already active" 78 - prompt: | 79 - Try to pull another plan. 80 - We already have an active plan, so this should fail. 81 - 82 - expected: 83 - tools_called: 84 - - 9plan_queue_pull 85 - response_contains: 86 - - "already active" # Should mention plan already active 87 - 88 - # Step 7: Clean up - complete the active plan 89 - - id: cleanup 90 - description: "Complete the active plan to clean up" 91 - prompt: | 92 - Complete the active plan with outcome "test completed". 93 - 94 - expected: 95 - tools_called: 96 - - 9plan_plan_complete 97 - state: 98 - active_plan: false 99 - completed_count: 1 100 - 101 - # Step 8: Try to get non-existent plan from history 102 - - id: history_get_invalid 103 - description: "Try to get a plan that doesn't exist" 104 - prompt: | 105 - Try to get plan details for a non-existent plan ID like "xxxxx". 106 - 107 - expected: 108 - tools_called: 109 - - 9plan_history_get 110 - response_contains: 111 - - "not found" # Should indicate plan not found 112 - 113 - # Step 9: Try to resume non-existent session 114 - - id: resume_invalid_session 115 - description: "Try to resume a session that doesn't exist" 116 - prompt: | 117 - Try to resume a session called "nonexistent-fake-session". 118 - 119 - expected: 120 - tools_called: 121 - - 9plan_session_resume 122 - response_contains: 123 - - "not found" # Should indicate session not found 124 - 125 - verification: 126 - final_state: 127 - # After all tests, should have 1 completed plan 128 - completed_count: 1 129 - queue_empty: true 130 - 131 - # All error conditions should have been tested 132 - error_tests_passed: 133 - - pull_empty_queue 134 - - complete_no_active 135 - - defer_no_active 136 - - pull_while_active 137 - - history_get_invalid 138 - - resume_invalid_session
-175
validation/scenarios/notekeeper-full.yaml
··· 1 - schema_version: "1.0" 2 - scenario_id: notekeeper-full 3 - description: | 4 - Complete Notekeeper CLI application build from scratch. 5 - Tests the full 9plan workflow: session creation, planning, execution, and completion. 6 - 7 - prerequisites: 8 - 9plan_built: true 9 - sandbox_clean: true # validation/sandbox/notekeeper/src/ should be empty 10 - 11 - # The overall task the inner Claude is working on 12 - task_description: | 13 - Build a simple CLI note-taking application called Notekeeper. 14 - It should support: add, list, search, and delete commands. 15 - Notes are stored in a JSON file. 16 - 17 - steps: 18 - # Step 1: Create the session 19 - - id: create_session 20 - description: "Create a 9plan session for the Notekeeper project" 21 - prompt: | 22 - I want to build a simple CLI note-taking app called Notekeeper. 23 - It should let users add notes, list all notes, search notes, and delete notes. 24 - Notes will be stored in a JSON file. 25 - 26 - Start by creating a 9plan session to track this work. 27 - 28 - expected: 29 - tools_called: 30 - - 9plan_session_create 31 - state: 32 - session_exists: true 33 - queue_length: 0 34 - response_contains: 35 - - "Session" 36 - 37 - validation: 38 - admin_validate: true 39 - 40 - # Step 2: Bootstrap plans 41 - - id: bootstrap_plans 42 - description: "Add initial plans for all components" 43 - prompt: | 44 - Now let's plan out the work. Think at the FEATURE level, not individual functions. 45 - 46 - We need two main pieces: 47 - 1. Storage layer - Note type definition and JSON file persistence (loadNotes, saveNotes, generateId) 48 - 2. CLI layer - All commands (add, list, search, delete) plus the entry point that routes to them 49 - 50 - Add these as 2 plans in the order they should be executed. 51 - The storage module should be first since the CLI depends on it. 52 - 53 - expected: 54 - tools_called: 55 - - 9plan_queue_add 56 - state: 57 - queue_length: 2 # 2 plans: storage layer, CLI layer 58 - files: 59 - - pattern: "~/.9plan/sessions/*/plans/*.txt" 60 - count: 2 61 - 62 - validation: 63 - admin_validate: true 64 - 65 - # Step 3: Execute storage layer 66 - - id: execute_storage 67 - description: "Pull and implement the storage layer" 68 - prompt: | 69 - Pull the first plan and implement the storage layer. 70 - Create the files in validation/sandbox/notekeeper/src/. 71 - 72 - The Note type should have: id (string), content (string), createdAt (string). 73 - The storage module should export: loadNotes(), saveNotes(), generateId(). 74 - 75 - expected: 76 - tools_called: 77 - - 9plan_queue_pull 78 - - 9plan_plan_complete 79 - state: 80 - queue_length: 1 81 - completed_count: 1 82 - files: 83 - - pattern: "validation/sandbox/notekeeper/src/types.ts" 84 - exists: true 85 - contains: 86 - - "interface Note" 87 - - "id" 88 - - "content" 89 - - pattern: "validation/sandbox/notekeeper/src/storage.ts" 90 - exists: true 91 - contains: 92 - - "loadNotes" 93 - - "saveNotes" 94 - 95 - validation: 96 - admin_validate: true 97 - 98 - # Step 4: Execute CLI layer (all commands + entry point) 99 - - id: execute_cli 100 - description: "Pull and implement all CLI commands and entry point" 101 - prompt: | 102 - Pull the final plan and implement the complete CLI layer: 103 - - add command: takes content string, saves new note 104 - - list command: displays all notes with IDs and content 105 - - search command: finds notes by keyword 106 - - delete command: removes note by ID 107 - - Entry point: parses args and routes to the right command 108 - 109 - Usage: notekeeper <command> [args] 110 - Commands: add <content>, list, search <keyword>, delete <id> 111 - 112 - Create these in validation/sandbox/notekeeper/src/commands/ and src/index.ts. 113 - 114 - expected: 115 - tools_called: 116 - - 9plan_queue_pull 117 - - 9plan_plan_complete 118 - state: 119 - queue_length: 0 120 - completed_count: 2 121 - queue_empty: true 122 - files: 123 - - pattern: "validation/sandbox/notekeeper/src/commands/add.ts" 124 - exists: true 125 - - pattern: "validation/sandbox/notekeeper/src/commands/list.ts" 126 - exists: true 127 - - pattern: "validation/sandbox/notekeeper/src/commands/search.ts" 128 - exists: true 129 - - pattern: "validation/sandbox/notekeeper/src/commands/delete.ts" 130 - exists: true 131 - - pattern: "validation/sandbox/notekeeper/src/index.ts" 132 - exists: true 133 - 134 - validation: 135 - admin_validate: true 136 - 137 - # Final verification after all steps 138 - verification: 139 - # Build the project 140 - commands: 141 - - command: "cd validation/sandbox/notekeeper && npm install" 142 - description: "Install dependencies" 143 - success: true 144 - 145 - - command: "cd validation/sandbox/notekeeper && npm run build" 146 - description: "Build TypeScript" 147 - success: true 148 - 149 - - command: "node validation/sandbox/notekeeper/dist/index.js add \"Test note from validation\"" 150 - description: "Test add command" 151 - output_contains: "added" 152 - 153 - - command: "node validation/sandbox/notekeeper/dist/index.js list" 154 - description: "Test list command" 155 - output_contains: "Test note from validation" 156 - 157 - - command: "node validation/sandbox/notekeeper/dist/index.js search test" 158 - description: "Test search command" 159 - output_contains: "Test note from validation" 160 - 161 - # Verify history search works 162 - history_searches: 163 - - query: "storage loadNotes saveNotes" 164 - min_results: 1 165 - description: "Should find the storage layer plan" 166 - 167 - - query: "CLI commands add list search delete" 168 - min_results: 1 169 - description: "Should find the CLI layer plan" 170 - 171 - # Final state check 172 - final_state: 173 - queue_empty: true 174 - all_plans_completed: true 175 - completed_count: 2
-232
validation/scenarios/schema.md
··· 1 - # Validation Scenario File Schema 2 - 3 - This document describes the YAML format for validation scenario files. 4 - 5 - ## Overview 6 - 7 - Scenario files define a sequence of steps for the outer Claude to execute against an inner Claude, along with expected outcomes and verification commands. 8 - 9 - ## Schema Version 10 - 11 - All scenario files must specify the schema version: 12 - 13 - ```yaml 14 - schema_version: "1.0" 15 - ``` 16 - 17 - ## Top-Level Fields 18 - 19 - | Field | Type | Required | Description | 20 - |-------|------|----------|-------------| 21 - | `schema_version` | string | Yes | Schema version (currently "1.0") | 22 - | `scenario_id` | string | Yes | Unique identifier for this scenario | 23 - | `description` | string | Yes | Human-readable description | 24 - | `prerequisites` | object | No | What must be true before running | 25 - | `steps` | array | Yes | Sequence of prompts and validations | 26 - | `verification` | object | No | Final verification commands | 27 - 28 - ## Prerequisites 29 - 30 - Optional conditions that must be met before running: 31 - 32 - ```yaml 33 - prerequisites: 34 - 9plan_built: true # npm run build completed 35 - sandbox_clean: true # sandbox/notekeeper/src/ is empty 36 - no_existing_sessions: true # no 9plan sessions exist 37 - ``` 38 - 39 - ## Steps 40 - 41 - Each step contains a prompt to send and expected outcomes: 42 - 43 - ```yaml 44 - steps: 45 - - id: create_session 46 - description: "Create a 9plan session for the project" 47 - prompt: | 48 - I want to build a simple CLI note-taking app called Notekeeper. 49 - Start by creating a 9plan session to track this work. 50 - 51 - expected: 52 - # What the inner Claude should do 53 - tools_called: 54 - - 9plan_session_create 55 - 56 - # State after this step 57 - state: 58 - session_exists: true 59 - queue_length: 0 60 - active_plan: null 61 - 62 - # Files that should exist 63 - files: 64 - - pattern: "~/.9plan/sessions/*/session.db" 65 - exists: true 66 - - pattern: "~/.9plan/sessions/*/plans/" 67 - is_directory: true 68 - 69 - # Optional: How to validate 70 - validation: 71 - admin_validate: true # Run 9plan_admin_validate 72 - custom_check: | 73 - # PowerShell to run for custom validation 74 - Test-Path ~/.9plan/sessions/*/session.db 75 - ``` 76 - 77 - ## Step Fields 78 - 79 - | Field | Type | Required | Description | 80 - |-------|------|----------|-------------| 81 - | `id` | string | Yes | Unique step identifier | 82 - | `description` | string | Yes | What this step does | 83 - | `prompt` | string | Yes | Prompt to send to inner Claude | 84 - | `expected` | object | No | Expected outcomes | 85 - | `validation` | object | No | How to validate this step | 86 - | `on_failure` | string | No | What to do if step fails ("abort", "continue", "retry") | 87 - 88 - ## Expected Outcomes 89 - 90 - ### tools_called 91 - 92 - List of MCP tools the inner Claude should call: 93 - 94 - ```yaml 95 - expected: 96 - tools_called: 97 - - 9plan_session_create 98 - - 9plan_queue_add 99 - ``` 100 - 101 - ### state 102 - 103 - Expected 9plan state after this step: 104 - 105 - ```yaml 106 - expected: 107 - state: 108 - session_exists: true 109 - queue_length: 3 110 - active_plan: null 111 - completed_count: 0 112 - ``` 113 - 114 - ### files 115 - 116 - Expected file system state: 117 - 118 - ```yaml 119 - expected: 120 - files: 121 - - pattern: "validation/sandbox/notekeeper/src/storage.ts" 122 - exists: true 123 - contains: 124 - - "export function loadNotes" 125 - - "export function saveNotes" 126 - - pattern: "~/.9plan/sessions/*/plans/*.txt" 127 - count: 3 # Exactly 3 plan files 128 - ``` 129 - 130 - ### response_contains 131 - 132 - Keywords that should appear in inner Claude's response: 133 - 134 - ```yaml 135 - expected: 136 - response_contains: 137 - - "session created" 138 - - "Session:" 139 - ``` 140 - 141 - ## Verification 142 - 143 - Final verification after all steps complete: 144 - 145 - ```yaml 146 - verification: 147 - # Commands to run 148 - commands: 149 - - command: "cd validation/sandbox/notekeeper && npm run build" 150 - success: true 151 - 152 - - command: "node validation/sandbox/notekeeper/dist/index.js add 'Test note'" 153 - output_contains: "Note added" 154 - 155 - - command: "node validation/sandbox/notekeeper/dist/index.js list" 156 - output_contains: "Test note" 157 - 158 - # History searches to verify 159 - history_searches: 160 - - query: "storage module" 161 - min_results: 1 162 - 163 - - query: "add command" 164 - results_contain: 165 - goal_keywords: ["add", "command"] 166 - 167 - # Final state check 168 - final_state: 169 - queue_empty: true 170 - all_plans_completed: true 171 - ``` 172 - 173 - ## Complete Example 174 - 175 - ```yaml 176 - schema_version: "1.0" 177 - scenario_id: simple-session-test 178 - description: "Test basic session creation and plan lifecycle" 179 - 180 - prerequisites: 181 - 9plan_built: true 182 - 183 - steps: 184 - - id: create_session 185 - description: "Create a session" 186 - prompt: "Create a 9plan session for testing" 187 - expected: 188 - tools_called: 189 - - 9plan_session_create 190 - state: 191 - session_exists: true 192 - validation: 193 - admin_validate: true 194 - 195 - - id: add_plan 196 - description: "Add a test plan" 197 - prompt: "Add a plan to test something simple" 198 - expected: 199 - tools_called: 200 - - 9plan_queue_add 201 - state: 202 - queue_length: 1 203 - 204 - - id: pull_plan 205 - description: "Pull the plan" 206 - prompt: "Pull the plan and mark it complete with a simple outcome" 207 - expected: 208 - tools_called: 209 - - 9plan_queue_pull 210 - - 9plan_plan_complete 211 - state: 212 - queue_length: 0 213 - completed_count: 1 214 - 215 - verification: 216 - history_searches: 217 - - query: "test" 218 - min_results: 1 219 - 220 - final_state: 221 - queue_empty: true 222 - ``` 223 - 224 - ## Scenario Files in This Directory 225 - 226 - | File | Description | 227 - |------|-------------| 228 - | `notekeeper-full.yaml` | Complete Notekeeper build scenario | 229 - | `decomposition-test.yaml` | Test plan decomposition and aggregation | 230 - | `error-conditions.yaml` | Test error handling | 231 - | `dependency-resolution.yaml` | Test semantic search for dependencies | 232 - | `session-recovery.yaml` | Test session resume after context loss |
-153
validation/scenarios/session-recovery.yaml
··· 1 - schema_version: "1.0" 2 - scenario_id: session-recovery 3 - description: | 4 - Test session resume after context loss. 5 - Simulates a scenario where the agent loses context and needs to resume 6 - a session using 9plan_session_resume. 7 - 8 - prerequisites: 9 - 9plan_built: true 10 - 11 - # Special instruction for outer Claude: 12 - # After step 3, you will start a NEW inner Claude session (new --resume chain) 13 - # to simulate context loss. The new session should use session_resume. 14 - 15 - steps: 16 - # Phase 1: Initial work (first inner Claude session) 17 - 18 - - id: create_session 19 - description: "Create a session and do some work" 20 - prompt: | 21 - Create a 9plan session for building a calculator app. 22 - 23 - expected: 24 - tools_called: 25 - - 9plan_session_create 26 - state: 27 - session_exists: true 28 - # IMPORTANT: Outer Claude must save the session name for later 29 - 30 - - id: add_plans 31 - description: "Add several plans" 32 - prompt: | 33 - Add plans for: 34 - 1. Basic operations (add, subtract, multiply, divide) 35 - 2. Advanced operations (power, sqrt, log) 36 - 3. Memory functions (store, recall, clear) 37 - 38 - expected: 39 - tools_called: 40 - - 9plan_queue_add 41 - state: 42 - queue_length: 3 43 - 44 - - id: start_work 45 - description: "Pull and complete the first plan" 46 - prompt: | 47 - Pull the first plan (basic operations) and complete it. 48 - 49 - expected: 50 - tools_called: 51 - - 9plan_queue_pull 52 - - 9plan_plan_complete 53 - state: 54 - queue_length: 2 55 - completed_count: 1 56 - 57 - # --- CONTEXT LOSS SIMULATION --- 58 - # Outer Claude: Start a new inner Claude session here (no --resume) 59 - # This simulates the agent losing context mid-task 60 - 61 - - id: context_loss 62 - description: "Simulate context loss by starting fresh inner Claude" 63 - special_instruction: | 64 - OUTER CLAUDE: Start a completely new inner Claude session. 65 - Do NOT use --resume. This simulates the agent losing all context. 66 - Save the session name from step 1 to give to the new session. 67 - prompt: null # No prompt - this is an instruction for outer Claude 68 - 69 - # Phase 2: Recovery (new inner Claude session) 70 - 71 - - id: resume_session 72 - description: "Resume the session using session name" 73 - # Outer Claude should tell the new inner Claude about the session 74 - prompt: | 75 - You were working on a calculator app but lost context. 76 - Resume the 9plan session named "{SESSION_NAME_FROM_STEP_1}". 77 - (Outer Claude: substitute the actual session name here) 78 - 79 - expected: 80 - tools_called: 81 - - 9plan_session_resume 82 - state: 83 - session_exists: true 84 - response_contains: 85 - - "resumed" 86 - 87 - - id: check_state_after_resume 88 - description: "Check the queue state after resuming" 89 - prompt: | 90 - Check what plans are in the queue. You should see some plans remaining 91 - from before the context loss. 92 - 93 - expected: 94 - state: 95 - queue_length: 2 # 2 plans should remain 96 - completed_count: 1 # 1 was completed before 97 - 98 - - id: continue_work 99 - description: "Continue working after resume" 100 - prompt: | 101 - Pull the next plan and complete it. 102 - 103 - expected: 104 - tools_called: 105 - - 9plan_queue_pull 106 - - 9plan_plan_complete 107 - state: 108 - queue_length: 1 109 - completed_count: 2 110 - 111 - - id: finish_work 112 - description: "Complete the remaining work" 113 - prompt: | 114 - Pull and complete the final plan. 115 - 116 - expected: 117 - tools_called: 118 - - 9plan_queue_pull 119 - - 9plan_plan_complete 120 - state: 121 - queue_length: 0 122 - completed_count: 3 123 - 124 - verification: 125 - # Verify all plans were completed despite context loss 126 - history_searches: 127 - - query: "basic operations add subtract" 128 - min_results: 1 129 - description: "First plan should be in history" 130 - 131 - - query: "advanced operations power sqrt" 132 - min_results: 1 133 - description: "Second plan should be in history" 134 - 135 - - query: "memory functions store recall" 136 - min_results: 1 137 - description: "Third plan should be in history" 138 - 139 - final_state: 140 - queue_empty: true 141 - all_plans_completed: true 142 - completed_count: 3 143 - 144 - # Notes for outer Claude: 145 - notes: | 146 - This scenario is special because it requires starting a new inner Claude 147 - session mid-way through to simulate context loss. 148 - 149 - Steps 1-3: Use one inner Claude session (with --resume between steps) 150 - Step 4: Instruction to outer Claude - no inner Claude action 151 - Steps 5-8: Use a NEW inner Claude session (fresh start, no --resume from before) 152 - 153 - The key validation is that session_resume allows recovery and work continues.
-113
validation/scripts/invoke-inner.ps1
··· 1 - <# 2 - .SYNOPSIS 3 - Invoke inner Claude with pre-approved 9plan tools 4 - .DESCRIPTION 5 - Helper script for the "Claude sandwich" validation pattern. 6 - Bundles the --mcp-config and --allowedTools flags so you don't have to type them every time. 7 - .PARAMETER Prompt 8 - The prompt to send to inner Claude 9 - .PARAMETER Resume 10 - Optional session ID to resume a previous conversation 11 - .PARAMETER OutputFormat 12 - Output format: "text" (default) or "json" 13 - .EXAMPLE 14 - .\invoke-inner.ps1 -Prompt "Create a 9plan session for Notekeeper" 15 - .EXAMPLE 16 - .\invoke-inner.ps1 -Prompt "Add plans for storage module" -Resume "abc123" -OutputFormat json 17 - #> 18 - param( 19 - [Parameter(Mandatory=$true)] 20 - [string]$Prompt, 21 - 22 - [Parameter(Mandatory=$false)] 23 - [string]$Resume, 24 - 25 - [Parameter(Mandatory=$false)] 26 - [ValidateSet("text", "json")] 27 - [string]$OutputFormat = "text", 28 - 29 - [Parameter(Mandatory=$false)] 30 - [switch]$Bootstrap 31 - ) 32 - 33 - # All 9plan MCP tools + file access for implementation 34 - $ALLOWED_TOOLS = @( 35 - # File operations (needed for reading plans and writing code) 36 - "Read", 37 - "Write", 38 - # 9plan session tools 39 - "mcp__9plan__9plan_session_create", 40 - "mcp__9plan__9plan_session_resume", 41 - # 9plan queue tools 42 - "mcp__9plan__9plan_queue_add", 43 - "mcp__9plan__9plan_queue_pull", 44 - # 9plan plan lifecycle tools 45 - "mcp__9plan__9plan_plan_complete", 46 - "mcp__9plan__9plan_plan_defer", 47 - "mcp__9plan__9plan_plan_discard", 48 - # 9plan history tools 49 - "mcp__9plan__9plan_history_search", 50 - "mcp__9plan__9plan_history_get", 51 - # 9plan admin tools (for validation) 52 - "mcp__9plan__9plan_admin_validate", 53 - "mcp__9plan__9plan_admin_sessions", 54 - "mcp__9plan__9plan_admin_state" 55 - ) -join "," 56 - 57 - # Bootstrap prompt content (condensed from src/prompts/bootstrap.ts) 58 - $BOOTSTRAP_PROMPT = @" 59 - You have access to 9plan, a work queue system for tracking complex tasks. 60 - 61 - WORKFLOW: 62 - 1. CREATE SESSION: Use 9plan_session_create with a task description 63 - 2. DECOMPOSE: Break the task into discrete, self-contained plans 64 - 3. ENQUEUE: Add plans with 9plan_queue_add (use "back" position, add in dependency order) 65 - 4. EXECUTE: Pull plans with 9plan_queue_pull, implement, then complete with 9plan_plan_complete 66 - 67 - PLAN SCOPE - AVOID OVER-DECOMPOSITION: 68 - Plans should be at the FEATURE level, not the function level. 69 - - GOOD: "Implement CLI commands for notes (add, list, search, delete)" - groups related functionality 70 - - BAD: "Implement add command" then "Implement list command" - too granular, should be ONE plan 71 - - NEVER A PLAN: "Write tests for X", "Commit changes", "Run the build" - these are part of completing plans, not separate plans 72 - Rule of thumb: A simple CLI app needs 1-2 plans, not 6-7. 73 - 74 - PLAN STRUCTURE - Each plan MUST have: 75 - - Context: Where this fits in the overall task 76 - - Goal: Specific, measurable objective (at FEATURE level, not function level) 77 - - Inputs: What this plan needs from other plans (by description) 78 - - Outputs: What this plan produces that others may need 79 - - Approach: Concrete, actionable steps. Include enough detail (code samples, structure, edge cases) that the plan is executable without additional context. 80 - - Testing: REQUIRED! Specific commands to verify the implementation works (e.g., "Run npm run build - should compile without errors") 81 - - Success Criteria: How you'll know it's done 82 - 83 - TESTING IS REQUIRED - Every plan must have specific verification commands. Before completing a plan, RUN THE TESTS. 84 - Good: "Run node dist/index.js list - should show all notes" 85 - Bad: "(none)" or "verify it works" - TOO VAGUE, will be rejected 86 - 87 - DEPENDENCY RESOLUTION: 88 - - Plans reference dependencies by description, not ID 89 - - Use 9plan_history_search to find completed plan outputs when needed 90 - 91 - Use these tools to track work formally so progress survives context limits. 92 - "@ 93 - 94 - # Build the command 95 - $cmdArgs = @( 96 - "-p", $Prompt, 97 - "--mcp-config", ".mcp.json", 98 - "--allowedTools", $ALLOWED_TOOLS, 99 - "--output-format", $OutputFormat 100 - ) 101 - 102 - if ($Bootstrap) { 103 - $cmdArgs += "--append-system-prompt" 104 - $cmdArgs += $BOOTSTRAP_PROMPT 105 - } 106 - 107 - if ($Resume) { 108 - $cmdArgs += "--resume" 109 - $cmdArgs += $Resume 110 - } 111 - 112 - # Execute 113 - & claude @cmdArgs