feat: add agent validation infrastructure · karashiiro.moe/9plan@853d71c

+12

.mcp.json

··· 1 + { 2 + "mcpServers": { 3 + "9plan": { 4 + "type": "stdio", 5 + "command": "node", 6 + "args": ["dist/index.js"], 7 + "env": { 8 + "NINEPLAN_LOG_LEVEL": "info" 9 + } 10 + } 11 + } 12 + }

+161

docs/api-examples.md

··· 535 535 536 536 --- 537 537 538 + ## Admin Tools 539 + 540 + ### 9plan_admin_validate 541 + 542 + Checks session state against all invariants. 543 + 544 + **Request:** 545 + ```json 546 + { 547 + "tool": "9plan_admin_validate", 548 + "arguments": {} 549 + } 550 + ``` 551 + 552 + **Response (Success - all invariants pass):** 553 + ``` 554 + [Session: amber-quiet-river] 555 + 556 + Validation passed: All invariants hold. 557 + 558 + Invariants: 559 + ✓ single_active_plan: At most one plan is active 560 + ✓ queue_order_preserved: Queue positions are sequential 561 + ✓ files_match_database: Plan files exist for all queued/active plans 562 + ✓ no_orphaned_files: No plan files without database entries 563 + 564 + Issues: (none) 565 + ``` 566 + 567 + **Response (Issues detected):** 568 + ``` 569 + [Session: amber-quiet-river] 570 + 571 + Validation failed: 2 issues detected. 572 + 573 + Invariants: 574 + ✓ single_active_plan: At most one plan is active 575 + ✓ queue_order_preserved: Queue positions are sequential 576 + ✗ files_match_database: Plan files exist for all queued/active plans 577 + ✗ no_orphaned_files: No plan files without database entries 578 + 579 + Issues: 580 + - Missing plan file: plans/k7f3m.txt (plan k7f3m is queued) 581 + - Orphaned file: plans/old123.txt (no database entry) 582 + ``` 583 + 584 + --- 585 + 586 + ### 9plan_admin_sessions 587 + 588 + Lists all sessions on the system. 589 + 590 + **Request:** 591 + ```json 592 + { 593 + "tool": "9plan_admin_sessions", 594 + "arguments": {} 595 + } 596 + ``` 597 + 598 + **Response (Success):** 599 + ``` 600 + Sessions found: 3 601 + 602 + 1. amber-quiet-river 603 + Created: 2024-01-15 10:30 604 + Task: Build Ghost-powered blog application 605 + Queue: 2 plans 606 + Completed: 4 plans 607 + Active: Yes (k7f3m) 608 + 609 + 2. copper-velvet-morning 610 + Created: 2024-01-14 14:00 611 + Task: Implement user authentication 612 + Queue: 0 plans 613 + Completed: 6 plans 614 + Active: No 615 + 616 + 3. silver-ocean-dawn 617 + Created: 2024-01-13 09:15 618 + Task: Refactor database layer 619 + Queue: 3 plans 620 + Completed: 1 plan 621 + Active: No 622 + ``` 623 + 624 + **Response (No sessions):** 625 + ``` 626 + No sessions found. 627 + 628 + Use 9plan_session_create to create a new session. 629 + ``` 630 + 631 + --- 632 + 633 + ### 9plan_admin_state 634 + 635 + Returns detailed state dump for current session. 636 + 637 + **Request:** 638 + ```json 639 + { 640 + "tool": "9plan_admin_state", 641 + "arguments": {} 642 + } 643 + ``` 644 + 645 + **Response (Success):** 646 + ``` 647 + [Session: amber-quiet-river] 648 + 649 + Session State Dump 650 + ================== 651 + 652 + Session: amber-quiet-river 653 + Task: Build Ghost-powered blog application with authentication, post fetching, pagination, and response caching 654 + 655 + Queue (2 plans): 656 + Position 1: m2x9p - "Implement response caching system" 657 + Position 2: p4r2k - "Build post display layer with pagination" 658 + 659 + Active Plan: 660 + ID: k7f3m 661 + Goal: Create authenticated Ghost API client module 662 + File: /Users/dev/.9plan/sessions/amber-quiet-river/plans/k7f3m.txt 663 + 664 + Completed Plans: 4 665 + 666 + Plan Files in Directory: 667 + - k7f3m.txt (active) 668 + - m2x9p.txt (queued) 669 + - p4r2k.txt (queued) 670 + 671 + File/Database Match: ✓ Yes 672 + ``` 673 + 674 + **Response (Empty session):** 675 + ``` 676 + [Session: amber-quiet-river] 677 + 678 + Session State Dump 679 + ================== 680 + 681 + Session: amber-quiet-river 682 + Task: Build Ghost-powered blog application 683 + 684 + Queue: (empty) 685 + 686 + Active Plan: (none) 687 + 688 + Completed Plans: 6 689 + 690 + Plan Files in Directory: (none) 691 + 692 + File/Database Match: ✓ Yes 693 + 694 + Task complete! All plans have been executed. 695 + ``` 696 + 697 + --- 698 + 538 699 ## Common Patterns 539 700 540 701 ### Resolving Dependencies

+46

docs/design.md

··· 166 166 167 167 Returns the full completed plan including context, goal, inputs, outputs, approach, success criteria, notes, and outcome. Use this when you know the specific plan ID (e.g., from Notes indicating which child plans were created during decomposition). 168 168 169 + ### Admin Tools 170 + 171 + These tools support validation and debugging. They are prefixed with `admin` to distinguish them from core workflow tools. 172 + 173 + **`9plan_admin_validate`** 174 + Checks session state against all invariants. 175 + 176 + Returns: 177 + - `valid`: boolean indicating if all invariants pass 178 + - `invariants`: object with status of each invariant check 179 + - `single_active_plan`: at most one plan is active 180 + - `queue_order_preserved`: queue positions are sequential 181 + - `files_match_database`: plan files exist for all queued/active plans 182 + - `no_orphaned_files`: no plan files without database entries 183 + - `issues`: array of any detected problems 184 + 185 + Use this to verify the session is in a consistent state after operations. 186 + 187 + **`9plan_admin_sessions`** 188 + Lists all sessions on the system. 189 + 190 + Returns: 191 + - `sessions`: array of session objects with: 192 + - `name`: three-word identifier 193 + - `created`: creation timestamp 194 + - `task_description`: overall task summary 195 + - `queue_length`: number of plans in queue 196 + - `completed_count`: number of completed plans 197 + - `has_active_plan`: whether a plan is currently active 198 + 199 + Use this to find existing sessions or clean up old ones. 200 + 201 + **`9plan_admin_state`** 202 + Returns detailed state dump for the current session. 203 + 204 + Returns: 205 + - `session_name`: current session identifier 206 + - `task_description`: overall task summary 207 + - `queue`: array of queued plans (id, goal, position) 208 + - `active_plan`: currently active plan (id, goal, file_path) or null 209 + - `completed_count`: number of completed plans 210 + - `plan_files`: list of files in plans/ directory 211 + - `file_db_match`: whether files match database state 212 + 213 + Use this for detailed inspection during debugging or validation. 214 + 169 215 --- 170 216 171 217 ## Prompts

+96 -3

docs/implementation-plan.md

··· 65 65 │ 9plan MCP Server │ 66 66 │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │ 67 67 │ │ Tools │ │ Prompts │ │ Server Instructions │ │ 68 - │ │ (9 total) │ │ (bootstrap) │ │ (plan format spec) │ │ 68 + │ │ (12 total) │ │ (bootstrap) │ │ (plan format spec) │ │ 69 69 │ └──────┬──────┘ └─────────────┘ └─────────────────────────┘ │ 70 70 │ │ │ 71 71 │ ┌──────▼──────────────────────────────────────────────────┐ │ ··· 213 213 │ │ ├── plan-complete.ts 214 214 │ │ ├── plan-discard.ts 215 215 │ │ ├── history-search.ts 216 - │ │ └── history-get.ts 216 + │ │ ├── history-get.ts 217 + │ │ ├── admin-validate.ts # Invariant checking 218 + │ │ ├── admin-sessions.ts # List all sessions 219 + │ │ └── admin-state.ts # Detailed state dump 217 220 │ │ 218 221 │ ├── prompts/ 219 222 │ │ └── bootstrap.ts ··· 502 505 503 506 --- 504 507 508 + ### 6.5 Admin Tools 509 + 510 + Admin tools support validation and debugging. Prefixed with `admin_` to distinguish from core workflow tools. 511 + 512 + #### `9plan_admin_validate` 513 + 514 + Checks session state against all invariants. 515 + 516 + **Input**: None (uses current session) 517 + 518 + **Output**: 519 + ```typescript 520 + { 521 + valid: boolean, 522 + invariants: { 523 + single_active_plan: boolean, 524 + queue_order_preserved: boolean, 525 + files_match_database: boolean, 526 + no_orphaned_files: boolean 527 + }, 528 + issues: string[] 529 + } 530 + ``` 531 + 532 + **Use Case**: Verify session consistency during debugging or validation testing. 533 + 534 + --- 535 + 536 + #### `9plan_admin_sessions` 537 + 538 + Lists all sessions on the system. 539 + 540 + **Input**: None 541 + 542 + **Output**: 543 + ```typescript 544 + { 545 + sessions: Array<{ 546 + name: string, 547 + created: string, 548 + task_description: string | null, 549 + queue_length: number, 550 + completed_count: number, 551 + has_active_plan: boolean 552 + }> 553 + } 554 + ``` 555 + 556 + **Use Case**: Find existing sessions, clean up old ones, verify session creation. 557 + 558 + --- 559 + 560 + #### `9plan_admin_state` 561 + 562 + Returns detailed state dump for current session. 563 + 564 + **Input**: None (uses current session) 565 + 566 + **Output**: 567 + ```typescript 568 + { 569 + session_name: string, 570 + task_description: string | null, 571 + queue: Array<{ id: string, goal: string, position: number }>, 572 + active_plan: { id: string, goal: string, file_path: string } | null, 573 + completed_count: number, 574 + plan_files: string[], 575 + file_db_match: boolean 576 + } 577 + ``` 578 + 579 + **Use Case**: Detailed inspection during validation or debugging. 580 + 581 + --- 582 + 505 583 ## 7. Prompt Specifications 506 584 507 585 ### `bootstrap` ··· 955 1033 - [ ] FTS5 query implementation 956 1034 - [ ] Unit tests for search 957 1035 958 - ### Phase 6: Polish 1036 + ### Phase 6: Admin Tools 1037 + 1038 + - [ ] `9plan_admin_validate` 1039 + - [ ] `9plan_admin_sessions` 1040 + - [ ] `9plan_admin_state` 1041 + - [ ] Admin tool unit tests 1042 + 1043 + ### Phase 7: Polish 959 1044 960 1045 - [ ] `bootstrap` prompt 961 1046 - [ ] Server instructions 962 1047 - [ ] Error messages 963 1048 - [ ] Integration tests 964 1049 - [ ] Documentation 1050 + 1051 + ### Phase 8: Agent Validation Infrastructure 1052 + 1053 + - [ ] `validation/README.md` - Human overview 1054 + - [ ] `validation/outer-claude-guide.md` - Instructions for validation agent 1055 + - [ ] `validation/scenarios/*.yaml` - Validation scenario files 1056 + - [ ] `validation/sandbox/notekeeper/` - Test project scaffold 1057 + - [ ] `.mcp.json` - MCP configuration for testing 965 1058 966 1059 --- 967 1060

+25 -1

src/tools/queue-pull.ts

··· 25 25 const plan = store.pullPlan(); 26 26 const planPath = store.getSessionPath() + `/plans/${plan.id}.txt`; 27 27 28 + // Format plan content for display 29 + const planContent = `## Context 30 + ${plan.context ?? "(none)"} 31 + 32 + ## Goal 33 + ${plan.goal} 34 + 35 + ## Inputs 36 + ${plan.inputs ?? "(none)"} 37 + 38 + ## Outputs 39 + ${plan.outputs ?? "(none)"} 40 + 41 + ## Approach 42 + ${plan.approach ?? "(none)"} 43 + 44 + ## Success Criteria 45 + ${plan.successCriteria ?? "(none)"} 46 + 47 + ## Notes 48 + ${plan.notes ?? "(none)"}`; 49 + 28 50 const response = formatResponse( 29 51 sessionName, 30 52 `Active plan: ${plan.id} 31 53 Path: ${planPath} 32 54 33 - Read the plan file for full context. Review for any ambiguities before starting execution. 55 + ${planContent} 34 56 57 + --- 58 + Review for any ambiguities before starting execution. 35 59 If the plan has inputs from other plans, use 9plan_history_search to find their outputs. 36 60 If the plan's Notes indicate it was previously decomposed, use 9plan_history_get to retrieve child outcomes.`, 37 61 );

+137

validation/README.md

··· 1 + # 9plan Agent Validation System 2 + 3 + This directory contains infrastructure for validating the 9plan MCP server using the **"Claude sandwich"** pattern - an outer AI agent controls an inner AI agent to test the server in realistic conditions. 4 + 5 + ## Overview 6 + 7 + The validation system tests 9plan by having an outer Claude: 8 + 1. Spawn an inner Claude that thinks it's just building an app 9 + 2. Guide the inner Claude step-by-step through building "Notekeeper" (a test project) 10 + 3. Verify the inner Claude correctly uses 9plan tools 11 + 4. Check invariants and state between steps 12 + 13 + This provides more realistic testing than unit tests because the inner agent behaves naturally, not knowing it's being tested. 14 + 15 + ## Directory Structure 16 + 17 + ``` 18 + validation/ 19 + ├── README.md # This file 20 + ├── outer-claude-guide.md # Instructions for the outer Claude 21 + ├── sandbox/ 22 + │ └── notekeeper/ # Test project for inner Claude to build 23 + │ ├── package.json # Pre-configured 24 + │ ├── tsconfig.json # Pre-configured 25 + │ └── src/ # Empty - inner Claude populates this 26 + └── scenarios/ 27 + ├── schema.md # Scenario file format documentation 28 + ├── notekeeper-full.yaml # Complete Notekeeper build scenario 29 + ├── decomposition-test.yaml # Test plan decomposition workflow 30 + ├── error-conditions.yaml # Test error handling 31 + ├── dependency-resolution.yaml # Test semantic search for dependencies 32 + └── session-recovery.yaml # Test session resume after context loss 33 + ``` 34 + 35 + ## How It Works 36 + 37 + ### The "Claude Sandwich" Pattern 38 + 39 + ``` 40 + ┌─────────────────────────────────────────────────────┐ 41 + │ OUTER CLAUDE (the puppetmaster) │ 42 + │ │ │ 43 + │ ├─→ Reads outer-claude-guide.md │ 44 + │ ├─→ Loads scenario file (e.g., notekeeper-full) │ 45 + │ │ │ 46 + │ ├─→ Runs: claude -p "Create session" --json │ 47 + │ │ └─→ INNER CLAUDE creates 9plan session │ 48 + │ │ │ 49 + │ ├─→ Validates state (admin tools, filesystem) │ 50 + │ │ │ 51 + │ ├─→ Runs: claude -p "Add plans" --resume $sid │ 52 + │ │ └─→ INNER CLAUDE adds plans to queue │ 53 + │ │ │ 54 + │ ├─→ Validates state... │ 55 + │ │ │ 56 + │ └─→ Continues until scenario complete │ 57 + └─────────────────────────────────────────────────────┘ 58 + ``` 59 + 60 + ### Key Components 61 + 62 + 1. **Outer Claude** - Reads scenarios, runs commands, validates state 63 + 2. **Inner Claude** - Builds the app using 9plan (doesn't know it's a test) 64 + 3. **Notekeeper** - Simple CLI app used as test project 65 + 4. **Scenarios** - YAML files describing what to test and expected outcomes 66 + 5. **Admin Tools** - `9plan_admin_*` tools for state verification 67 + 68 + ## Running Validation 69 + 70 + ### Prerequisites 71 + 72 + 1. 9plan MCP server is built (`npm run build`) 73 + 2. `.mcp.json` is configured in project root 74 + 3. `claude` CLI is available 75 + 76 + ### Running a Scenario 77 + 78 + Ask your outer Claude (the one you're talking to): 79 + 80 + ``` 81 + Please run the 9plan validation scenario at validation/scenarios/notekeeper-full.yaml 82 + ``` 83 + 84 + The outer Claude will: 85 + 1. Read `outer-claude-guide.md` for instructions 86 + 2. Parse the scenario file 87 + 3. Execute step-by-step, spawning inner Claude instances 88 + 4. Validate state between steps 89 + 5. Report pass/fail with details 90 + 91 + ### Manual Testing 92 + 93 + You can also run individual steps manually. **Important**: You must include `--mcp-config` and `--allowedTools` flags: 94 + 95 + ```powershell 96 + # Define allowed tools 97 + $ALLOWED = "Read,Write,mcp__9plan__9plan_session_create,mcp__9plan__9plan_session_resume,mcp__9plan__9plan_queue_add,mcp__9plan__9plan_queue_pull,mcp__9plan__9plan_plan_complete,mcp__9plan__9plan_plan_defer,mcp__9plan__9plan_plan_discard,mcp__9plan__9plan_history_search,mcp__9plan__9plan_history_get" 98 + 99 + # Start a session 100 + $r = claude -p "Create a 9plan session for Notekeeper" --mcp-config ".mcp.json" --allowedTools $ALLOWED --output-format json 101 + $sid = ($r | ConvertFrom-Json).session_id 102 + 103 + # Continue with more prompts (using --resume)... 104 + claude -p "Add a plan for the storage module" --resume $sid --mcp-config ".mcp.json" --allowedTools $ALLOWED 105 + ``` 106 + 107 + **Note on `--allowedTools`**: MCP tools use the format `mcp__<server>__<toolname>`. The inner Claude cannot prompt for permissions in non-interactive mode, so all needed tools must be pre-approved. 108 + 109 + ## Scenarios 110 + 111 + | Scenario | Description | 112 + |----------|-------------| 113 + | `notekeeper-full` | Build complete Notekeeper app from scratch | 114 + | `decomposition-test` | Test plan decomposition and parent aggregation | 115 + | `error-conditions` | Verify error handling (pull with active, etc.) | 116 + | `dependency-resolution` | Test semantic search for input dependencies | 117 + | `session-recovery` | Test resuming session after context loss | 118 + 119 + ## Admin Tools 120 + 121 + These tools help the outer Claude verify state between steps: 122 + 123 + | Tool | Purpose | 124 + |------|---------| 125 + | `9plan_admin_validate` | Check all invariants hold | 126 + | `9plan_admin_sessions` | List all sessions | 127 + | `9plan_admin_state` | Dump detailed session state | 128 + 129 + ## Success Criteria 130 + 131 + Validation passes when: 132 + - [ ] All scenarios execute without errors 133 + - [ ] Invariants hold at every step (checked via admin tools) 134 + - [ ] Inner Claude successfully builds Notekeeper 135 + - [ ] Notekeeper CLI works (`add`, `list`, `search`, `delete` commands) 136 + - [ ] History search returns expected results 137 + - [ ] Decomposition/aggregation workflow completes correctly

+229

validation/outer-claude-guide.md

··· 1 + # Outer Claude Validation Guide 2 + 3 + You are the **outer Claude** - the puppetmaster who will control an inner Claude to validate the 9plan MCP server. This guide explains how to run validation scenarios. 4 + 5 + ## Pre-Validation Checklist 6 + 7 + Before starting, verify: 8 + 9 + - [ ] 9plan MCP server is built: Run `npm run build` in project root 10 + - [ ] `.mcp.json` exists in project root with 9plan configured 11 + - [ ] The `claude` CLI is available and authenticated 12 + - [ ] The sandbox project exists at `validation/sandbox/notekeeper/` 13 + 14 + ## Critical: CLI Flags for Inner Claude 15 + 16 + Every `claude -p` command MUST include these flags: 17 + 18 + ### `--mcp-config ".mcp.json"` 19 + Loads the 9plan MCP server configuration so the inner Claude can use 9plan tools. 20 + 21 + ### `--allowedTools "..."` 22 + Pre-approves tools for non-interactive use. The inner Claude cannot prompt for permissions in `-p` mode, so you must pre-approve all tools it will need. 23 + 24 + **Tool naming format**: MCP tools use the pattern `mcp__<server>__<tool>`. For 9plan, this is `mcp__9plan__<toolname>`. 25 + 26 + **Standard allowedTools for full validation:** 27 + ``` 28 + --allowedTools "Read,Write,mcp__9plan__9plan_session_create,mcp__9plan__9plan_session_resume,mcp__9plan__9plan_queue_add,mcp__9plan__9plan_queue_pull,mcp__9plan__9plan_plan_complete,mcp__9plan__9plan_plan_defer,mcp__9plan__9plan_plan_discard,mcp__9plan__9plan_history_search,mcp__9plan__9plan_history_get" 29 + ``` 30 + 31 + **Why `Read,Write`?** The inner Claude needs file access to read plan files and create implementation files. 32 + 33 + ## How to Run Validation 34 + 35 + ### Step 1: Load the Scenario 36 + 37 + Read the scenario file you want to run (e.g., `validation/scenarios/notekeeper-full.yaml`). This contains: 38 + - `task_description`: What to tell the inner Claude 39 + - `steps`: Sequence of prompts and expected outcomes 40 + - `verification`: Commands to run at the end 41 + 42 + ### Step 2: Start Inner Claude Session 43 + 44 + Run the first prompt to create a session: 45 + 46 + ```powershell 47 + $ALLOWED = "Read,Write,mcp__9plan__9plan_session_create,mcp__9plan__9plan_session_resume,mcp__9plan__9plan_queue_add,mcp__9plan__9plan_queue_pull,mcp__9plan__9plan_plan_complete,mcp__9plan__9plan_plan_defer,mcp__9plan__9plan_plan_discard,mcp__9plan__9plan_history_search,mcp__9plan__9plan_history_get" 48 + 49 + $result = claude -p "<first prompt from scenario>" --mcp-config ".mcp.json" --allowedTools $ALLOWED --output-format json 50 + $sessionId = ($result | ConvertFrom-Json).session_id 51 + ``` 52 + 53 + Save the `session_id` - you'll need it for `--resume` in subsequent steps. Also note: defining `$ALLOWED` once makes subsequent commands cleaner. 54 + 55 + ### Step 3: Validate State 56 + 57 + Between each step, check that the inner Claude did the right thing: 58 + 59 + 1. **Check filesystem** - Use `ls`, `cat`, or Read tool to verify files were created/modified 60 + 2. **Use admin tools** - Call `9plan_admin_state` to see queue/active plan state 61 + 3. **Check for errors** - Look for error messages in the inner Claude's response 62 + 63 + ### Step 4: Send Next Prompt 64 + 65 + Continue the conversation with `--resume`: 66 + 67 + ```powershell 68 + $result = claude -p "<next prompt from scenario>" --resume $sessionId --mcp-config ".mcp.json" --allowedTools $ALLOWED --output-format json 69 + ``` 70 + 71 + ### Step 5: Repeat Until Done 72 + 73 + Continue steps 3-4 until all prompts in the scenario are complete. 74 + 75 + ### Step 6: Final Verification 76 + 77 + Run the verification commands from the scenario: 78 + 79 + ```powershell 80 + # Example: Test the Notekeeper CLI 81 + cd validation/sandbox/notekeeper 82 + npm run build 83 + node dist/index.js add "Test note" 84 + node dist/index.js list 85 + ``` 86 + 87 + ## Handling Failures 88 + 89 + ### Inner Claude Makes Mistake 90 + 91 + If the inner Claude does something unexpected: 92 + 1. Note what went wrong 93 + 2. Decide if this is a 9plan bug or expected agent behavior 94 + 3. You may need to guide the inner Claude with a corrective prompt 95 + 96 + ### State Verification Fails 97 + 98 + If `9plan_admin_validate` returns issues: 99 + 1. This is likely a 9plan bug 100 + 2. Document the exact state and what went wrong 101 + 3. Report the failure 102 + 103 + ### Inner Claude Gets Stuck 104 + 105 + If the inner Claude seems confused or stuck: 106 + 1. Try a more specific prompt 107 + 2. Check if the scenario description is unclear 108 + 3. You may need to provide hints 109 + 110 + ## Prompting the Inner Claude 111 + 112 + ### Good Prompts 113 + 114 + - Be specific about what you want done 115 + - Reference 9plan tools naturally (e.g., "create a session", "add a plan") 116 + - Don't mention that this is a test 117 + 118 + ### Bad Prompts 119 + 120 + - "Test the 9plan server" (reveals it's a test) 121 + - "Use 9plan_session_create" (too prescriptive about tool names) 122 + - Vague instructions that could be interpreted multiple ways 123 + 124 + ## Example Validation Flow 125 + 126 + ```powershell 127 + # Define allowed tools once 128 + $ALLOWED = "Read,Write,mcp__9plan__9plan_session_create,mcp__9plan__9plan_session_resume,mcp__9plan__9plan_queue_add,mcp__9plan__9plan_queue_pull,mcp__9plan__9plan_plan_complete,mcp__9plan__9plan_plan_defer,mcp__9plan__9plan_plan_discard,mcp__9plan__9plan_history_search,mcp__9plan__9plan_history_get" 129 + 130 + # Step 1: Create session 131 + $r1 = claude -p "I want to build a simple CLI note-taking app called Notekeeper. Start by creating a 9plan session to track this work." --mcp-config ".mcp.json" --allowedTools $ALLOWED --output-format json 132 + $sid = ($r1 | ConvertFrom-Json).session_id 133 + 134 + # Validate: Session should be created 135 + # - Check $env:LOCALAPPDATA/9plan/Data/sessions/ for new directory 136 + # - The inner Claude should tell you the session name 137 + 138 + # Step 2: Add plans 139 + $r2 = claude -p "Now let's plan out the work. Add plans for: 1) the storage module, 2) add command, 3) list command, 4) search command, 5) delete command. Add them in the order they should be executed." --resume $sid --mcp-config ".mcp.json" --allowedTools $ALLOWED --output-format json 140 + 141 + # Validate: Plans should be in queue 142 + # - Check plans/ directory in the session folder 143 + # - Verify plan files exist (e.g., k7f3m.txt) 144 + 145 + # Step 3: Execute first plan 146 + $r3 = claude -p "Pull the first plan and implement the storage module in validation/sandbox/notekeeper/src/storage.ts" --resume $sid --mcp-config ".mcp.json" --allowedTools $ALLOWED --output-format json 147 + 148 + # Validate: Storage module created 149 + # - Check validation/sandbox/notekeeper/src/storage.ts exists 150 + # - Check plan file was deleted from plans/ 151 + # - History should now contain the completed plan 152 + 153 + # Continue with remaining plans... 154 + ``` 155 + 156 + ## Using Admin Tools 157 + 158 + Between steps, use these tools to verify state: 159 + 160 + ### 9plan_admin_validate 161 + 162 + Returns whether all invariants hold: 163 + ```json 164 + { 165 + "valid": true, 166 + "invariants": { 167 + "single_active_plan": true, 168 + "queue_order_preserved": true, 169 + "files_match_database": true 170 + }, 171 + "issues": [] 172 + } 173 + ``` 174 + 175 + ### 9plan_admin_state 176 + 177 + Returns detailed state dump: 178 + ```json 179 + { 180 + "session_name": "copper-velvet-morning", 181 + "queue": [ 182 + {"id": "k7f3m", "goal": "Create storage module", "position": 1} 183 + ], 184 + "active_plan": null, 185 + "completed_count": 0, 186 + "plan_files": ["k7f3m.txt"] 187 + } 188 + ``` 189 + 190 + ### 9plan_admin_sessions 191 + 192 + Lists all sessions: 193 + ```json 194 + { 195 + "sessions": [ 196 + {"name": "copper-velvet-morning", "created": "2024-01-15", "plans": 5} 197 + ] 198 + } 199 + ``` 200 + 201 + ## Success Criteria 202 + 203 + A scenario passes when: 204 + 205 + 1. **All prompts execute** - Inner Claude responds to each step 206 + 2. **State is valid at each step** - `9plan_admin_validate` returns no issues 207 + 3. **Expected outcomes match** - Files created, plans completed, etc. 208 + 4. **Verification passes** - Final commands work as expected 209 + 210 + ## Reporting Results 211 + 212 + After running a scenario, report: 213 + 214 + ``` 215 + ## Validation Report: [scenario-name] 216 + 217 + **Status**: PASS / FAIL 218 + 219 + **Steps Completed**: X/Y 220 + 221 + **Issues Found**: 222 + - (list any problems) 223 + 224 + **Inner Claude Behavior**: 225 + - (notes on how the inner Claude performed) 226 + 227 + **9plan Bugs Found**: 228 + - (list any bugs in the MCP server) 229 + ```

+24

validation/sandbox/notekeeper/package.json

··· 1 + { 2 + "name": "notekeeper", 3 + "version": "1.0.0", 4 + "description": "A simple CLI note-taking app for 9plan validation testing", 5 + "type": "module", 6 + "main": "dist/index.js", 7 + "scripts": { 8 + "build": "tsc", 9 + "start": "node dist/index.js", 10 + "clean": "rimraf dist" 11 + }, 12 + "keywords": [ 13 + "cli", 14 + "notes", 15 + "validation" 16 + ], 17 + "author": "", 18 + "license": "MIT", 19 + "devDependencies": { 20 + "typescript": "^5.0.0", 21 + "rimraf": "^5.0.0", 22 + "@types/node": "^20.0.0" 23 + } 24 + }

+63

validation/sandbox/notekeeper/src/storage.ts

··· 1 + /** 2 + * Storage module for notekeeper CLI app 3 + * Provides save/load functions for persisting notes data to disk 4 + */ 5 + 6 + import { readFile, writeFile, mkdir } from 'node:fs/promises'; 7 + import { dirname } from 'node:path'; 8 + 9 + /** The shape of our stored data */ 10 + export interface NotesData { 11 + notes: Array<{ 12 + id: string; 13 + content: string; 14 + createdAt: string; 15 + }>; 16 + } 17 + 18 + /** Default empty state when no data file exists */ 19 + const DEFAULT_DATA: NotesData = { 20 + notes: [], 21 + }; 22 + 23 + /** Default storage file path */ 24 + const DEFAULT_STORAGE_PATH = './data/notes.json'; 25 + 26 + /** 27 + * Load notes data from disk 28 + * Returns default empty state if file doesn't exist 29 + * 30 + * @param filePath - Path to the storage file (defaults to ./data/notes.json) 31 + * @returns The loaded notes data, or default empty state if file not found 32 + */ 33 + export async function load(filePath: string = DEFAULT_STORAGE_PATH): Promise<NotesData> { 34 + try { 35 + const content = await readFile(filePath, 'utf-8'); 36 + const data = JSON.parse(content) as NotesData; 37 + return data; 38 + } catch (error) { 39 + // Handle file not found gracefully - return default state 40 + if (error instanceof Error && 'code' in error && error.code === 'ENOENT') { 41 + return { ...DEFAULT_DATA, notes: [] }; 42 + } 43 + // Re-throw other errors (parse errors, permission errors, etc.) 44 + throw error; 45 + } 46 + } 47 + 48 + /** 49 + * Save notes data to disk 50 + * Creates parent directories if they don't exist 51 + * 52 + * @param data - The notes data to save 53 + * @param filePath - Path to the storage file (defaults to ./data/notes.json) 54 + */ 55 + export async function save(data: NotesData, filePath: string = DEFAULT_STORAGE_PATH): Promise<void> { 56 + // Ensure parent directory exists 57 + const dir = dirname(filePath); 58 + await mkdir(dir, { recursive: true }); 59 + 60 + // Write data with pretty formatting for readability 61 + const content = JSON.stringify(data, null, 2); 62 + await writeFile(filePath, content, 'utf-8'); 63 + }

+20

validation/sandbox/notekeeper/tsconfig.json

··· 1 + { 2 + "compilerOptions": { 3 + "target": "ES2022", 4 + "module": "NodeNext", 5 + "moduleResolution": "NodeNext", 6 + "lib": ["ES2022"], 7 + "outDir": "./dist", 8 + "rootDir": "./src", 9 + "strict": true, 10 + "esModuleInterop": true, 11 + "skipLibCheck": true, 12 + "forceConsistentCasingInFileNames": true, 13 + "resolveJsonModule": true, 14 + "declaration": true, 15 + "declarationMap": true, 16 + "sourceMap": true 17 + }, 18 + "include": ["src/**/*"], 19 + "exclude": ["node_modules", "dist"] 20 + }

+152

validation/scenarios/decomposition-test.yaml

··· 1 + schema_version: "1.0" 2 + scenario_id: decomposition-test 3 + description: | 4 + Test plan decomposition and parent aggregation workflow. 5 + Verifies that parent plans can be deferred, child plans executed, 6 + and parent plans re-pulled for aggregation. 7 + 8 + prerequisites: 9 + 9plan_built: true 10 + 11 + steps: 12 + # Step 1: Create session 13 + - id: create_session 14 + description: "Create a session for decomposition testing" 15 + prompt: | 16 + Create a 9plan session to test building a small utility library. 17 + 18 + expected: 19 + tools_called: 20 + - 9plan_session_create 21 + state: 22 + session_exists: true 23 + 24 + # Step 2: Add parent plan that's too big 25 + - id: add_parent_plan 26 + description: "Add a plan that needs decomposition" 27 + prompt: | 28 + Add a plan for "Build complete utility library" that includes: 29 + - String utilities (capitalize, truncate, etc.) 30 + - Array utilities (unique, flatten, etc.) 31 + - Date utilities (format, parse, etc.) 32 + 33 + This is a big plan that covers multiple areas. 34 + 35 + expected: 36 + tools_called: 37 + - 9plan_queue_add 38 + state: 39 + queue_length: 1 40 + 41 + # Step 3: Pull and recognize need for decomposition 42 + - id: pull_and_decompose 43 + description: "Pull the plan and decompose it" 44 + prompt: | 45 + Pull the plan. This is too big to do in one go - let's break it down. 46 + 47 + Add three sub-plans: 48 + 1. String utilities module 49 + 2. Array utilities module 50 + 3. Date utilities module 51 + 52 + Then defer the parent plan to the back of the queue so we can 53 + aggregate the results later. 54 + 55 + expected: 56 + tools_called: 57 + - 9plan_queue_pull 58 + - 9plan_queue_add # Adding subplans 59 + - 9plan_plan_defer 60 + state: 61 + # After: 3 subplans at front, parent at back = 4 total 62 + queue_length: 4 63 + 64 + validation: 65 + admin_validate: true 66 + # Verify parent was deferred with reason 67 + custom_check: | 68 + # The parent plan's notes should mention decomposition 69 + 70 + # Step 4: Execute first subplan 71 + - id: execute_subplan_1 72 + description: "Execute string utilities subplan" 73 + prompt: | 74 + Pull and complete the string utilities plan. 75 + Just describe what would be in it - no need to actually implement. 76 + 77 + expected: 78 + tools_called: 79 + - 9plan_queue_pull 80 + - 9plan_plan_complete 81 + state: 82 + queue_length: 3 83 + completed_count: 1 84 + 85 + # Step 5: Execute second subplan 86 + - id: execute_subplan_2 87 + description: "Execute array utilities subplan" 88 + prompt: | 89 + Pull and complete the array utilities plan. 90 + 91 + expected: 92 + tools_called: 93 + - 9plan_queue_pull 94 + - 9plan_plan_complete 95 + state: 96 + queue_length: 2 97 + completed_count: 2 98 + 99 + # Step 6: Execute third subplan 100 + - id: execute_subplan_3 101 + description: "Execute date utilities subplan" 102 + prompt: | 103 + Pull and complete the date utilities plan. 104 + 105 + expected: 106 + tools_called: 107 + - 9plan_queue_pull 108 + - 9plan_plan_complete 109 + state: 110 + queue_length: 1 # Only parent remains 111 + completed_count: 3 112 + 113 + # Step 7: Re-pull parent and aggregate 114 + - id: aggregate_parent 115 + description: "Re-pull parent plan and aggregate child outcomes" 116 + prompt: | 117 + Pull the remaining plan (the parent). 118 + Use 9plan_history_search or 9plan_history_get to find the child outcomes. 119 + Then complete the parent with an aggregated summary of all the work done. 120 + 121 + expected: 122 + tools_called: 123 + - 9plan_queue_pull 124 + - 9plan_history_search # or 9plan_history_get 125 + - 9plan_plan_complete 126 + state: 127 + queue_length: 0 128 + completed_count: 4 129 + queue_empty: true 130 + 131 + validation: 132 + admin_validate: true 133 + 134 + verification: 135 + history_searches: 136 + - query: "string utilities" 137 + min_results: 1 138 + 139 + - query: "array utilities" 140 + min_results: 1 141 + 142 + - query: "date utilities" 143 + min_results: 1 144 + 145 + - query: "utility library" 146 + min_results: 1 147 + description: "Should find the parent plan" 148 + 149 + final_state: 150 + queue_empty: true 151 + all_plans_completed: true 152 + completed_count: 4

+148

validation/scenarios/dependency-resolution.yaml

··· 1 + schema_version: "1.0" 2 + scenario_id: dependency-resolution 3 + description: | 4 + Test semantic search for resolving input dependencies between plans. 5 + Verifies that 9plan_history_search correctly finds completed plans 6 + that match input descriptions. 7 + 8 + prerequisites: 9 + 9plan_built: true 10 + 11 + steps: 12 + # Step 1: Create session 13 + - id: create_session 14 + description: "Create a session for dependency testing" 15 + prompt: | 16 + Create a 9plan session to build a simple API client library. 17 + 18 + expected: 19 + tools_called: 20 + - 9plan_session_create 21 + state: 22 + session_exists: true 23 + 24 + # Step 2: Add foundation plan 25 + - id: add_foundation 26 + description: "Add a plan that produces outputs other plans will need" 27 + prompt: | 28 + Add a plan for creating an HTTP client wrapper. 29 + The outputs should include: 30 + - httpClient module with get(), post(), put(), delete() methods 31 + - Error handling types (ApiError, NetworkError) 32 + 33 + expected: 34 + tools_called: 35 + - 9plan_queue_add 36 + state: 37 + queue_length: 1 38 + 39 + # Step 3: Add dependent plan 40 + - id: add_dependent 41 + description: "Add a plan that depends on the foundation" 42 + prompt: | 43 + Add a plan for creating user API methods. 44 + The inputs should reference the httpClient module from the previous plan. 45 + The outputs should include: 46 + - userApi module with getUser(), createUser(), updateUser() methods 47 + 48 + expected: 49 + tools_called: 50 + - 9plan_queue_add 51 + state: 52 + queue_length: 2 53 + 54 + # Step 4: Add another dependent plan 55 + - id: add_another_dependent 56 + description: "Add another plan that also depends on the foundation" 57 + prompt: | 58 + Add a plan for creating posts API methods. 59 + The inputs should also reference the httpClient module. 60 + The outputs should include: 61 + - postsApi module with getPosts(), createPost() methods 62 + 63 + expected: 64 + tools_called: 65 + - 9plan_queue_add 66 + state: 67 + queue_length: 3 68 + 69 + # Step 5: Execute foundation plan 70 + - id: execute_foundation 71 + description: "Pull and complete the HTTP client plan" 72 + prompt: | 73 + Pull the first plan (HTTP client) and complete it. 74 + Describe what was created in the outcome - mention the httpClient module, 75 + the get/post/put/delete methods, and the error types. 76 + 77 + expected: 78 + tools_called: 79 + - 9plan_queue_pull 80 + - 9plan_plan_complete 81 + state: 82 + queue_length: 2 83 + completed_count: 1 84 + 85 + # Step 6: Execute dependent plan with dependency resolution 86 + - id: execute_with_resolution 87 + description: "Pull user API plan and resolve its dependency" 88 + prompt: | 89 + Pull the next plan (user API). 90 + Before implementing, search the history for "httpClient module" to find 91 + where the HTTP client was created. Then complete the plan, mentioning 92 + that you found and used the httpClient from the previous work. 93 + 94 + expected: 95 + tools_called: 96 + - 9plan_queue_pull 97 + - 9plan_history_search 98 + - 9plan_plan_complete 99 + state: 100 + queue_length: 1 101 + completed_count: 2 102 + response_contains: 103 + - "httpClient" # Should mention finding the dependency 104 + 105 + validation: 106 + # Verify the search actually found the foundation plan 107 + custom_check: | 108 + # history_search for "httpClient" should return 1 result 109 + 110 + # Step 7: Execute final plan with same dependency 111 + - id: execute_final 112 + description: "Pull posts API plan and resolve same dependency" 113 + prompt: | 114 + Pull the final plan (posts API). 115 + Search history for the httpClient again and complete the plan. 116 + 117 + expected: 118 + tools_called: 119 + - 9plan_queue_pull 120 + - 9plan_history_search 121 + - 9plan_plan_complete 122 + state: 123 + queue_length: 0 124 + completed_count: 3 125 + 126 + verification: 127 + # Verify semantic search works for various queries 128 + history_searches: 129 + - query: "httpClient get post" 130 + min_results: 1 131 + description: "Should find HTTP client plan" 132 + 133 + - query: "user API getUser" 134 + min_results: 1 135 + description: "Should find user API plan" 136 + 137 + - query: "posts API createPost" 138 + min_results: 1 139 + description: "Should find posts API plan" 140 + 141 + - query: "ApiError NetworkError" 142 + min_results: 1 143 + description: "Should find HTTP client by error types" 144 + 145 + final_state: 146 + queue_empty: true 147 + all_plans_completed: true 148 + completed_count: 3

+138

validation/scenarios/error-conditions.yaml

··· 1 + schema_version: "1.0" 2 + scenario_id: error-conditions 3 + description: | 4 + Test error handling for invalid operations. 5 + Verifies that 9plan returns appropriate errors for invalid state transitions. 6 + 7 + prerequisites: 8 + 9plan_built: true 9 + 10 + steps: 11 + # Step 1: Create session 12 + - id: create_session 13 + description: "Create a session for error testing" 14 + prompt: | 15 + Create a 9plan session for testing error conditions. 16 + 17 + expected: 18 + tools_called: 19 + - 9plan_session_create 20 + state: 21 + session_exists: true 22 + 23 + # Step 2: Try to pull from empty queue 24 + - id: pull_empty_queue 25 + description: "Try to pull when queue is empty" 26 + prompt: | 27 + Try to pull a plan from the queue. 28 + The queue should be empty, so this should fail or indicate there's nothing to pull. 29 + 30 + expected: 31 + tools_called: 32 + - 9plan_queue_pull 33 + response_contains: 34 + - "empty" # Should mention queue is empty 35 + 36 + # Step 3: Try to complete without active plan 37 + - id: complete_no_active 38 + description: "Try to complete when no plan is active" 39 + prompt: | 40 + Try to complete a plan with outcome "test". 41 + There's no active plan, so this should fail. 42 + 43 + expected: 44 + tools_called: 45 + - 9plan_plan_complete 46 + response_contains: 47 + - "no active" # Should mention no active plan 48 + 49 + # Step 4: Try to defer without active plan 50 + - id: defer_no_active 51 + description: "Try to defer when no plan is active" 52 + prompt: | 53 + Try to defer a plan with reason "testing". 54 + There's no active plan, so this should fail. 55 + 56 + expected: 57 + tools_called: 58 + - 9plan_plan_defer 59 + response_contains: 60 + - "no active" 61 + 62 + # Step 5: Add a plan and pull it 63 + - id: setup_active_plan 64 + description: "Add and pull a plan to set up active state" 65 + prompt: | 66 + Add a simple test plan and then pull it so we have an active plan. 67 + 68 + expected: 69 + tools_called: 70 + - 9plan_queue_add 71 + - 9plan_queue_pull 72 + state: 73 + active_plan: true 74 + 75 + # Step 6: Try to pull again while plan is active 76 + - id: pull_while_active 77 + description: "Try to pull when a plan is already active" 78 + prompt: | 79 + Try to pull another plan. 80 + We already have an active plan, so this should fail. 81 + 82 + expected: 83 + tools_called: 84 + - 9plan_queue_pull 85 + response_contains: 86 + - "already active" # Should mention plan already active 87 + 88 + # Step 7: Clean up - complete the active plan 89 + - id: cleanup 90 + description: "Complete the active plan to clean up" 91 + prompt: | 92 + Complete the active plan with outcome "test completed". 93 + 94 + expected: 95 + tools_called: 96 + - 9plan_plan_complete 97 + state: 98 + active_plan: false 99 + completed_count: 1 100 + 101 + # Step 8: Try to get non-existent plan from history 102 + - id: history_get_invalid 103 + description: "Try to get a plan that doesn't exist" 104 + prompt: | 105 + Try to get plan details for a non-existent plan ID like "xxxxx". 106 + 107 + expected: 108 + tools_called: 109 + - 9plan_history_get 110 + response_contains: 111 + - "not found" # Should indicate plan not found 112 + 113 + # Step 9: Try to resume non-existent session 114 + - id: resume_invalid_session 115 + description: "Try to resume a session that doesn't exist" 116 + prompt: | 117 + Try to resume a session called "nonexistent-fake-session". 118 + 119 + expected: 120 + tools_called: 121 + - 9plan_session_resume 122 + response_contains: 123 + - "not found" # Should indicate session not found 124 + 125 + verification: 126 + final_state: 127 + # After all tests, should have 1 completed plan 128 + completed_count: 1 129 + queue_empty: true 130 + 131 + # All error conditions should have been tested 132 + error_tests_passed: 133 + - pull_empty_queue 134 + - complete_no_active 135 + - defer_no_active 136 + - pull_while_active 137 + - history_get_invalid 138 + - resume_invalid_session

+247

validation/scenarios/notekeeper-full.yaml

··· 1 + schema_version: "1.0" 2 + scenario_id: notekeeper-full 3 + description: | 4 + Complete Notekeeper CLI application build from scratch. 5 + Tests the full 9plan workflow: session creation, planning, execution, and completion. 6 + 7 + prerequisites: 8 + 9plan_built: true 9 + sandbox_clean: true # validation/sandbox/notekeeper/src/ should be empty 10 + 11 + # The overall task the inner Claude is working on 12 + task_description: | 13 + Build a simple CLI note-taking application called Notekeeper. 14 + It should support: add, list, search, and delete commands. 15 + Notes are stored in a JSON file. 16 + 17 + steps: 18 + # Step 1: Create the session 19 + - id: create_session 20 + description: "Create a 9plan session for the Notekeeper project" 21 + prompt: | 22 + I want to build a simple CLI note-taking app called Notekeeper. 23 + It should let users add notes, list all notes, search notes, and delete notes. 24 + Notes will be stored in a JSON file. 25 + 26 + Start by creating a 9plan session to track this work. 27 + 28 + expected: 29 + tools_called: 30 + - 9plan_session_create 31 + state: 32 + session_exists: true 33 + queue_length: 0 34 + response_contains: 35 + - "Session" 36 + 37 + validation: 38 + admin_validate: true 39 + 40 + # Step 2: Bootstrap plans 41 + - id: bootstrap_plans 42 + description: "Add initial plans for all components" 43 + prompt: | 44 + Now let's plan out the work. We need: 45 + 1. A Note type definition and storage module (read/write JSON) 46 + 2. An "add" command to create new notes 47 + 3. A "list" command to show all notes 48 + 4. A "search" command to find notes by keyword 49 + 5. A "delete" command to remove notes 50 + 6. A CLI entry point that routes to the right command 51 + 52 + Add these as plans in the order they should be executed. 53 + The storage module should be first since other commands depend on it. 54 + 55 + expected: 56 + tools_called: 57 + - 9plan_queue_add 58 + state: 59 + queue_length: 6 # 6 plans: storage, add, list, search, delete, CLI 60 + files: 61 + - pattern: "~/.9plan/sessions/*/plans/*.txt" 62 + count: 6 63 + 64 + validation: 65 + admin_validate: true 66 + 67 + # Step 3: Execute storage module 68 + - id: execute_storage 69 + description: "Pull and implement the storage module" 70 + prompt: | 71 + Pull the first plan and implement the storage module. 72 + Create the files in validation/sandbox/notekeeper/src/. 73 + 74 + The Note type should have: id (string), content (string), createdAt (string). 75 + The storage module should export: loadNotes(), saveNotes(), generateId(). 76 + 77 + expected: 78 + tools_called: 79 + - 9plan_queue_pull 80 + - 9plan_plan_complete 81 + state: 82 + queue_length: 5 83 + completed_count: 1 84 + files: 85 + - pattern: "validation/sandbox/notekeeper/src/types.ts" 86 + exists: true 87 + contains: 88 + - "interface Note" 89 + - "id" 90 + - "content" 91 + - pattern: "validation/sandbox/notekeeper/src/storage.ts" 92 + exists: true 93 + contains: 94 + - "loadNotes" 95 + - "saveNotes" 96 + 97 + validation: 98 + admin_validate: true 99 + 100 + # Step 4: Execute add command 101 + - id: execute_add 102 + description: "Pull and implement the add command" 103 + prompt: | 104 + Pull the next plan and implement the add command. 105 + It should take a note content string and save a new note. 106 + 107 + expected: 108 + tools_called: 109 + - 9plan_queue_pull 110 + - 9plan_plan_complete 111 + state: 112 + queue_length: 4 113 + completed_count: 2 114 + files: 115 + - pattern: "validation/sandbox/notekeeper/src/commands/add.ts" 116 + exists: true 117 + 118 + validation: 119 + admin_validate: true 120 + 121 + # Step 5: Execute list command 122 + - id: execute_list 123 + description: "Pull and implement the list command" 124 + prompt: | 125 + Pull the next plan and implement the list command. 126 + It should display all notes with their IDs and content. 127 + 128 + expected: 129 + tools_called: 130 + - 9plan_queue_pull 131 + - 9plan_plan_complete 132 + state: 133 + queue_length: 3 134 + completed_count: 3 135 + files: 136 + - pattern: "validation/sandbox/notekeeper/src/commands/list.ts" 137 + exists: true 138 + 139 + validation: 140 + admin_validate: true 141 + 142 + # Step 6: Execute search command 143 + - id: execute_search 144 + description: "Pull and implement the search command" 145 + prompt: | 146 + Pull the next plan and implement the search command. 147 + It should search note content for a keyword and show matching notes. 148 + 149 + expected: 150 + tools_called: 151 + - 9plan_queue_pull 152 + - 9plan_plan_complete 153 + state: 154 + queue_length: 2 155 + completed_count: 4 156 + files: 157 + - pattern: "validation/sandbox/notekeeper/src/commands/search.ts" 158 + exists: true 159 + 160 + validation: 161 + admin_validate: true 162 + 163 + # Step 7: Execute delete command 164 + - id: execute_delete 165 + description: "Pull and implement the delete command" 166 + prompt: | 167 + Pull the next plan and implement the delete command. 168 + It should delete a note by ID. 169 + 170 + expected: 171 + tools_called: 172 + - 9plan_queue_pull 173 + - 9plan_plan_complete 174 + state: 175 + queue_length: 1 176 + completed_count: 5 177 + files: 178 + - pattern: "validation/sandbox/notekeeper/src/commands/delete.ts" 179 + exists: true 180 + 181 + validation: 182 + admin_validate: true 183 + 184 + # Step 8: Execute CLI entry point 185 + - id: execute_cli 186 + description: "Pull and implement the CLI entry point" 187 + prompt: | 188 + Pull the final plan and implement the CLI entry point. 189 + It should parse command line arguments and route to the right command. 190 + 191 + Usage: notekeeper <command> [args] 192 + Commands: add <content>, list, search <keyword>, delete <id> 193 + 194 + expected: 195 + tools_called: 196 + - 9plan_queue_pull 197 + - 9plan_plan_complete 198 + state: 199 + queue_length: 0 200 + completed_count: 6 201 + queue_empty: true 202 + files: 203 + - pattern: "validation/sandbox/notekeeper/src/index.ts" 204 + exists: true 205 + 206 + validation: 207 + admin_validate: true 208 + 209 + # Final verification after all steps 210 + verification: 211 + # Build the project 212 + commands: 213 + - command: "cd validation/sandbox/notekeeper && npm install" 214 + description: "Install dependencies" 215 + success: true 216 + 217 + - command: "cd validation/sandbox/notekeeper && npm run build" 218 + description: "Build TypeScript" 219 + success: true 220 + 221 + - command: "node validation/sandbox/notekeeper/dist/index.js add \"Test note from validation\"" 222 + description: "Test add command" 223 + output_contains: "added" 224 + 225 + - command: "node validation/sandbox/notekeeper/dist/index.js list" 226 + description: "Test list command" 227 + output_contains: "Test note from validation" 228 + 229 + - command: "node validation/sandbox/notekeeper/dist/index.js search test" 230 + description: "Test search command" 231 + output_contains: "Test note from validation" 232 + 233 + # Verify history search works 234 + history_searches: 235 + - query: "storage module loadNotes" 236 + min_results: 1 237 + description: "Should find the storage module plan" 238 + 239 + - query: "add command" 240 + min_results: 1 241 + description: "Should find the add command plan" 242 + 243 + # Final state check 244 + final_state: 245 + queue_empty: true 246 + all_plans_completed: true 247 + completed_count: 6

+232

validation/scenarios/schema.md

··· 1 + # Validation Scenario File Schema 2 + 3 + This document describes the YAML format for validation scenario files. 4 + 5 + ## Overview 6 + 7 + Scenario files define a sequence of steps for the outer Claude to execute against an inner Claude, along with expected outcomes and verification commands. 8 + 9 + ## Schema Version 10 + 11 + All scenario files must specify the schema version: 12 + 13 + ```yaml 14 + schema_version: "1.0" 15 + ``` 16 + 17 + ## Top-Level Fields 18 + 19 + | Field | Type | Required | Description | 20 + |-------|------|----------|-------------| 21 + | `schema_version` | string | Yes | Schema version (currently "1.0") | 22 + | `scenario_id` | string | Yes | Unique identifier for this scenario | 23 + | `description` | string | Yes | Human-readable description | 24 + | `prerequisites` | object | No | What must be true before running | 25 + | `steps` | array | Yes | Sequence of prompts and validations | 26 + | `verification` | object | No | Final verification commands | 27 + 28 + ## Prerequisites 29 + 30 + Optional conditions that must be met before running: 31 + 32 + ```yaml 33 + prerequisites: 34 + 9plan_built: true # npm run build completed 35 + sandbox_clean: true # sandbox/notekeeper/src/ is empty 36 + no_existing_sessions: true # no 9plan sessions exist 37 + ``` 38 + 39 + ## Steps 40 + 41 + Each step contains a prompt to send and expected outcomes: 42 + 43 + ```yaml 44 + steps: 45 + - id: create_session 46 + description: "Create a 9plan session for the project" 47 + prompt: | 48 + I want to build a simple CLI note-taking app called Notekeeper. 49 + Start by creating a 9plan session to track this work. 50 + 51 + expected: 52 + # What the inner Claude should do 53 + tools_called: 54 + - 9plan_session_create 55 + 56 + # State after this step 57 + state: 58 + session_exists: true 59 + queue_length: 0 60 + active_plan: null 61 + 62 + # Files that should exist 63 + files: 64 + - pattern: "~/.9plan/sessions/*/session.db" 65 + exists: true 66 + - pattern: "~/.9plan/sessions/*/plans/" 67 + is_directory: true 68 + 69 + # Optional: How to validate 70 + validation: 71 + admin_validate: true # Run 9plan_admin_validate 72 + custom_check: | 73 + # PowerShell to run for custom validation 74 + Test-Path ~/.9plan/sessions/*/session.db 75 + ``` 76 + 77 + ## Step Fields 78 + 79 + | Field | Type | Required | Description | 80 + |-------|------|----------|-------------| 81 + | `id` | string | Yes | Unique step identifier | 82 + | `description` | string | Yes | What this step does | 83 + | `prompt` | string | Yes | Prompt to send to inner Claude | 84 + | `expected` | object | No | Expected outcomes | 85 + | `validation` | object | No | How to validate this step | 86 + | `on_failure` | string | No | What to do if step fails ("abort", "continue", "retry") | 87 + 88 + ## Expected Outcomes 89 + 90 + ### tools_called 91 + 92 + List of MCP tools the inner Claude should call: 93 + 94 + ```yaml 95 + expected: 96 + tools_called: 97 + - 9plan_session_create 98 + - 9plan_queue_add 99 + ``` 100 + 101 + ### state 102 + 103 + Expected 9plan state after this step: 104 + 105 + ```yaml 106 + expected: 107 + state: 108 + session_exists: true 109 + queue_length: 3 110 + active_plan: null 111 + completed_count: 0 112 + ``` 113 + 114 + ### files 115 + 116 + Expected file system state: 117 + 118 + ```yaml 119 + expected: 120 + files: 121 + - pattern: "validation/sandbox/notekeeper/src/storage.ts" 122 + exists: true 123 + contains: 124 + - "export function loadNotes" 125 + - "export function saveNotes" 126 + - pattern: "~/.9plan/sessions/*/plans/*.txt" 127 + count: 3 # Exactly 3 plan files 128 + ``` 129 + 130 + ### response_contains 131 + 132 + Keywords that should appear in inner Claude's response: 133 + 134 + ```yaml 135 + expected: 136 + response_contains: 137 + - "session created" 138 + - "Session:" 139 + ``` 140 + 141 + ## Verification 142 + 143 + Final verification after all steps complete: 144 + 145 + ```yaml 146 + verification: 147 + # Commands to run 148 + commands: 149 + - command: "cd validation/sandbox/notekeeper && npm run build" 150 + success: true 151 + 152 + - command: "node validation/sandbox/notekeeper/dist/index.js add 'Test note'" 153 + output_contains: "Note added" 154 + 155 + - command: "node validation/sandbox/notekeeper/dist/index.js list" 156 + output_contains: "Test note" 157 + 158 + # History searches to verify 159 + history_searches: 160 + - query: "storage module" 161 + min_results: 1 162 + 163 + - query: "add command" 164 + results_contain: 165 + goal_keywords: ["add", "command"] 166 + 167 + # Final state check 168 + final_state: 169 + queue_empty: true 170 + all_plans_completed: true 171 + ``` 172 + 173 + ## Complete Example 174 + 175 + ```yaml 176 + schema_version: "1.0" 177 + scenario_id: simple-session-test 178 + description: "Test basic session creation and plan lifecycle" 179 + 180 + prerequisites: 181 + 9plan_built: true 182 + 183 + steps: 184 + - id: create_session 185 + description: "Create a session" 186 + prompt: "Create a 9plan session for testing" 187 + expected: 188 + tools_called: 189 + - 9plan_session_create 190 + state: 191 + session_exists: true 192 + validation: 193 + admin_validate: true 194 + 195 + - id: add_plan 196 + description: "Add a test plan" 197 + prompt: "Add a plan to test something simple" 198 + expected: 199 + tools_called: 200 + - 9plan_queue_add 201 + state: 202 + queue_length: 1 203 + 204 + - id: pull_plan 205 + description: "Pull the plan" 206 + prompt: "Pull the plan and mark it complete with a simple outcome" 207 + expected: 208 + tools_called: 209 + - 9plan_queue_pull 210 + - 9plan_plan_complete 211 + state: 212 + queue_length: 0 213 + completed_count: 1 214 + 215 + verification: 216 + history_searches: 217 + - query: "test" 218 + min_results: 1 219 + 220 + final_state: 221 + queue_empty: true 222 + ``` 223 + 224 + ## Scenario Files in This Directory 225 + 226 + | File | Description | 227 + |------|-------------| 228 + | `notekeeper-full.yaml` | Complete Notekeeper build scenario | 229 + | `decomposition-test.yaml` | Test plan decomposition and aggregation | 230 + | `error-conditions.yaml` | Test error handling | 231 + | `dependency-resolution.yaml` | Test semantic search for dependencies | 232 + | `session-recovery.yaml` | Test session resume after context loss |

+153

validation/scenarios/session-recovery.yaml

··· 1 + schema_version: "1.0" 2 + scenario_id: session-recovery 3 + description: | 4 + Test session resume after context loss. 5 + Simulates a scenario where the agent loses context and needs to resume 6 + a session using 9plan_session_resume. 7 + 8 + prerequisites: 9 + 9plan_built: true 10 + 11 + # Special instruction for outer Claude: 12 + # After step 3, you will start a NEW inner Claude session (new --resume chain) 13 + # to simulate context loss. The new session should use session_resume. 14 + 15 + steps: 16 + # Phase 1: Initial work (first inner Claude session) 17 + 18 + - id: create_session 19 + description: "Create a session and do some work" 20 + prompt: | 21 + Create a 9plan session for building a calculator app. 22 + 23 + expected: 24 + tools_called: 25 + - 9plan_session_create 26 + state: 27 + session_exists: true 28 + # IMPORTANT: Outer Claude must save the session name for later 29 + 30 + - id: add_plans 31 + description: "Add several plans" 32 + prompt: | 33 + Add plans for: 34 + 1. Basic operations (add, subtract, multiply, divide) 35 + 2. Advanced operations (power, sqrt, log) 36 + 3. Memory functions (store, recall, clear) 37 + 38 + expected: 39 + tools_called: 40 + - 9plan_queue_add 41 + state: 42 + queue_length: 3 43 + 44 + - id: start_work 45 + description: "Pull and complete the first plan" 46 + prompt: | 47 + Pull the first plan (basic operations) and complete it. 48 + 49 + expected: 50 + tools_called: 51 + - 9plan_queue_pull 52 + - 9plan_plan_complete 53 + state: 54 + queue_length: 2 55 + completed_count: 1 56 + 57 + # --- CONTEXT LOSS SIMULATION --- 58 + # Outer Claude: Start a new inner Claude session here (no --resume) 59 + # This simulates the agent losing context mid-task 60 + 61 + - id: context_loss 62 + description: "Simulate context loss by starting fresh inner Claude" 63 + special_instruction: | 64 + OUTER CLAUDE: Start a completely new inner Claude session. 65 + Do NOT use --resume. This simulates the agent losing all context. 66 + Save the session name from step 1 to give to the new session. 67 + prompt: null # No prompt - this is an instruction for outer Claude 68 + 69 + # Phase 2: Recovery (new inner Claude session) 70 + 71 + - id: resume_session 72 + description: "Resume the session using session name" 73 + # Outer Claude should tell the new inner Claude about the session 74 + prompt: | 75 + You were working on a calculator app but lost context. 76 + Resume the 9plan session named "{SESSION_NAME_FROM_STEP_1}". 77 + (Outer Claude: substitute the actual session name here) 78 + 79 + expected: 80 + tools_called: 81 + - 9plan_session_resume 82 + state: 83 + session_exists: true 84 + response_contains: 85 + - "resumed" 86 + 87 + - id: check_state_after_resume 88 + description: "Check the queue state after resuming" 89 + prompt: | 90 + Check what plans are in the queue. You should see some plans remaining 91 + from before the context loss. 92 + 93 + expected: 94 + state: 95 + queue_length: 2 # 2 plans should remain 96 + completed_count: 1 # 1 was completed before 97 + 98 + - id: continue_work 99 + description: "Continue working after resume" 100 + prompt: | 101 + Pull the next plan and complete it. 102 + 103 + expected: 104 + tools_called: 105 + - 9plan_queue_pull 106 + - 9plan_plan_complete 107 + state: 108 + queue_length: 1 109 + completed_count: 2 110 + 111 + - id: finish_work 112 + description: "Complete the remaining work" 113 + prompt: | 114 + Pull and complete the final plan. 115 + 116 + expected: 117 + tools_called: 118 + - 9plan_queue_pull 119 + - 9plan_plan_complete 120 + state: 121 + queue_length: 0 122 + completed_count: 3 123 + 124 + verification: 125 + # Verify all plans were completed despite context loss 126 + history_searches: 127 + - query: "basic operations add subtract" 128 + min_results: 1 129 + description: "First plan should be in history" 130 + 131 + - query: "advanced operations power sqrt" 132 + min_results: 1 133 + description: "Second plan should be in history" 134 + 135 + - query: "memory functions store recall" 136 + min_results: 1 137 + description: "Third plan should be in history" 138 + 139 + final_state: 140 + queue_empty: true 141 + all_plans_completed: true 142 + completed_count: 3 143 + 144 + # Notes for outer Claude: 145 + notes: | 146 + This scenario is special because it requires starting a new inner Claude 147 + session mid-way through to simulate context loss. 148 + 149 + Steps 1-3: Use one inner Claude session (with --resume between steps) 150 + Step 4: Instruction to outer Claude - no inner Claude action 151 + Steps 5-8: Use a NEW inner Claude session (fresh start, no --resume from before) 152 + 153 + The key validation is that session_resume allows recovery and work continues.

+68

validation/scripts/invoke-inner.ps1

··· 1 + <# 2 + .SYNOPSIS 3 + Invoke inner Claude with pre-approved 9plan tools 4 + .DESCRIPTION 5 + Helper script for the "Claude sandwich" validation pattern. 6 + Bundles the --mcp-config and --allowedTools flags so you don't have to type them every time. 7 + .PARAMETER Prompt 8 + The prompt to send to inner Claude 9 + .PARAMETER Resume 10 + Optional session ID to resume a previous conversation 11 + .PARAMETER OutputFormat 12 + Output format: "text" (default) or "json" 13 + .EXAMPLE 14 + .\invoke-inner.ps1 -Prompt "Create a 9plan session for Notekeeper" 15 + .EXAMPLE 16 + .\invoke-inner.ps1 -Prompt "Add plans for storage module" -Resume "abc123" -OutputFormat json 17 + #> 18 + param( 19 + [Parameter(Mandatory=$true)] 20 + [string]$Prompt, 21 + 22 + [Parameter(Mandatory=$false)] 23 + [string]$Resume, 24 + 25 + [Parameter(Mandatory=$false)] 26 + [ValidateSet("text", "json")] 27 + [string]$OutputFormat = "text" 28 + ) 29 + 30 + # All 9plan MCP tools + file access for implementation 31 + $ALLOWED_TOOLS = @( 32 + # File operations (needed for reading plans and writing code) 33 + "Read", 34 + "Write", 35 + # 9plan session tools 36 + "mcp__9plan__9plan_session_create", 37 + "mcp__9plan__9plan_session_resume", 38 + # 9plan queue tools 39 + "mcp__9plan__9plan_queue_add", 40 + "mcp__9plan__9plan_queue_pull", 41 + # 9plan plan lifecycle tools 42 + "mcp__9plan__9plan_plan_complete", 43 + "mcp__9plan__9plan_plan_defer", 44 + "mcp__9plan__9plan_plan_discard", 45 + # 9plan history tools 46 + "mcp__9plan__9plan_history_search", 47 + "mcp__9plan__9plan_history_get", 48 + # 9plan admin tools (for validation) 49 + "mcp__9plan__9plan_admin_validate", 50 + "mcp__9plan__9plan_admin_sessions", 51 + "mcp__9plan__9plan_admin_state" 52 + ) -join "," 53 + 54 + # Build the command 55 + $args = @( 56 + "-p", $Prompt, 57 + "--mcp-config", ".mcp.json", 58 + "--allowedTools", $ALLOWED_TOOLS, 59 + "--output-format", $OutputFormat 60 + ) 61 + 62 + if ($Resume) { 63 + $args += "--resume" 64 + $args += $Resume 65 + } 66 + 67 + # Execute 68 + & claude @args

Configure Feed

Configure Feed