this repo has no description
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

agentic-engineering skills and whatnot

+1099 -484
+9
agents/CLAUDE.md
··· 1 + # Agentic Engineering Toolkit 2 + 3 + @README.md 4 + 5 + 6 + 7 + ## Task Tracking 8 + 9 + Current tasks are tracked in `TODO.md`. Each task should have a clear definition of done.
+89
agents/GLOBAL_CONTEXT.md
··· 1 + ## About me 2 + 3 + I prefer learning over raw task execution unless stated otherwise. Keep output short, concise, and direct. 4 + 5 + --- 6 + 7 + ## Our relationship 8 + 9 + - We're colleagues — no hierarchy 10 + - Be direct and honest. No sycophancy 11 + - ALWAYS stop and ask rather than assume 12 + - Call out bad ideas, mistakes, and unreasonable expectations — I depend on this 13 + - Push back when you disagree, even if it's just a gut feeling 14 + - Speak up when you don't know something or we're in over our heads 15 + - Discuss architectural decisions together before implementation; routine fixes don't need discussion 16 + 17 + ## Proactiveness 18 + 19 + - Default to doing the task while explaining key decisions 20 + - If I emphasize learning ("help me understand"), prioritize explanation over execution 21 + - For straightforward tasks, just do them with minimal explanation 22 + - Pause to confirm when: multiple valid approaches exist, action would delete/restructure code, you don't understand what's asked, or I ask "how should I approach X?" 23 + 24 + ## Documentation 25 + 26 + Context is sacred. Keep it concise, high-signal, and focused on what matters most. Never duplicate information that lives elsewhere — point to it instead. Treat every line of documentation as carrying weight. 27 + 28 + ## Continuous improvement 29 + 30 + Keep documentation evergreen. As work happens, update relevant docs immediately. Know what documentation exists in a project and ensure it stays accurate as things change. 31 + 32 + --- 33 + 34 + ## Verify claims 35 + 36 + Search to verify information that changes rapidly or may have updated since 2024. When mentioning software, libraries, articles, or books, confirm they exist. For factual claims, seek primary sources over social media coverage. 37 + 38 + --- 39 + 40 + ## Coding principles 41 + 42 + - YAGNI. Don't add features we don't need right now 43 + - Simple, maintainable solutions over clever ones. Readability is a primary concern 44 + - Make the SMALLEST reasonable changes. Never rewrite without explicit permission 45 + - Reduce code duplication, even if refactoring takes extra effort 46 + - Match surrounding code style. Consistency within a file trumps external standards 47 + - Get explicit approval before implementing backward compatibility 48 + - Fix broken things immediately when found 49 + 50 + ### Naming 51 + 52 + Names tell what code does, not how it's implemented or its history: 53 + - NEVER use implementation details (e.g., "ZodValidator", "MCPWrapper") 54 + - NEVER use temporal context (e.g., "NewAPI", "LegacyHandler", "ImprovedInterface") 55 + - Good: `Tool`, `Registry`, `execute()` — Bad: `AbstractToolInterface`, `ToolRegistryManager` 56 + 57 + ### Comments 58 + 59 + - Comments explain WHAT or WHY, never that something is "improved" or "new" 60 + - Never add temporal context ("recently refactored", "moved from") 61 + - Never remove comments unless provably false 62 + - If refactoring, remove old comments — don't add ones explaining the refactoring 63 + 64 + --- 65 + 66 + ## Testing 67 + 68 + - **ALWAYS write a failing test before fixing a bug or implementing new behavior.** Confirm the test fails. Then write the minimum code to make it pass. No exceptions. 69 + - Tests must cover all functionality. Never delete a failing test — raise it with me 70 + - Never write tests that only test mocked behavior. No mocks in e2e tests 71 + - Test output must be pristine. Capture and validate expected errors 72 + 73 + --- 74 + 75 + ## Debugging 76 + 77 + Always find root cause. Never fix symptoms or add workarounds. 78 + 79 + 1. Read error messages carefully — they often contain the solution 80 + 2. Reproduce consistently before investigating 81 + 3. Find working examples in the codebase and compare 82 + 4. Form a single hypothesis, test minimally, verify before continuing 83 + 5. If first fix doesn't work, re-analyze rather than adding more fixes 84 + 85 + --- 86 + 87 + ## Task execution 88 + 89 + Use the `task-workflow` skill for all substantial work. Every task gets a `.tasks/<task-slug>/` folder with TASK.md (requirements) and PLAN.md (steps + progress). See the skill for the full process.
+42
agents/README.md
··· 1 + # Agentic Engineering Toolkit 2 + 3 + A minimal, portable toolkit of practices, templates, and skills for rigorous AI-assisted development. Built for personal use across any project and any CLI-based agent harness (Claude Code, pi, opencode, etc.). 4 + 5 + The goal: have a single, complete setup that can be dropped into any project to enforce disciplined agentic engineering — TDD, adversarial review, fresh contexts, and human-in-the-loop commits — without reinventing the wheel each time. 6 + 7 + ## Workflow 8 + 9 + 1. **Init** — Set up a repo with `context-init` (project documentation) and `agent-container` (isolated dev environment) 10 + 2. **Work** — Launch the agent container with a task, follow the core loop: spec, failing tests, minimal implementation, adversarial review, commit 11 + 3. **Maintain** — Review state of work, clean up, update documentation 12 + 13 + ## Structure 14 + 15 + - `skills/` — Portable markdown instruction sets. Each is a self-contained practice (with a `SKILL.md`) that can be loaded into any agent harness. 16 + - `bin/` — Standalone scripts (e.g., adversarial review via a second model's API). 17 + 18 + ## The Core Loop 19 + 20 + ``` 21 + Spec ──→ Failing Tests ──→ Minimal Implementation ──→ All Tests Pass 22 + ↑ │ 23 + │ ↓ 24 + │ Adversarial Review 25 + │ (fresh context, hostile) 26 + │ │ 27 + │ ┌─── hallucinating? ──→ DONE (converged) ───→ Commit 28 + │ │ 29 + └── real findings ──→ Feed back to Builder (fresh session) 30 + ``` 31 + 32 + **Exit condition:** When the adversary starts inventing problems that don't exist in the code, you've converged. Ship it. 33 + 34 + ## Principles 35 + 36 + 1. **Specs are truth.** Tests serve specs. Code serves tests. 37 + 2. **Tests before code.** No implementation without a failing test. 38 + 3. **Fresh context per task.** Long sessions degrade. Commit and restart. 39 + 4. **Different model for review.** Cognitive diversity catches blind spots. 40 + 5. **Fresh context per review.** No relationship drift. Every roast is the first roast. 41 + 6. **Hallucination = exit signal.** When the adversary makes things up, you're done. 42 + 7. **You commit manually.** The agent proposes. You dispose.
+1
agents/TODO.md
··· 1 + - Test out the adversarial review skill on this repo, in the container
+45
agents/install.sh
··· 1 + #!/usr/bin/env bash 2 + set -euo pipefail 3 + 4 + SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" 5 + TARGET="${1:-claude}" 6 + 7 + case "$TARGET" in 8 + claude) 9 + SKILLS_DEST="$HOME/.claude/skills" 10 + GLOBAL_CONTEXT="$HOME/.claude/CLAUDE.md" 11 + ;; 12 + pi) 13 + SKILLS_DEST="$HOME/.pi/agent/skills" 14 + GLOBAL_CONTEXT="$HOME/.pi/agent/AGENTS.md" 15 + ;; 16 + *) 17 + echo "usage: $0 [claude|pi]" >&2 18 + exit 1 19 + ;; 20 + esac 21 + 22 + # Symlink GLOBAL_CONTEXT.md -> target global context file 23 + if [ -L "$GLOBAL_CONTEXT" ]; then 24 + echo "skipped GLOBAL_CONTEXT.md (already linked)" 25 + else 26 + ln -s "$SCRIPT_DIR/GLOBAL_CONTEXT.md" "$GLOBAL_CONTEXT" 27 + echo "linked GLOBAL_CONTEXT.md -> $GLOBAL_CONTEXT" 28 + fi 29 + 30 + mkdir -p "$SKILLS_DEST" 31 + 32 + for skill in "$SCRIPT_DIR"/skills/*/; do 33 + name="$(basename "$skill")" 34 + target="$SKILLS_DEST/$name" 35 + 36 + if [ -e "$target" ] || [ -L "$target" ]; then 37 + echo "skipped $name (already exists)" 38 + continue 39 + fi 40 + 41 + ln -s "$skill" "$target" 42 + echo "linked $name -> $target" 43 + done 44 + 45 + echo "done. $(ls -1d "$SKILLS_DEST"/*/ 2>/dev/null | wc -l | tr -d ' ') skills installed to $SKILLS_DEST"
+39
agents/skills/adversarial-review/SKILL.md
··· 1 + --- 2 + name: adversarial-review 3 + description: Hostile code review that finds logic errors, test gaps, security issues, and placeholder fraud. Trigger after implementation passes all tests, ideally in a fresh context with a different model. 4 + --- 5 + 6 + # Adversarial Review 7 + 8 + Run this in a FRESH context after implementation passes tests. Use a different model from the one that wrote the code if possible. 9 + 10 + ## Instructions 11 + 12 + You are a hostile code reviewer with zero tolerance for slop. You are not here to be helpful or encouraging. You are here to find every flaw. 13 + 14 + Review the following code. For every issue you find, provide: 15 + - **SEVERITY**: CRITICAL / HIGH / MEDIUM / LOW 16 + - **LOCATION**: exact file and line/function 17 + - **FLAW**: what's wrong, specifically 18 + - **FIX**: what should be done instead 19 + 20 + ## Categories to Check 21 + 22 + 1. **Logic errors** — does the code actually do what it claims? 23 + 2. **Test quality** — would these tests pass even if the implementation were subtly wrong? Tautological tests? Tests that mock so aggressively they don't test anything real? 24 + 3. **Error handling** — generic catches that swallow context? Missing error paths? 25 + 4. **Security** — input validation gaps? Injection vectors? Auth assumptions? Secrets in code? Unsafe deserialization? Path traversal? Command injection? 26 + 5. **Hidden coupling** — does this code depend on things not in its interface? 27 + 6. **Resource management** — missing cleanup? Unclosed handles? Unbounded allocations? 28 + 7. **Race conditions** — shared mutable state? Missing synchronization? 29 + 8. **Placeholder fraud** — TODO comments, stub implementations, "will implement later" that shipped as-is 30 + 31 + ## Rules 32 + 33 + No preamble. No "overall this looks good." Start with the worst finding. 34 + If you genuinely cannot find real problems, say "NO FINDINGS" and nothing else. 35 + Do not invent problems. Do not nitpick style unless it causes bugs. 36 + 37 + ## Exit Condition 38 + 39 + When the adversary starts inventing problems that don't exist in the code, you've converged. Ship it.
+111
agents/skills/adversarial-review/adversarial-review.sh
··· 1 + #!/usr/bin/env bash 2 + # adversarial-review.sh — Adversarial code review via a second model using pi 3 + # 4 + # Usage: 5 + # ./adversarial-review.sh src/auth.rs src/auth_test.rs 6 + # ./adversarial-review.sh src/api/ # reviews all files in directory 7 + # ./adversarial-review.sh --security src/crypto.rs # security-focused review 8 + # 9 + # Configuration: 10 + # ADVERSARY_MODEL — model to use (default: google-gemini-cli/gemini-2.5-flash) 11 + # ADVERSARY_TOOLS — comma-separated tool list (default: read) 12 + # 13 + # Requires: pi (https://pi.dev) 14 + # 15 + # The script always uses a fresh context (--no-session). 16 + # This is intentional — it prevents relationship drift. 17 + 18 + set -euo pipefail 19 + 20 + # --- Configuration --- 21 + MODEL="${ADVERSARY_MODEL:-google-gemini-cli/gemini-2.5-flash}" 22 + TOOLS="${ADVERSARY_TOOLS:-read}" 23 + MODE="general" 24 + 25 + # --- Parse flags --- 26 + FILES=() 27 + for arg in "$@"; do 28 + case "$arg" in 29 + --security) MODE="security" ;; 30 + --help|-h) 31 + head -16 "$0" | grep '^#' | sed 's/^# \?//' 32 + exit 0 33 + ;; 34 + *) FILES+=("$arg") ;; 35 + esac 36 + done 37 + 38 + if [ ${#FILES[@]} -eq 0 ]; then 39 + echo "Usage: $0 [--security] <file-or-directory> ..." >&2 40 + exit 1 41 + fi 42 + 43 + # --- Validate targets exist --- 44 + TARGETS=() 45 + for target in "${FILES[@]}"; do 46 + if [ -e "$target" ]; then 47 + TARGETS+=("$(cd "$(dirname "$target")" && pwd)/$(basename "$target")") 48 + else 49 + echo "Warning: $target not found, skipping" >&2 50 + fi 51 + done 52 + 53 + if [ ${#TARGETS[@]} -eq 0 ]; then 54 + echo "No valid files or directories. Check your paths." >&2 55 + exit 1 56 + fi 57 + 58 + # --- Select system prompt --- 59 + if [ "$MODE" = "security" ]; then 60 + SYSTEM_PROMPT='You are a security auditor. Assume all inputs are hostile. Assume the network is hostile. 61 + 62 + Read the provided files and check: 63 + 1. Input validation: injection vectors (SQL, command, path, template, header) 64 + 2. Auth: paths that skip auth, privilege escalation, token validation 65 + 3. Secrets: hardcoded credentials, secrets in logs or error messages 66 + 4. Dependencies: known CVEs, unmaintained packages 67 + 5. Crypto: weak algorithms, hardcoded IVs, missing MACs 68 + 6. Data exposure: sensitive info in errors, logs, API responses 69 + 7. Resource exhaustion: unbounded allocation, CPU, disk 70 + 8. Deserialization: untrusted data deserialized unsafely 71 + 72 + For each finding provide: 73 + - SEVERITY: CRITICAL / HIGH / MEDIUM / LOW 74 + - CWE: number if applicable 75 + - LOCATION: file and function 76 + - ATTACK: how to exploit it 77 + - FIX: specific remediation 78 + 79 + No hedging. Concrete findings only. If no real issues, say NO FINDINGS.' 80 + else 81 + SYSTEM_PROMPT='You are a hostile code reviewer with zero tolerance. No preamble. No encouragement. 82 + 83 + For every issue provide: 84 + - SEVERITY: CRITICAL / HIGH / MEDIUM / LOW 85 + - LOCATION: file and line/function 86 + - FLAW: what is wrong 87 + - FIX: what to do instead 88 + 89 + Check: logic errors, test quality (would tests pass with a subtly wrong implementation?), 90 + error handling, security, hidden coupling, resource management, race conditions, 91 + placeholder fraud (TODOs shipped as implementation). 92 + 93 + Start with the worst finding. Do not invent problems. Do not nitpick style unless it causes bugs. 94 + If genuinely no issues, say NO FINDINGS.' 95 + fi 96 + 97 + # --- Build the user prompt --- 98 + FILE_LIST=$(printf '%s\n' "${TARGETS[@]}") 99 + USER_PROMPT="Review the following files: 100 + 101 + ${FILE_LIST}" 102 + 103 + # --- Run review --- 104 + echo "Adversarial review via pi (model: $MODEL, tools: $TOOLS)..." >&2 105 + 106 + pi -p \ 107 + --no-session \ 108 + --model "$MODEL" \ 109 + --tools "$TOOLS" \ 110 + --system-prompt "$SYSTEM_PROMPT" \ 111 + "$USER_PROMPT"
+217
agents/skills/agent-container/SKILL.md
··· 1 + --- 2 + name: agent-container 3 + description: Set up a sandboxed Docker container for running an AI coding agent (pi) against the current project. Trigger when starting a new project that needs an isolated agent environment, or when adding agent infrastructure to an existing project. 4 + --- 5 + 6 + # Agent Container 7 + 8 + Set up a Docker container for running an AI coding agent (pi) against the current project. The container provides a sandboxed environment where the agent can operate freely in yolo mode — auto-approving all actions — without risk to the host system. 9 + 10 + ## What This Produces 11 + 12 + All files go in a `.agent/` directory at the project root: 13 + 14 + - `.agent/Dockerfile` — a container image with pi, project dependencies, and tooling 15 + - `.agent/docker-compose.yml` — compose file that volume-mounts the project and runs the agent 16 + - `.agent/secrets` — key=value mapping of env vars to 1Password `op://` references (committed, no actual secrets) 17 + - `.agent/run.sh` — one-command script to resolve secrets, build (if needed), and launch the agent 18 + - `.dockerignore` additions (if needed) 19 + 20 + ## Process 21 + 22 + ### Step 1: Analyze the Project 23 + 24 + Before writing anything, examine the project to determine: 25 + 26 + - **Language(s) and runtimes** — what needs to be installed (e.g., Node, Python, Go, Rust) 27 + - **Package manager** — npm, pnpm, yarn, pip, cargo, etc. 28 + - **System dependencies** — anything the project needs beyond the language runtime (e.g., database clients, native libraries) 29 + - **Test runner** — how tests are run, since the agent will need to run them 30 + - **Linter/formatter** — same reasoning 31 + 32 + Look at: `package.json`, `pyproject.toml`, `Cargo.toml`, `go.mod`, `Makefile`, `Dockerfile` (existing), CI config, and the project's development documentation. 33 + 34 + ### Step 2: Choose a Base Image 35 + 36 + Pick the simplest base image that covers the project's needs: 37 + 38 + - Node project → `node:<version>-slim` 39 + - Python project → `python:<version>-slim` 40 + - Multi-language → `ubuntu:latest` or `debian:bookworm-slim` with manual installs 41 + - If the project already has a Dockerfile, use the same base image or a compatible one 42 + 43 + Always prefer `-slim` variants to keep the image small. 44 + 45 + ### Step 3: Write the Dockerfile 46 + 47 + Create the `.agent/` directory and write `.agent/Dockerfile`: 48 + 49 + ```dockerfile 50 + FROM <base-image> 51 + 52 + # System dependencies 53 + RUN apt-get update && apt-get install -y --no-install-recommends \ 54 + git \ 55 + curl \ 56 + <project-specific-system-deps> \ 57 + && rm -rf /var/lib/apt/lists/* 58 + 59 + # Install Node (if not already in base image) — required for pi 60 + # Use the NodeSource setup or official node image as base 61 + RUN npm install -g @mariozechner/pi-coding-agent 62 + 63 + # Project dependencies 64 + WORKDIR /workspace 65 + COPY package.json package-lock.json ./ # (adapt for the project's package manager) 66 + RUN npm install # (adapt for the project's package manager) 67 + 68 + # The rest of the project is volume-mounted at runtime, not copied 69 + ``` 70 + 71 + Key principles: 72 + - **pi is always installed globally via npm** — this is non-negotiable 73 + - **Node must be available** — even if the project isn't a Node project, Node is needed for pi 74 + - **Project deps are installed in the image** for faster startup, but source code is volume-mounted 75 + - **Git must be installed** — the agent uses it heavily 76 + - **Keep layers cacheable** — deps before source code 77 + - **Build context is the project root** (set in compose), so `COPY` paths are relative to the project root, not `.agent/` 78 + 79 + ### Step 4: Write the Compose File 80 + 81 + Write `.agent/docker-compose.yml`: 82 + 83 + ```yaml 84 + services: 85 + agent: 86 + build: 87 + context: .. 88 + dockerfile: .agent/Dockerfile 89 + volumes: 90 + - ..:/workspace 91 + - node_modules:/workspace/node_modules # (adapt: preserve installed deps) 92 + - ${PI_AGENT_DIR}/skills:/root/.pi/agent/skills 93 + - ${PI_AGENT_DIR}/auth.json:/root/.pi/agent/auth.json 94 + - ${PI_AGENT_DIR}/AGENTS.md:/root/.pi/agent/AGENTS.md 95 + - ${PI_AGENT_DIR}/bin:/root/.pi/agent/bin 96 + working_dir: /workspace 97 + stdin_open: true 98 + tty: true 99 + command: pi 100 + 101 + volumes: 102 + node_modules: # (adapt for the project's dependency cache) 103 + ``` 104 + 105 + Key principles: 106 + - **Build context is `..` (the project root)** so `COPY` in the Dockerfile can access project files 107 + - **Volume-mount the project** so changes persist on the host 108 + - **Use a named volume for dependency directories** (node_modules, venv, target, etc.) to avoid overwriting the installed deps from the image build 109 + - **Secrets are passed at runtime by the run script** — never bake them into the image or compose file 110 + - **stdin_open and tty are required** — pi needs an interactive terminal 111 + - **Mount specific pi agent config** so the container picks up skills, auth, AGENTS.md, and bin — but not sessions or other host-only state 112 + - **Default command is `pi`** so the run script drops straight into the agent 113 + 114 + ### Step 5: Mount Pi Agent Config 115 + 116 + The agent inside the container needs access to specific parts of the host's pi agent configuration. Mount only what the agent needs — not the entire `~/.pi/agent/` directory (which contains sessions and other host-only state). 117 + 118 + **Mounting:** Add volume mounts in `docker-compose.yml` for each item: 119 + 120 + ```yaml 121 + volumes: 122 + - ..:/workspace 123 + - node_modules:/workspace/node_modules 124 + - ${PI_AGENT_DIR}/skills:/root/.pi/agent/skills 125 + - ${PI_AGENT_DIR}/auth.json:/root/.pi/agent/auth.json 126 + - ${PI_AGENT_DIR}/AGENTS.md:/root/.pi/agent/AGENTS.md 127 + - ${PI_AGENT_DIR}/bin:/root/.pi/agent/bin 128 + ``` 129 + 130 + **In the run script**, export `PI_AGENT_DIR="${HOME}/.pi/agent"` before invoking compose. 131 + 132 + ### Step 6: Write the Secrets File 133 + 134 + Create `.agent/secrets` — a plain key=value file mapping environment variable names to 1Password secret references: 135 + 136 + ``` 137 + ANTHROPIC_API_KEY=op://Personal/Anthropic API/credential 138 + GITHUB_TOKEN=$GITHUB_TOKEN 139 + ``` 140 + 141 + Two formats are supported: 142 + - `KEY=op://vault/item/field` — fetched from 1Password CLI at runtime 143 + - `KEY=$ENV_VAR` — read from a host environment variable 144 + 145 + This file is committed to the repo. It contains no secrets — only pointers to where secrets live. Ask the user which secrets the agent needs and what their references are. 146 + 147 + Lines starting with `#` are comments. Blank lines are ignored. 148 + 149 + ### Step 7: Write the Run Script 150 + 151 + Copy the reference implementation from this skill into the project: 152 + 153 + cp <path-to-this-skill>/run.sh .agent/run.sh 154 + chmod +x .agent/run.sh 155 + 156 + The reference `run.sh` (located alongside this SKILL.md) handles skills resolution and secret injection. Adapt it if the project needs additional setup before launching the agent. 157 + 158 + Key principles: 159 + - **1Password CLI (`op`) is the primary secret source** — secrets are fetched at runtime via `op read` 160 + - **Falls back to host environment variables** — if `op` isn't installed or the read fails, the script checks for a matching env var on the host 161 + - **Warns on missing secrets** — so you know immediately if something isn't configured 162 + - **Secrets are passed as `-e` flags** to `docker compose run`, so the compose file doesn't need to know about specific secrets 163 + - **Uses `SCRIPT_DIR`** so it works from any working directory 164 + - **`--build` flag** rebuilds the image if the Dockerfile changed, otherwise uses cache (fast) 165 + - **`--rm` flag** cleans up the container after exit 166 + - **Passes `"$@"`** so the user can override the command (e.g., `.agent/run.sh bash` to get a shell) 167 + 168 + ### Step 8: Update .dockerignore 169 + 170 + If `.dockerignore` doesn't already cover these, add: 171 + 172 + ``` 173 + .env 174 + .env.* 175 + node_modules/ 176 + ``` 177 + 178 + ### Step 9: Document Usage 179 + 180 + Add a section to the project's README or development docs explaining: 181 + 182 + ``` 183 + ## Running the AI Agent 184 + 185 + Secrets are resolved from `.agent/secrets` via 1Password CLI (`op`). 186 + Make sure you're signed in (`op signin`) before running. 187 + 188 + Launch the containerized agent: 189 + 190 + .agent/run.sh 191 + 192 + If you don't use 1Password, export the required env vars instead: 193 + 194 + export ANTHROPIC_API_KEY=your-key-here 195 + .agent/run.sh 196 + 197 + The agent (pi) starts in yolo mode inside a sandboxed container with full 198 + access to the project via volume mount. Changes the agent makes are 199 + reflected on your host filesystem. 200 + 201 + To get a shell in the container instead: 202 + 203 + .agent/run.sh bash 204 + 205 + To rebuild the image manually: 206 + 207 + docker compose -f .agent/docker-compose.yml build 208 + ``` 209 + 210 + ## Rules 211 + 212 + - Always analyze the project first. Do not write a generic Dockerfile — tailor it to what the project actually needs. 213 + - Never bake secrets or API keys into the image. 214 + - Prefer slim base images. 215 + - All agent container files live in `.agent/` to keep the project root clean and avoid conflicts with existing Docker configuration. 216 + - If the project already has Docker configuration, study it and stay consistent with its patterns. 217 + - Ask the user before writing any files. Show drafts and get confirmation.
+39
agents/skills/agent-container/run.sh
··· 1 + #!/usr/bin/env bash 2 + set -euo pipefail 3 + 4 + SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" 5 + SECRETS_FILE="$SCRIPT_DIR/secrets" 6 + 7 + # Resolve pi agent config directory 8 + export PI_AGENT_DIR="${HOME}/.pi/agent" 9 + 10 + # Resolve secrets from .agent/secrets file 11 + # Supported formats: 12 + # KEY=op://vault/item/field — fetch from 1Password CLI 13 + # KEY=$ENV_VAR — read from host environment variable 14 + ENV_ARGS=() 15 + if [[ -f "$SECRETS_FILE" ]]; then 16 + while IFS='=' read -r key ref; do 17 + [[ -z "$key" || "$key" == \#* ]] && continue 18 + 19 + value="" 20 + if [[ "$ref" == op://* ]]; then 21 + if command -v op &>/dev/null && op read "$ref" &>/dev/null; then 22 + value="$(op read "$ref")" 23 + else 24 + echo "Warning: could not resolve $key via op (is op installed and signed in?)" >&2 25 + fi 26 + elif [[ "$ref" == \$* ]]; then 27 + varname="${ref#\$}" 28 + value="${!varname:-}" 29 + fi 30 + 31 + if [[ -z "$value" ]]; then 32 + echo "Warning: could not resolve $key" >&2 33 + else 34 + ENV_ARGS+=(-e "$key=$value") 35 + fi 36 + done <"$SECRETS_FILE" 37 + fi 38 + 39 + docker compose -f "$SCRIPT_DIR/docker-compose.yml" run --rm --build "${ENV_ARGS[@]}" agent "$@"
+94
agents/skills/codebase-rescue/SKILL.md
··· 1 + --- 2 + name: codebase-rescue 3 + description: Stabilize a messy, mid-goal codebase through structured audit, characterization tests, and boundary fixes. Trigger when a project is partially working with drifting architecture or broken integration points. 4 + --- 5 + 6 + # Codebase Rescue 7 + 8 + Use this when a codebase is mid-goal and becoming messy. This skill walks through a structured process to stabilize before resuming development. 9 + 10 + ## When to Use 11 + 12 + You have a codebase that's mid-goal. Some parts work, some don't. The architecture may be drifting. Integration points are broken. The instinct is to keep pushing forward or to rewrite — both are usually wrong. The right move is: stop, map, stabilize, resume with discipline. 13 + 14 + --- 15 + 16 + ## Phase 0: Stop the Bleeding 17 + 18 + Before anything else: 19 + 20 + 1. Commit everything as-is on a branch and tag it — this is the "before" snapshot. 21 + 2. Write down (in plain English) what the system is supposed to do. Not how — what. 3-5 sentences. This is the north star. 22 + 3. Write down what's currently broken, from memory. Don't investigate yet. Just dump what you know. 23 + 24 + ## Phase 1: Audit 25 + 26 + Analyze the codebase structure. For each module/component, determine: 27 + - What it's supposed to do (infer from code, comments, any specs) 28 + - What it actually does right now (working, partial, broken, stub) 29 + - What tests exist and whether they pass 30 + - What its dependencies are 31 + 32 + Identify every integration point between components. For each boundary: 33 + - Is the contract (types, expected behavior) defined clearly? 34 + - Do both sides agree on the contract? 35 + - Is there a test that verifies this boundary works? 36 + 37 + Produce a prioritized issue list: 38 + - **CRITICAL**: broken and blocks other work 39 + - **HIGH**: will break under real usage 40 + - **MEDIUM**: works but fragile or wrong 41 + - **LOW**: tech debt that can wait 42 + 43 + Output this as a TODO.md. Do NOT fix anything yet. 44 + 45 + ## Phase 2: Stabilize What Works 46 + 47 + Before fixing broken things, lock down what currently works. For each working component: 48 + 49 + 1. Read the code 50 + 2. Write tests that capture its current behavior (happy path and error handling) 51 + 3. These are characterization tests — they document what the code does NOW 52 + 4. Run the tests. If any fail, the code doesn't work the way you think — flag that 53 + 54 + This gives you a safety net for when you start fixing broken integration points. 55 + 56 + ## Phase 3: Fix Integration Boundaries 57 + 58 + Work through broken boundaries from TODO.md one at a time: 59 + 60 + - One boundary per session. Fresh context each time. 61 + - Write the integration test FIRST (TDD) 62 + - Fix the minimum code to make it pass 63 + - Run the FULL test suite after each fix — if something regressed, fix it before moving on 64 + - Commit after each successful fix. Small, atomic commits. 65 + 66 + If fixing a boundary reveals a spec problem, STOP. Fix the spec first. 67 + 68 + ## Phase 4: Adversarial Review 69 + 70 + After fixing boundaries, run adversarial review on the changed code. Feed findings back to a fresh session for fixes. Repeat until convergence (adversary hallucinates or returns NO FINDINGS). 71 + 72 + ## Phase 5: Spec Reconciliation 73 + 74 + Compare current codebase against specs. For each requirement, determine: 75 + - **IMPLEMENTED**: code exists and tests verify it 76 + - **PARTIAL**: code exists but incomplete or untested 77 + - **MISSING**: not implemented yet 78 + - **DIVERGED**: code does something different from spec 79 + 80 + For DIVERGED items: is the code right or the spec? Flag for human decision — do not change either silently. Never leave specs and code in disagreement. 81 + 82 + ## Phase 6: Resume Normal Development 83 + 84 + You now have specs that match reality, tests on everything that works, fixed integration boundaries, and a TODO.md with remaining work. Resume with TDD. 85 + 86 + --- 87 + 88 + ## Common Pitfalls 89 + 90 + - **"Let me just rewrite this module."** No. Fix the contract, test it, move on. Rewrite later when stable. 91 + - **"I should refactor first."** No. Characterize with tests first. Then refactor with tests as safety net. 92 + - **"I'll fix all the boundaries at once."** No. One at a time. Commit between each. 93 + - **"The agent says this needs a major architectural change."** Maybe. But not during rescue. Stabilize first. 94 + - **"The specs are so out of date it's not worth reconciling."** Then you don't have specs — you have aspirational fiction. Phase 5 fixes that.
+159
agents/skills/context-init/SKILL.md
··· 1 + --- 2 + name: context-init 3 + description: Initialize project documentation (architecture, constraints, security, development practices) through a guided conversation. Trigger when setting up a new project or when a project lacks the documentation agents need to work effectively. 4 + --- 5 + 6 + # Context Init 7 + 8 + Initialize project documentation through a guided conversation. Creates the documentation that agents need to understand and work effectively in a project. 9 + 10 + ## Categories 11 + 12 + These are the important categories of information to capture about a project. Not every project needs all of them, and the depth should match the project's complexity. 13 + 14 + - **Overview** — what the system does, who it's for, why it exists, how work is tracked 15 + - **Architecture** — components, data flow, key decisions and their rationale 16 + - **Constraints** — invariants, boundaries, non-negotiables 17 + - **Security** — security rules, secret hygiene, dependency vetting, restricted commands 18 + - **Development** — TDD discipline, testing, verification, code conventions 19 + 20 + ## Process 21 + 22 + ### Step 1: Understand the Project 23 + 24 + Before writing anything, understand what already exists and what's needed. Ask: 25 + 26 + - What does this system do? Who is it for? Why does it exist? 27 + - What's the tech stack? 28 + - How complex is the project? (rough sense of scale — solo script vs. multi-service system) 29 + - Is there existing documentation? Where does it live? 30 + - How is work tracked? 31 + 32 + Use the answers to judge which categories need dedicated files vs. which can be folded into the README or hub file. A small project might only need a README and a hub file. A larger one might warrant separate files for each category. 33 + 34 + ### Step 2: Gather Information 35 + 36 + Work through each relevant category one at a time. For each, ask targeted questions, draft the content, and confirm before writing. Skip categories the user has no answers for — sparse and accurate beats thorough and speculative. 37 + 38 + **Architecture** — ask about: 39 + - Major components or modules, what each owns, interfaces, dependencies 40 + - How data flows between them (walk through a typical request/operation) 41 + - Key architectural decisions worth documenting with rationale 42 + 43 + **Constraints** — ask about: 44 + - What must always be true (invariants) 45 + - What's explicitly out of scope (boundaries) 46 + - What decisions are final (non-negotiables) 47 + 48 + **Security** — start with the defaults below, then ask about project-specific additions: 49 + - Any requirements beyond the defaults? 50 + - Additional restricted commands or patterns? 51 + - Compliance or regulatory concerns? 52 + - Secret patterns to watch for beyond the standard set? 53 + 54 + **Development** — start with the defaults below, then ask about: 55 + - Test framework and how to run tests 56 + - Coverage expectations 57 + - Error handling patterns 58 + - Naming conventions (or "match surrounding code") 59 + - Linter/formatter and how to run them 60 + - Any other practices to enforce 61 + 62 + ### Step 3: Write Documentation 63 + 64 + Write the documentation in whatever structure fits the project. Prefer separate files for substantial categories, but use judgment. Always create a hub file (`CLAUDE.md` or `AGENTS.md`, ask the user) that lists all documentation files and what information each contains. Do not use `@` file references — instead, list each file path with a brief description of its contents so agents know where to look for specific information. 65 + 66 + Example hub file structure: 67 + 68 + ``` 69 + # Project Name 70 + 71 + ## Documentation 72 + 73 + - `README.md` — project overview, purpose, and getting started 74 + - `ARCHITECTURE.md` — component breakdown, data flow, key decisions 75 + - `CONSTRAINTS.md` — invariants, boundaries, non-negotiables 76 + - `SECURITY.md` — security rules, secret hygiene, restricted commands 77 + - `DEVELOPMENT.md` — TDD practices, testing, code conventions 78 + 79 + ## Task Tracking 80 + ... 81 + ``` 82 + 83 + Update the README with overview information rather than creating a separate overview file. 84 + 85 + ## Defaults 86 + 87 + These defaults should be included in every project's documentation regardless of structure. They can be trimmed if the user says they're not relevant. 88 + 89 + ### Security Defaults 90 + 91 + **Hard Rules:** 92 + - NEVER write secrets, API keys, tokens, or credentials into any file 93 + - NEVER commit .env files — check .gitignore before every commit 94 + - All user input must be validated and sanitized before use 95 + - All external API calls must use TLS 96 + - No shell commands that pipe untrusted input (no eval, no unquoted variables) 97 + - Dependencies: prefer well-maintained packages with recent releases — flag any dependency with <100 stars or no updates in 6+ months 98 + 99 + **Restricted Commands — do NOT run:** 100 + - `rm -rf` on any path outside the project directory 101 + - Any command that modifies system configuration 102 + - Any command that installs system-level packages without asking first 103 + - Any command that accesses network resources not defined in this project 104 + - `curl | sh` or equivalent pipe-to-shell patterns 105 + 106 + **Before Every Commit:** 107 + - Run the full test suite 108 + - Run the linter/formatter 109 + - Check `git diff --staged` for any secrets or credentials 110 + - Check that no new dependencies were added without justification 111 + 112 + **Dependency Vetting** — when adding a new dependency: 113 + 1. State WHY it's needed (what does it do that we can't do in <50 lines?) 114 + 2. Show: name, version, weekly downloads, last publish date, maintainer count 115 + 3. Run the relevant audit tool (`cargo audit` / `npm audit` / `pip audit`) 116 + 4. If <100 stars or no updates in 6 months, FLAG IT and wait for human approval 117 + 118 + **Secret Hygiene:** 119 + - NEVER let agents create .env files — create them yourself 120 + - Use .env.example with placeholder values that agents CAN see 121 + - Ensure .gitignore covers: `.env`, `.env.*`, `*.pem`, `*.key`, `*.p12`, `*.pfx`, `secrets/`, `credentials/` 122 + 123 + ### Development Defaults 124 + 125 + **TDD Discipline — all implementation MUST follow strict TDD:** 126 + 1. Write failing tests FIRST based on the spec for the current task 127 + 2. Confirm tests fail before writing any implementation 128 + 3. Write the MINIMUM code to pass each test 129 + 4. Run the full test suite after each change 130 + 5. Refactor only after all tests pass 131 + 132 + Do NOT write implementation and tests simultaneously. 133 + Do NOT write tests that match implementation — tests must match the SPEC. 134 + 135 + **Session Hygiene:** 136 + - Start each session by reading the relevant spec and existing tests 137 + - One task per session — if scope creeps, stop and decompose into new tasks 138 + - Before ending a session, summarize what was done and what's left 139 + - When context feels degraded, commit and start fresh 140 + 141 + **Verification:** 142 + - Run the full test suite before and after every change 143 + - Run the linter/formatter before committing 144 + - After each feature, run adversarial review on the changed files 145 + - When fixing a bug, write a failing test that reproduces it before fixing 146 + 147 + **Testing Requirements:** 148 + - Unit tests for all business logic 149 + - Integration tests for all component boundaries 150 + - Test error paths, not just happy paths 151 + - Test edge cases from the spec's edge case catalog (if present) 152 + 153 + ## Rules 154 + 155 + - One category at a time. Don't overwhelm with all questions at once. 156 + - Draft content and show it before writing. Get explicit confirmation. 157 + - Use the user's words — don't over-formalize their answers. 158 + - Skip what they don't know or don't care about. 159 + - Match documentation depth to project complexity.
-60
agents/skills/dev-philosophy/SKILL.md
··· 1 - --- 2 - name: software-dev 3 - description: Use when the agent is performing any direct software development implementation or planning tasks. Provides the principles and values that must be followed. 4 - --- 5 - # Software Development Philosophy 6 - 7 - This skill defines how I approach software development. Apply these principles whenever working on code. 8 - 9 - ## Core Principle: Ruthless Simplicity 10 - 11 - - Write the minimum code necessary to solve the problem 12 - - Every line must justify its existence 13 - - Minimize maintenance burden — less code means less to understand, review, debug 14 - - Do not over-engineer unless there is a clear, proven need 15 - - Avoid abstractions until they're obviously necessary 16 - 17 - ## Approach to New Work 18 - 19 - 1. **Clarify the problem first** — understand exactly what needs to be solved before writing any code 20 - 2. **Design before implementing** — think through data structures, functions, and interfaces at a high level 21 - 3. **Iterate on the design** — refine until the structure is solid and any unnecessary complexity is removed 22 - 4. **Then implement** — only after the above steps are complete 23 - 24 - ## Testing 25 - 26 - - Always prefer automated tests that verify behavior deterministically 27 - - Write code that is testable 28 - - Prefer tests that run against real conditions (real DB like sqlite, end-to-end) over heavy mocking 29 - - Keep tests themselves simple — they are code too and carry maintenance burden 30 - - Tests are part of "done" 31 - 32 - ## Documentation 33 - 34 - - Avoid documentation that duplicates what exists in code 35 - - Point to the source of truth instead of recreating it 36 - - Documentation has maintenance burden — minimize it 37 - - Examples: Don't list routes in docs, link to route definitions; don't copy API schemas, reference the source 38 - 39 - ## Technology Preferences 40 - 41 - - Prefer lightweight tools (sqlite over heavy databases) 42 - - Prefer simple HTML over heavy UI frameworks 43 - - Minimize client-side complexity 44 - - CLIs and JSON APIs are preferred interfaces 45 - - Avoid unnecessary dependencies 46 - 47 - ## Boundaries 48 - 49 - - **Stay focused**: Only implement what is directly requested — no "helpful" extras or unrequested improvements 50 - - **No commits without permission**: Do not commit unless explicitly asked. A reminder that changes are ready to commit is acceptable. 51 - - **No pushing**: Never push to remote without explicit instruction 52 - - **Lint and format**: Ensure code passes linting and formatting before considering work complete 53 - 54 - ## What "Done" Looks Like 55 - 56 - - The problem is solved with minimal code 57 - - Automated tests verify the behavior 58 - - Code is linted and formatted 59 - - Diff has been reviewed (by the user, not the agent) 60 - - Ready to commit, but waiting for explicit go-ahead
+31
agents/skills/integration-boundary-fix/SKILL.md
··· 1 + --- 2 + name: integration-boundary-fix 3 + description: Fix a broken integration between two components using TDD — write the integration test first, identify the contract mismatch, fix minimally. Trigger when two components fail to communicate correctly at their boundary. 4 + --- 5 + 6 + # Integration Boundary Fix 7 + 8 + Use this to fix a broken integration between two components. Run in a fresh session per boundary. 9 + 10 + ## Instructions 11 + 12 + We're fixing the integration between [COMPONENT_A] and [COMPONENT_B]. 13 + 14 + The spec says: [WHAT SHOULD HAPPEN] 15 + Currently: [WHAT ACTUALLY HAPPENS OR DOESN'T] 16 + 17 + ### Step 1: Write the Integration Test 18 + 19 + Write an integration test that exercises this boundary according to the spec. The test should call component A's interface and verify that component B receives/produces the correct result. This test MUST FAIL right now. 20 + 21 + ### Step 2: Identify the Mismatch 22 + 23 + What does A send vs what B expects? Show the specific types, formats, or protocols that disagree. 24 + 25 + ### Step 3: Fix the Minimum 26 + 27 + Fix the MINIMUM code to make the integration test pass. Prefer changing the implementation to match the spec. If the spec is wrong, STOP and flag it — do not change the spec silently. 28 + 29 + ### Step 4: Verify 30 + 31 + Run the full test suite. If anything else broke, fix those regressions before moving on.
+54
agents/skills/security-guardrails/SKILL.md
··· 1 + --- 2 + name: security-guardrails 3 + description: Apply security guardrails to a project using agentic development — agent permissions, secret hygiene, dependency vetting, and blast radius limiting. Trigger when setting up a new project for agent-assisted development or when hardening an existing project's agent workflow. 4 + --- 5 + 6 + # Security Guardrails Setup 7 + 8 + Use this to apply security guardrails to a project that uses agentic development. 9 + 10 + ## Threat Model 11 + 12 + When an AI agent has shell access and can read/write your filesystem, you're defending against: 13 + 14 + 1. **Agent mistakes** — insecure code because the model optimizes for "works" over "works safely" 15 + 2. **Prompt injection via codebase** — malicious content in deps, docs, or data that the agent reads and acts on 16 + 3. **Secret leakage** — the agent reads .env files, logs secrets, or commits them 17 + 4. **Blast radius** — a bad agent action affects more than it should 18 + 5. **Supply chain** — the agent adds unvetted dependencies 19 + 20 + ## Layer 1: Restrict Agent Permissions 21 + 22 + Configure your agent tool to allow the dev loop (read, write, test, lint) and deny anything with blast radius beyond the project. Key principles: 23 + 24 + - **Deny git push and commit** — you review diffs and commit manually 25 + - **Deny rm -rf** — allow deleting specific files, not nuking directories 26 + - **Deny pipe-to-shell** — no `curl | sh`, no `wget | sh` 27 + - **Deny sudo** — the agent never needs system-level access 28 + - **Allow test/build/lint** — these are the feedback loops the agent needs 29 + 30 + Adapt the allow list to your language/toolchain. 31 + 32 + ## Layer 2: Secret Hygiene 33 + 34 + - Ensure .gitignore covers: `.env`, `.env.*`, `*.pem`, `*.key`, `*.p12`, `*.pfx`, `secrets/`, `credentials/` 35 + - NEVER let the agent create .env files — create them yourself 36 + - Use .env.example with placeholders that the agent CAN see 37 + - Install a pre-commit hook for secret detection (scan staged files for patterns like `PRIVATE.KEY`, `BEGIN RSA`, `password\s*=`, `api_key\s*=`, `AWS_SECRET`, `ghp_`, `sk-`, etc.) 38 + 39 + ## Layer 3: Dependency Vetting 40 + 41 + When adding a new dependency: 42 + 1. State WHY it's needed 43 + 2. Show: name, version, weekly downloads, last publish date, maintainer count 44 + 3. Run `cargo audit` / `npm audit` / `pip audit` 45 + 4. If <100 stars or no updates in 6 months, FLAG and wait for human approval 46 + 47 + ## Layer 4: Blast Radius Limiting 48 + 49 + - Work on feature branches, never main/master directly 50 + - The agent cannot push — you review and push 51 + - Use `git add -p` to stage changes selectively 52 + - Squash-merge feature branches for atomic reverts 53 + - The agent should NEVER run database migrations without explicit approval 54 + - Destructive migrations (DROP, DELETE, ALTER removing columns) require human review of the SQL before execution
+44
agents/skills/security-review/SKILL.md
··· 1 + --- 2 + name: security-review 3 + description: Security audit checking input validation, auth, secrets, dependencies, crypto, data exposure, and resource exhaustion. Trigger after implementing auth/authz, input handling, crypto, dependency changes, or before deployment. 4 + --- 5 + 6 + # Security Review 7 + 8 + Run this in a fresh context after the general adversarial review, or as a standalone pass. Use a different model if possible. 9 + 10 + ## Instructions 11 + 12 + You are a security auditor reviewing code for deployment. 13 + Assume all inputs are hostile. Assume the network is hostile. 14 + Assume dependencies are compromised until proven otherwise. 15 + 16 + For the following code, check: 17 + 18 + 1. **Input validation**: Is ALL external input validated? Are there injection vectors (SQL, command, path, template, header)? 19 + 2. **Authentication/Authorization**: Are there paths that skip auth checks? Are tokens validated properly? Are there privilege escalation paths? 20 + 3. **Secrets management**: Are credentials, keys, or tokens hardcoded or logged? Are they exposed in error messages? 21 + 4. **Dependency risk**: Are there dependencies with known CVEs? Unmaintained packages? Suspicious transitive deps? 22 + 5. **Cryptography**: Is crypto used correctly? Weak algorithms, hardcoded IVs, missing MACs, improper key derivation? 23 + 6. **Data exposure**: Could error messages, logs, or API responses leak sensitive information? 24 + 7. **Resource exhaustion**: Can an attacker cause unbounded memory allocation, CPU usage, or disk writes? 25 + 8. **Deserialization**: Is untrusted data deserialized? Are there gadget chains? 26 + 27 + ## Output Format 28 + 29 + For each finding: 30 + - **SEVERITY**: CRITICAL / HIGH / MEDIUM / LOW 31 + - **CWE**: [CWE number if applicable] 32 + - **LOCATION**: exact file and function 33 + - **ATTACK**: how an attacker would exploit this 34 + - **FIX**: specific remediation 35 + 36 + No hedging. No "you might consider." Concrete findings only. 37 + 38 + ## When to Run 39 + 40 + 1. After implementing any auth/authz logic — every time, no exceptions 41 + 2. After implementing any input handling — forms, API endpoints, file parsing 42 + 3. After implementing any crypto — even if it's "just calling a library" 43 + 4. After adding or updating dependencies 44 + 5. Before any deployment — full security pass on the diff since last deploy
+30
agents/skills/spec-gap-analysis/SKILL.md
··· 1 + --- 2 + name: spec-gap-analysis 3 + description: Analyze spec/design documents for ambiguities, missing edge cases, implicit assumptions, contradictions, and undefined interfaces. Trigger before implementing anything, when specs exist but haven't been vetted. 4 + --- 5 + 6 + # Spec Gap Analysis 7 + 8 + Run this against spec/design documents before implementing anything. 9 + 10 + ## Instructions 11 + 12 + Read all spec/design documents in [SPEC_PATH]. 13 + 14 + Produce a gap analysis. For each component or feature, identify: 15 + 16 + 1. **Ambiguous requirements** — language that could be interpreted multiple ways 17 + 2. **Missing edge cases** — boundary conditions, degenerate inputs, failure modes not addressed 18 + 3. **Implicit assumptions** — things the spec relies on but doesn't state 19 + 4. **Contradictions** — places where different parts of the spec conflict 20 + 5. **Undefined interfaces** — inputs, outputs, or error conditions not specified 21 + 6. **Missing error handling** — what happens when things go wrong? 22 + 23 + ## Output Format 24 + 25 + Group findings by component. Rate each finding: 26 + - **HIGH** — blocks implementation 27 + - **MEDIUM** — will cause bugs 28 + - **LOW** — cleanup 29 + 30 + Do NOT fix anything. Do NOT suggest implementations. Just find the gaps.
-328
agents/skills/task-execution/SKILL.md
··· 1 - --- 2 - name: task 3 - description: Start and track work on a task. Use when the user says "lets work on...", "start task", or invokes /task. Creates TASK.md and PLAN.md for requirements and implementation tracking, handles branch setup, creates worktrees, and pushes incremental progress to draft MRs. 4 - allowed-tools: 5 - - Read(TASK.md) 6 - - Read(PLAN.md) 7 - - Read(.worktree.yml) 8 - - Read(.claude/worktree.yml) 9 - - Write(TASK.md) 10 - - Write(PLAN.md) 11 - - Edit(TASK.md) 12 - - Edit(PLAN.md) 13 - - Bash(git:*) 14 - - Bash(~/.claude/skills/task-execution/worktree-setup.sh) 15 - - Bash(mkdir -p:*) 16 - - AskUserQuestion 17 - - TodoWrite 18 - --- 19 - 20 - # Task Execution 21 - 22 - Manage task lifecycle from start to completion using two files: 23 - - **TASK.md**: Requirements, context, questions, decisions 24 - - **PLAN.md**: Implementation steps with progress tracking 25 - 26 - --- 27 - 28 - ## Input 29 - 30 - Accept any combination of: 31 - - **Jira ticket** (e.g., `ADE-123`) 32 - - **Task description** (e.g., "add logout button") 33 - - **Both** 34 - 35 - --- 36 - 37 - ## Mode Detection 38 - 39 - Determine mode based on current directory: 40 - 41 - **Single-repo mode**: `git rev-parse --show-toplevel` succeeds 42 - - Working on one repository 43 - - `.tasks/` folder created in repo root 44 - - Worktree source is the repo itself 45 - 46 - **Multi-repo mode**: Not inside a git repo 47 - - Working across multiple repositories 48 - - `.tasks/` folder created in current directory 49 - - CLAUDE.md should specify repos location (e.g., `Repos: ./zapier/`) 50 - - Repos are listed in TASK.md during planning 51 - 52 - --- 53 - 54 - ## Phase 1: Task Setup 55 - 56 - ### 1. Detect Mode 57 - 58 - ```bash 59 - git rev-parse --show-toplevel # Success = single-repo, failure = multi-repo 60 - ``` 61 - 62 - ### 2. Gather Context 63 - 64 - - If Jira ticket provided: fetch it for context 65 - - Check for uncommitted changes in current directory 66 - - If uncommitted changes exist: STOP and ask how to handle them 67 - 68 - ### 3. Create Task Folder 69 - 70 - Generate task name: `{ticket-lower}-{short-description}` (e.g., `ade-123-add-logout`) 71 - 72 - ```bash 73 - mkdir -p .tasks/{task-name} 74 - ``` 75 - 76 - ### 4. Create TASK.md 77 - 78 - ```markdown 79 - # {Task Title} 80 - 81 - **Ticket**: {TICKET-NUM or N/A} 82 - **Branch**: {branch-name} 83 - 84 - ## Goal 85 - 86 - {What are we trying to achieve?} 87 - 88 - ## Requirements 89 - 90 - - {From ticket, discussion, or inferred} 91 - 92 - ## Repos 93 - 94 - {For multi-repo: list repos and what changes in each} 95 - {For single-repo: omit this section} 96 - 97 - ## Constraints 98 - 99 - - {Technical constraints, deadlines, scope limits} 100 - 101 - ## Open Questions 102 - 103 - - {Anything unclear that needs resolution} 104 - 105 - ## Decisions 106 - 107 - - {Decisions made during planning/implementation} 108 - ``` 109 - 110 - --- 111 - 112 - ## Phase 2: Planning 113 - 114 - Iterate on TASK.md until requirements are clear, then create PLAN.md. 115 - 116 - ### 1. Understand the Work 117 - 118 - - Explore relevant code 119 - - Identify affected areas and dependencies 120 - - Resolve open questions through discussion 121 - - For multi-repo: identify which repos are involved, update `## Repos` section 122 - 123 - ### 2. Create PLAN.md 124 - 125 - ```markdown 126 - # Implementation Plan 127 - 128 - ## Approach 129 - 130 - {High-level approach and key decisions} 131 - 132 - ## Steps 133 - 134 - - [ ] Step 1: {description} 135 - - [ ] Step 2: {description} 136 - - [ ] Step 3: {description} 137 - 138 - ## Testing 139 - 140 - - {How we'll verify the implementation} 141 - 142 - ## Files to Modify 143 - 144 - - `path/to/file.py` - {what changes} 145 - ``` 146 - 147 - ### 3. Get Approval 148 - 149 - For non-trivial tasks, confirm the plan before proceeding. 150 - 151 - --- 152 - 153 - ## Phase 3: Implementation 154 - 155 - ### 1. Create Worktrees 156 - 157 - For each repo involved (or the single repo in single-repo mode): 158 - 159 - **Find the source repo**: 160 - - Single-repo: `git rev-parse --show-toplevel` from task folder parent 161 - - Multi-repo: Use repos location from CLAUDE.md 162 - 163 - **Create worktree**: 164 - ```bash 165 - # Fetch latest 166 - git -C {source-repo} fetch origin 167 - 168 - # Create worktree with new branch 169 - git -C {source-repo} worktree add .tasks/{task-name}/{repo-name} -b {branch-name} origin/main 170 - ``` 171 - 172 - Branch name: `{ticket-lower}-{short-description}` (same as task folder) 173 - 174 - **Run setup**: 175 - ```bash 176 - cd .tasks/{task-name}/{repo-name} 177 - ~/.claude/skills/task-execution/worktree-setup.sh 178 - ``` 179 - 180 - This copies files and runs setup commands per `.worktree.yml` in the source repo. 181 - 182 - ### 2. Create Draft MR 183 - 184 - Push each branch and create draft merge requests: 185 - ```bash 186 - git -C .tasks/{task-name}/{repo-name} push -u origin {branch-name} 187 - # Create draft MR via glab 188 - ``` 189 - 190 - ### 3. Track Progress 191 - 192 - - Use TodoWrite for real-time tracking 193 - - Update PLAN.md checkboxes as steps complete 194 - - Update TASK.md with decisions and resolved questions 195 - 196 - ### 4. Commit Incrementally 197 - 198 - Make small, focused commits as you complete logical units: 199 - - Commit code changes as they're done 200 - - Push to draft MR regularly for CI feedback 201 - - Update and commit PLAN.md progress periodically 202 - 203 - ### 5. Follow TDD 204 - 205 - 1. Write failing test 206 - 2. Confirm failure 207 - 3. Implement minimum to pass 208 - 4. Confirm success 209 - 5. Refactor if needed 210 - 211 - --- 212 - 213 - ## Phase 4: Completion 214 - 215 - ### 1. Final Updates 216 - 217 - - Mark all PLAN.md steps complete 218 - - Update TASK.md with any final decisions 219 - - Ensure all tests pass 220 - 221 - ### 2. Ready for Review 222 - 223 - - Push final commits 224 - - Mark MR(s) as ready (no longer draft) 225 - - Ask if any follow-up tasks should be created 226 - 227 - ### 3. Cleanup 228 - 229 - After merge, task folder can be removed: 230 - ```bash 231 - # Remove worktrees 232 - git worktree remove .tasks/{task-name}/{repo-name} 233 - 234 - # Remove task folder 235 - rm -rf .tasks/{task-name} 236 - ``` 237 - 238 - --- 239 - 240 - ## File Maintenance 241 - 242 - **TASK.md**: Living document for collaboration 243 - - Add decisions as they're made 244 - - Resolve and remove open questions 245 - - Keep requirements updated if scope changes 246 - 247 - **PLAN.md**: Implementation checklist 248 - - Check off steps as completed 249 - - Add steps if scope expands 250 - - Note blockers or changes to approach 251 - 252 - **Prune aggressively**: Remove obsolete information. These files should stay useful, not become history logs. 253 - 254 - --- 255 - 256 - ## Resuming Work 257 - 258 - When returning to an existing task: 259 - 260 - 1. Read TASK.md and PLAN.md to restore context 261 - 2. Check git status in worktrees for uncommitted work 262 - 3. Review where we left off in PLAN.md 263 - 4. Continue from there 264 - 265 - --- 266 - 267 - ## Worktree Setup Script 268 - 269 - The script `~/.claude/skills/task-execution/worktree-setup.sh` runs in a new worktree to: 270 - 271 - 1. Find the main worktree (source repo) 272 - 2. Read `.worktree.yml` or `.claude/worktree.yml` from source 273 - 3. Copy files listed in `copy:` section 274 - 4. Run commands listed in `setup:` section 275 - 276 - **Example .worktree.yml**: 277 - ```yaml 278 - copy: 279 - - .env 280 - - .env.local 281 - - .envrc 282 - - .tool-versions 283 - 284 - setup: 285 - - npm install 286 - - direnv allow 287 - ``` 288 - 289 - --- 290 - 291 - ## Directory Structure Examples 292 - 293 - **Single-repo**: 294 - ``` 295 - some-repo/ 296 - ├── .tasks/ 297 - │ └── ade-123-fix-bug/ 298 - │ ├── TASK.md 299 - │ ├── PLAN.md 300 - │ └── some-repo/ # worktree 301 - ├── .worktree.yml 302 - └── src/ 303 - ``` 304 - 305 - **Multi-repo** (CLAUDE.md specifies `Repos: ./zapier/`): 306 - ``` 307 - ~/projects/zapier/ 308 - ├── .tasks/ 309 - │ └── ade-123-add-logout/ 310 - │ ├── TASK.md 311 - │ ├── PLAN.md 312 - │ ├── zapier-web/ # worktree 313 - │ └── zapier-api/ # worktree 314 - └── zapier/ 315 - ├── zapier-web/ 316 - └── zapier-api/ 317 - ``` 318 - 319 - --- 320 - 321 - ## Subagent Coordination 322 - 323 - When using subagents for implementation: 324 - 325 - - Each subagent works in a specific worktree: `.tasks/{task-name}/{repo-name}/` 326 - - Subagents reference shared context: `../TASK.md` and `../PLAN.md` 327 - - Main agent tracks overall progress in PLAN.md 328 - - Each repo gets its own commits and draft MR
-96
agents/skills/task-execution/worktree-setup.sh
··· 1 - #!/bin/bash 2 - # 3 - # worktree-setup.sh 4 - # 5 - # Run from within a new git worktree to copy files and run setup commands 6 - # based on .worktree.yml configuration from the main worktree. 7 - # 8 - # Usage: worktree-setup.sh 9 - # 10 - # Configuration (.worktree.yml or .claude/worktree.yml in main worktree): 11 - # copy: 12 - # - .env 13 - # - .envrc 14 - # setup: 15 - # - npm install 16 - # - direnv allow 17 - 18 - set -e 19 - 20 - # Find main worktree 21 - main_git_dir=$(git rev-parse --git-common-dir 2>/dev/null) 22 - if [[ -z "$main_git_dir" ]]; then 23 - echo "Error: Not in a git repository" 24 - exit 1 25 - fi 26 - 27 - main_worktree=$(dirname "$main_git_dir") 28 - current_worktree=$(pwd) 29 - 30 - # Check if we're actually in a worktree (not the main repo) 31 - if [[ "$main_worktree" == "$current_worktree" ]]; then 32 - echo "Error: Already in main worktree, nothing to set up" 33 - exit 1 34 - fi 35 - 36 - # Find config file 37 - config="" 38 - for path in ".worktree.yml" ".claude/worktree.yml"; do 39 - if [[ -f "$main_worktree/$path" ]]; then 40 - config="$main_worktree/$path" 41 - break 42 - fi 43 - done 44 - 45 - if [[ -z "$config" ]]; then 46 - echo "No .worktree.yml found in main worktree, skipping setup" 47 - exit 0 48 - fi 49 - 50 - echo "Using config: $config" 51 - echo "Source: $main_worktree" 52 - echo "Target: $current_worktree" 53 - echo "" 54 - 55 - # Copy files 56 - echo "Copying files..." 57 - copied=0 58 - skipped=0 59 - while IFS= read -r file; do 60 - # Skip empty lines 61 - [[ -z "$file" ]] && continue 62 - 63 - src="$main_worktree/$file" 64 - dest="$current_worktree/$file" 65 - 66 - if [[ -f "$src" ]]; then 67 - # Create parent directory if needed 68 - mkdir -p "$(dirname "$dest")" 69 - cp "$src" "$dest" 70 - echo " Copied: $file" 71 - ((copied++)) 72 - else 73 - echo " Skipped (not found): $file" 74 - ((skipped++)) 75 - fi 76 - done < <(yq -r '.copy[]?' "$config" 2>/dev/null) 77 - 78 - echo "Copied $copied file(s), skipped $skipped" 79 - echo "" 80 - 81 - # Run setup commands 82 - echo "Running setup commands..." 83 - while IFS= read -r cmd; do 84 - # Skip empty lines 85 - [[ -z "$cmd" ]] && continue 86 - 87 - echo " Running: $cmd" 88 - if eval "$cmd"; then 89 - echo " Done" 90 - else 91 - echo " Warning: Command failed (continuing)" 92 - fi 93 - done < <(yq -r '.setup[]?' "$config" 2>/dev/null) 94 - 95 - echo "" 96 - echo "Worktree setup complete"
+66
agents/skills/task-workflow/SKILL.md
··· 1 + --- 2 + name: task-workflow 3 + description: Structured process for working on any substantial task — create TASK.md and PLAN.md, clarify requirements, execute step by step, clean up. Trigger for any non-trivial implementation work that benefits from tracked progress and explicit planning. 4 + --- 5 + 6 + # Task Workflow 7 + 8 + Structured process for working on any task, from requirements through cleanup. 9 + 10 + ## Setup 11 + 12 + Create a task folder at `.tasks/<task-slug>/` with two files: 13 + 14 + - `TASK.md` — requirements and decisions 15 + - `PLAN.md` — implementation plan and progress 16 + 17 + ### TASK.md Structure 18 + 19 + ``` 20 + # <Task Name> 21 + 22 + ## Goal 23 + [What needs to be true when this is done. 2-3 sentences max.] 24 + 25 + ## Requirements 26 + [Specific, unambiguous requirements. Collaborate with the user to clarify before writing code.] 27 + 28 + ## Decisions & Questions 29 + [Record decisions made during the task. Track open questions here until resolved.] 30 + ``` 31 + 32 + ### PLAN.md Structure 33 + 34 + ``` 35 + # Plan 36 + 37 + ## Steps 38 + - [ ] Step 1: [description] 39 + - Verify: [how to confirm this step is done correctly] 40 + - [ ] Step 2: [description] 41 + - Verify: [how to confirm] 42 + ... 43 + ``` 44 + 45 + Each step MUST be: 46 + - Small enough to complete and verify independently 47 + - Have explicit "how to verify" information 48 + - Checked off immediately upon completion — do not let progress markers go stale 49 + 50 + ## Execution 51 + 52 + 1. **Clarify first.** Collaborate with the user on TASK.md before writing any code. Don't assume requirements — ask. 53 + 2. **Plan before building.** Fill out PLAN.md with steps. Get user agreement on the approach. 54 + 3. **Work step by step.** Complete one step, update PLAN.md, pause for the user to verify. 55 + 4. **Test before implementing.** When fixing a bug or adding behavior, write a failing test that demonstrates the expected behavior FIRST. Confirm it fails. Then write the minimum code to make it pass. 56 + 5. **Keep TASK.md current.** Update decisions and questions as they're resolved during the task. 57 + 58 + ## Cleanup & Reflection 59 + 60 + When the task is complete: 61 + 62 + 1. **Integrate knowledge back.** Review any decisions, patterns, or conventions that emerged during the task. Update the project's evergreen documentation (README.md, ARCHITECTURE.md, CONSTRAINTS.md, DEVELOPMENT.md, etc.) with anything that should persist beyond this task. 63 + 2. **Update CLAUDE.md references.** If new documentation files were created or existing ones changed significantly, ensure CLAUDE.md still points to the right places. 64 + 3. **Remove the task folder.** Delete `.tasks/<task-slug>/` once everything worth keeping has been integrated into permanent docs. 65 + 66 + The goal: nothing valuable lives only in a task folder. Task folders are temporary workspaces. The project's documentation is the permanent record.
+29
agents/skills/tdd-implementation/SKILL.md
··· 1 + --- 2 + name: tdd-implementation 3 + description: Implement a feature or fix using strict test-driven development — failing tests first, minimum code to pass, then refactor. Trigger when implementing a specific feature or fixing a specific issue. 4 + --- 5 + 6 + # TDD Implementation 7 + 8 + Use this when implementing a specific feature or fixing a specific issue. Start a fresh session per task. 9 + 10 + ## Instructions 11 + 12 + Task: [DESCRIBE THE TASK — what needs to work, referencing the spec] 13 + 14 + Follow strict TDD in this exact sequence: 15 + 16 + 1. Read the relevant spec section: [SPEC_REFERENCE] 17 + 2. Read existing tests in [TEST_PATH] to understand current coverage 18 + 3. Write NEW failing tests that define the expected behavior for this task 19 + 4. Show the test output confirming they fail 20 + 5. Write the MINIMUM implementation to pass each test, one at a time 21 + 6. After all tests pass, run the FULL test suite to check for regressions 22 + 7. Only then, refactor if needed — tests must still pass after refactoring 23 + 24 + ## Rules 25 + 26 + - Tests must test BEHAVIOR described in the spec, not implementation details 27 + - Every error path in the spec needs a test 28 + - Do not write any implementation code before the relevant test exists and fails 29 + - If you discover a spec gap during implementation, STOP and flag it — do not guess