···11+# Agentic Engineering Toolkit
22+33+@README.md
44+55+66+77+## Task Tracking
88+99+Current tasks are tracked in `TODO.md`. Each task should have a clear definition of done.
+89
agents/GLOBAL_CONTEXT.md
···11+## About me
22+33+I prefer learning over raw task execution unless stated otherwise. Keep output short, concise, and direct.
44+55+---
66+77+## Our relationship
88+99+- We're colleagues — no hierarchy
1010+- Be direct and honest. No sycophancy
1111+- ALWAYS stop and ask rather than assume
1212+- Call out bad ideas, mistakes, and unreasonable expectations — I depend on this
1313+- Push back when you disagree, even if it's just a gut feeling
1414+- Speak up when you don't know something or we're in over our heads
1515+- Discuss architectural decisions together before implementation; routine fixes don't need discussion
1616+1717+## Proactiveness
1818+1919+- Default to doing the task while explaining key decisions
2020+- If I emphasize learning ("help me understand"), prioritize explanation over execution
2121+- For straightforward tasks, just do them with minimal explanation
2222+- Pause to confirm when: multiple valid approaches exist, action would delete/restructure code, you don't understand what's asked, or I ask "how should I approach X?"
2323+2424+## Documentation
2525+2626+Context is sacred. Keep it concise, high-signal, and focused on what matters most. Never duplicate information that lives elsewhere — point to it instead. Treat every line of documentation as carrying weight.
2727+2828+## Continuous improvement
2929+3030+Keep documentation evergreen. As work happens, update relevant docs immediately. Know what documentation exists in a project and ensure it stays accurate as things change.
3131+3232+---
3333+3434+## Verify claims
3535+3636+Search to verify information that changes rapidly or may have updated since 2024. When mentioning software, libraries, articles, or books, confirm they exist. For factual claims, seek primary sources over social media coverage.
3737+3838+---
3939+4040+## Coding principles
4141+4242+- YAGNI. Don't add features we don't need right now
4343+- Simple, maintainable solutions over clever ones. Readability is a primary concern
4444+- Make the SMALLEST reasonable changes. Never rewrite without explicit permission
4545+- Reduce code duplication, even if refactoring takes extra effort
4646+- Match surrounding code style. Consistency within a file trumps external standards
4747+- Get explicit approval before implementing backward compatibility
4848+- Fix broken things immediately when found
4949+5050+### Naming
5151+5252+Names tell what code does, not how it's implemented or its history:
5353+- NEVER use implementation details (e.g., "ZodValidator", "MCPWrapper")
5454+- NEVER use temporal context (e.g., "NewAPI", "LegacyHandler", "ImprovedInterface")
5555+- Good: `Tool`, `Registry`, `execute()` — Bad: `AbstractToolInterface`, `ToolRegistryManager`
5656+5757+### Comments
5858+5959+- Comments explain WHAT or WHY, never that something is "improved" or "new"
6060+- Never add temporal context ("recently refactored", "moved from")
6161+- Never remove comments unless provably false
6262+- If refactoring, remove old comments — don't add ones explaining the refactoring
6363+6464+---
6565+6666+## Testing
6767+6868+- **ALWAYS write a failing test before fixing a bug or implementing new behavior.** Confirm the test fails. Then write the minimum code to make it pass. No exceptions.
6969+- Tests must cover all functionality. Never delete a failing test — raise it with me
7070+- Never write tests that only test mocked behavior. No mocks in e2e tests
7171+- Test output must be pristine. Capture and validate expected errors
7272+7373+---
7474+7575+## Debugging
7676+7777+Always find root cause. Never fix symptoms or add workarounds.
7878+7979+1. Read error messages carefully — they often contain the solution
8080+2. Reproduce consistently before investigating
8181+3. Find working examples in the codebase and compare
8282+4. Form a single hypothesis, test minimally, verify before continuing
8383+5. If first fix doesn't work, re-analyze rather than adding more fixes
8484+8585+---
8686+8787+## Task execution
8888+8989+Use the `task-workflow` skill for all substantial work. Every task gets a `.tasks/<task-slug>/` folder with TASK.md (requirements) and PLAN.md (steps + progress). See the skill for the full process.
+42
agents/README.md
···11+# Agentic Engineering Toolkit
22+33+A minimal, portable toolkit of practices, templates, and skills for rigorous AI-assisted development. Built for personal use across any project and any CLI-based agent harness (Claude Code, pi, opencode, etc.).
44+55+The goal: have a single, complete setup that can be dropped into any project to enforce disciplined agentic engineering — TDD, adversarial review, fresh contexts, and human-in-the-loop commits — without reinventing the wheel each time.
66+77+## Workflow
88+99+1. **Init** — Set up a repo with `context-init` (project documentation) and `agent-container` (isolated dev environment)
1010+2. **Work** — Launch the agent container with a task, follow the core loop: spec, failing tests, minimal implementation, adversarial review, commit
1111+3. **Maintain** — Review state of work, clean up, update documentation
1212+1313+## Structure
1414+1515+- `skills/` — Portable markdown instruction sets. Each is a self-contained practice (with a `SKILL.md`) that can be loaded into any agent harness.
1616+- `bin/` — Standalone scripts (e.g., adversarial review via a second model's API).
1717+1818+## The Core Loop
1919+2020+```
2121+ Spec ──→ Failing Tests ──→ Minimal Implementation ──→ All Tests Pass
2222+ ↑ │
2323+ │ ↓
2424+ │ Adversarial Review
2525+ │ (fresh context, hostile)
2626+ │ │
2727+ │ ┌─── hallucinating? ──→ DONE (converged) ───→ Commit
2828+ │ │
2929+ └── real findings ──→ Feed back to Builder (fresh session)
3030+```
3131+3232+**Exit condition:** When the adversary starts inventing problems that don't exist in the code, you've converged. Ship it.
3333+3434+## Principles
3535+3636+1. **Specs are truth.** Tests serve specs. Code serves tests.
3737+2. **Tests before code.** No implementation without a failing test.
3838+3. **Fresh context per task.** Long sessions degrade. Commit and restart.
3939+4. **Different model for review.** Cognitive diversity catches blind spots.
4040+5. **Fresh context per review.** No relationship drift. Every roast is the first roast.
4141+6. **Hallucination = exit signal.** When the adversary makes things up, you're done.
4242+7. **You commit manually.** The agent proposes. You dispose.
+1
agents/TODO.md
···11+- Test out the adversarial review skill on this repo, in the container
···11+---
22+name: adversarial-review
33+description: Hostile code review that finds logic errors, test gaps, security issues, and placeholder fraud. Trigger after implementation passes all tests, ideally in a fresh context with a different model.
44+---
55+66+# Adversarial Review
77+88+Run this in a FRESH context after implementation passes tests. Use a different model from the one that wrote the code if possible.
99+1010+## Instructions
1111+1212+You are a hostile code reviewer with zero tolerance for slop. You are not here to be helpful or encouraging. You are here to find every flaw.
1313+1414+Review the following code. For every issue you find, provide:
1515+- **SEVERITY**: CRITICAL / HIGH / MEDIUM / LOW
1616+- **LOCATION**: exact file and line/function
1717+- **FLAW**: what's wrong, specifically
1818+- **FIX**: what should be done instead
1919+2020+## Categories to Check
2121+2222+1. **Logic errors** — does the code actually do what it claims?
2323+2. **Test quality** — would these tests pass even if the implementation were subtly wrong? Tautological tests? Tests that mock so aggressively they don't test anything real?
2424+3. **Error handling** — generic catches that swallow context? Missing error paths?
2525+4. **Security** — input validation gaps? Injection vectors? Auth assumptions? Secrets in code? Unsafe deserialization? Path traversal? Command injection?
2626+5. **Hidden coupling** — does this code depend on things not in its interface?
2727+6. **Resource management** — missing cleanup? Unclosed handles? Unbounded allocations?
2828+7. **Race conditions** — shared mutable state? Missing synchronization?
2929+8. **Placeholder fraud** — TODO comments, stub implementations, "will implement later" that shipped as-is
3030+3131+## Rules
3232+3333+No preamble. No "overall this looks good." Start with the worst finding.
3434+If you genuinely cannot find real problems, say "NO FINDINGS" and nothing else.
3535+Do not invent problems. Do not nitpick style unless it causes bugs.
3636+3737+## Exit Condition
3838+3939+When the adversary starts inventing problems that don't exist in the code, you've converged. Ship it.
···11+#!/usr/bin/env bash
22+# adversarial-review.sh — Adversarial code review via a second model using pi
33+#
44+# Usage:
55+# ./adversarial-review.sh src/auth.rs src/auth_test.rs
66+# ./adversarial-review.sh src/api/ # reviews all files in directory
77+# ./adversarial-review.sh --security src/crypto.rs # security-focused review
88+#
99+# Configuration:
1010+# ADVERSARY_MODEL — model to use (default: google-gemini-cli/gemini-2.5-flash)
1111+# ADVERSARY_TOOLS — comma-separated tool list (default: read)
1212+#
1313+# Requires: pi (https://pi.dev)
1414+#
1515+# The script always uses a fresh context (--no-session).
1616+# This is intentional — it prevents relationship drift.
1717+1818+set -euo pipefail
1919+2020+# --- Configuration ---
2121+MODEL="${ADVERSARY_MODEL:-google-gemini-cli/gemini-2.5-flash}"
2222+TOOLS="${ADVERSARY_TOOLS:-read}"
2323+MODE="general"
2424+2525+# --- Parse flags ---
2626+FILES=()
2727+for arg in "$@"; do
2828+ case "$arg" in
2929+ --security) MODE="security" ;;
3030+ --help|-h)
3131+ head -16 "$0" | grep '^#' | sed 's/^# \?//'
3232+ exit 0
3333+ ;;
3434+ *) FILES+=("$arg") ;;
3535+ esac
3636+done
3737+3838+if [ ${#FILES[@]} -eq 0 ]; then
3939+ echo "Usage: $0 [--security] <file-or-directory> ..." >&2
4040+ exit 1
4141+fi
4242+4343+# --- Validate targets exist ---
4444+TARGETS=()
4545+for target in "${FILES[@]}"; do
4646+ if [ -e "$target" ]; then
4747+ TARGETS+=("$(cd "$(dirname "$target")" && pwd)/$(basename "$target")")
4848+ else
4949+ echo "Warning: $target not found, skipping" >&2
5050+ fi
5151+done
5252+5353+if [ ${#TARGETS[@]} -eq 0 ]; then
5454+ echo "No valid files or directories. Check your paths." >&2
5555+ exit 1
5656+fi
5757+5858+# --- Select system prompt ---
5959+if [ "$MODE" = "security" ]; then
6060+SYSTEM_PROMPT='You are a security auditor. Assume all inputs are hostile. Assume the network is hostile.
6161+6262+Read the provided files and check:
6363+1. Input validation: injection vectors (SQL, command, path, template, header)
6464+2. Auth: paths that skip auth, privilege escalation, token validation
6565+3. Secrets: hardcoded credentials, secrets in logs or error messages
6666+4. Dependencies: known CVEs, unmaintained packages
6767+5. Crypto: weak algorithms, hardcoded IVs, missing MACs
6868+6. Data exposure: sensitive info in errors, logs, API responses
6969+7. Resource exhaustion: unbounded allocation, CPU, disk
7070+8. Deserialization: untrusted data deserialized unsafely
7171+7272+For each finding provide:
7373+- SEVERITY: CRITICAL / HIGH / MEDIUM / LOW
7474+- CWE: number if applicable
7575+- LOCATION: file and function
7676+- ATTACK: how to exploit it
7777+- FIX: specific remediation
7878+7979+No hedging. Concrete findings only. If no real issues, say NO FINDINGS.'
8080+else
8181+SYSTEM_PROMPT='You are a hostile code reviewer with zero tolerance. No preamble. No encouragement.
8282+8383+For every issue provide:
8484+- SEVERITY: CRITICAL / HIGH / MEDIUM / LOW
8585+- LOCATION: file and line/function
8686+- FLAW: what is wrong
8787+- FIX: what to do instead
8888+8989+Check: logic errors, test quality (would tests pass with a subtly wrong implementation?),
9090+error handling, security, hidden coupling, resource management, race conditions,
9191+placeholder fraud (TODOs shipped as implementation).
9292+9393+Start with the worst finding. Do not invent problems. Do not nitpick style unless it causes bugs.
9494+If genuinely no issues, say NO FINDINGS.'
9595+fi
9696+9797+# --- Build the user prompt ---
9898+FILE_LIST=$(printf '%s\n' "${TARGETS[@]}")
9999+USER_PROMPT="Review the following files:
100100+101101+${FILE_LIST}"
102102+103103+# --- Run review ---
104104+echo "Adversarial review via pi (model: $MODEL, tools: $TOOLS)..." >&2
105105+106106+pi -p \
107107+ --no-session \
108108+ --model "$MODEL" \
109109+ --tools "$TOOLS" \
110110+ --system-prompt "$SYSTEM_PROMPT" \
111111+ "$USER_PROMPT"
+217
agents/skills/agent-container/SKILL.md
···11+---
22+name: agent-container
33+description: Set up a sandboxed Docker container for running an AI coding agent (pi) against the current project. Trigger when starting a new project that needs an isolated agent environment, or when adding agent infrastructure to an existing project.
44+---
55+66+# Agent Container
77+88+Set up a Docker container for running an AI coding agent (pi) against the current project. The container provides a sandboxed environment where the agent can operate freely in yolo mode — auto-approving all actions — without risk to the host system.
99+1010+## What This Produces
1111+1212+All files go in a `.agent/` directory at the project root:
1313+1414+- `.agent/Dockerfile` — a container image with pi, project dependencies, and tooling
1515+- `.agent/docker-compose.yml` — compose file that volume-mounts the project and runs the agent
1616+- `.agent/secrets` — key=value mapping of env vars to 1Password `op://` references (committed, no actual secrets)
1717+- `.agent/run.sh` — one-command script to resolve secrets, build (if needed), and launch the agent
1818+- `.dockerignore` additions (if needed)
1919+2020+## Process
2121+2222+### Step 1: Analyze the Project
2323+2424+Before writing anything, examine the project to determine:
2525+2626+- **Language(s) and runtimes** — what needs to be installed (e.g., Node, Python, Go, Rust)
2727+- **Package manager** — npm, pnpm, yarn, pip, cargo, etc.
2828+- **System dependencies** — anything the project needs beyond the language runtime (e.g., database clients, native libraries)
2929+- **Test runner** — how tests are run, since the agent will need to run them
3030+- **Linter/formatter** — same reasoning
3131+3232+Look at: `package.json`, `pyproject.toml`, `Cargo.toml`, `go.mod`, `Makefile`, `Dockerfile` (existing), CI config, and the project's development documentation.
3333+3434+### Step 2: Choose a Base Image
3535+3636+Pick the simplest base image that covers the project's needs:
3737+3838+- Node project → `node:<version>-slim`
3939+- Python project → `python:<version>-slim`
4040+- Multi-language → `ubuntu:latest` or `debian:bookworm-slim` with manual installs
4141+- If the project already has a Dockerfile, use the same base image or a compatible one
4242+4343+Always prefer `-slim` variants to keep the image small.
4444+4545+### Step 3: Write the Dockerfile
4646+4747+Create the `.agent/` directory and write `.agent/Dockerfile`:
4848+4949+```dockerfile
5050+FROM <base-image>
5151+5252+# System dependencies
5353+RUN apt-get update && apt-get install -y --no-install-recommends \
5454+ git \
5555+ curl \
5656+ <project-specific-system-deps> \
5757+ && rm -rf /var/lib/apt/lists/*
5858+5959+# Install Node (if not already in base image) — required for pi
6060+# Use the NodeSource setup or official node image as base
6161+RUN npm install -g @mariozechner/pi-coding-agent
6262+6363+# Project dependencies
6464+WORKDIR /workspace
6565+COPY package.json package-lock.json ./ # (adapt for the project's package manager)
6666+RUN npm install # (adapt for the project's package manager)
6767+6868+# The rest of the project is volume-mounted at runtime, not copied
6969+```
7070+7171+Key principles:
7272+- **pi is always installed globally via npm** — this is non-negotiable
7373+- **Node must be available** — even if the project isn't a Node project, Node is needed for pi
7474+- **Project deps are installed in the image** for faster startup, but source code is volume-mounted
7575+- **Git must be installed** — the agent uses it heavily
7676+- **Keep layers cacheable** — deps before source code
7777+- **Build context is the project root** (set in compose), so `COPY` paths are relative to the project root, not `.agent/`
7878+7979+### Step 4: Write the Compose File
8080+8181+Write `.agent/docker-compose.yml`:
8282+8383+```yaml
8484+services:
8585+ agent:
8686+ build:
8787+ context: ..
8888+ dockerfile: .agent/Dockerfile
8989+ volumes:
9090+ - ..:/workspace
9191+ - node_modules:/workspace/node_modules # (adapt: preserve installed deps)
9292+ - ${PI_AGENT_DIR}/skills:/root/.pi/agent/skills
9393+ - ${PI_AGENT_DIR}/auth.json:/root/.pi/agent/auth.json
9494+ - ${PI_AGENT_DIR}/AGENTS.md:/root/.pi/agent/AGENTS.md
9595+ - ${PI_AGENT_DIR}/bin:/root/.pi/agent/bin
9696+ working_dir: /workspace
9797+ stdin_open: true
9898+ tty: true
9999+ command: pi
100100+101101+volumes:
102102+ node_modules: # (adapt for the project's dependency cache)
103103+```
104104+105105+Key principles:
106106+- **Build context is `..` (the project root)** so `COPY` in the Dockerfile can access project files
107107+- **Volume-mount the project** so changes persist on the host
108108+- **Use a named volume for dependency directories** (node_modules, venv, target, etc.) to avoid overwriting the installed deps from the image build
109109+- **Secrets are passed at runtime by the run script** — never bake them into the image or compose file
110110+- **stdin_open and tty are required** — pi needs an interactive terminal
111111+- **Mount specific pi agent config** so the container picks up skills, auth, AGENTS.md, and bin — but not sessions or other host-only state
112112+- **Default command is `pi`** so the run script drops straight into the agent
113113+114114+### Step 5: Mount Pi Agent Config
115115+116116+The agent inside the container needs access to specific parts of the host's pi agent configuration. Mount only what the agent needs — not the entire `~/.pi/agent/` directory (which contains sessions and other host-only state).
117117+118118+**Mounting:** Add volume mounts in `docker-compose.yml` for each item:
119119+120120+```yaml
121121+volumes:
122122+ - ..:/workspace
123123+ - node_modules:/workspace/node_modules
124124+ - ${PI_AGENT_DIR}/skills:/root/.pi/agent/skills
125125+ - ${PI_AGENT_DIR}/auth.json:/root/.pi/agent/auth.json
126126+ - ${PI_AGENT_DIR}/AGENTS.md:/root/.pi/agent/AGENTS.md
127127+ - ${PI_AGENT_DIR}/bin:/root/.pi/agent/bin
128128+```
129129+130130+**In the run script**, export `PI_AGENT_DIR="${HOME}/.pi/agent"` before invoking compose.
131131+132132+### Step 6: Write the Secrets File
133133+134134+Create `.agent/secrets` — a plain key=value file mapping environment variable names to 1Password secret references:
135135+136136+```
137137+ANTHROPIC_API_KEY=op://Personal/Anthropic API/credential
138138+GITHUB_TOKEN=$GITHUB_TOKEN
139139+```
140140+141141+Two formats are supported:
142142+- `KEY=op://vault/item/field` — fetched from 1Password CLI at runtime
143143+- `KEY=$ENV_VAR` — read from a host environment variable
144144+145145+This file is committed to the repo. It contains no secrets — only pointers to where secrets live. Ask the user which secrets the agent needs and what their references are.
146146+147147+Lines starting with `#` are comments. Blank lines are ignored.
148148+149149+### Step 7: Write the Run Script
150150+151151+Copy the reference implementation from this skill into the project:
152152+153153+ cp <path-to-this-skill>/run.sh .agent/run.sh
154154+ chmod +x .agent/run.sh
155155+156156+The reference `run.sh` (located alongside this SKILL.md) handles skills resolution and secret injection. Adapt it if the project needs additional setup before launching the agent.
157157+158158+Key principles:
159159+- **1Password CLI (`op`) is the primary secret source** — secrets are fetched at runtime via `op read`
160160+- **Falls back to host environment variables** — if `op` isn't installed or the read fails, the script checks for a matching env var on the host
161161+- **Warns on missing secrets** — so you know immediately if something isn't configured
162162+- **Secrets are passed as `-e` flags** to `docker compose run`, so the compose file doesn't need to know about specific secrets
163163+- **Uses `SCRIPT_DIR`** so it works from any working directory
164164+- **`--build` flag** rebuilds the image if the Dockerfile changed, otherwise uses cache (fast)
165165+- **`--rm` flag** cleans up the container after exit
166166+- **Passes `"$@"`** so the user can override the command (e.g., `.agent/run.sh bash` to get a shell)
167167+168168+### Step 8: Update .dockerignore
169169+170170+If `.dockerignore` doesn't already cover these, add:
171171+172172+```
173173+.env
174174+.env.*
175175+node_modules/
176176+```
177177+178178+### Step 9: Document Usage
179179+180180+Add a section to the project's README or development docs explaining:
181181+182182+```
183183+## Running the AI Agent
184184+185185+Secrets are resolved from `.agent/secrets` via 1Password CLI (`op`).
186186+Make sure you're signed in (`op signin`) before running.
187187+188188+Launch the containerized agent:
189189+190190+ .agent/run.sh
191191+192192+If you don't use 1Password, export the required env vars instead:
193193+194194+ export ANTHROPIC_API_KEY=your-key-here
195195+ .agent/run.sh
196196+197197+The agent (pi) starts in yolo mode inside a sandboxed container with full
198198+access to the project via volume mount. Changes the agent makes are
199199+reflected on your host filesystem.
200200+201201+To get a shell in the container instead:
202202+203203+ .agent/run.sh bash
204204+205205+To rebuild the image manually:
206206+207207+ docker compose -f .agent/docker-compose.yml build
208208+```
209209+210210+## Rules
211211+212212+- Always analyze the project first. Do not write a generic Dockerfile — tailor it to what the project actually needs.
213213+- Never bake secrets or API keys into the image.
214214+- Prefer slim base images.
215215+- All agent container files live in `.agent/` to keep the project root clean and avoid conflicts with existing Docker configuration.
216216+- If the project already has Docker configuration, study it and stay consistent with its patterns.
217217+- Ask the user before writing any files. Show drafts and get confirmation.
+39
agents/skills/agent-container/run.sh
···11+#!/usr/bin/env bash
22+set -euo pipefail
33+44+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
55+SECRETS_FILE="$SCRIPT_DIR/secrets"
66+77+# Resolve pi agent config directory
88+export PI_AGENT_DIR="${HOME}/.pi/agent"
99+1010+# Resolve secrets from .agent/secrets file
1111+# Supported formats:
1212+# KEY=op://vault/item/field — fetch from 1Password CLI
1313+# KEY=$ENV_VAR — read from host environment variable
1414+ENV_ARGS=()
1515+if [[ -f "$SECRETS_FILE" ]]; then
1616+ while IFS='=' read -r key ref; do
1717+ [[ -z "$key" || "$key" == \#* ]] && continue
1818+1919+ value=""
2020+ if [[ "$ref" == op://* ]]; then
2121+ if command -v op &>/dev/null && op read "$ref" &>/dev/null; then
2222+ value="$(op read "$ref")"
2323+ else
2424+ echo "Warning: could not resolve $key via op (is op installed and signed in?)" >&2
2525+ fi
2626+ elif [[ "$ref" == \$* ]]; then
2727+ varname="${ref#\$}"
2828+ value="${!varname:-}"
2929+ fi
3030+3131+ if [[ -z "$value" ]]; then
3232+ echo "Warning: could not resolve $key" >&2
3333+ else
3434+ ENV_ARGS+=(-e "$key=$value")
3535+ fi
3636+ done <"$SECRETS_FILE"
3737+fi
3838+3939+docker compose -f "$SCRIPT_DIR/docker-compose.yml" run --rm --build "${ENV_ARGS[@]}" agent "$@"
+94
agents/skills/codebase-rescue/SKILL.md
···11+---
22+name: codebase-rescue
33+description: Stabilize a messy, mid-goal codebase through structured audit, characterization tests, and boundary fixes. Trigger when a project is partially working with drifting architecture or broken integration points.
44+---
55+66+# Codebase Rescue
77+88+Use this when a codebase is mid-goal and becoming messy. This skill walks through a structured process to stabilize before resuming development.
99+1010+## When to Use
1111+1212+You have a codebase that's mid-goal. Some parts work, some don't. The architecture may be drifting. Integration points are broken. The instinct is to keep pushing forward or to rewrite — both are usually wrong. The right move is: stop, map, stabilize, resume with discipline.
1313+1414+---
1515+1616+## Phase 0: Stop the Bleeding
1717+1818+Before anything else:
1919+2020+1. Commit everything as-is on a branch and tag it — this is the "before" snapshot.
2121+2. Write down (in plain English) what the system is supposed to do. Not how — what. 3-5 sentences. This is the north star.
2222+3. Write down what's currently broken, from memory. Don't investigate yet. Just dump what you know.
2323+2424+## Phase 1: Audit
2525+2626+Analyze the codebase structure. For each module/component, determine:
2727+- What it's supposed to do (infer from code, comments, any specs)
2828+- What it actually does right now (working, partial, broken, stub)
2929+- What tests exist and whether they pass
3030+- What its dependencies are
3131+3232+Identify every integration point between components. For each boundary:
3333+- Is the contract (types, expected behavior) defined clearly?
3434+- Do both sides agree on the contract?
3535+- Is there a test that verifies this boundary works?
3636+3737+Produce a prioritized issue list:
3838+- **CRITICAL**: broken and blocks other work
3939+- **HIGH**: will break under real usage
4040+- **MEDIUM**: works but fragile or wrong
4141+- **LOW**: tech debt that can wait
4242+4343+Output this as a TODO.md. Do NOT fix anything yet.
4444+4545+## Phase 2: Stabilize What Works
4646+4747+Before fixing broken things, lock down what currently works. For each working component:
4848+4949+1. Read the code
5050+2. Write tests that capture its current behavior (happy path and error handling)
5151+3. These are characterization tests — they document what the code does NOW
5252+4. Run the tests. If any fail, the code doesn't work the way you think — flag that
5353+5454+This gives you a safety net for when you start fixing broken integration points.
5555+5656+## Phase 3: Fix Integration Boundaries
5757+5858+Work through broken boundaries from TODO.md one at a time:
5959+6060+- One boundary per session. Fresh context each time.
6161+- Write the integration test FIRST (TDD)
6262+- Fix the minimum code to make it pass
6363+- Run the FULL test suite after each fix — if something regressed, fix it before moving on
6464+- Commit after each successful fix. Small, atomic commits.
6565+6666+If fixing a boundary reveals a spec problem, STOP. Fix the spec first.
6767+6868+## Phase 4: Adversarial Review
6969+7070+After fixing boundaries, run adversarial review on the changed code. Feed findings back to a fresh session for fixes. Repeat until convergence (adversary hallucinates or returns NO FINDINGS).
7171+7272+## Phase 5: Spec Reconciliation
7373+7474+Compare current codebase against specs. For each requirement, determine:
7575+- **IMPLEMENTED**: code exists and tests verify it
7676+- **PARTIAL**: code exists but incomplete or untested
7777+- **MISSING**: not implemented yet
7878+- **DIVERGED**: code does something different from spec
7979+8080+For DIVERGED items: is the code right or the spec? Flag for human decision — do not change either silently. Never leave specs and code in disagreement.
8181+8282+## Phase 6: Resume Normal Development
8383+8484+You now have specs that match reality, tests on everything that works, fixed integration boundaries, and a TODO.md with remaining work. Resume with TDD.
8585+8686+---
8787+8888+## Common Pitfalls
8989+9090+- **"Let me just rewrite this module."** No. Fix the contract, test it, move on. Rewrite later when stable.
9191+- **"I should refactor first."** No. Characterize with tests first. Then refactor with tests as safety net.
9292+- **"I'll fix all the boundaries at once."** No. One at a time. Commit between each.
9393+- **"The agent says this needs a major architectural change."** Maybe. But not during rescue. Stabilize first.
9494+- **"The specs are so out of date it's not worth reconciling."** Then you don't have specs — you have aspirational fiction. Phase 5 fixes that.
+159
agents/skills/context-init/SKILL.md
···11+---
22+name: context-init
33+description: Initialize project documentation (architecture, constraints, security, development practices) through a guided conversation. Trigger when setting up a new project or when a project lacks the documentation agents need to work effectively.
44+---
55+66+# Context Init
77+88+Initialize project documentation through a guided conversation. Creates the documentation that agents need to understand and work effectively in a project.
99+1010+## Categories
1111+1212+These are the important categories of information to capture about a project. Not every project needs all of them, and the depth should match the project's complexity.
1313+1414+- **Overview** — what the system does, who it's for, why it exists, how work is tracked
1515+- **Architecture** — components, data flow, key decisions and their rationale
1616+- **Constraints** — invariants, boundaries, non-negotiables
1717+- **Security** — security rules, secret hygiene, dependency vetting, restricted commands
1818+- **Development** — TDD discipline, testing, verification, code conventions
1919+2020+## Process
2121+2222+### Step 1: Understand the Project
2323+2424+Before writing anything, understand what already exists and what's needed. Ask:
2525+2626+- What does this system do? Who is it for? Why does it exist?
2727+- What's the tech stack?
2828+- How complex is the project? (rough sense of scale — solo script vs. multi-service system)
2929+- Is there existing documentation? Where does it live?
3030+- How is work tracked?
3131+3232+Use the answers to judge which categories need dedicated files vs. which can be folded into the README or hub file. A small project might only need a README and a hub file. A larger one might warrant separate files for each category.
3333+3434+### Step 2: Gather Information
3535+3636+Work through each relevant category one at a time. For each, ask targeted questions, draft the content, and confirm before writing. Skip categories the user has no answers for — sparse and accurate beats thorough and speculative.
3737+3838+**Architecture** — ask about:
3939+- Major components or modules, what each owns, interfaces, dependencies
4040+- How data flows between them (walk through a typical request/operation)
4141+- Key architectural decisions worth documenting with rationale
4242+4343+**Constraints** — ask about:
4444+- What must always be true (invariants)
4545+- What's explicitly out of scope (boundaries)
4646+- What decisions are final (non-negotiables)
4747+4848+**Security** — start with the defaults below, then ask about project-specific additions:
4949+- Any requirements beyond the defaults?
5050+- Additional restricted commands or patterns?
5151+- Compliance or regulatory concerns?
5252+- Secret patterns to watch for beyond the standard set?
5353+5454+**Development** — start with the defaults below, then ask about:
5555+- Test framework and how to run tests
5656+- Coverage expectations
5757+- Error handling patterns
5858+- Naming conventions (or "match surrounding code")
5959+- Linter/formatter and how to run them
6060+- Any other practices to enforce
6161+6262+### Step 3: Write Documentation
6363+6464+Write the documentation in whatever structure fits the project. Prefer separate files for substantial categories, but use judgment. Always create a hub file (`CLAUDE.md` or `AGENTS.md`, ask the user) that lists all documentation files and what information each contains. Do not use `@` file references — instead, list each file path with a brief description of its contents so agents know where to look for specific information.
6565+6666+Example hub file structure:
6767+6868+```
6969+# Project Name
7070+7171+## Documentation
7272+7373+- `README.md` — project overview, purpose, and getting started
7474+- `ARCHITECTURE.md` — component breakdown, data flow, key decisions
7575+- `CONSTRAINTS.md` — invariants, boundaries, non-negotiables
7676+- `SECURITY.md` — security rules, secret hygiene, restricted commands
7777+- `DEVELOPMENT.md` — TDD practices, testing, code conventions
7878+7979+## Task Tracking
8080+...
8181+```
8282+8383+Update the README with overview information rather than creating a separate overview file.
8484+8585+## Defaults
8686+8787+These defaults should be included in every project's documentation regardless of structure. They can be trimmed if the user says they're not relevant.
8888+8989+### Security Defaults
9090+9191+**Hard Rules:**
9292+- NEVER write secrets, API keys, tokens, or credentials into any file
9393+- NEVER commit .env files — check .gitignore before every commit
9494+- All user input must be validated and sanitized before use
9595+- All external API calls must use TLS
9696+- No shell commands that pipe untrusted input (no eval, no unquoted variables)
9797+- Dependencies: prefer well-maintained packages with recent releases — flag any dependency with <100 stars or no updates in 6+ months
9898+9999+**Restricted Commands — do NOT run:**
100100+- `rm -rf` on any path outside the project directory
101101+- Any command that modifies system configuration
102102+- Any command that installs system-level packages without asking first
103103+- Any command that accesses network resources not defined in this project
104104+- `curl | sh` or equivalent pipe-to-shell patterns
105105+106106+**Before Every Commit:**
107107+- Run the full test suite
108108+- Run the linter/formatter
109109+- Check `git diff --staged` for any secrets or credentials
110110+- Check that no new dependencies were added without justification
111111+112112+**Dependency Vetting** — when adding a new dependency:
113113+1. State WHY it's needed (what does it do that we can't do in <50 lines?)
114114+2. Show: name, version, weekly downloads, last publish date, maintainer count
115115+3. Run the relevant audit tool (`cargo audit` / `npm audit` / `pip audit`)
116116+4. If <100 stars or no updates in 6 months, FLAG IT and wait for human approval
117117+118118+**Secret Hygiene:**
119119+- NEVER let agents create .env files — create them yourself
120120+- Use .env.example with placeholder values that agents CAN see
121121+- Ensure .gitignore covers: `.env`, `.env.*`, `*.pem`, `*.key`, `*.p12`, `*.pfx`, `secrets/`, `credentials/`
122122+123123+### Development Defaults
124124+125125+**TDD Discipline — all implementation MUST follow strict TDD:**
126126+1. Write failing tests FIRST based on the spec for the current task
127127+2. Confirm tests fail before writing any implementation
128128+3. Write the MINIMUM code to pass each test
129129+4. Run the full test suite after each change
130130+5. Refactor only after all tests pass
131131+132132+Do NOT write implementation and tests simultaneously.
133133+Do NOT write tests that match implementation — tests must match the SPEC.
134134+135135+**Session Hygiene:**
136136+- Start each session by reading the relevant spec and existing tests
137137+- One task per session — if scope creeps, stop and decompose into new tasks
138138+- Before ending a session, summarize what was done and what's left
139139+- When context feels degraded, commit and start fresh
140140+141141+**Verification:**
142142+- Run the full test suite before and after every change
143143+- Run the linter/formatter before committing
144144+- After each feature, run adversarial review on the changed files
145145+- When fixing a bug, write a failing test that reproduces it before fixing
146146+147147+**Testing Requirements:**
148148+- Unit tests for all business logic
149149+- Integration tests for all component boundaries
150150+- Test error paths, not just happy paths
151151+- Test edge cases from the spec's edge case catalog (if present)
152152+153153+## Rules
154154+155155+- One category at a time. Don't overwhelm with all questions at once.
156156+- Draft content and show it before writing. Get explicit confirmation.
157157+- Use the user's words — don't over-formalize their answers.
158158+- Skip what they don't know or don't care about.
159159+- Match documentation depth to project complexity.
-60
agents/skills/dev-philosophy/SKILL.md
···11----
22-name: software-dev
33-description: Use when the agent is performing any direct software development implementation or planning tasks. Provides the principles and values that must be followed.
44----
55-# Software Development Philosophy
66-77-This skill defines how I approach software development. Apply these principles whenever working on code.
88-99-## Core Principle: Ruthless Simplicity
1010-1111-- Write the minimum code necessary to solve the problem
1212-- Every line must justify its existence
1313-- Minimize maintenance burden — less code means less to understand, review, debug
1414-- Do not over-engineer unless there is a clear, proven need
1515-- Avoid abstractions until they're obviously necessary
1616-1717-## Approach to New Work
1818-1919-1. **Clarify the problem first** — understand exactly what needs to be solved before writing any code
2020-2. **Design before implementing** — think through data structures, functions, and interfaces at a high level
2121-3. **Iterate on the design** — refine until the structure is solid and any unnecessary complexity is removed
2222-4. **Then implement** — only after the above steps are complete
2323-2424-## Testing
2525-2626-- Always prefer automated tests that verify behavior deterministically
2727-- Write code that is testable
2828-- Prefer tests that run against real conditions (real DB like sqlite, end-to-end) over heavy mocking
2929-- Keep tests themselves simple — they are code too and carry maintenance burden
3030-- Tests are part of "done"
3131-3232-## Documentation
3333-3434-- Avoid documentation that duplicates what exists in code
3535-- Point to the source of truth instead of recreating it
3636-- Documentation has maintenance burden — minimize it
3737-- Examples: Don't list routes in docs, link to route definitions; don't copy API schemas, reference the source
3838-3939-## Technology Preferences
4040-4141-- Prefer lightweight tools (sqlite over heavy databases)
4242-- Prefer simple HTML over heavy UI frameworks
4343-- Minimize client-side complexity
4444-- CLIs and JSON APIs are preferred interfaces
4545-- Avoid unnecessary dependencies
4646-4747-## Boundaries
4848-4949-- **Stay focused**: Only implement what is directly requested — no "helpful" extras or unrequested improvements
5050-- **No commits without permission**: Do not commit unless explicitly asked. A reminder that changes are ready to commit is acceptable.
5151-- **No pushing**: Never push to remote without explicit instruction
5252-- **Lint and format**: Ensure code passes linting and formatting before considering work complete
5353-5454-## What "Done" Looks Like
5555-5656-- The problem is solved with minimal code
5757-- Automated tests verify the behavior
5858-- Code is linted and formatted
5959-- Diff has been reviewed (by the user, not the agent)
6060-- Ready to commit, but waiting for explicit go-ahead
+31
agents/skills/integration-boundary-fix/SKILL.md
···11+---
22+name: integration-boundary-fix
33+description: Fix a broken integration between two components using TDD — write the integration test first, identify the contract mismatch, fix minimally. Trigger when two components fail to communicate correctly at their boundary.
44+---
55+66+# Integration Boundary Fix
77+88+Use this to fix a broken integration between two components. Run in a fresh session per boundary.
99+1010+## Instructions
1111+1212+We're fixing the integration between [COMPONENT_A] and [COMPONENT_B].
1313+1414+The spec says: [WHAT SHOULD HAPPEN]
1515+Currently: [WHAT ACTUALLY HAPPENS OR DOESN'T]
1616+1717+### Step 1: Write the Integration Test
1818+1919+Write an integration test that exercises this boundary according to the spec. The test should call component A's interface and verify that component B receives/produces the correct result. This test MUST FAIL right now.
2020+2121+### Step 2: Identify the Mismatch
2222+2323+What does A send vs what B expects? Show the specific types, formats, or protocols that disagree.
2424+2525+### Step 3: Fix the Minimum
2626+2727+Fix the MINIMUM code to make the integration test pass. Prefer changing the implementation to match the spec. If the spec is wrong, STOP and flag it — do not change the spec silently.
2828+2929+### Step 4: Verify
3030+3131+Run the full test suite. If anything else broke, fix those regressions before moving on.
+54
agents/skills/security-guardrails/SKILL.md
···11+---
22+name: security-guardrails
33+description: Apply security guardrails to a project using agentic development — agent permissions, secret hygiene, dependency vetting, and blast radius limiting. Trigger when setting up a new project for agent-assisted development or when hardening an existing project's agent workflow.
44+---
55+66+# Security Guardrails Setup
77+88+Use this to apply security guardrails to a project that uses agentic development.
99+1010+## Threat Model
1111+1212+When an AI agent has shell access and can read/write your filesystem, you're defending against:
1313+1414+1. **Agent mistakes** — insecure code because the model optimizes for "works" over "works safely"
1515+2. **Prompt injection via codebase** — malicious content in deps, docs, or data that the agent reads and acts on
1616+3. **Secret leakage** — the agent reads .env files, logs secrets, or commits them
1717+4. **Blast radius** — a bad agent action affects more than it should
1818+5. **Supply chain** — the agent adds unvetted dependencies
1919+2020+## Layer 1: Restrict Agent Permissions
2121+2222+Configure your agent tool to allow the dev loop (read, write, test, lint) and deny anything with blast radius beyond the project. Key principles:
2323+2424+- **Deny git push and commit** — you review diffs and commit manually
2525+- **Deny rm -rf** — allow deleting specific files, not nuking directories
2626+- **Deny pipe-to-shell** — no `curl | sh`, no `wget | sh`
2727+- **Deny sudo** — the agent never needs system-level access
2828+- **Allow test/build/lint** — these are the feedback loops the agent needs
2929+3030+Adapt the allow list to your language/toolchain.
3131+3232+## Layer 2: Secret Hygiene
3333+3434+- Ensure .gitignore covers: `.env`, `.env.*`, `*.pem`, `*.key`, `*.p12`, `*.pfx`, `secrets/`, `credentials/`
3535+- NEVER let the agent create .env files — create them yourself
3636+- Use .env.example with placeholders that the agent CAN see
3737+- Install a pre-commit hook for secret detection (scan staged files for patterns like `PRIVATE.KEY`, `BEGIN RSA`, `password\s*=`, `api_key\s*=`, `AWS_SECRET`, `ghp_`, `sk-`, etc.)
3838+3939+## Layer 3: Dependency Vetting
4040+4141+When adding a new dependency:
4242+1. State WHY it's needed
4343+2. Show: name, version, weekly downloads, last publish date, maintainer count
4444+3. Run `cargo audit` / `npm audit` / `pip audit`
4545+4. If <100 stars or no updates in 6 months, FLAG and wait for human approval
4646+4747+## Layer 4: Blast Radius Limiting
4848+4949+- Work on feature branches, never main/master directly
5050+- The agent cannot push — you review and push
5151+- Use `git add -p` to stage changes selectively
5252+- Squash-merge feature branches for atomic reverts
5353+- The agent should NEVER run database migrations without explicit approval
5454+- Destructive migrations (DROP, DELETE, ALTER removing columns) require human review of the SQL before execution
+44
agents/skills/security-review/SKILL.md
···11+---
22+name: security-review
33+description: Security audit checking input validation, auth, secrets, dependencies, crypto, data exposure, and resource exhaustion. Trigger after implementing auth/authz, input handling, crypto, dependency changes, or before deployment.
44+---
55+66+# Security Review
77+88+Run this in a fresh context after the general adversarial review, or as a standalone pass. Use a different model if possible.
99+1010+## Instructions
1111+1212+You are a security auditor reviewing code for deployment.
1313+Assume all inputs are hostile. Assume the network is hostile.
1414+Assume dependencies are compromised until proven otherwise.
1515+1616+For the following code, check:
1717+1818+1. **Input validation**: Is ALL external input validated? Are there injection vectors (SQL, command, path, template, header)?
1919+2. **Authentication/Authorization**: Are there paths that skip auth checks? Are tokens validated properly? Are there privilege escalation paths?
2020+3. **Secrets management**: Are credentials, keys, or tokens hardcoded or logged? Are they exposed in error messages?
2121+4. **Dependency risk**: Are there dependencies with known CVEs? Unmaintained packages? Suspicious transitive deps?
2222+5. **Cryptography**: Is crypto used correctly? Weak algorithms, hardcoded IVs, missing MACs, improper key derivation?
2323+6. **Data exposure**: Could error messages, logs, or API responses leak sensitive information?
2424+7. **Resource exhaustion**: Can an attacker cause unbounded memory allocation, CPU usage, or disk writes?
2525+8. **Deserialization**: Is untrusted data deserialized? Are there gadget chains?
2626+2727+## Output Format
2828+2929+For each finding:
3030+- **SEVERITY**: CRITICAL / HIGH / MEDIUM / LOW
3131+- **CWE**: [CWE number if applicable]
3232+- **LOCATION**: exact file and function
3333+- **ATTACK**: how an attacker would exploit this
3434+- **FIX**: specific remediation
3535+3636+No hedging. No "you might consider." Concrete findings only.
3737+3838+## When to Run
3939+4040+1. After implementing any auth/authz logic — every time, no exceptions
4141+2. After implementing any input handling — forms, API endpoints, file parsing
4242+3. After implementing any crypto — even if it's "just calling a library"
4343+4. After adding or updating dependencies
4444+5. Before any deployment — full security pass on the diff since last deploy
+30
agents/skills/spec-gap-analysis/SKILL.md
···11+---
22+name: spec-gap-analysis
33+description: Analyze spec/design documents for ambiguities, missing edge cases, implicit assumptions, contradictions, and undefined interfaces. Trigger before implementing anything, when specs exist but haven't been vetted.
44+---
55+66+# Spec Gap Analysis
77+88+Run this against spec/design documents before implementing anything.
99+1010+## Instructions
1111+1212+Read all spec/design documents in [SPEC_PATH].
1313+1414+Produce a gap analysis. For each component or feature, identify:
1515+1616+1. **Ambiguous requirements** — language that could be interpreted multiple ways
1717+2. **Missing edge cases** — boundary conditions, degenerate inputs, failure modes not addressed
1818+3. **Implicit assumptions** — things the spec relies on but doesn't state
1919+4. **Contradictions** — places where different parts of the spec conflict
2020+5. **Undefined interfaces** — inputs, outputs, or error conditions not specified
2121+6. **Missing error handling** — what happens when things go wrong?
2222+2323+## Output Format
2424+2525+Group findings by component. Rate each finding:
2626+- **HIGH** — blocks implementation
2727+- **MEDIUM** — will cause bugs
2828+- **LOW** — cleanup
2929+3030+Do NOT fix anything. Do NOT suggest implementations. Just find the gaps.
-328
agents/skills/task-execution/SKILL.md
···11----
22-name: task
33-description: Start and track work on a task. Use when the user says "lets work on...", "start task", or invokes /task. Creates TASK.md and PLAN.md for requirements and implementation tracking, handles branch setup, creates worktrees, and pushes incremental progress to draft MRs.
44-allowed-tools:
55- - Read(TASK.md)
66- - Read(PLAN.md)
77- - Read(.worktree.yml)
88- - Read(.claude/worktree.yml)
99- - Write(TASK.md)
1010- - Write(PLAN.md)
1111- - Edit(TASK.md)
1212- - Edit(PLAN.md)
1313- - Bash(git:*)
1414- - Bash(~/.claude/skills/task-execution/worktree-setup.sh)
1515- - Bash(mkdir -p:*)
1616- - AskUserQuestion
1717- - TodoWrite
1818----
1919-2020-# Task Execution
2121-2222-Manage task lifecycle from start to completion using two files:
2323-- **TASK.md**: Requirements, context, questions, decisions
2424-- **PLAN.md**: Implementation steps with progress tracking
2525-2626----
2727-2828-## Input
2929-3030-Accept any combination of:
3131-- **Jira ticket** (e.g., `ADE-123`)
3232-- **Task description** (e.g., "add logout button")
3333-- **Both**
3434-3535----
3636-3737-## Mode Detection
3838-3939-Determine mode based on current directory:
4040-4141-**Single-repo mode**: `git rev-parse --show-toplevel` succeeds
4242-- Working on one repository
4343-- `.tasks/` folder created in repo root
4444-- Worktree source is the repo itself
4545-4646-**Multi-repo mode**: Not inside a git repo
4747-- Working across multiple repositories
4848-- `.tasks/` folder created in current directory
4949-- CLAUDE.md should specify repos location (e.g., `Repos: ./zapier/`)
5050-- Repos are listed in TASK.md during planning
5151-5252----
5353-5454-## Phase 1: Task Setup
5555-5656-### 1. Detect Mode
5757-5858-```bash
5959-git rev-parse --show-toplevel # Success = single-repo, failure = multi-repo
6060-```
6161-6262-### 2. Gather Context
6363-6464-- If Jira ticket provided: fetch it for context
6565-- Check for uncommitted changes in current directory
6666-- If uncommitted changes exist: STOP and ask how to handle them
6767-6868-### 3. Create Task Folder
6969-7070-Generate task name: `{ticket-lower}-{short-description}` (e.g., `ade-123-add-logout`)
7171-7272-```bash
7373-mkdir -p .tasks/{task-name}
7474-```
7575-7676-### 4. Create TASK.md
7777-7878-```markdown
7979-# {Task Title}
8080-8181-**Ticket**: {TICKET-NUM or N/A}
8282-**Branch**: {branch-name}
8383-8484-## Goal
8585-8686-{What are we trying to achieve?}
8787-8888-## Requirements
8989-9090-- {From ticket, discussion, or inferred}
9191-9292-## Repos
9393-9494-{For multi-repo: list repos and what changes in each}
9595-{For single-repo: omit this section}
9696-9797-## Constraints
9898-9999-- {Technical constraints, deadlines, scope limits}
100100-101101-## Open Questions
102102-103103-- {Anything unclear that needs resolution}
104104-105105-## Decisions
106106-107107-- {Decisions made during planning/implementation}
108108-```
109109-110110----
111111-112112-## Phase 2: Planning
113113-114114-Iterate on TASK.md until requirements are clear, then create PLAN.md.
115115-116116-### 1. Understand the Work
117117-118118-- Explore relevant code
119119-- Identify affected areas and dependencies
120120-- Resolve open questions through discussion
121121-- For multi-repo: identify which repos are involved, update `## Repos` section
122122-123123-### 2. Create PLAN.md
124124-125125-```markdown
126126-# Implementation Plan
127127-128128-## Approach
129129-130130-{High-level approach and key decisions}
131131-132132-## Steps
133133-134134-- [ ] Step 1: {description}
135135-- [ ] Step 2: {description}
136136-- [ ] Step 3: {description}
137137-138138-## Testing
139139-140140-- {How we'll verify the implementation}
141141-142142-## Files to Modify
143143-144144-- `path/to/file.py` - {what changes}
145145-```
146146-147147-### 3. Get Approval
148148-149149-For non-trivial tasks, confirm the plan before proceeding.
150150-151151----
152152-153153-## Phase 3: Implementation
154154-155155-### 1. Create Worktrees
156156-157157-For each repo involved (or the single repo in single-repo mode):
158158-159159-**Find the source repo**:
160160-- Single-repo: `git rev-parse --show-toplevel` from task folder parent
161161-- Multi-repo: Use repos location from CLAUDE.md
162162-163163-**Create worktree**:
164164-```bash
165165-# Fetch latest
166166-git -C {source-repo} fetch origin
167167-168168-# Create worktree with new branch
169169-git -C {source-repo} worktree add .tasks/{task-name}/{repo-name} -b {branch-name} origin/main
170170-```
171171-172172-Branch name: `{ticket-lower}-{short-description}` (same as task folder)
173173-174174-**Run setup**:
175175-```bash
176176-cd .tasks/{task-name}/{repo-name}
177177-~/.claude/skills/task-execution/worktree-setup.sh
178178-```
179179-180180-This copies files and runs setup commands per `.worktree.yml` in the source repo.
181181-182182-### 2. Create Draft MR
183183-184184-Push each branch and create draft merge requests:
185185-```bash
186186-git -C .tasks/{task-name}/{repo-name} push -u origin {branch-name}
187187-# Create draft MR via glab
188188-```
189189-190190-### 3. Track Progress
191191-192192-- Use TodoWrite for real-time tracking
193193-- Update PLAN.md checkboxes as steps complete
194194-- Update TASK.md with decisions and resolved questions
195195-196196-### 4. Commit Incrementally
197197-198198-Make small, focused commits as you complete logical units:
199199-- Commit code changes as they're done
200200-- Push to draft MR regularly for CI feedback
201201-- Update and commit PLAN.md progress periodically
202202-203203-### 5. Follow TDD
204204-205205-1. Write failing test
206206-2. Confirm failure
207207-3. Implement minimum to pass
208208-4. Confirm success
209209-5. Refactor if needed
210210-211211----
212212-213213-## Phase 4: Completion
214214-215215-### 1. Final Updates
216216-217217-- Mark all PLAN.md steps complete
218218-- Update TASK.md with any final decisions
219219-- Ensure all tests pass
220220-221221-### 2. Ready for Review
222222-223223-- Push final commits
224224-- Mark MR(s) as ready (no longer draft)
225225-- Ask if any follow-up tasks should be created
226226-227227-### 3. Cleanup
228228-229229-After merge, task folder can be removed:
230230-```bash
231231-# Remove worktrees
232232-git worktree remove .tasks/{task-name}/{repo-name}
233233-234234-# Remove task folder
235235-rm -rf .tasks/{task-name}
236236-```
237237-238238----
239239-240240-## File Maintenance
241241-242242-**TASK.md**: Living document for collaboration
243243-- Add decisions as they're made
244244-- Resolve and remove open questions
245245-- Keep requirements updated if scope changes
246246-247247-**PLAN.md**: Implementation checklist
248248-- Check off steps as completed
249249-- Add steps if scope expands
250250-- Note blockers or changes to approach
251251-252252-**Prune aggressively**: Remove obsolete information. These files should stay useful, not become history logs.
253253-254254----
255255-256256-## Resuming Work
257257-258258-When returning to an existing task:
259259-260260-1. Read TASK.md and PLAN.md to restore context
261261-2. Check git status in worktrees for uncommitted work
262262-3. Review where we left off in PLAN.md
263263-4. Continue from there
264264-265265----
266266-267267-## Worktree Setup Script
268268-269269-The script `~/.claude/skills/task-execution/worktree-setup.sh` runs in a new worktree to:
270270-271271-1. Find the main worktree (source repo)
272272-2. Read `.worktree.yml` or `.claude/worktree.yml` from source
273273-3. Copy files listed in `copy:` section
274274-4. Run commands listed in `setup:` section
275275-276276-**Example .worktree.yml**:
277277-```yaml
278278-copy:
279279- - .env
280280- - .env.local
281281- - .envrc
282282- - .tool-versions
283283-284284-setup:
285285- - npm install
286286- - direnv allow
287287-```
288288-289289----
290290-291291-## Directory Structure Examples
292292-293293-**Single-repo**:
294294-```
295295-some-repo/
296296-├── .tasks/
297297-│ └── ade-123-fix-bug/
298298-│ ├── TASK.md
299299-│ ├── PLAN.md
300300-│ └── some-repo/ # worktree
301301-├── .worktree.yml
302302-└── src/
303303-```
304304-305305-**Multi-repo** (CLAUDE.md specifies `Repos: ./zapier/`):
306306-```
307307-~/projects/zapier/
308308-├── .tasks/
309309-│ └── ade-123-add-logout/
310310-│ ├── TASK.md
311311-│ ├── PLAN.md
312312-│ ├── zapier-web/ # worktree
313313-│ └── zapier-api/ # worktree
314314-└── zapier/
315315- ├── zapier-web/
316316- └── zapier-api/
317317-```
318318-319319----
320320-321321-## Subagent Coordination
322322-323323-When using subagents for implementation:
324324-325325-- Each subagent works in a specific worktree: `.tasks/{task-name}/{repo-name}/`
326326-- Subagents reference shared context: `../TASK.md` and `../PLAN.md`
327327-- Main agent tracks overall progress in PLAN.md
328328-- Each repo gets its own commits and draft MR
-96
agents/skills/task-execution/worktree-setup.sh
···11-#!/bin/bash
22-#
33-# worktree-setup.sh
44-#
55-# Run from within a new git worktree to copy files and run setup commands
66-# based on .worktree.yml configuration from the main worktree.
77-#
88-# Usage: worktree-setup.sh
99-#
1010-# Configuration (.worktree.yml or .claude/worktree.yml in main worktree):
1111-# copy:
1212-# - .env
1313-# - .envrc
1414-# setup:
1515-# - npm install
1616-# - direnv allow
1717-1818-set -e
1919-2020-# Find main worktree
2121-main_git_dir=$(git rev-parse --git-common-dir 2>/dev/null)
2222-if [[ -z "$main_git_dir" ]]; then
2323- echo "Error: Not in a git repository"
2424- exit 1
2525-fi
2626-2727-main_worktree=$(dirname "$main_git_dir")
2828-current_worktree=$(pwd)
2929-3030-# Check if we're actually in a worktree (not the main repo)
3131-if [[ "$main_worktree" == "$current_worktree" ]]; then
3232- echo "Error: Already in main worktree, nothing to set up"
3333- exit 1
3434-fi
3535-3636-# Find config file
3737-config=""
3838-for path in ".worktree.yml" ".claude/worktree.yml"; do
3939- if [[ -f "$main_worktree/$path" ]]; then
4040- config="$main_worktree/$path"
4141- break
4242- fi
4343-done
4444-4545-if [[ -z "$config" ]]; then
4646- echo "No .worktree.yml found in main worktree, skipping setup"
4747- exit 0
4848-fi
4949-5050-echo "Using config: $config"
5151-echo "Source: $main_worktree"
5252-echo "Target: $current_worktree"
5353-echo ""
5454-5555-# Copy files
5656-echo "Copying files..."
5757-copied=0
5858-skipped=0
5959-while IFS= read -r file; do
6060- # Skip empty lines
6161- [[ -z "$file" ]] && continue
6262-6363- src="$main_worktree/$file"
6464- dest="$current_worktree/$file"
6565-6666- if [[ -f "$src" ]]; then
6767- # Create parent directory if needed
6868- mkdir -p "$(dirname "$dest")"
6969- cp "$src" "$dest"
7070- echo " Copied: $file"
7171- ((copied++))
7272- else
7373- echo " Skipped (not found): $file"
7474- ((skipped++))
7575- fi
7676-done < <(yq -r '.copy[]?' "$config" 2>/dev/null)
7777-7878-echo "Copied $copied file(s), skipped $skipped"
7979-echo ""
8080-8181-# Run setup commands
8282-echo "Running setup commands..."
8383-while IFS= read -r cmd; do
8484- # Skip empty lines
8585- [[ -z "$cmd" ]] && continue
8686-8787- echo " Running: $cmd"
8888- if eval "$cmd"; then
8989- echo " Done"
9090- else
9191- echo " Warning: Command failed (continuing)"
9292- fi
9393-done < <(yq -r '.setup[]?' "$config" 2>/dev/null)
9494-9595-echo ""
9696-echo "Worktree setup complete"
+66
agents/skills/task-workflow/SKILL.md
···11+---
22+name: task-workflow
33+description: Structured process for working on any substantial task — create TASK.md and PLAN.md, clarify requirements, execute step by step, clean up. Trigger for any non-trivial implementation work that benefits from tracked progress and explicit planning.
44+---
55+66+# Task Workflow
77+88+Structured process for working on any task, from requirements through cleanup.
99+1010+## Setup
1111+1212+Create a task folder at `.tasks/<task-slug>/` with two files:
1313+1414+- `TASK.md` — requirements and decisions
1515+- `PLAN.md` — implementation plan and progress
1616+1717+### TASK.md Structure
1818+1919+```
2020+# <Task Name>
2121+2222+## Goal
2323+[What needs to be true when this is done. 2-3 sentences max.]
2424+2525+## Requirements
2626+[Specific, unambiguous requirements. Collaborate with the user to clarify before writing code.]
2727+2828+## Decisions & Questions
2929+[Record decisions made during the task. Track open questions here until resolved.]
3030+```
3131+3232+### PLAN.md Structure
3333+3434+```
3535+# Plan
3636+3737+## Steps
3838+- [ ] Step 1: [description]
3939+ - Verify: [how to confirm this step is done correctly]
4040+- [ ] Step 2: [description]
4141+ - Verify: [how to confirm]
4242+...
4343+```
4444+4545+Each step MUST be:
4646+- Small enough to complete and verify independently
4747+- Have explicit "how to verify" information
4848+- Checked off immediately upon completion — do not let progress markers go stale
4949+5050+## Execution
5151+5252+1. **Clarify first.** Collaborate with the user on TASK.md before writing any code. Don't assume requirements — ask.
5353+2. **Plan before building.** Fill out PLAN.md with steps. Get user agreement on the approach.
5454+3. **Work step by step.** Complete one step, update PLAN.md, pause for the user to verify.
5555+4. **Test before implementing.** When fixing a bug or adding behavior, write a failing test that demonstrates the expected behavior FIRST. Confirm it fails. Then write the minimum code to make it pass.
5656+5. **Keep TASK.md current.** Update decisions and questions as they're resolved during the task.
5757+5858+## Cleanup & Reflection
5959+6060+When the task is complete:
6161+6262+1. **Integrate knowledge back.** Review any decisions, patterns, or conventions that emerged during the task. Update the project's evergreen documentation (README.md, ARCHITECTURE.md, CONSTRAINTS.md, DEVELOPMENT.md, etc.) with anything that should persist beyond this task.
6363+2. **Update CLAUDE.md references.** If new documentation files were created or existing ones changed significantly, ensure CLAUDE.md still points to the right places.
6464+3. **Remove the task folder.** Delete `.tasks/<task-slug>/` once everything worth keeping has been integrated into permanent docs.
6565+6666+The goal: nothing valuable lives only in a task folder. Task folders are temporary workspaces. The project's documentation is the permanent record.
+29
agents/skills/tdd-implementation/SKILL.md
···11+---
22+name: tdd-implementation
33+description: Implement a feature or fix using strict test-driven development — failing tests first, minimum code to pass, then refactor. Trigger when implementing a specific feature or fixing a specific issue.
44+---
55+66+# TDD Implementation
77+88+Use this when implementing a specific feature or fixing a specific issue. Start a fresh session per task.
99+1010+## Instructions
1111+1212+Task: [DESCRIBE THE TASK — what needs to work, referencing the spec]
1313+1414+Follow strict TDD in this exact sequence:
1515+1616+1. Read the relevant spec section: [SPEC_REFERENCE]
1717+2. Read existing tests in [TEST_PATH] to understand current coverage
1818+3. Write NEW failing tests that define the expected behavior for this task
1919+4. Show the test output confirming they fail
2020+5. Write the MINIMUM implementation to pass each test, one at a time
2121+6. After all tests pass, run the FULL test suite to check for regressions
2222+7. Only then, refactor if needed — tests must still pass after refactoring
2323+2424+## Rules
2525+2626+- Tests must test BEHAVIOR described in the spec, not implementation details
2727+- Every error path in the spec needs a test
2828+- Do not write any implementation code before the relevant test exists and fails
2929+- If you discover a spec gap during implementation, STOP and flag it — do not guess