···11+# mill v0 Operations & Troubleshooting
22+33+Operational conventions for diagnosing stuck runs, cancellations, and stale UI state.
44+55+## 1) Source of truth for run state
66+77+- **Canonical:** `mill status <runId> --json` and `run.json` under the run directory.
88+- **Advisory only:** extension-local mirrors (widget/monitor caches, historical `run.json` snapshots in pi session folders).
99+1010+When in doubt, always trust canonical mill state.
1111+1212+## 2) Cancellation semantics
1313+1414+`mill cancel <runId>` performs two steps:
1515+1616+1. **Logical cancel**
1717+ - Appends `run:cancelled` (if needed)
1818+ - Sets run status to `cancelled`
1919+2. **Physical cancel**
2020+ - Reads `worker.pid`
2121+ - Validates it belongs to `_worker --run-id <runId>`
2222+ - Sends `SIGTERM` to worker + descendants
2323+ - After a short grace period, sends `SIGKILL` to survivors
2424+2525+Cancel behavior is idempotent at run-state level.
2626+2727+## 3) On-disk artifacts to inspect
2828+2929+For run `<runId>` in runs dir `<runsDir>`:
3030+3131+- `<runsDir>/<runId>/run.json`
3232+- `<runsDir>/<runId>/events.ndjson`
3333+- `<runsDir>/<runId>/result.json`
3434+- `<runsDir>/<runId>/worker.pid` (best effort)
3535+- `<runsDir>/<runId>/logs/worker.log`
3636+- `<runsDir>/<runId>/logs/cancel.log`
3737+- `<runsDir>/<runId>/sessions/<spawnId>.jsonl` (pi driver transcripts)
3838+3939+## 4) Session behavior (pi driver)
4040+4141+pi driver uses explicit per-spawn session files:
4242+4343+- `--session <runDir>/sessions/<spawnId>.jsonl`
4444+- `sessionRef` in spawn result points to that file path
4545+4646+This keeps transcripts available for post-hoc debugging and parent-orchestrator context recovery.
4747+4848+## 5) Fast triage checklist for "run stuck in running"
4949+5050+1. `mill inspect <runId> --json`
5151+ - if you only see `spawn:start` and no `spawn:complete`, the child driver call is still in-flight.
5252+2. Check process liveness using `worker.pid` + OS process list.
5353+3. `mill cancel <runId> --json`
5454+4. Read `logs/cancel.log`
5555+ - verify TERM/KILL steps and survivor count.
5656+5. Re-check `mill status <runId> --json`
5757+5858+## 6) Stale historical entries in pi monitor
5959+6060+Convention:
6161+6262+- Historical `status: running` entries are reconciled against canonical `mill status` on scan.
6363+- If canonical status is terminal, scanner rewrites the historical record with reconciled terminal status.
6464+6565+This avoids long-lived "running" ghosts from previous failures.
···6262]);
6363```
64646565-Each `mill.spawn()` submits an async mill run (`mill run --json`) and then follows completion via mill APIs (`watch` + `inspect`). Model selection, driver routing, and execution behavior all come from your mill configuration.
6565+Each `mill.spawn()` submits an async mill run (`mill run --json`) and then follows completion via mill APIs (`wait` + `inspect`). Model selection, driver routing, and execution behavior all come from your mill configuration.
66666767Runs are **async by default** — the tool returns a `runId` immediately and delivers results via notification when complete.
6868···98989999## Context flow
100100101101-Each subagent receives the parent session path and can use `search_thread` to explore the orchestrator's conversation for context. Results (including each subagent's `sessionPath`) flow back to the orchestrator via `result.text`.
101101+Each subagent receives the parent session path and can use `search_thread` to explore the orchestrator's conversation for context. Results include each subagent's `sessionPath` (session reference, typically a `.jsonl` path for pi driver) for later inspection and context recovery.
···2525 step?: number;
2626 /** Final assistant text output, auto-populated on completion. */
2727 text: string;
2828- /** Path to the subagent's session .jsonl file. Use search_thread to explore. */
2828+ /** Subagent session reference (session id or .jsonl path). Use search_thread to explore. */
2929 sessionPath?: string;
3030}
3131