···11-Source demos/config.tape
22-Output demos/tandem-exe-dev.gif
33-44-# ============================================================================
55-# tandem: distributed jj workspaces across 3 VMs on exe.dev
66-#
77-# Two AI agents on separate VMs collaborating on code through a shared
88-# tandem server. Each agent sees the other's commits instantly.
99-# ============================================================================
1010-1111-Sleep 1s
1212-1313-# -- Create 3 VMs on exe.dev ------------------------------------------------
1414-1515-Type "# Create three exe.dev VMs: server + two agents"
1616-Enter
1717-Sleep 1s
1818-1919-Type "ssh exe.dev new --name tandem-server"
2020-Enter
2121-Wait@30s
2222-Sleep 2s
2323-2424-Type "ssh exe.dev new --name tandem-agent-a"
2525-Enter
2626-Wait@30s
2727-Sleep 2s
2828-2929-Type "ssh exe.dev new --name tandem-agent-b"
3030-Enter
3131-Wait@30s
3232-Sleep 2s
3333-3434-# -- Copy tandem binary + scripts -------------------------------------------
3535-3636-Type "# Copy tandem binary to all VMs"
3737-Enter
3838-Sleep 1s
3939-4040-Type "BIN=target/x86_64-unknown-linux-musl/release/tandem"
4141-Enter
4242-Sleep 300ms
4343-4444-Type "scp $BIN tandem-server.exe.xyz:~/tandem"
4545-Enter
4646-Wait@60s
4747-Sleep 1s
4848-4949-Type "scp $BIN tandem-agent-a.exe.xyz:~/tandem"
5050-Enter
5151-Wait@60s
5252-Sleep 1s
5353-5454-Type "scp $BIN tandem-agent-b.exe.xyz:~/tandem"
5555-Enter
5656-Wait@60s
5757-Sleep 2s
5858-5959-# -- Start the tandem server ------------------------------------------------
6060-6161-Type "# Start tandem server"
6262-Enter
6363-Sleep 1s
6464-6565-Type "scp demos/scripts/server-start.sh tandem-server.exe.xyz:/tmp/start.sh"
6666-Enter
6767-Wait@15s
6868-Sleep 500ms
6969-7070-Type "ssh tandem-server.exe.xyz bash /tmp/start.sh"
7171-Enter
7272-Wait@15s
7373-Sleep 2s
7474-7575-# -- Set up SSH tunnels ------------------------------------------------------
7676-7777-Type "# SSH tunnels: bridge raw TCP between VMs via localhost"
7878-Enter
7979-Sleep 1s
8080-8181-Type "ssh -f -N -L 15555:localhost:5555 tandem-server.exe.xyz"
8282-Enter
8383-Wait@10s
8484-Sleep 1s
8585-8686-Type "ssh -f -N -R 13013:localhost:15555 tandem-agent-a.exe.xyz"
8787-Enter
8888-Wait@10s
8989-Sleep 1s
9090-9191-Type "ssh -f -N -R 13013:localhost:15555 tandem-agent-b.exe.xyz"
9292-Enter
9393-Wait@10s
9494-Sleep 2s
9595-9696-# -- Agent A: write auth module ----------------------------------------------
9797-9898-Type "# --- Agent A: write auth module ---"
9999-Enter
100100-Sleep 1s
101101-102102-Type "scp demos/scripts/agent-a.sh tandem-agent-a.exe.xyz:/tmp/setup.sh"
103103-Enter
104104-Wait@15s
105105-Sleep 500ms
106106-107107-Type "ssh tandem-agent-a.exe.xyz bash /tmp/setup.sh"
108108-Enter
109109-Wait@30s
110110-Sleep 2s
111111-112112-Type "# Agent A sees their commit"
113113-Enter
114114-Sleep 500ms
115115-116116-Type "ssh tandem-agent-a.exe.xyz 'cd ~/work && ~/tandem --config=fsmonitor.backend=none log'"
117117-Enter
118118-Wait@15s
119119-Sleep 4s
120120-121121-# -- Agent B: see Agent A, then add API routes --------------------------------
122122-123123-Type "# --- Agent B: init workspace, see Agent A's work ---"
124124-Enter
125125-Sleep 1s
126126-127127-Type "scp demos/scripts/agent-b.sh tandem-agent-b.exe.xyz:/tmp/setup.sh"
128128-Enter
129129-Wait@15s
130130-Sleep 500ms
131131-132132-Type "ssh tandem-agent-b.exe.xyz bash /tmp/setup.sh"
133133-Enter
134134-Wait@30s
135135-Sleep 2s
136136-137137-Type "# Agent B sees both workspaces in the log"
138138-Enter
139139-Sleep 500ms
140140-141141-Type "ssh tandem-agent-b.exe.xyz 'cd ~/work && ~/tandem --config=fsmonitor.backend=none log'"
142142-Enter
143143-Wait@15s
144144-Sleep 4s
145145-146146-Type "# Agent B reads Agent A's auth.rs from the shared store"
147147-Enter
148148-Sleep 500ms
149149-150150-Type "ssh tandem-agent-b.exe.xyz 'cd ~/work && ~/tandem --config=fsmonitor.backend=none file show -r @-- src/auth.rs'"
151151-Enter
152152-Wait@15s
153153-Sleep 4s
154154-155155-# -- Server: everything is there --------------------------------------------
156156-157157-Type "# --- Server: all commits from both agents ---"
158158-Enter
159159-Sleep 1s
160160-161161-Type "ssh tandem-server.exe.xyz 'cd ~/project && ~/tandem --config=fsmonitor.backend=none log --no-graph --ignore-working-copy'"
162162-Enter
163163-Wait@15s
164164-Sleep 4s
165165-166166-Type "# Server has everything. Ready for: jj git push"
167167-Enter
168168-Sleep 3s
169169-170170-# -- Fin ---------------------------------------------------------------------
171171-172172-Type "# Two agents, three VMs, one store. That's tandem."
173173-Enter
174174-Sleep 4s
+1
docs/design-docs/index.md
···99- [jj-lib integration](./jj-lib-integration.md)
1010- [RPC protocol](./rpc-protocol.md)
1111- [RPC error model](./rpc-error-model.md)
1212+- [Server lifecycle](./server-lifecycle.md) — `tandem up/down/status/logs`, daemon management
12131314## Add a new design doc when
1415
+211
docs/design-docs/server-lifecycle.md
···11+# Server Lifecycle (up/down/status/logs)
22+33+## Motivation
44+55+Users shouldn't need to understand systemd, launchd, or process management to
66+run a tandem server. `tandem up` starts it, `tandem down` stops it,
77+`tandem status` tells you if it's running. Same model as Tailscale (`tailscale up`)
88+and Caddy (`caddy start`).
99+1010+## API surface
1111+1212+```
1313+tandem up --repo /srv/project --listen 0.0.0.0:13013 # start daemon, return
1414+tandem down # stop daemon
1515+tandem status # health check
1616+tandem status --json # machine-readable
1717+tandem logs # stream logs from daemon
1818+tandem logs --level debug # stream at higher verbosity
1919+```
2020+2121+`tandem serve` remains the foreground mode for systemd/docker/debugging:
2222+2323+```
2424+tandem serve --repo /srv/project --listen 0.0.0.0:13013
2525+tandem serve --log-level debug --log-file /var/log/tandem.log --log-format json
2626+tandem serve --pidfile /var/run/tandem.pid
2727+```
2828+2929+## Fork model
3030+3131+`tandem up` forks itself as a background process. No separate daemon binary.
3232+3333+1. `tandem up` validates flags (repo exists, port parseable).
3434+2. Forks `tandem serve --daemon` with same flags. `--daemon` is internal/hidden.
3535+3. Parent waits for child to signal readiness (control socket exists + health OK).
3636+4. Parent prints "tandem running, PID <n>" and exits 0.
3737+5. If child fails to start within timeout (5s default), parent exits 1 with error.
3838+3939+The `--daemon` flag tells `serve` to:
4040+- Detach from terminal (setsid, close stdin/stdout/stderr).
4141+- Write PID file to `$XDG_RUNTIME_DIR/tandem/daemon.pid`.
4242+- Create control socket.
4343+- Redirect logs to `$XDG_RUNTIME_DIR/tandem/daemon.log` (unless --log-file overrides).
4444+4545+Same pattern as Caddy's `caddy start` → `caddy run --environ`.
4646+4747+### Already running
4848+4949+`tandem up` when a daemon is already running: exit 1 with
5050+"tandem is already running (PID <n>). Use `tandem down` first."
5151+5252+Detected via control socket liveness check, not just PID file existence.
5353+5454+## Control socket
5555+5656+Path: `$XDG_RUNTIME_DIR/tandem/control.sock` (Linux) or
5757+`$TMPDIR/tandem/control.sock` (macOS). Override with `--control-socket <path>`.
5858+5959+Protocol: HTTP/1.1 over Unix domain socket. Reasons:
6060+6161+- Reuse hyper/axum (same stack as the HTTP API feature).
6262+- Structured request/response with status codes.
6363+- Easy to curl for debugging: `curl --unix-socket /path/to/control.sock http://localhost/status`
6464+- No need to invent a framing protocol.
6565+6666+### Control endpoints
6767+6868+```
6969+GET /status → { "pid": 1234, "uptime_secs": 3600, "repo": "/srv/project", ... }
7070+POST /shutdown → 200 OK, daemon begins graceful shutdown
7171+GET /logs?level=debug → SSE stream of log events (text/event-stream)
7272+```
7373+7474+The control socket is **local-only** (Unix socket permissions). No auth needed.
7575+7676+## Log streaming
7777+7878+`tandem logs` connects to the control socket's `/logs` SSE endpoint.
7979+8080+Key design: the daemon always logs at trace level internally (ring buffer or
8181+tracing subscriber). `tandem logs --level info` filters server-side before
8282+streaming. This means you can attach at debug level to a daemon that was
8383+started with `--log-level info` — the Consul `consul monitor` pattern.
8484+8585+Implementation: tracing subscriber that fans out to:
8686+1. File/stderr (at configured --log-level).
8787+2. Zero or more SSE clients (each with independent level filter).
8888+8989+Log format over SSE:
9090+9191+```
9292+data: {"ts":"2026-02-19T18:00:00Z","level":"info","target":"tandem::server","msg":"client connected","fields":{"addr":"10.0.0.5:44312"}}
9393+```
9494+9595+`tandem logs` renders these as human-readable lines by default.
9696+`tandem logs --json` passes the raw JSON through.
9797+9898+### No daemon running
9999+100100+`tandem logs` when no daemon is running: exit 1 with
101101+"no tandem daemon running. Start one with `tandem up`."
102102+103103+## Status output
104104+105105+`tandem status` (human-readable):
106106+107107+```
108108+tandem is running
109109+ PID: 1234
110110+ Uptime: 2h 15m
111111+ Repo: /srv/project
112112+ Listen: 0.0.0.0:13013
113113+ Version: 0.3.0
114114+```
115115+116116+`tandem status --json`:
117117+118118+```json
119119+{
120120+ "running": true,
121121+ "pid": 1234,
122122+ "uptime_secs": 8100,
123123+ "repo": "/srv/project",
124124+ "listen": "0.0.0.0:13013",
125125+ "version": "0.3.0",
126126+ "workspaces": 3
127127+}
128128+```
129129+130130+Exit codes: 0 = running, 1 = not running / unreachable.
131131+132132+When not running:
133133+134134+```
135135+tandem is not running
136136+```
137137+138138+## Signal handling
139139+140140+- **SIGTERM**: graceful shutdown. Drain in-flight RPCs (5s timeout), close
141141+ sockets, remove PID file and control socket, exit 0.
142142+- **SIGINT** (Ctrl+C): same as SIGTERM. Already needed for foreground `tandem serve`.
143143+- **SIGHUP**: reserved for future config reload. Currently ignored.
144144+- **Second SIGTERM/SIGINT**: immediate exit.
145145+146146+## Relationship to tandem serve
147147+148148+| | `tandem serve` | `tandem up` |
149149+|---|---|---|
150150+| Foreground | yes | no |
151151+| Logs to stderr | yes (default) | no (logs to file) |
152152+| Control socket | yes | yes |
153153+| PID file | opt-in (--pidfile) | auto-managed |
154154+| systemd/docker | yes | not needed |
155155+| Human operator | debugging | normal use |
156156+157157+Both modes create the control socket. `tandem down` / `tandem status` /
158158+`tandem logs` work against either mode.
159159+160160+## Flags summary
161161+162162+### tandem serve (existing + new)
163163+164164+```
165165+--listen <addr> Cap'n Proto listen address (required)
166166+--repo <path> Repository path (required)
167167+--log-level <level> trace|debug|info|warn|error (default: info)
168168+--log-file <path> Log to file instead of stderr
169169+--log-format <fmt> text|json (default: text)
170170+--pidfile <path> Write PID file (opt-in)
171171+--control-socket <path> Override control socket path
172172+--daemon Internal flag, set by `tandem up`
173173+```
174174+175175+### tandem up
176176+177177+```
178178+--listen <addr> Cap'n Proto listen address (required)
179179+--repo <path> Repository path (required)
180180+--log-level <level> Daemon log level (default: info)
181181+--log-file <path> Daemon log file (default: $XDG_RUNTIME_DIR/tandem/daemon.log)
182182+```
183183+184184+### tandem down
185185+186186+No flags. Finds daemon via control socket.
187187+188188+### tandem status
189189+190190+```
191191+--json Machine-readable output
192192+```
193193+194194+### tandem logs
195195+196196+```
197197+--level <level> Filter level (default: info)
198198+--json Raw JSON output
199199+```
200200+201201+## Open questions
202202+203203+1. **Multiple daemons.** Current design assumes one daemon per user (single
204204+ control socket path). Should we support named instances for serving multiple
205205+ repos? Could use `--name <n>` with per-name socket paths. Punt until needed.
206206+207207+2. **Log retention.** How large should the daemon log file grow? Rotation
208208+ policy? Probably punt to logrotate / the OS for now.
209209+210210+3. **macOS launchd.** Should `tandem up` optionally install a launchd plist
211211+ for auto-restart? Probably not — keep it simple, add later if needed.
+109
docs/exec-plans/active/server-lifecycle.md
···11+# Execution Plan: Server Lifecycle
22+33+**Design doc:** `docs/design-docs/server-lifecycle.md`
44+55+Implements `tandem up/down/status/logs` — daemon management without systemd.
66+77+## Slice 10 — Signal handling and graceful shutdown
88+99+**Goal:** `tandem serve` handles SIGTERM/SIGINT cleanly. Prerequisite for
1010+everything else — daemon mode needs reliable shutdown.
1111+1212+**Work:**
1313+1414+- Install signal handler (tokio::signal) in `tandem serve`.
1515+- On SIGTERM/SIGINT: stop accepting new connections, drain in-flight RPCs
1616+ (5s timeout), close listeners, exit 0.
1717+- Second signal: immediate exit.
1818+- Add `--log-level` and `--log-format` flags to `tandem serve`.
1919+2020+**Acceptance:**
2121+2222+- `tandem serve` + SIGINT exits 0 (not 130).
2323+- In-flight `getObject` call during shutdown completes (not dropped).
2424+- `--log-level debug` produces debug output to stderr.
2525+- Existing slice 1-7 tests still pass.
2626+2727+**Test:** `tests/slice10_graceful_shutdown.rs`
2828+- Start server, connect client, send SIGTERM, verify clean exit.
2929+- Start server, begin slow read, send SIGTERM, verify read completes.
3030+3131+## Slice 11 — Control socket and tandem status
3232+3333+**Goal:** `tandem serve` opens a control socket. `tandem status` queries it.
3434+3535+**Work:**
3636+3737+- Add HTTP-over-Unix-socket listener to `tandem serve` (axum + hyper-unix).
3838+- Implement `GET /status` on control socket.
3939+- Socket path: `$XDG_RUNTIME_DIR/tandem/control.sock` (Linux),
4040+ `$TMPDIR/tandem/control.sock` (macOS). Override with `--control-socket`.
4141+- Implement `tandem status` command: connect to control socket, print output.
4242+- Implement `tandem status --json`.
4343+- Exit code 0 = running, 1 = not running.
4444+4545+**Acceptance:**
4646+4747+- `tandem serve` creates control socket.
4848+- `tandem status` prints human-readable output while server runs.
4949+- `tandem status --json` returns valid JSON with pid, uptime, repo, listen fields.
5050+- `tandem status` exits 1 when no server is running.
5151+- Control socket is cleaned up on server exit (from slice 10).
5252+5353+**Test:** `tests/slice11_control_socket.rs`
5454+- Start server, run `tandem status --json`, parse output, verify fields.
5555+- No server running, run `tandem status`, verify exit code 1.
5656+5757+## Slice 12 — tandem up and tandem down
5858+5959+**Goal:** `tandem up` starts a background daemon. `tandem down` stops it.
6060+6161+**Work:**
6262+6363+- Implement `--daemon` internal flag on `tandem serve` (detach, redirect
6464+ stdio, write PID file).
6565+- Implement `tandem up`: validate flags, fork `tandem serve --daemon`, wait
6666+ for control socket readiness, print PID, exit 0.
6767+- Implement `tandem down`: connect to control socket, `POST /shutdown`,
6868+ wait for process exit.
6969+- `tandem up` when already running: exit 1 with message.
7070+- PID file at `$XDG_RUNTIME_DIR/tandem/daemon.pid`.
7171+7272+**Acceptance:**
7373+7474+- `tandem up --repo ... --listen ...` returns immediately, daemon is running.
7575+- `tandem status` shows running after `tandem up`.
7676+- `tandem down` stops daemon, `tandem status` shows not running.
7777+- `tandem up` twice: second invocation errors with "already running".
7878+- PID file and control socket cleaned up after `tandem down`.
7979+8080+**Test:** `tests/slice12_up_down.rs`
8181+- `tandem up`, verify status, connect client, read object, `tandem down`, verify stopped.
8282+- `tandem up` twice, verify error.
8383+8484+## Slice 13 — tandem logs (streaming)
8585+8686+**Goal:** `tandem logs` streams log output from a running daemon.
8787+8888+**Work:**
8989+9090+- Add tracing subscriber that fans out to: file/stderr + SSE clients.
9191+- Implement `GET /logs?level=<level>` on control socket (SSE stream).
9292+- Implement `tandem logs` command: connect to SSE endpoint, print lines.
9393+- `--level` flag on `tandem logs` (default: info).
9494+- `--json` flag for raw JSON log lines.
9595+- Client can request higher verbosity than daemon's file log level.
9696+9797+**Acceptance:**
9898+9999+- `tandem logs` prints log lines as events happen.
100100+- `tandem logs --level debug` shows debug events even if daemon was started
101101+ with `--log-level info`.
102102+- `tandem logs --json` outputs one JSON object per line.
103103+- `tandem logs` exits cleanly when daemon shuts down.
104104+- `tandem logs` with no daemon: exit 1 with helpful message.
105105+106106+**Test:** `tests/slice13_log_streaming.rs`
107107+- Start daemon, connect client, trigger activity, verify `tandem logs` output
108108+ contains expected event.
109109+- Verify `--level debug` produces more output than `--level warn`.
+3-1
docs/exec-plans/tech-debt-tracker.md
···2929- Add redaction rules for logs (paths, tokens, secrets)
3030- Decide reconnect/backoff defaults for `watchHeads`
3131- Verify object write idempotency contract and error codes
3232-- Clean shutdown for server (Ctrl+C signal handling)
3232+- Clean shutdown for server (Ctrl+C signal handling) — now part of server lifecycle feature, see `docs/exec-plans/active/server-lifecycle.md` slice 10
3333- Add distributed smoke-test harness (`sprites.dev` / `exe.dev`) with env-gated CI step
3434+- Control socket protocol design — finalize HTTP-over-Unix-socket vs alternatives, see `docs/design-docs/server-lifecycle.md`
3535+- Capnp token auth handshake design — how to validate bearer token during capnp connection setup
34363537### P3 (performance, not correctness)
3638
+8-1
docs/product-specs/core-product.md
···18181919## Out of scope
20202121-- authentication and tenant isolation
2121+- multi-tenant isolation and user/role auth model
2222- UI layer
2323- policy/workflow automation
2424+2525+## Planned
2626+2727+- **Server lifecycle management** — `tandem up/down/status/logs` for daemon
2828+ management without systemd. See `docs/design-docs/server-lifecycle.md`.
2929+- **Token auth** — bearer token on Cap'n Proto port. Single shared secret,
3030+ no user/role model. Required for servers on public networks.