···11+# AGENTS
22+33+Execution guide for building `tandem` from the docs in this repository.
44+55+## How to read these docs (quick)
66+77+1. Read `ARCHITECTURE.md` for system boundaries.
88+2. Read `docs/exec-plans/active/slice-roadmap.md` and pick the next slice.
99+3. Implement via failing integration test first.
1010+4. Keep any deferred cleanup in `docs/exec-plans/tech-debt-tracker.md`.
1111+5. When a slice is done, move a completion note into `docs/exec-plans/completed/`.
1212+1313+## Working style
1414+1515+- Implement **one vertical slice at a time**.
1616+- Each slice starts with a **failing Rust integration test**.
1717+- Make the test pass with the smallest correct change.
1818+- Keep behavior aligned with stock `jj` semantics.
1919+2020+## Priority order
2121+2222+1. Slice 1: Single-agent round-trip
2323+2. Slice 2: Two-agent visibility
2424+3. Slice 3: Concurrent convergence
2525+4. Slice 4: Promise pipelining
2626+5. Slice 5: WatchHeads
2727+6. Slice 6: Git round-trip via server-side `jj`
2828+7. Slice 7: End-to-end multi-agent
2929+3030+## Testing policy
3131+3232+- Integration tests are the primary source of truth.
3333+- Local deterministic tests first; cross-machine tests second.
3434+- Use `sprites.dev` / `exe.dev` for distributed smoke tests.
3535+- Keep networked tests opt-in (ignored by default / env-gated).
3636+3737+## Debug policy
3838+3939+Add structured tracing early so we do not sprinkle debug prints later.
4040+4141+Recommended flags:
4242+4343+- `--tandem-debug`
4444+- `--tandem-debug-format pretty|json`
4545+- `--tandem-debug-file <path>`
4646+- `--tandem-debug-filter <filter>`
4747+4848+Minimum events to emit:
4949+5050+- command lifecycle
5151+- RPC lifecycle
5252+- object read/write
5353+- CAS heads success/failure + retries
5454+- watcher subscribe/notify/reconnect
+85
ARCHITECTURE.md
···11+# ARCHITECTURE
22+33+`Tandem` = jj workspaces over the network.
44+55+## Shape
66+77+Single binary, two modes:
88+99+- `tandem serve --listen <addr> --repo <path>`
1010+- `tandem <jj command...>` (client mode)
1111+1212+## Core model
1313+1414+- Server hosts a **normal jj+git colocated repo**.
1515+- Client keeps **working copy local**.
1616+- Client store calls are remote via Cap'n Proto.
1717+- Clients always read heads from server, so no `workspace update-stale` model.
1818+1919+## Responsibilities
2020+2121+### Server
2222+2323+1. Read/write jj backend + op-store objects (commit/tree/file/symlink/copy/operation/view)
2424+2. Coordinate op heads with atomic compare-and-swap
2525+3. Notify watchers on head changes (`watchHeads`)
2626+2727+### Client
2828+2929+Implements jj traits as RPC stubs:
3030+3131+- `Backend`
3232+- `OpStore`
3333+- `OpHeadsStore`
3434+3535+On CAS failure, client retries using jj’s existing merge flow.
3636+3737+## Protocol
3838+3939+Cap'n Proto `Store` service (see `docs/design-docs/rpc-protocol.md` for the canonical schema).
4040+4141+Core capabilities:
4242+4343+- object read/write for backend + op-store data
4444+- op head reads + atomic updates
4545+- operation-prefix resolution
4646+- head watch subscriptions
4747+- optional snapshot/copy-tracking capabilities
4848+4949+No `repoId` in protocol: one server = one repo.
5050+5151+## Git compatibility
5252+5353+No custom git layer in tandem.
5454+5555+Git interop happens on server-hosted repo with stock `jj` commands:
5656+5757+- `jj git fetch`
5858+- `jj git push`
5959+6060+## Dependency graph
6161+6262+- Slice 1 (round-trip)
6363+ - enables Slice 2 (multi-agent)
6464+ - enables Slice 3 (concurrent merge)
6565+ - enables Slice 4 (pipelining)
6666+ - enables Slice 5 (watchHeads)
6767+ - enables Slice 6 (git round-trip)
6868+- Slice 7 integrates slices 1-6
6969+7070+Critical path: **1 → 2 → 6 → 7**.
7171+7272+## Technology choices
7373+7474+- Language: Rust
7575+- Binary: single `tandem`
7676+- RPC: Cap'n Proto (for promise pipelining)
7777+- Server storage: normal jj+git colocated repo
7878+- Serialization: jj-compatible object/op/view bytes
7979+8080+## Non-goals (v0.1)
8181+8282+- auth / ACL / multi-tenant isolation
8383+- workflow automation engines
8484+- web UI / IDE integrations
8585+- client-side caching
+11
docs/README.md
···11+# Docs
22+33+Minimal docs structure for the project:
44+55+- `../AGENTS.md` — execution/testing/debugging conventions
66+- `../ARCHITECTURE.md` — system shape and boundaries
77+- `design-docs/` — durable technical decisions
88+- `exec-plans/` — active/completed implementation plans
99+- `product-specs/` — concise product intent and scope
1010+1111+This docs set is now the canonical source of project direction and architecture.
+21
docs/design-docs/core-beliefs.md
···11+# Core Beliefs
22+33+1. **Thin server, smart client integration with jj**
44+ - Server provides storage + head coordination + notifications.
55+ - Workflow semantics remain jj-native.
66+77+2. **Stock jj behavior first**
88+ - Tandem should feel like jj, not a new VCS.
99+1010+3. **Remote store, local working copy**
1111+ - Fast local file operations with shared global history.
1212+1313+4. **No stale-head workflow**
1414+ - Always read latest heads from server.
1515+1616+5. **Compatibility over invention**
1717+ - Reuse jj protobuf/object formats where possible.
1818+ - Keep server repo a normal jj+git colocated repo.
1919+2020+6. **Integration tests drive slices**
2121+ - Every major claim in SPEC should be proven by an integration test.
+15
docs/design-docs/index.md
···11+# Design Docs Index
22+33+This folder holds stable technical decisions.
44+55+## Current docs
66+77+- [Core beliefs](./core-beliefs.md)
88+- [RPC protocol](./rpc-protocol.md)
99+- [RPC error model](./rpc-error-model.md)
1010+1111+## Add a new design doc when
1212+1313+- a decision affects correctness or compatibility
1414+- a decision changes protocol or storage format
1515+- a decision introduces operational tradeoffs
+128
docs/design-docs/rpc-error-model.md
···11+# RPC Error Model
22+33+This document defines Tandem’s error contract between client and server.
44+55+## Goals
66+77+- Keep failures mappable to `jj-lib` error types.
88+- Separate transport failures from domain/storage failures.
99+- Make retries safe and predictable.
1010+- Keep wire semantics stable for integration tests.
1111+1212+## Error classes
1313+1414+### 1) Transport/session errors
1515+1616+Examples:
1717+1818+- connection refused/reset
1919+- timeout
2020+- stream canceled
2121+- server unavailable
2222+2323+Behavior:
2424+2525+- surfaced by RPC runtime (not domain payload)
2626+- usually retriable for reads
2727+- writes may be retried only if idempotency is guaranteed
2828+2929+### 2) Domain/storage errors
3030+3131+Returned by server logic for request-specific failures.
3232+3333+Canonical codes:
3434+3535+- `not_found`
3636+- `invalid_id_length`
3737+- `invalid_data`
3838+- `unsupported`
3939+- `permission_denied` (reserved for future auth)
4040+- `internal`
4141+4242+### 3) Concurrency outcomes (not errors)
4343+4444+- `updateOpHeads(...)->ok=false` is **not** an error.
4545+- It represents normal CAS contention and triggers jj merge/retry flow.
4646+4747+## Error envelope (application-level)
4848+4949+When a method needs structured failures beyond generic exceptions, use:
5050+5151+```text
5252+RpcError {
5353+ code: <canonical code>,
5454+ message: <human-readable>,
5555+ retriable: <bool>,
5656+ details: {
5757+ object_type?: <string>,
5858+ hash?: <hex>,
5959+ op_id?: <hex>,
6060+ expected_len?: <u32>,
6161+ actual_len?: <u32>
6262+ }
6363+}
6464+```
6565+6666+Notes:
6767+6868+- Do not put secrets/tokens in `message` or `details`.
6969+- `message` is for operators; clients should branch on `code`.
7070+7171+## Mapping to `jj-lib`
7272+7373+### Backend mapping
7474+7575+- `not_found` + object context -> `BackendError::ObjectNotFound`
7676+- `invalid_id_length` -> `BackendError::InvalidHashLength`
7777+- invalid UTF-8 payloads -> `BackendError::InvalidUtf8`
7878+- read failures -> `BackendError::ReadObject` / `ReadFile`
7979+- write failures -> `BackendError::WriteObject`
8080+- unsupported feature -> `BackendError::Unsupported`
8181+- anything else -> `BackendError::Other`
8282+8383+### OpStore mapping
8484+8585+- `not_found` -> `OpStoreError::ObjectNotFound`
8686+- read failures -> `OpStoreError::ReadObject`
8787+- write failures -> `OpStoreError::WriteObject`
8888+- other -> `OpStoreError::Other`
8989+9090+### OpHeadsStore mapping
9191+9292+- get/list failures -> `OpHeadsStoreError::Read`
9393+- update failures (excluding CAS miss) -> `OpHeadsStoreError::Write`
9494+- lock failures -> `OpHeadsStoreError::Lock`
9595+9696+CAS miss path:
9797+9898+- represented by `ok=false` in `updateOpHeads`
9999+- should not be converted to `OpHeadsStoreError`
100100+101101+## Retry policy
102102+103103+### Safe to retry automatically
104104+105105+- pure reads (`getObject`, `getOperation`, `getView`, `getHeads`)
106106+- `watchHeads` subscribe after disconnect
107107+108108+### Retry with care
109109+110110+- writes only if idempotent by content-addressed semantics
111111+- if write acknowledgment is unknown, client may re-issue same write bytes
112112+113113+### Do not blind-retry
114114+115115+- `invalid_data`, `invalid_id_length`, `unsupported`
116116+117117+## Observability requirements
118118+119119+Log on both client and server:
120120+121121+- `rpc.method`
122122+- `rpc.error.code`
123123+- `retriable`
124124+- `attempt`
125125+- `latency_ms`
126126+- object/op identifiers (short hash prefix)
127127+128128+This allows debugging retries/concurrency without ad-hoc logging.
+201
docs/design-docs/rpc-protocol.md
···11+# RPC Protocol (Cap'n Proto)
22+33+This document defines Tandem’s wire protocol and storage data model for `jj-lib` compatibility.
44+55+Error semantics are defined in `rpc-error-model.md`.
66+77+## Goals
88+99+- Map cleanly to `jj_lib::backend::Backend`, `OpStore`, and `OpHeadsStore`.
1010+- Preserve jj’s operation/view model and multi-workspace visibility.
1111+- Keep the server authoritative for shared state while clients keep local working copies.
1212+- Support low-latency reads and push-based head updates.
1313+1414+## Repository scope
1515+1616+- One Tandem server serves one repo.
1717+- No `repoId` is sent in requests.
1818+- Run multiple servers for multiple repos.
1919+2020+## Compatibility contract
2121+2222+Clients must call `getRepoInfo()` on connect and verify:
2323+2424+- protocol version compatibility
2525+- jj object/op/view format compatibility
2626+- expected ID lengths and root IDs
2727+2828+If incompatible, client should fail fast with a clear error.
2929+3030+## Data model
3131+3232+### Backend object kinds
3333+3434+- `commit`
3535+- `tree`
3636+- `file`
3737+- `symlink`
3838+- `copy`
3939+4040+### Op-store objects
4141+4242+- `operation`
4343+- `view`
4444+4545+### Head state
4646+4747+- Current op-head set is stored server-side.
4848+- Head updates are linearizable via compare-and-swap semantics.
4949+- A monotonic `headsVersion` is incremented on successful head updates.
5050+5151+## Cap'n Proto interface (shape)
5252+5353+```capnp
5454+interface Store {
5555+ getRepoInfo @0 () -> (info :RepoInfo);
5656+5757+ getObject @1 (kind :ObjectKind, id :Data) -> (data :Data);
5858+ putObject @2 (kind :ObjectKind, data :Data) -> (id :Data, normalizedData :Data);
5959+6060+ getOperation @3 (id :Data) -> (data :Data);
6161+ putOperation @4 (data :Data) -> (id :Data);
6262+6363+ getView @5 (id :Data) -> (data :Data);
6464+ putView @6 (data :Data) -> (id :Data);
6565+6666+ resolveOperationIdPrefix @7 (hexPrefix :Text)
6767+ -> (resolution :PrefixResolution, match :Data);
6868+6969+ getHeads @8 () -> (heads :List(Data), version :UInt64);
7070+ updateOpHeads @9 (
7171+ oldIds :List(Data),
7272+ newId :Data,
7373+ expectedVersion :UInt64
7474+ ) -> (ok :Bool, heads :List(Data), version :UInt64);
7575+7676+ watchHeads @10 (watcher :HeadWatcher, afterVersion :UInt64)
7777+ -> (cancel :Cancel);
7878+7979+ getHeadsSnapshot @11 () -> (
8080+ heads :List(Data),
8181+ version :UInt64,
8282+ operations :List(IdBytes),
8383+ views :List(IdBytes)
8484+ );
8585+8686+ # Optional copy-tracking support (capability-gated)
8787+ getRelatedCopies @12 (copyId :Data) -> (copies :List(Data));
8888+}
8989+9090+interface HeadWatcher {
9191+ notify @0 (version :UInt64, heads :List(Data)) -> ();
9292+}
9393+9494+interface Cancel {
9595+ cancel @0 () -> ();
9696+}
9797+9898+struct IdBytes {
9999+ id @0 :Data;
100100+ data @1 :Data;
101101+}
102102+103103+enum ObjectKind {
104104+ commit @0;
105105+ tree @1;
106106+ file @2;
107107+ symlink @3;
108108+ copy @4;
109109+}
110110+111111+enum PrefixResolution {
112112+ noMatch @0;
113113+ singleMatch @1;
114114+ ambiguous @2;
115115+}
116116+117117+struct RepoInfo {
118118+ protocolMajor @0 :UInt16;
119119+ protocolMinor @1 :UInt16;
120120+ jjVersion @2 :Text;
121121+122122+ backendName @3 :Text;
123123+ opStoreName @4 :Text;
124124+125125+ commitIdLength @5 :UInt16;
126126+ changeIdLength @6 :UInt16;
127127+128128+ rootCommitId @7 :Data;
129129+ rootChangeId @8 :Data;
130130+ emptyTreeId @9 :Data;
131131+ rootOperationId @10 :Data;
132132+133133+ capabilities @11 :List(Capability);
134134+}
135135+136136+enum Capability {
137137+ watchHeads @0;
138138+ headsSnapshot @1;
139139+ copyTracking @2;
140140+}
141141+```
142142+143143+## Method semantics
144144+145145+### `putObject`
146146+147147+- Server computes canonical object ID from bytes.
148148+- Response returns canonical ID.
149149+- Writes are idempotent (same object bytes => same ID).
150150+- `normalizedData` allows commit write normalization; for non-commit objects it may equal input bytes.
151151+152152+### `putOperation` / `putView`
153153+154154+- Server computes IDs using jj-compatible content hashing.
155155+- IDs and bytes must remain byte-compatible with jj expectations.
156156+157157+### `updateOpHeads`
158158+159159+- Logical behavior: remove `oldIds`, add `newId`.
160160+- `ok=false` means caller must read current heads and retry merge/update flow.
161161+- This operation is the concurrency correctness boundary.
162162+163163+### `watchHeads`
164164+165165+- Notifications are monotonic by `version`.
166166+- Delivery is at-least-once and may coalesce rapid updates.
167167+- On reconnect, client resubscribes with `afterVersion` and/or calls `getHeads()` to catch up.
168168+169169+### `getHeadsSnapshot`
170170+171171+- Fast path for dependent read chains (`heads -> operations -> views`).
172172+- Returns a consistent snapshot tied to one `version`.
173173+174174+## Mapping to `jj-lib`
175175+176176+### Backend
177177+178178+- `read_*` -> `getObject(kind, id)`
179179+- `write_*` -> `putObject(kind, data)`
180180+- `get_related_copies` -> `getRelatedCopies` (when `copyTracking` capability exists)
181181+182182+### OpStore
183183+184184+- `read_operation` -> `getOperation`
185185+- `write_operation` -> `putOperation`
186186+- `read_view` -> `getView`
187187+- `write_view` -> `putView`
188188+- `resolve_operation_id_prefix` -> `resolveOperationIdPrefix`
189189+190190+### OpHeadsStore
191191+192192+- `get_op_heads` -> `getHeads`
193193+- `update_op_heads` -> `updateOpHeads`
194194+- `lock` -> client-local no-op lock (correctness comes from server-side CAS)
195195+196196+## Operational invariants
197197+198198+- `wc_commit_ids` in views is preserved exactly (workspace visibility model).
199199+- Non-root operations must keep valid parent links.
200200+- Head updates are durable before success responses.
201201+- Object reads/writes must not require any client-side global cache for correctness.
+55
docs/exec-plans/active/slice-roadmap.md
···11+# Active Execution Plan: Slice Roadmap
22+33+Canonical vertical-slice execution plan.
44+55+## Slice 1 — Single-agent round-trip
66+77+Goal: one client reads/writes via remote server and persists state.
88+99+Acceptance:
1010+- `tandem log/new/describe/diff` work
1111+- restarting client preserves state
1212+- server-side `jj log` matches
1313+1414+## Slice 2 — Two-agent visibility
1515+1616+Goal: two workspaces on different machines see each other.
1717+1818+Acceptance:
1919+- agent A and B both see each other's commits and workspaces
2020+2121+## Slice 3 — Concurrent convergence
2222+2323+Goal: concurrent writes do not lose data.
2424+2525+Acceptance:
2626+- both (or all) concurrent commits survive after CAS contention
2727+2828+## Slice 4 — Promise pipelining
2929+3030+Goal: dependent reads avoid additive RTT cost.
3131+3232+Acceptance:
3333+- latency benchmark proves pipelining behavior under artificial RPC delay
3434+3535+## Slice 5 — WatchHeads
3636+3737+Goal: clients receive head updates without polling.
3838+3939+Acceptance:
4040+- callback receives updates quickly
4141+- reconnect path catches up
4242+4343+## Slice 6 — Git round-trip
4444+4545+Goal: GitHub <-> server repo <-> clients round-trip via stock `jj git`.
4646+4747+Acceptance:
4848+- fetch and push are successful with expected history/diff
4949+5050+## Slice 7 — End-to-end multi-agent
5151+5252+Goal: integrated real-repo workflow.
5353+5454+Acceptance:
5555+- two agents collaborate concurrently and ship via server-side `jj git push`
+7
docs/exec-plans/completed/README.md
···11+# Completed Execution Plans
22+33+Move finished slice plans here with:
44+55+- date completed
66+- test file(s) proving completion
77+- follow-up debt/tasks
···11+# Core Product Spec
22+33+## One-liner
44+55+`tandem` lets multiple agents/machines use jj workspaces against a shared remote jj store, as if they shared a filesystem.
66+77+## Primary users
88+99+- AI/code agents collaborating concurrently
1010+- engineers using multiple machines
1111+1212+## Must-have outcomes
1313+1414+- same repo state visible across clients
1515+- no stale workspace workflow
1616+- safe concurrent writes (no lost updates)
1717+- server remains plain jj+git compatible
1818+1919+## Out of scope (v0.1)
2020+2121+- authentication and tenant isolation
2222+- UI layer
2323+- policy/workflow automation
+7
docs/product-specs/index.md
···11+# Product Specs Index
22+33+Lean product-facing docs for tandem.
44+55+## Current specs
66+77+- [Core product](./core-product.md)