···11+# CLAUDE.md
22+33+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
44+55+## Project Overview
66+77+`localcode` — a single CLI for managing a fully offline local AI coding environment on macOS Apple Silicon. Uses llama.cpp to serve Qwen 2.5 Coder models via OpenAI-compatible APIs, with switchable terminal coding agents (Aider, OpenCode, Pi).
88+99+## Commands
1010+1111+- `npm run dev -- <args>` — Run via tsx (development)
1212+- `npm run build` — Compile TypeScript to `dist/`
1313+- `npx tsc --noEmit` — Type-check without emitting
1414+1515+After `localcode setup`, the `localcode` binary is available in `~/.local/bin/`.
1616+1717+## CLI
1818+1919+```
2020+localcode Launch active TUI in current directory
2121+localcode status Show current config + server health
2222+localcode start Start chat + autocomplete servers
2323+localcode stop Stop all servers
2424+localcode models List available models
2525+localcode models set-chat <id> Switch chat model
2626+localcode models set-auto <id> Switch autocomplete model
2727+localcode tuis List available TUIs
2828+localcode tuis set <id> Switch active TUI
2929+localcode bench Benchmark running chat model
3030+localcode bench history Show past benchmark results
3131+localcode pipe "prompt" Pipe stdin through the model
3232+localcode setup Full install
3333+```
3434+3535+## Architecture
3636+3737+```
3838+src/
3939+ main.ts — CLI dispatcher (switch on process.argv[2])
4040+ config.ts — Path/port constants
4141+ log.ts — log/warn/err with ANSI colors
4242+ util.ts — Shell exec helpers, file writers
4343+ runtime-config.ts — Read/write ~/.config/localcode/config.json
4444+ registry/
4545+ models.ts — ModelDef interface + MODELS array
4646+ tuis.ts — TuiDef interface + TUIS array
4747+ commands/
4848+ run.ts — Default action: ensure server, init git, exec TUI
4949+ status.ts — Show config + server health
5050+ server.ts — Start/stop llama.cpp servers
5151+ setup.ts — Full install pipeline
5252+ models.ts — List/switch models, auto-download + regen scripts
5353+ tuis.ts — List/switch TUIs, auto-install + regen scripts
5454+ bench.ts — Benchmark against running llama.cpp server
5555+ pipe.ts — Pipe stdin through the model
5656+ steps/ — Individual setup phases (preflight, homebrew, llama, etc.)
5757+ templates/
5858+ scripts.ts — Bash server launcher templates (parameterized by ModelDef)
5959+ aider.ts — Aider config template
6060+ opencode.ts — OpenCode config template
6161+ pi.ts — Pi models.json template
6262+```
6363+6464+### Key patterns
6565+6666+**Runtime config** (`~/.config/localcode/config.json`): Stores active chatModel, autocompleteModel, and tui IDs. Read by `runtime-config.ts` with defaults fallback.
6767+6868+**Registries**: `registry/models.ts` and `registry/tuis.ts` define available options as typed arrays. Add new models/TUIs by appending to these arrays.
6969+7070+**Script regeneration**: When models or TUI are switched, launcher scripts in `~/.local/bin/` and all TUI configs are automatically regenerated.
7171+7272+**Template escaping**: Bash templates in `src/templates/scripts.ts` use `const D = "$"` to emit literal `$` without triggering TS interpolation.
7373+7474+**Generated scripts**: Only 3 bash scripts are generated in `~/.local/bin/`: `localcode` (thin wrapper calling `node dist/main.js`), `llama-chat-server`, `llama-complete-server`. All other functionality lives in TypeScript commands.
7575+7676+**Benchmark**: Hits `/v1/chat/completions` with 3 hardcoded prompts, measures wall-clock time + token counts. Results saved to `~/.config/localcode/benchmarks.json`.
7777+7878+## Key paths on the user's system
7979+8080+- `~/.local/bin/` — `localcode` wrapper + server launcher scripts
8181+- `~/.local/share/llama-models/` — Downloaded GGUF model files
8282+- `~/.config/localcode/config.json` — Active model/TUI selection
8383+- `~/.config/localcode/benchmarks.json` — Benchmark history
8484+- `~/.aider/` — Aider config
8585+- `~/.config/opencode/opencode.json` — OpenCode config
8686+- `~/.pi/agent/models.json` — Pi config
8787+- Chat server port **8080**, autocomplete port **8081**
8888+8989+## Important: after changing TypeScript
9090+9191+The `localcode` wrapper in `~/.local/bin/` calls `node dist/main.js`. After modifying TypeScript source, run `npm run build` to recompile, or the wrapper will run stale code.
+192
README.md
···11+# Local AI Coding Environment
22+33+A fully offline, privacy-first AI coding setup for macOS Apple Silicon. Uses **llama.cpp** to run **Qwen 2.5 Coder** models locally, with **Aider** and **OpenCode** as terminal-based coding agents — no API keys, no cloud, no costs.
44+55+## Hardware Requirements
66+77+- **Mac with Apple Silicon** (M1/M2/M3/M4)
88+- **32GB RAM** recommended (the 32B model uses ~20GB)
99+- ~25GB free disk space for models
1010+1111+## What Gets Installed
1212+1313+| Component | Purpose |
1414+|---|---|
1515+| **llama.cpp** | Local model inference with Metal GPU acceleration |
1616+| **Qwen 2.5 Coder 32B** (Q4_K_M) | Main chat/coding model (~20GB) |
1717+| **Qwen 2.5 Coder 1.5B** (Q4_K_M) | Fast autocomplete model (~1.2GB) |
1818+| **Aider** | Terminal coding agent (Claude Code alternative) |
1919+| **jq** | JSON processing for the pipe command |
2020+2121+Both models are served via llama.cpp's built-in OpenAI-compatible API, making them work with any tool that supports the OpenAI API format.
2222+2323+## Installation
2424+2525+```bash
2626+chmod +x setup.sh
2727+./setup.sh
2828+```
2929+3030+The script is idempotent — safe to run multiple times. The first run downloads ~21GB of model weights from HuggingFace.
3131+3232+After installation, restart your shell or run:
3333+3434+```bash
3535+source ~/.zshrc
3636+```
3737+3838+## Commands
3939+4040+### `llama-start`
4141+4242+Starts both llama.cpp servers in the foreground. Press `Ctrl+C` to stop both.
4343+4444+- Chat model (32B): `http://127.0.0.1:8080`
4545+- Autocomplete model (1.5B): `http://127.0.0.1:8081`
4646+4747+### `llama-stop`
4848+4949+Kills all running llama-server processes.
5050+5151+### `ai-code [directory]`
5252+5353+The main coding agent. Auto-starts the chat server if it's not running. Initializes a git repo if one doesn't exist, then launches Aider with full file-editing capabilities.
5454+5555+```bash
5656+cd ~/projects/my-app
5757+ai-code .
5858+5959+# or from anywhere
6060+ai-code ~/projects/my-app
6161+```
6262+6363+Inside Aider you can ask it to edit files, run commands, and refactor code across your project. Changes are auto-committed to git so you can always roll back.
6464+6565+### `ai-ask "question"`
6666+6767+Quick Q&A mode — no file editing, just chat. Useful for coding questions without modifying your project.
6868+6969+```bash
7070+ai-ask "how do I handle errors in rust"
7171+```
7272+7373+### `ai-pipe "prompt"`
7474+7575+Pipe code through the model via stdin. Useful for one-shot transforms in scripts.
7676+7777+```bash
7878+cat main.py | ai-pipe "add type hints"
7979+git diff | ai-pipe "write a commit message"
8080+cat api.go | ai-pipe "find bugs"
8181+```
8282+8383+## Using OpenCode Instead of Aider
8484+8585+[OpenCode](https://opencode.ai) is another terminal coding agent with a polished TUI. It connects to the same llama.cpp backend.
8686+8787+### Install OpenCode
8888+8989+```bash
9090+brew install anomalyco/tap/opencode
9191+```
9292+9393+### Configure
9494+9595+Create `~/.config/opencode/opencode.json`:
9696+9797+```json
9898+{
9999+ "$schema": "https://opencode.ai/config.json",
100100+ "model": "llama-cpp/qwen2.5-coder-32b",
101101+ "provider": {
102102+ "llama-cpp": {
103103+ "npm": "@ai-sdk/openai-compatible",
104104+ "name": "llama.cpp (local)",
105105+ "options": {
106106+ "baseURL": "http://127.0.0.1:8080/v1",
107107+ "apiKey": "not-needed"
108108+ },
109109+ "models": {
110110+ "qwen2.5-coder-32b": {
111111+ "name": "Qwen 2.5 Coder 32B",
112112+ "tools": true
113113+ }
114114+ }
115115+ }
116116+ }
117117+}
118118+```
119119+120120+### Run
121121+122122+```bash
123123+llama-start # start the server (or ai-code auto-starts it)
124124+opencode # launch OpenCode in your project directory
125125+```
126126+127127+Use `/models` inside OpenCode to select the Qwen model, and `Tab` to switch between Plan and Build modes.
128128+129129+## Configuration Files
130130+131131+| File | Purpose |
132132+|---|---|
133133+| `~/.aider/aider.conf.yml` | Aider settings (model, git, UI) |
134134+| `~/.aider/.env` | API base URL and key for Aider |
135135+| `~/.config/opencode/opencode.json` | OpenCode provider config |
136136+| `~/.local/share/llama-models/` | Downloaded GGUF model files |
137137+| `~/.local/bin/` | Launcher scripts |
138138+139139+## llama.cpp Server Flags
140140+141141+The chat server launches with these defaults:
142142+143143+| Flag | Value | Purpose |
144144+|---|---|---|
145145+| `--ctx-size` | 16384 | Context window (increase to 32768 if tools misbehave) |
146146+| `--n-gpu-layers` | 99 | Offload all layers to Metal GPU |
147147+| `--flash-attn` | — | Enable flash attention for speed |
148148+| `--mlock` | — | Lock model in RAM to prevent swapping |
149149+| `--threads` | auto | Uses performance core count |
150150+151151+## Troubleshooting
152152+153153+**Model loading is slow on first run**: The first inference after starting the server takes 10–30 seconds while the model loads into memory. Subsequent requests are fast.
154154+155155+**Running out of RAM / swapping**: The 32B Q4 model needs ~20GB. Close memory-heavy apps. You can also try the smaller `qwen2.5-coder-14b-instruct-q4_k_m.gguf` instead.
156156+157157+**OpenCode tools not working**: Increase `--ctx-size` to 32768 in the `llama-chat-server` script. Tool-calling needs more context to work reliably.
158158+159159+**Slow generation speed**: Expect ~15–25 tokens/sec on the 32B model with M4. This is normal for a model this size running locally. The 1.5B autocomplete model runs much faster.
160160+161161+**Server won't start**: Check if another process is using port 8080 or 8081 with `lsof -i :8080`. Use `llama-stop` to kill stale processes.
162162+163163+## Performance Expectations
164164+165165+| Model | Speed | Use Case |
166166+|---|---|---|
167167+| Qwen 2.5 Coder 32B | ~15–25 tok/s | Chat, code generation, refactoring |
168168+| Qwen 2.5 Coder 1.5B | ~100+ tok/s | Autocomplete, quick suggestions |
169169+170170+Both models run entirely on-device using Metal acceleration. No network connection required after initial setup.
171171+172172+## Uninstall
173173+174174+```bash
175175+# Remove models (~21GB)
176176+rm -rf ~/.local/share/llama-models
177177+178178+# Remove launcher scripts
179179+rm ~/.local/bin/{ai-code,ai-ask,ai-pipe,llama-start,llama-stop,llama-chat-server,llama-complete-server}
180180+181181+# Remove configs
182182+rm -rf ~/.aider
183183+rm -f ~/.config/opencode/opencode.json
184184+185185+# Remove Ollama auto-start (if set)
186186+launchctl unload ~/Library/LaunchAgents/com.ollama.serve.plist
187187+rm ~/Library/LaunchAgents/com.ollama.serve.plist
188188+189189+# Uninstall packages
190190+pipx uninstall aider-chat
191191+brew uninstall llama.cpp jq
192192+```
+5
gen-scripts.ts
···11+import { createLauncherScripts } from "./src/steps/scripts.js";
22+import { writeTuiConfig } from "./src/steps/aider-config.js";
33+44+await createLauncherScripts();
55+await writeTuiConfig();