ive harnessed the harness
1# klbr — codebase reference for agents
2
3Personal AI agent harness in Rust. Local LLM chat daemon with long-term memory, tool calling, and a ratatui TUI. Self-hosted, no corporate product feel.
4
5---
6
7## crate layout
8
9```
10klbr/
11 klbr-core/ — agent loop, LLM client, memory, context, tools
12 klbr-daemon/ — WebSocket server, bridges agent to clients
13 klbr-ipc/ — shared protocol types (ClientMsg, ServerMsg)
14 klbr-tui/ — ratatui TUI chat client
15```
16
17**binaries**: `klbr-daemon` (start this first), `klbr-tui` (connects to daemon)
18
19---
20
21## LLM backend
22
23- **llama-server compatible API** at `http://localhost:1234`
24- Chat model: `google/gemma-4-26b-a4b`
25- Embedding model: `nomic-embed-text-v1.5` (768 dims)
26- Both served from the same endpoint (configured separately in `Config` as `llm_url` / `embed_url`)
27- Streaming via SSE (`data: {...}\n`)
28- Tool calls use OpenAI function-calling format with streaming delta accumulation
29
30---
31
32## klbr-core
33
34### `config.rs`
35
36`Config` struct (supports JSON config loading via `Config::load()`). Fields:
37- `llm_url`, `embed_url`, `llm_model`, `embed_model`
38- `watermark_tokens: 32_000` — triggers compaction when context exceeds this
39- `compaction_keep: 10` — turns to keep after draining
40- `memory_top_k: 3`, `memory_sim_threshold: 0.3` — recall injection params
41- `history_window` — how many persisted turns to send on connect
42- `compaction_llm_url`, `compaction_model` — optional separate LLM for compaction
43- `db_path: "agent.db"`, `embed_dim: 768`
44- `anchor: String` — system prompt (includes personality + memory tool instructions)
45
46The anchor tells the agent about its memory tools and tagging conventions. Edit it in `config.rs` when adding/changing tools.
47
48### `llm.rs`
49
50**`LlmClient`** (Clone):
51- `stream(messages, tools, tok_tx)` — streaming completion, sends `LlmEvent` over mpsc channel. accumulates tool call deltas by index in `HashMap<usize, PartialCall>`, flushes `LlmEvent::ToolCalls` on `[DONE]`.
52- `complete(messages)` — non-streaming, used for compaction summaries and reflection. returns `(String, Usage)`.
53- `embed(text)` — returns `Vec<f32>` embedding.
54
55**`Message`** struct — OpenAI format:
56- `role: String`, `content: Option<String>`
57- `tool_calls: Option<Vec<ToolCall>>` — for assistant tool call messages
58- `tool_call_id: Option<String>` — for tool result messages
59- All optional fields skip serialization when None (`#[serde(skip_serializing_if)]`)
60- Constructors: `Message::system()`, `::user()`, `::assistant()`, `::with_tool_calls()`, `::tool_result()`
61
62**`LlmEvent`** variants: `Token(String)`, `ThinkToken(String)`, `Usage(Usage)`, `ToolCalls(Vec<ToolCall>)`
63
64### `memory.rs`
65
66SQLite + sqlite-vec. Single DB file (`agent.db`). Two tables:
67
68**`memories`** — episodic memory store:
69- `id, content TEXT, pinned INTEGER (0/1), tags TEXT (JSON array), ts INTEGER`
70- paired with virtual table **`vec_memories`** (sqlite-vec, cosine distance metric, 768 dims)
71- migration-safe: `migrate()` runs `ALTER TABLE ADD COLUMN` (fails silently if column exists)
72
73**`turns`** — full turn history:
74- `id, role TEXT, content TEXT, thinking TEXT, ts INTEGER`
75
76**`MemoryStore`** (Clone, wraps `Arc<Mutex<Connection>>`):
77- `store(content, emb, tags)` → `Result<i64>` — insert memory, return id
78- `set_pinned(id, bool)` — pin/unpin
79- `set_tags(id, tags)` — replace tags
80- `pinned_memories()` → `Vec<String>` — for anchor injection at startup
81- `recent_unpinned(n)` → `Vec<(i64, String, Vec<String>)>` — for reflection prompt
82- `recall(query_emb, tags, tag_and, limit)` → `Vec<RecallEntry>` — **main search method**:
83 - no tags: global ANN via sqlite-vec
84 - with tags + query: fetch all tag-matched memories WITH embeddings, exact cosine in Rust (never misses due to ANN cutoff)
85 - with tags only: delegates to `context_for`
86- `context_for(tags, tag_and, limit)` → `Vec<RecallEntry>` — pure SQL tag lookup, newest first
87- `log_turn(role, content, thinking)` — append to turns table
88- `recent_turns(n)` → chronological slice (oldest first) for context replay
89- `turns_before(before_id, limit)` → for TUI scroll-back paging
90- `get_all()` → `Vec<Memory>` — all memories with embeddings (for dump)
91- `reset()` — drop and recreate all tables
92
93**`RecallEntry`**: `id, content, tags: Vec<String>, distance: Option<f32>` (None = tag-only hit)
94
95**internal helpers** (private):
96- `top_k(emb, k)` — ANN via sqlite-vec
97- `by_tags(tags, tag_and)` — SQL LIKE on JSON array with escape handling
98- `tag_matched_with_embeddings(tags, tag_and)` — for exact cosine path
99- `cosine_distance(a, b)` — returns value in [0, 2], matching sqlite-vec convention
100
101### `context.rs`
102
103In-memory sliding window sent to the LLM on each turn.
104
105**`Context`**:
106- `anchor: Vec<Message>` — never evicted (system prompt + pinned memories)
107- `turns: Vec<Message>` — rolling conversation
108- `total_tokens: usize` — updated from `LlmEvent::Usage`
109
110Key methods:
111- `new(anchor, pinned_memories)` — builds system message, appends pinned memories section
112- `update_anchor(anchor, pinned)` — rebuilds system message with pinned section
113- `load_turns(pairs)` — replay `(role, content)` pairs from DB on startup; skips tool/other roles (ephemeral)
114- `inject_recalled_memories(memories)` — ephemeral assistant message `[recalled memory]` with `[id:..] [tags:..]` blocks
115- `push_input(content)` — user turn only (persisted history stays clean)
116- `push_assistant_tool_calls(calls)` — assistant message with tool_calls, no content
117- `push_tool_result(id, content)` — tool role message
118- `drain_oldest(keep)` — removes all but `keep` most recent turns; walks forward from cut point to avoid splitting tool call sequences (never cuts mid-tool-call)
119- `as_messages()` — anchor + turns concatenated, ready to send to LLM
120
121### `tools.rs`
122
123**`definitions()`** — full tool list sent to LLM on every turn:
124- `shell(cmd)` — runs via `sh -c`, caps stdout 20k / stderr 5k chars
125- `read_file(path, start_line?, end_line?)` — caps at 50k bytes
126- `write_file(path, content)`
127- `remember(content, important?, tags?)` — embeds and stores; pins if `important=true`
128- `recall(query, tags?, tag_mode?, limit?)` — semantic search, optional tag filter
129- `context_for(tags, tag_mode?, limit?)` — pure tag lookup, default limit 20
130- `edit_memory(id?, tags?, pinned?, special?)` — unified edit tool (including special anchor memory)
131- `list_memories()` — shows pinned + 10 recent unpinned with ids and tags
132
133**`memory_tools()`** — filtered subset for the reflection mini-loop: `remember, recall, context_for, edit_memory, list_memories` (no shell/file tools)
134
135**`execute(call, memory, llm)`** — async dispatch by tool name. needs `&MemoryStore` and `&LlmClient` for memory tools.
136
137### `agent.rs`
138
139Main async loop (`run()`). Receives `Interrupt` from mpsc, sends `AgentEvent` over broadcast.
140
141**startup**:
1421. load pinned memories → `Context::new(anchor, pinned)`
1432. replay recent turns from DB → `ctx.load_turns()`
144
145**interrupt handling**:
146- `Reset` → clear context, emit `Status`
147- `Compact` → call `compact()` immediately
148- `UserMessage` → embed query, recall, `ctx.inject_recalled_memories()`, `ctx.push_input()`, `log_turn()`
149
150**tool loop** (max 20 iterations):
1511. spawn `llm.stream()` in background task
1522. collect `LlmEvent`s: accumulate tokens/thinking, capture `ToolCalls`
1533. if tool calls: emit `ToolCall` events, `tools::execute()` each, emit `ToolResult`, push to context, loop
1544. if plain text: push assistant message, `log_turn()`, emit `Done` + `Metrics`, check watermark
155
156**`compact(output)`**:
1571. emit `Status("reflecting...")`
1582. call `reflect()` — ephemeral mini tool loop (memory tools only, max 6 iterations, separate context)
1593. drain oldest turns from main context
1604. LLM-summarize drained text → store with tag `["compaction_summary"]`
1615. reload pinned memories and call `ctx.update_anchor_memories()`
162
163**`reflect()`**:
164- builds reflection prompt with: last 10 turn outline (truncated), current pinned + recent unpinned memories
165- runs mini stream loop with `reflection_definitions()` tools
166- agent pins/unpins/remembers/tags as it sees fit
167- ephemeral — results don't enter main context
168
169### `interrupt.rs`
170
171`Interrupt` enum: `UserMessage(String)`, `Reset`, `Compact`
172
173`spawn_source(tx, f)` — helper for future external interrupt sources (e.g. Bluesky notifications). Not currently used.
174
175### `lib.rs`
176
177Re-exports modules. Defines:
178- `MetricsSnapshot = Arc<RwLock<Option<AgentMetrics>>>`
179- `AgentMetrics { turn_count, context_tokens, watermark }`
180- `AgentEvent` enum: `Started, Token(String), ThinkToken(String), Done, Status(String), Metrics(AgentMetrics), ToolCall { name, args }, ToolResult { name, content }`
181
182---
183
184## klbr-ipc
185
186`ClientMsg` (TUI → daemon, tagged by `type` field):
187- `Message { source, content }` — chat message
188- `FetchHistory { before_id, limit }` — scroll-back paging
189- `Compact` — manual compaction trigger
190- `Reset` — wipe DB and context
191- `DumpMemories { path: Option<String> }` — dump memories JSON to file
192
193`ServerMsg` (daemon → TUI):
194- `Started, Token { content }, ThinkToken { content }, Done`
195- `Status { content }` — status bar text
196- `Metrics { turn_count, context_tokens, watermark }`
197- `History { turns: Vec<HistoryEntry> }` — sent on connect and on `FetchHistory`
198- `ToolCall { name, args }`, `ToolResult { name, content }`
199
200`HistoryEntry { id, timestamp, role, content, reasoning: Option<String> }`
201
202`ws_url()` → `ws://127.0.0.1:8765`
203
204Protocol: one JSON `ClientMsg`/`ServerMsg` per WebSocket text frame.
205
206---
207
208## klbr-daemon
209
210`main.rs` — wires everything together:
2111. `Config::load()`
2122. open `MemoryStore`, create `LlmClient`
2133. spawn `agent::run()` and `daemon::serve()` concurrently
2144. `tokio::select!` on both, propagate errors
215
216`daemon.rs` — `serve()` accepts connections in a loop, each gets its own `handle()` task.
217
218`handle()` per-connection:
2191. push `History { turns }` immediately (last `history_window` turns from DB)
2202. push current `Metrics` from snapshot if available
2213. `tokio::select!` between:
222 - incoming WebSocket `ClientMsg` frames → translate to `Interrupt` or handle directly (FetchHistory, Reset, Compact, DumpMemories)
223 - `AgentEvent` from broadcast → translate to `ServerMsg`, send to client
224
225`send_msg()` — serialize `ServerMsg` to JSON, send as WS text frame.
226
227---
228
229## klbr-tui
230
231Ratatui TUI using crossterm + `tui-scrollview`.
232
233**`App`** state:
234- `history: Vec<ChatMsg>` — display model
235- `scroll: ScrollViewState`, `at_bottom: bool` — scroll tracking
236- `input: String`, `cursor: usize`, `cmd_mode: bool` — input box
237- `oldest_turn_id`, `history_exhausted`, `loading_history` — scroll-back paging
238- `turn_count, context_tokens, watermark, last_tps` — metrics display
239
240**`ChatMsg`** with **`Role`** enum:
241- `User` — cyan "you " prefix
242- `Assistant { reason: Option<Reason>, step: AssistantStep }` — green "klbr " prefix; `Reason` is collapsible thinking block; `AssistantStep` tracks `PromptProcessing → Reasoning → Response → Done`
243- `System` — dark gray, dimmed
244- `Tool { name, args, result: Option<String> }` — yellow `$ name(key=val...)` header + up to 10 lines of result (or "running..." while pending)
245
246**Commands** (typed with `/` prefix):
247- `/clear` (`/c`) — clear display
248- `/compact` (`/cp`) — send `ClientMsg::Compact`
249- `/reset` — clear display + send `ClientMsg::Reset`
250- `/dump [path]` — send `ClientMsg::DumpMemories`
251- `/think` (`/t`) — toggle reasoning block on last assistant message
252- `/help` (`/h`) — show help inline
253
254**Event loop**: `tokio::select!` between crossterm events and socket lines.
255
256Scroll-back: PageUp sends `FetchHistory { before_id: oldest_turn_id, limit: 50 }`. `prepend_turns()` inserts older turns at front of history vec.
257
258Tool result matching: on `ServerMsg::ToolResult`, scan history in reverse for last `Role::Tool { name: matching, result: None }` and fill in result.
259
260Status bar (bottom line): `{tps} ctx {pct}% ({remaining} tok left) (turns: N) {status}`
261
262---
263
264## data flow summary
265
266```
267TUI ──ClientMsg──► daemon ──Interrupt──► agent
268 │
269 tool loop
270 │
271TUI ◄──ServerMsg── daemon ◄─AgentEvent── agent
272```
273
274---
275
276## things not yet implemented
277
278- multiple clients (broadcast works but history paging is per-connection)
279- external interrupt sources (Bluesky, etc.) — `spawn_source` is ready but unused
280- auth/encryption for networked transports (WebSocket is local-only by default)
281
282
283<!-- headroom:rtk-instructions -->
284# RTK (Rust Token Killer) - Token-Optimized Commands
285
286When running shell commands, **always prefix with `rtk`**. This reduces context
287usage by 60-90% with zero behavior change. If rtk has no filter for a command,
288it passes through unchanged — so it is always safe to use.
289
290## Key Commands
291```bash
292# Git (59-80% savings)
293rtk git status rtk git diff rtk git log
294
295# Files & Search (60-75% savings)
296rtk ls <path> rtk read <file> rtk grep <pattern>
297rtk find <pattern> rtk diff <file>
298
299# Test (90-99% savings) — shows failures only
300rtk pytest tests/ rtk cargo test rtk test <cmd>
301
302# Build & Lint (80-90% savings) — shows errors only
303rtk tsc rtk lint rtk cargo build
304rtk prettier --check rtk mypy rtk ruff check
305
306# Analysis (70-90% savings)
307rtk err <cmd> rtk log <file> rtk json <file>
308rtk summary <cmd> rtk deps rtk env
309
310# GitHub (26-87% savings)
311rtk gh pr view <n> rtk gh run list rtk gh issue list
312
313# Infrastructure (85% savings)
314rtk docker ps rtk kubectl get rtk docker logs <c>
315
316# Package managers (70-90% savings)
317rtk pip list rtk pnpm install rtk npm run <script>
318```
319
320## Rules
321- In command chains, prefix each segment: `rtk git add . && rtk git commit -m "msg"`
322- For debugging, use raw command without rtk prefix
323- `rtk proxy <cmd>` runs command without filtering but tracks usage
324<!-- /headroom:rtk-instructions -->