klbr — codebase reference for agents#
Personal AI agent harness in Rust. Local LLM chat daemon with long-term memory, tool calling, and a ratatui TUI. Self-hosted, no corporate product feel.
crate layout#
klbr/
klbr-core/ — agent loop, LLM client, memory, context, tools
klbr-daemon/ — WebSocket server, bridges agent to clients
klbr-ipc/ — shared protocol types (ClientMsg, ServerMsg)
klbr-tui/ — ratatui TUI chat client
binaries: klbr-daemon (start this first), klbr-tui (connects to daemon)
LLM backend#
- llama-server compatible API at
http://localhost:1234 - Chat model:
google/gemma-4-26b-a4b - Embedding model:
nomic-embed-text-v1.5(768 dims) - Both served from the same endpoint (configured separately in
Configasllm_url/embed_url) - Streaming via SSE (
data: {...}\n) - Tool calls use OpenAI function-calling format with streaming delta accumulation
klbr-core#
config.rs#
Config struct (supports JSON config loading via Config::load()). Fields:
llm_url,embed_url,llm_model,embed_modelwatermark_tokens: 32_000— triggers compaction when context exceeds thiscompaction_keep: 10— turns to keep after drainingmemory_top_k: 3,memory_sim_threshold: 0.3— recall injection paramshistory_window— how many persisted turns to send on connectcompaction_llm_url,compaction_model— optional separate LLM for compactiondb_path: "agent.db",embed_dim: 768anchor: String— system prompt (includes personality + memory tool instructions)
The anchor tells the agent about its memory tools and tagging conventions. Edit it in config.rs when adding/changing tools.
llm.rs#
LlmClient (Clone):
stream(messages, tools, tok_tx)— streaming completion, sendsLlmEventover mpsc channel. accumulates tool call deltas by index inHashMap<usize, PartialCall>, flushesLlmEvent::ToolCallson[DONE].complete(messages)— non-streaming, used for compaction summaries and reflection. returns(String, Usage).embed(text)— returnsVec<f32>embedding.
Message struct — OpenAI format:
role: String,content: Option<String>tool_calls: Option<Vec<ToolCall>>— for assistant tool call messagestool_call_id: Option<String>— for tool result messages- All optional fields skip serialization when None (
#[serde(skip_serializing_if)]) - Constructors:
Message::system(),::user(),::assistant(),::with_tool_calls(),::tool_result()
LlmEvent variants: Token(String), ThinkToken(String), Usage(Usage), ToolCalls(Vec<ToolCall>)
memory.rs#
SQLite + sqlite-vec. Single DB file (agent.db). Two tables:
memories — episodic memory store:
id, content TEXT, pinned INTEGER (0/1), tags TEXT (JSON array), ts INTEGER- paired with virtual table
vec_memories(sqlite-vec, cosine distance metric, 768 dims) - migration-safe:
migrate()runsALTER TABLE ADD COLUMN(fails silently if column exists)
turns — full turn history:
id, role TEXT, content TEXT, thinking TEXT, ts INTEGER
MemoryStore (Clone, wraps Arc<Mutex<Connection>>):
store(content, emb, tags)→Result<i64>— insert memory, return idset_pinned(id, bool)— pin/unpinset_tags(id, tags)— replace tagspinned_memories()→Vec<String>— for anchor injection at startuprecent_unpinned(n)→Vec<(i64, String, Vec<String>)>— for reflection promptrecall(query_emb, tags, tag_and, limit)→Vec<RecallEntry>— main search method:- no tags: global ANN via sqlite-vec
- with tags + query: fetch all tag-matched memories WITH embeddings, exact cosine in Rust (never misses due to ANN cutoff)
- with tags only: delegates to
context_for
context_for(tags, tag_and, limit)→Vec<RecallEntry>— pure SQL tag lookup, newest firstlog_turn(role, content, thinking)— append to turns tablerecent_turns(n)→ chronological slice (oldest first) for context replayturns_before(before_id, limit)→ for TUI scroll-back pagingget_all()→Vec<Memory>— all memories with embeddings (for dump)reset()— drop and recreate all tables
RecallEntry: id, content, tags: Vec<String>, distance: Option<f32> (None = tag-only hit)
internal helpers (private):
top_k(emb, k)— ANN via sqlite-vecby_tags(tags, tag_and)— SQL LIKE on JSON array with escape handlingtag_matched_with_embeddings(tags, tag_and)— for exact cosine pathcosine_distance(a, b)— returns value in [0, 2], matching sqlite-vec convention
context.rs#
In-memory sliding window sent to the LLM on each turn.
Context:
anchor: Vec<Message>— never evicted (system prompt + pinned memories)turns: Vec<Message>— rolling conversationtotal_tokens: usize— updated fromLlmEvent::Usage
Key methods:
new(anchor, pinned_memories)— builds system message, appends pinned memories sectionupdate_anchor(anchor, pinned)— rebuilds system message with pinned sectionload_turns(pairs)— replay(role, content)pairs from DB on startup; skips tool/other roles (ephemeral)inject_recalled_memories(memories)— ephemeral assistant message[recalled memory]with[id:..] [tags:..]blockspush_input(content)— user turn only (persisted history stays clean)push_assistant_tool_calls(calls)— assistant message with tool_calls, no contentpush_tool_result(id, content)— tool role messagedrain_oldest(keep)— removes all butkeepmost recent turns; walks forward from cut point to avoid splitting tool call sequences (never cuts mid-tool-call)as_messages()— anchor + turns concatenated, ready to send to LLM
tools.rs#
definitions() — full tool list sent to LLM on every turn:
shell(cmd)— runs viash -c, caps stdout 20k / stderr 5k charsread_file(path, start_line?, end_line?)— caps at 50k byteswrite_file(path, content)remember(content, important?, tags?)— embeds and stores; pins ifimportant=truerecall(query, tags?, tag_mode?, limit?)— semantic search, optional tag filtercontext_for(tags, tag_mode?, limit?)— pure tag lookup, default limit 20edit_memory(id?, tags?, pinned?, special?)— unified edit tool (including special anchor memory)list_memories()— shows pinned + 10 recent unpinned with ids and tags
memory_tools() — filtered subset for the reflection mini-loop: remember, recall, context_for, edit_memory, list_memories (no shell/file tools)
execute(call, memory, llm) — async dispatch by tool name. needs &MemoryStore and &LlmClient for memory tools.
agent.rs#
Main async loop (run()). Receives Interrupt from mpsc, sends AgentEvent over broadcast.
startup:
- load pinned memories →
Context::new(anchor, pinned) - replay recent turns from DB →
ctx.load_turns()
interrupt handling:
Reset→ clear context, emitStatusCompact→ callcompact()immediatelyUserMessage→ embed query, recall,ctx.inject_recalled_memories(),ctx.push_input(),log_turn()
tool loop (max 20 iterations):
- spawn
llm.stream()in background task - collect
LlmEvents: accumulate tokens/thinking, captureToolCalls - if tool calls: emit
ToolCallevents,tools::execute()each, emitToolResult, push to context, loop - if plain text: push assistant message,
log_turn(), emitDone+Metrics, check watermark
compact(output):
- emit
Status("reflecting...") - call
reflect()— ephemeral mini tool loop (memory tools only, max 6 iterations, separate context) - drain oldest turns from main context
- LLM-summarize drained text → store with tag
["compaction_summary"] - reload pinned memories and call
ctx.update_anchor_memories()
reflect():
- builds reflection prompt with: last 10 turn outline (truncated), current pinned + recent unpinned memories
- runs mini stream loop with
reflection_definitions()tools - agent pins/unpins/remembers/tags as it sees fit
- ephemeral — results don't enter main context
interrupt.rs#
Interrupt enum: UserMessage(String), Reset, Compact
spawn_source(tx, f) — helper for future external interrupt sources (e.g. Bluesky notifications). Not currently used.
lib.rs#
Re-exports modules. Defines:
MetricsSnapshot = Arc<RwLock<Option<AgentMetrics>>>AgentMetrics { turn_count, context_tokens, watermark }AgentEventenum:Started, Token(String), ThinkToken(String), Done, Status(String), Metrics(AgentMetrics), ToolCall { name, args }, ToolResult { name, content }
klbr-ipc#
ClientMsg (TUI → daemon, tagged by type field):
Message { source, content }— chat messageFetchHistory { before_id, limit }— scroll-back pagingCompact— manual compaction triggerReset— wipe DB and contextDumpMemories { path: Option<String> }— dump memories JSON to file
ServerMsg (daemon → TUI):
Started, Token { content }, ThinkToken { content }, DoneStatus { content }— status bar textMetrics { turn_count, context_tokens, watermark }History { turns: Vec<HistoryEntry> }— sent on connect and onFetchHistoryToolCall { name, args },ToolResult { name, content }
HistoryEntry { id, timestamp, role, content, reasoning: Option<String> }
ws_url() → ws://127.0.0.1:8765
Protocol: one JSON ClientMsg/ServerMsg per WebSocket text frame.
klbr-daemon#
main.rs — wires everything together:
Config::load()- open
MemoryStore, createLlmClient - spawn
agent::run()anddaemon::serve()concurrently tokio::select!on both, propagate errors
daemon.rs — serve() accepts connections in a loop, each gets its own handle() task.
handle() per-connection:
- push
History { turns }immediately (lasthistory_windowturns from DB) - push current
Metricsfrom snapshot if available tokio::select!between:- incoming WebSocket
ClientMsgframes → translate toInterruptor handle directly (FetchHistory, Reset, Compact, DumpMemories) AgentEventfrom broadcast → translate toServerMsg, send to client
- incoming WebSocket
send_msg() — serialize ServerMsg to JSON, send as WS text frame.
klbr-tui#
Ratatui TUI using crossterm + tui-scrollview.
App state:
history: Vec<ChatMsg>— display modelscroll: ScrollViewState,at_bottom: bool— scroll trackinginput: String,cursor: usize,cmd_mode: bool— input boxoldest_turn_id,history_exhausted,loading_history— scroll-back pagingturn_count, context_tokens, watermark, last_tps— metrics display
ChatMsg with Role enum:
User— cyan "you " prefixAssistant { reason: Option<Reason>, step: AssistantStep }— green "klbr " prefix;Reasonis collapsible thinking block;AssistantSteptracksPromptProcessing → Reasoning → Response → DoneSystem— dark gray, dimmedTool { name, args, result: Option<String> }— yellow$ name(key=val...)header + up to 10 lines of result (or "running..." while pending)
Commands (typed with / prefix):
/clear(/c) — clear display/compact(/cp) — sendClientMsg::Compact/reset— clear display + sendClientMsg::Reset/dump [path]— sendClientMsg::DumpMemories/think(/t) — toggle reasoning block on last assistant message/help(/h) — show help inline
Event loop: tokio::select! between crossterm events and socket lines.
Scroll-back: PageUp sends FetchHistory { before_id: oldest_turn_id, limit: 50 }. prepend_turns() inserts older turns at front of history vec.
Tool result matching: on ServerMsg::ToolResult, scan history in reverse for last Role::Tool { name: matching, result: None } and fill in result.
Status bar (bottom line): {tps} ctx {pct}% ({remaining} tok left) (turns: N) {status}
data flow summary#
TUI ──ClientMsg──► daemon ──Interrupt──► agent
│
tool loop
│
TUI ◄──ServerMsg── daemon ◄─AgentEvent── agent
things not yet implemented#
- multiple clients (broadcast works but history paging is per-connection)
- external interrupt sources (Bluesky, etc.) —
spawn_sourceis ready but unused - auth/encryption for networked transports (WebSocket is local-only by default)