klbr — codebase reference for agents#

Personal AI agent harness in Rust. Local LLM chat daemon with long-term memory, tool calling, and a ratatui TUI. Self-hosted, no corporate product feel.

crate layout#

klbr/
  klbr-core/    — agent loop, LLM client, memory, context, tools
  klbr-daemon/  — WebSocket server, bridges agent to clients
  klbr-ipc/     — shared protocol types (ClientMsg, ServerMsg)
  klbr-tui/     — ratatui TUI chat client

binaries: klbr-daemon (start this first), klbr-tui (connects to daemon)

LLM backend#

llama-server compatible API at http://localhost:1234
Chat model: google/gemma-4-26b-a4b
Embedding model: nomic-embed-text-v1.5 (768 dims)
Both served from the same endpoint (configured separately in Config as llm_url / embed_url)
Streaming via SSE (data: {...}\n)
Tool calls use OpenAI function-calling format with streaming delta accumulation

klbr-core#

`config.rs`#

Config struct (supports JSON config loading via Config::load()). Fields:

llm_url, embed_url, llm_model, embed_model
watermark_tokens: 32_000 — triggers compaction when context exceeds this
compaction_keep: 10 — turns to keep after draining
memory_top_k: 3, memory_sim_threshold: 0.3 — recall injection params
history_window — how many persisted turns to send on connect
compaction_llm_url, compaction_model — optional separate LLM for compaction
db_path: "agent.db", embed_dim: 768
anchor: String — system prompt (includes personality + memory tool instructions)

The anchor tells the agent about its memory tools and tagging conventions. Edit it in config.rs when adding/changing tools.

`llm.rs`#

LlmClient (Clone):

stream(messages, tools, tok_tx) — streaming completion, sends LlmEvent over mpsc channel. accumulates tool call deltas by index in HashMap<usize, PartialCall>, flushes LlmEvent::ToolCalls on [DONE].
complete(messages) — non-streaming, used for compaction summaries and reflection. returns (String, Usage).
embed(text) — returns Vec<f32> embedding.

Message struct — OpenAI format:

role: String, content: Option<String>
tool_calls: Option<Vec<ToolCall>> — for assistant tool call messages
tool_call_id: Option<String> — for tool result messages
All optional fields skip serialization when None (#[serde(skip_serializing_if)])
Constructors: Message::system(), ::user(), ::assistant(), ::with_tool_calls(), ::tool_result()

LlmEvent variants: Token(String), ThinkToken(String), Usage(Usage), ToolCalls(Vec<ToolCall>)

`memory.rs`#

SQLite + sqlite-vec. Single DB file (agent.db). Two tables:

memories — episodic memory store:

id, content TEXT, pinned INTEGER (0/1), tags TEXT (JSON array), ts INTEGER
paired with virtual table vec_memories (sqlite-vec, cosine distance metric, 768 dims)
migration-safe: migrate() runs ALTER TABLE ADD COLUMN (fails silently if column exists)

turns — full turn history:

id, role TEXT, content TEXT, thinking TEXT, ts INTEGER

MemoryStore (Clone, wraps Arc<Mutex<Connection>>):

store(content, emb, tags) → Result<i64> — insert memory, return id
set_pinned(id, bool) — pin/unpin
set_tags(id, tags) — replace tags
pinned_memories() → Vec<String> — for anchor injection at startup
recent_unpinned(n) → Vec<(i64, String, Vec<String>)> — for reflection prompt
recall(query_emb, tags, tag_and, limit) → Vec<RecallEntry> — main search method:
- no tags: global ANN via sqlite-vec
- with tags + query: fetch all tag-matched memories WITH embeddings, exact cosine in Rust (never misses due to ANN cutoff)
- with tags only: delegates to context_for
context_for(tags, tag_and, limit) → Vec<RecallEntry> — pure SQL tag lookup, newest first
log_turn(role, content, thinking) — append to turns table
recent_turns(n) → chronological slice (oldest first) for context replay
turns_before(before_id, limit) → for TUI scroll-back paging
get_all() → Vec<Memory> — all memories with embeddings (for dump)
reset() — drop and recreate all tables

RecallEntry: id, content, tags: Vec<String>, distance: Option<f32> (None = tag-only hit)

internal helpers (private):

top_k(emb, k) — ANN via sqlite-vec
by_tags(tags, tag_and) — SQL LIKE on JSON array with escape handling
tag_matched_with_embeddings(tags, tag_and) — for exact cosine path
cosine_distance(a, b) — returns value in [0, 2], matching sqlite-vec convention

`context.rs`#

In-memory sliding window sent to the LLM on each turn.

Context:

anchor: Vec<Message> — never evicted (system prompt + pinned memories)
turns: Vec<Message> — rolling conversation
total_tokens: usize — updated from LlmEvent::Usage

Key methods:

new(anchor, pinned_memories) — builds system message, appends pinned memories section
update_anchor(anchor, pinned) — rebuilds system message with pinned section
load_turns(pairs) — replay (role, content) pairs from DB on startup; skips tool/other roles (ephemeral)
inject_recalled_memories(memories) — ephemeral assistant message [recalled memory] with [id:..] [tags:..] blocks
push_input(content) — user turn only (persisted history stays clean)
push_assistant_tool_calls(calls) — assistant message with tool_calls, no content
push_tool_result(id, content) — tool role message
drain_oldest(keep) — removes all but keep most recent turns; walks forward from cut point to avoid splitting tool call sequences (never cuts mid-tool-call)
as_messages() — anchor + turns concatenated, ready to send to LLM

`tools.rs`#

definitions() — full tool list sent to LLM on every turn:

shell(cmd) — runs via sh -c, caps stdout 20k / stderr 5k chars
read_file(path, start_line?, end_line?) — caps at 50k bytes
write_file(path, content)
remember(content, important?, tags?) — embeds and stores; pins if important=true
recall(query, tags?, tag_mode?, limit?) — semantic search, optional tag filter
context_for(tags, tag_mode?, limit?) — pure tag lookup, default limit 20
edit_memory(id?, tags?, pinned?, special?) — unified edit tool (including special anchor memory)
list_memories() — shows pinned + 10 recent unpinned with ids and tags

memory_tools() — filtered subset for the reflection mini-loop: remember, recall, context_for, edit_memory, list_memories (no shell/file tools)

execute(call, memory, llm) — async dispatch by tool name. needs &MemoryStore and &LlmClient for memory tools.

`agent.rs`#

Main async loop (run()). Receives Interrupt from mpsc, sends AgentEvent over broadcast.

startup:

load pinned memories → Context::new(anchor, pinned)
replay recent turns from DB → ctx.load_turns()

interrupt handling:

Reset → clear context, emit Status
Compact → call compact() immediately
UserMessage → embed query, recall, ctx.inject_recalled_memories(), ctx.push_input(), log_turn()

tool loop (max 20 iterations):

spawn llm.stream() in background task
collect LlmEvents: accumulate tokens/thinking, capture ToolCalls
if tool calls: emit ToolCall events, tools::execute() each, emit ToolResult, push to context, loop
if plain text: push assistant message, log_turn(), emit Done + Metrics, check watermark

compact(output):

emit Status("reflecting...")
call reflect() — ephemeral mini tool loop (memory tools only, max 6 iterations, separate context)
drain oldest turns from main context
LLM-summarize drained text → store with tag ["compaction_summary"]
reload pinned memories and call ctx.update_anchor_memories()

reflect():

builds reflection prompt with: last 10 turn outline (truncated), current pinned + recent unpinned memories
runs mini stream loop with reflection_definitions() tools
agent pins/unpins/remembers/tags as it sees fit
ephemeral — results don't enter main context

`interrupt.rs`#

Interrupt enum: UserMessage(String), Reset, Compact

spawn_source(tx, f) — helper for future external interrupt sources (e.g. Bluesky notifications). Not currently used.

`lib.rs`#

Re-exports modules. Defines:

MetricsSnapshot = Arc<RwLock<Option<AgentMetrics>>>
AgentMetrics { turn_count, context_tokens, watermark }
AgentEvent enum: Started, Token(String), ThinkToken(String), Done, Status(String), Metrics(AgentMetrics), ToolCall { name, args }, ToolResult { name, content }

klbr-ipc#

ClientMsg (TUI → daemon, tagged by type field):

Message { source, content } — chat message
FetchHistory { before_id, limit } — scroll-back paging
Compact — manual compaction trigger
Reset — wipe DB and context
DumpMemories { path: Option<String> } — dump memories JSON to file

ServerMsg (daemon → TUI):

Started, Token { content }, ThinkToken { content }, Done
Status { content } — status bar text
Metrics { turn_count, context_tokens, watermark }
History { turns: Vec<HistoryEntry> } — sent on connect and on FetchHistory
ToolCall { name, args }, ToolResult { name, content }

HistoryEntry { id, timestamp, role, content, reasoning: Option<String> }

ws_url() → ws://127.0.0.1:8765

Protocol: one JSON ClientMsg/ServerMsg per WebSocket text frame.

klbr-daemon#

main.rs — wires everything together:

Config::load()
open MemoryStore, create LlmClient
spawn agent::run() and daemon::serve() concurrently
tokio::select! on both, propagate errors

daemon.rs — serve() accepts connections in a loop, each gets its own handle() task.

handle() per-connection:

push History { turns } immediately (last history_window turns from DB)
push current Metrics from snapshot if available
tokio::select! between:
- incoming WebSocket ClientMsg frames → translate to Interrupt or handle directly (FetchHistory, Reset, Compact, DumpMemories)
- AgentEvent from broadcast → translate to ServerMsg, send to client

send_msg() — serialize ServerMsg to JSON, send as WS text frame.

klbr-tui#

Ratatui TUI using crossterm + tui-scrollview.

App state:

history: Vec<ChatMsg> — display model
scroll: ScrollViewState, at_bottom: bool — scroll tracking
input: String, cursor: usize, cmd_mode: bool — input box
oldest_turn_id, history_exhausted, loading_history — scroll-back paging
turn_count, context_tokens, watermark, last_tps — metrics display

ChatMsg with Role enum:

User — cyan "you " prefix
Assistant { reason: Option<Reason>, step: AssistantStep } — green "klbr " prefix; Reason is collapsible thinking block; AssistantStep tracks PromptProcessing → Reasoning → Response → Done
System — dark gray, dimmed
Tool { name, args, result: Option<String> } — yellow $ name(key=val...) header + up to 10 lines of result (or "running..." while pending)

Commands (typed with / prefix):

/clear (/c) — clear display
/compact (/cp) — send ClientMsg::Compact
/reset — clear display + send ClientMsg::Reset
/dump [path] — send ClientMsg::DumpMemories
/think (/t) — toggle reasoning block on last assistant message
/help (/h) — show help inline

Event loop: tokio::select! between crossterm events and socket lines.

Scroll-back: PageUp sends FetchHistory { before_id: oldest_turn_id, limit: 50 }. prepend_turns() inserts older turns at front of history vec.

Tool result matching: on ServerMsg::ToolResult, scan history in reverse for last Role::Tool { name: matching, result: None } and fill in result.

Status bar (bottom line): {tps} ctx {pct}% ({remaining} tok left) (turns: N) {status}

data flow summary#

TUI ──ClientMsg──► daemon ──Interrupt──► agent
                                          │
                                     tool loop
                                          │
TUI ◄──ServerMsg── daemon ◄─AgentEvent── agent

things not yet implemented#

multiple clients (broadcast works but history paging is per-connection)
external interrupt sources (Bluesky, etc.) — spawn_source is ready but unused
auth/encryption for networked transports (WebSocket is local-only by default)

Configure Feed

Configure Feed

klbr — codebase reference for agents#

crate layout#

LLM backend#

klbr-core#

config.rs#

llm.rs#

memory.rs#

context.rs#

tools.rs#

agent.rs#

interrupt.rs#

lib.rs#