ive harnessed the harness
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

klbr — codebase reference for agents#

Personal AI agent harness in Rust. Local LLM chat daemon with long-term memory, tool calling, and a ratatui TUI. Self-hosted, no corporate product feel.


crate layout#

klbr/
  klbr-core/    — agent loop, LLM client, memory, context, tools
  klbr-daemon/  — WebSocket server, bridges agent to clients
  klbr-ipc/     — shared protocol types (ClientMsg, ServerMsg)
  klbr-tui/     — ratatui TUI chat client

binaries: klbr-daemon (start this first), klbr-tui (connects to daemon)


LLM backend#

  • llama-server compatible API at http://localhost:1234
  • Chat model: google/gemma-4-26b-a4b
  • Embedding model: nomic-embed-text-v1.5 (768 dims)
  • Both served from the same endpoint (configured separately in Config as llm_url / embed_url)
  • Streaming via SSE (data: {...}\n)
  • Tool calls use OpenAI function-calling format with streaming delta accumulation

klbr-core#

config.rs#

Config struct (supports JSON config loading via Config::load()). Fields:

  • llm_url, embed_url, llm_model, embed_model
  • watermark_tokens: 32_000 — triggers compaction when context exceeds this
  • compaction_keep: 10 — turns to keep after draining
  • memory_top_k: 3, memory_sim_threshold: 0.3 — recall injection params
  • history_window — how many persisted turns to send on connect
  • compaction_llm_url, compaction_model — optional separate LLM for compaction
  • db_path: "agent.db", embed_dim: 768
  • anchor: String — system prompt (includes personality + memory tool instructions)

The anchor tells the agent about its memory tools and tagging conventions. Edit it in config.rs when adding/changing tools.

llm.rs#

LlmClient (Clone):

  • stream(messages, tools, tok_tx) — streaming completion, sends LlmEvent over mpsc channel. accumulates tool call deltas by index in HashMap<usize, PartialCall>, flushes LlmEvent::ToolCalls on [DONE].
  • complete(messages) — non-streaming, used for compaction summaries and reflection. returns (String, Usage).
  • embed(text) — returns Vec<f32> embedding.

Message struct — OpenAI format:

  • role: String, content: Option<String>
  • tool_calls: Option<Vec<ToolCall>> — for assistant tool call messages
  • tool_call_id: Option<String> — for tool result messages
  • All optional fields skip serialization when None (#[serde(skip_serializing_if)])
  • Constructors: Message::system(), ::user(), ::assistant(), ::with_tool_calls(), ::tool_result()

LlmEvent variants: Token(String), ThinkToken(String), Usage(Usage), ToolCalls(Vec<ToolCall>)

memory.rs#

SQLite + sqlite-vec. Single DB file (agent.db). Two tables:

memories — episodic memory store:

  • id, content TEXT, pinned INTEGER (0/1), tags TEXT (JSON array), ts INTEGER
  • paired with virtual table vec_memories (sqlite-vec, cosine distance metric, 768 dims)
  • migration-safe: migrate() runs ALTER TABLE ADD COLUMN (fails silently if column exists)

turns — full turn history:

  • id, role TEXT, content TEXT, thinking TEXT, ts INTEGER

MemoryStore (Clone, wraps Arc<Mutex<Connection>>):

  • store(content, emb, tags)Result<i64> — insert memory, return id
  • set_pinned(id, bool) — pin/unpin
  • set_tags(id, tags) — replace tags
  • pinned_memories()Vec<String> — for anchor injection at startup
  • recent_unpinned(n)Vec<(i64, String, Vec<String>)> — for reflection prompt
  • recall(query_emb, tags, tag_and, limit)Vec<RecallEntry>main search method:
    • no tags: global ANN via sqlite-vec
    • with tags + query: fetch all tag-matched memories WITH embeddings, exact cosine in Rust (never misses due to ANN cutoff)
    • with tags only: delegates to context_for
  • context_for(tags, tag_and, limit)Vec<RecallEntry> — pure SQL tag lookup, newest first
  • log_turn(role, content, thinking) — append to turns table
  • recent_turns(n) → chronological slice (oldest first) for context replay
  • turns_before(before_id, limit) → for TUI scroll-back paging
  • get_all()Vec<Memory> — all memories with embeddings (for dump)
  • reset() — drop and recreate all tables

RecallEntry: id, content, tags: Vec<String>, distance: Option<f32> (None = tag-only hit)

internal helpers (private):

  • top_k(emb, k) — ANN via sqlite-vec
  • by_tags(tags, tag_and) — SQL LIKE on JSON array with escape handling
  • tag_matched_with_embeddings(tags, tag_and) — for exact cosine path
  • cosine_distance(a, b) — returns value in [0, 2], matching sqlite-vec convention

context.rs#

In-memory sliding window sent to the LLM on each turn.

Context:

  • anchor: Vec<Message> — never evicted (system prompt + pinned memories)
  • turns: Vec<Message> — rolling conversation
  • total_tokens: usize — updated from LlmEvent::Usage

Key methods:

  • new(anchor, pinned_memories) — builds system message, appends pinned memories section
  • update_anchor(anchor, pinned) — rebuilds system message with pinned section
  • load_turns(pairs) — replay (role, content) pairs from DB on startup; skips tool/other roles (ephemeral)
  • inject_recalled_memories(memories) — ephemeral assistant message [recalled memory] with [id:..] [tags:..] blocks
  • push_input(content) — user turn only (persisted history stays clean)
  • push_assistant_tool_calls(calls) — assistant message with tool_calls, no content
  • push_tool_result(id, content) — tool role message
  • drain_oldest(keep) — removes all but keep most recent turns; walks forward from cut point to avoid splitting tool call sequences (never cuts mid-tool-call)
  • as_messages() — anchor + turns concatenated, ready to send to LLM

tools.rs#

definitions() — full tool list sent to LLM on every turn:

  • shell(cmd) — runs via sh -c, caps stdout 20k / stderr 5k chars
  • read_file(path, start_line?, end_line?) — caps at 50k bytes
  • write_file(path, content)
  • remember(content, important?, tags?) — embeds and stores; pins if important=true
  • recall(query, tags?, tag_mode?, limit?) — semantic search, optional tag filter
  • context_for(tags, tag_mode?, limit?) — pure tag lookup, default limit 20
  • edit_memory(id?, tags?, pinned?, special?) — unified edit tool (including special anchor memory)
  • list_memories() — shows pinned + 10 recent unpinned with ids and tags

memory_tools() — filtered subset for the reflection mini-loop: remember, recall, context_for, edit_memory, list_memories (no shell/file tools)

execute(call, memory, llm) — async dispatch by tool name. needs &MemoryStore and &LlmClient for memory tools.

agent.rs#

Main async loop (run()). Receives Interrupt from mpsc, sends AgentEvent over broadcast.

startup:

  1. load pinned memories → Context::new(anchor, pinned)
  2. replay recent turns from DB → ctx.load_turns()

interrupt handling:

  • Reset → clear context, emit Status
  • Compact → call compact() immediately
  • UserMessage → embed query, recall, ctx.inject_recalled_memories(), ctx.push_input(), log_turn()

tool loop (max 20 iterations):

  1. spawn llm.stream() in background task
  2. collect LlmEvents: accumulate tokens/thinking, capture ToolCalls
  3. if tool calls: emit ToolCall events, tools::execute() each, emit ToolResult, push to context, loop
  4. if plain text: push assistant message, log_turn(), emit Done + Metrics, check watermark

compact(output):

  1. emit Status("reflecting...")
  2. call reflect() — ephemeral mini tool loop (memory tools only, max 6 iterations, separate context)
  3. drain oldest turns from main context
  4. LLM-summarize drained text → store with tag ["compaction_summary"]
  5. reload pinned memories and call ctx.update_anchor_memories()

reflect():

  • builds reflection prompt with: last 10 turn outline (truncated), current pinned + recent unpinned memories
  • runs mini stream loop with reflection_definitions() tools
  • agent pins/unpins/remembers/tags as it sees fit
  • ephemeral — results don't enter main context

interrupt.rs#

Interrupt enum: UserMessage(String), Reset, Compact

spawn_source(tx, f) — helper for future external interrupt sources (e.g. Bluesky notifications). Not currently used.

lib.rs#

Re-exports modules. Defines:

  • MetricsSnapshot = Arc<RwLock<Option<AgentMetrics>>>
  • AgentMetrics { turn_count, context_tokens, watermark }
  • AgentEvent enum: Started, Token(String), ThinkToken(String), Done, Status(String), Metrics(AgentMetrics), ToolCall { name, args }, ToolResult { name, content }

klbr-ipc#

ClientMsg (TUI → daemon, tagged by type field):

  • Message { source, content } — chat message
  • FetchHistory { before_id, limit } — scroll-back paging
  • Compact — manual compaction trigger
  • Reset — wipe DB and context
  • DumpMemories { path: Option<String> } — dump memories JSON to file

ServerMsg (daemon → TUI):

  • Started, Token { content }, ThinkToken { content }, Done
  • Status { content } — status bar text
  • Metrics { turn_count, context_tokens, watermark }
  • History { turns: Vec<HistoryEntry> } — sent on connect and on FetchHistory
  • ToolCall { name, args }, ToolResult { name, content }

HistoryEntry { id, timestamp, role, content, reasoning: Option<String> }

ws_url()ws://127.0.0.1:8765

Protocol: one JSON ClientMsg/ServerMsg per WebSocket text frame.


klbr-daemon#

main.rs — wires everything together:

  1. Config::load()
  2. open MemoryStore, create LlmClient
  3. spawn agent::run() and daemon::serve() concurrently
  4. tokio::select! on both, propagate errors

daemon.rsserve() accepts connections in a loop, each gets its own handle() task.

handle() per-connection:

  1. push History { turns } immediately (last history_window turns from DB)
  2. push current Metrics from snapshot if available
  3. tokio::select! between:
    • incoming WebSocket ClientMsg frames → translate to Interrupt or handle directly (FetchHistory, Reset, Compact, DumpMemories)
    • AgentEvent from broadcast → translate to ServerMsg, send to client

send_msg() — serialize ServerMsg to JSON, send as WS text frame.


klbr-tui#

Ratatui TUI using crossterm + tui-scrollview.

App state:

  • history: Vec<ChatMsg> — display model
  • scroll: ScrollViewState, at_bottom: bool — scroll tracking
  • input: String, cursor: usize, cmd_mode: bool — input box
  • oldest_turn_id, history_exhausted, loading_history — scroll-back paging
  • turn_count, context_tokens, watermark, last_tps — metrics display

ChatMsg with Role enum:

  • User — cyan "you " prefix
  • Assistant { reason: Option<Reason>, step: AssistantStep } — green "klbr " prefix; Reason is collapsible thinking block; AssistantStep tracks PromptProcessing → Reasoning → Response → Done
  • System — dark gray, dimmed
  • Tool { name, args, result: Option<String> } — yellow $ name(key=val...) header + up to 10 lines of result (or "running..." while pending)

Commands (typed with / prefix):

  • /clear (/c) — clear display
  • /compact (/cp) — send ClientMsg::Compact
  • /reset — clear display + send ClientMsg::Reset
  • /dump [path] — send ClientMsg::DumpMemories
  • /think (/t) — toggle reasoning block on last assistant message
  • /help (/h) — show help inline

Event loop: tokio::select! between crossterm events and socket lines.

Scroll-back: PageUp sends FetchHistory { before_id: oldest_turn_id, limit: 50 }. prepend_turns() inserts older turns at front of history vec.

Tool result matching: on ServerMsg::ToolResult, scan history in reverse for last Role::Tool { name: matching, result: None } and fill in result.

Status bar (bottom line): {tps} ctx {pct}% ({remaining} tok left) (turns: N) {status}


data flow summary#

TUI ──ClientMsg──► daemon ──Interrupt──► agent
                                          │
                                     tool loop
                                          │
TUI ◄──ServerMsg── daemon ◄─AgentEvent── agent

things not yet implemented#

  • multiple clients (broadcast works but history paging is per-connection)
  • external interrupt sources (Bluesky, etc.) — spawn_source is ready but unused
  • auth/encryption for networked transports (WebSocket is local-only by default)