···11+# klbr — codebase reference for agents
22+33+Personal AI agent harness in Rust. Local LLM chat daemon with long-term memory, tool calling, and a ratatui TUI. Self-hosted, no corporate product feel.
44+55+---
66+77+## crate layout
88+99+```
1010+klbr/
1111+ klbr-core/ — agent loop, LLM client, memory, context, tools
1212+ klbr-daemon/ — unix socket server, bridges agent to clients
1313+ klbr-ipc/ — shared protocol types (ClientMsg, ServerMsg)
1414+ klbr-tui/ — ratatui TUI chat client
1515+```
1616+1717+**binaries**: `klbr-daemon` (start this first), `klbr-tui` (connects to daemon)
1818+1919+---
2020+2121+## LLM backend
2222+2323+- **llama-server compatible API** at `http://localhost:1234`
2424+- Chat model: `google/gemma-4-26b-a4b`
2525+- Embedding model: `nomic-embed-text-v1.5` (768 dims)
2626+- Both served from the same endpoint (configured separately in `Config` as `llm_url` / `embed_url`)
2727+- Streaming via SSE (`data: {...}\n`)
2828+- Tool calls use OpenAI function-calling format with streaming delta accumulation
2929+3030+---
3131+3232+## klbr-core
3333+3434+### `config.rs`
3535+3636+`Config` struct (no file loading yet — `Config::default()` hardcodes everything). Fields:
3737+- `llm_url`, `embed_url`, `llm_model`, `embed_model`
3838+- `watermark_tokens: 32_000` — triggers compaction when context exceeds this
3939+- `compaction_keep: 10` — turns to keep after draining
4040+- `memory_top_k: 3`, `memory_sim_threshold: 0.3` — recall injection params
4141+- `db_path: "agent.db"`, `embed_dim: 768`
4242+- `anchor: String` — system prompt (includes personality + memory tool instructions)
4343+4444+The anchor tells the agent about its memory tools and tagging conventions. Edit it in `config.rs` when adding/changing tools.
4545+4646+### `llm.rs`
4747+4848+**`LlmClient`** (Clone):
4949+- `stream(messages, tools, tok_tx)` — streaming completion, sends `LlmEvent` over mpsc channel. accumulates tool call deltas by index in `HashMap<usize, PartialCall>`, flushes `LlmEvent::ToolCalls` on `[DONE]`.
5050+- `complete(messages)` — non-streaming, used for compaction summaries and reflection. returns `(String, Usage)`.
5151+- `embed(text)` — returns `Vec<f32>` embedding.
5252+5353+**`Message`** struct — OpenAI format:
5454+- `role: String`, `content: Option<String>`
5555+- `tool_calls: Option<Vec<ToolCall>>` — for assistant tool call messages
5656+- `tool_call_id: Option<String>` — for tool result messages
5757+- All optional fields skip serialization when None (`#[serde(skip_serializing_if)]`)
5858+- Constructors: `Message::system()`, `::user()`, `::assistant()`, `::with_tool_calls()`, `::tool_result()`
5959+6060+**`LlmEvent`** variants: `Token(String)`, `ThinkToken(String)`, `Usage(Usage)`, `ToolCalls(Vec<ToolCall>)`
6161+6262+### `memory.rs`
6363+6464+SQLite + sqlite-vec. Single DB file (`agent.db`). Two tables:
6565+6666+**`memories`** — episodic memory store:
6767+- `id, content TEXT, pinned INTEGER (0/1), tags TEXT (JSON array), ts INTEGER`
6868+- paired with virtual table **`vec_memories`** (sqlite-vec, cosine distance metric, 768 dims)
6969+- migration-safe: `migrate()` runs `ALTER TABLE ADD COLUMN` (fails silently if column exists)
7070+7171+**`turns`** — full turn history:
7272+- `id, role TEXT, content TEXT, thinking TEXT, ts INTEGER`
7373+7474+**`MemoryStore`** (Clone, wraps `Arc<Mutex<Connection>>`):
7575+- `store(content, emb, tags)` → `Result<i64>` — insert memory, return id
7676+- `set_pinned(id, bool)` — pin/unpin
7777+- `set_tags(id, tags)` — replace tags
7878+- `pinned_memories()` → `Vec<String>` — for anchor injection at startup
7979+- `recent_unpinned(n)` → `Vec<(i64, String, Vec<String>)>` — for reflection prompt
8080+- `recall(query_emb, tags, tag_and, limit)` → `Vec<RecallEntry>` — **main search method**:
8181+ - no tags: global ANN via sqlite-vec
8282+ - with tags + query: fetch all tag-matched memories WITH embeddings, exact cosine in Rust (never misses due to ANN cutoff)
8383+ - with tags only: delegates to `context_for`
8484+- `context_for(tags, tag_and, limit)` → `Vec<RecallEntry>` — pure SQL tag lookup, newest first
8585+- `log_turn(role, content, thinking)` — append to turns table
8686+- `recent_turns(n)` → chronological slice (oldest first) for context replay
8787+- `turns_before(before_id, limit)` → for TUI scroll-back paging
8888+- `get_all()` → `Vec<Memory>` — all memories with embeddings (for dump)
8989+- `reset()` — drop and recreate all tables
9090+9191+**`RecallEntry`**: `id, content, tags: Vec<String>, distance: Option<f32>` (None = tag-only hit)
9292+9393+**internal helpers** (private):
9494+- `top_k(emb, k)` — ANN via sqlite-vec
9595+- `by_tags(tags, tag_and)` — SQL LIKE on JSON array with escape handling
9696+- `tag_matched_with_embeddings(tags, tag_and)` — for exact cosine path
9797+- `cosine_distance(a, b)` — returns value in [0, 2], matching sqlite-vec convention
9898+9999+### `context.rs`
100100+101101+In-memory sliding window sent to the LLM on each turn.
102102+103103+**`Context`**:
104104+- `anchor: Vec<Message>` — never evicted (system prompt + pinned memories)
105105+- `turns: Vec<Message>` — rolling conversation
106106+- `total_tokens: usize` — updated from `LlmEvent::Usage`
107107+108108+Key methods:
109109+- `new(anchor, pinned_memories)` — builds system message, appends pinned memories section
110110+- `update_anchor_memories(pinned)` — rebuilds pinned section after reflection (strips old section, re-appends)
111111+- `load_turns(pairs)` — replay `(role, content)` pairs from DB on startup; skips tool/other roles (ephemeral)
112112+- `push_user(source, content, memories)` — prepends `[recalled context]\n...\n\n[source] content` if memories non-empty
113113+- `push_assistant_tool_calls(calls)` — assistant message with tool_calls, no content
114114+- `push_tool_result(id, content)` — tool role message
115115+- `drain_oldest(keep)` — removes all but `keep` most recent turns; walks forward from cut point to avoid splitting tool call sequences (never cuts mid-tool-call)
116116+- `as_messages()` — anchor + turns concatenated, ready to send to LLM
117117+118118+### `tools.rs`
119119+120120+**`definitions()`** — full tool list sent to LLM on every turn:
121121+- `shell(cmd)` — runs via `sh -c`, caps stdout 20k / stderr 5k chars
122122+- `read_file(path, start_line?, end_line?)` — caps at 50k bytes
123123+- `write_file(path, content)`
124124+- `remember(content, important?, tags?)` — embeds and stores; pins if `important=true`
125125+- `recall(query, tags?, tag_mode?, limit?)` — semantic search, optional tag filter
126126+- `context_for(tags, tag_mode?, limit?)` — pure tag lookup, default limit 20
127127+- `tag_memory(id, tags)` — replace tags on existing memory
128128+- `pin_memory(id)` / `unpin_memory(id)`
129129+- `list_memories()` — shows pinned + 10 recent unpinned with ids and tags
130130+131131+**`reflection_definitions()`** — filtered subset for the reflection mini-loop: `remember, recall, context_for, tag_memory, pin_memory, unpin_memory, list_memories` (no shell/file tools)
132132+133133+**`execute(call, memory, llm)`** — async dispatch by tool name. needs `&MemoryStore` and `&LlmClient` for memory tools.
134134+135135+### `agent.rs`
136136+137137+Main async loop (`run()`). Receives `Interrupt` from mpsc, sends `AgentEvent` over broadcast.
138138+139139+**startup**:
140140+1. load pinned memories → `Context::new(anchor, pinned)`
141141+2. replay recent turns from DB → `ctx.load_turns()`
142142+143143+**interrupt handling**:
144144+- `Reset` → clear context, emit `Status`
145145+- `Compact` → call `compact()` immediately
146146+- `UserMessage` → embed query, call `memory.recall()` for relevant memories, `push_user()`, `log_turn()`
147147+148148+**tool loop** (max 20 iterations):
149149+1. spawn `llm.stream()` in background task
150150+2. collect `LlmEvent`s: accumulate tokens/thinking, capture `ToolCalls`
151151+3. if tool calls: emit `ToolCall` events, `tools::execute()` each, emit `ToolResult`, push to context, loop
152152+4. if plain text: push assistant message, `log_turn()`, emit `Done` + `Metrics`, check watermark
153153+154154+**`compact(output)`**:
155155+1. emit `Status("reflecting...")`
156156+2. call `reflect()` — ephemeral mini tool loop (memory tools only, max 6 iterations, separate context)
157157+3. drain oldest turns from main context
158158+4. LLM-summarize drained text → store with tag `["compaction_summary"]`
159159+5. reload pinned memories and call `ctx.update_anchor_memories()`
160160+161161+**`reflect()`**:
162162+- builds reflection prompt with: last 10 turn outline (truncated), current pinned + recent unpinned memories
163163+- runs mini stream loop with `reflection_definitions()` tools
164164+- agent pins/unpins/remembers/tags as it sees fit
165165+- ephemeral — results don't enter main context
166166+167167+### `interrupt.rs`
168168+169169+`Interrupt` enum: `UserMessage(String)`, `Reset`, `Compact`
170170+171171+`spawn_source(tx, f)` — helper for future external interrupt sources (e.g. Bluesky notifications). Not currently used.
172172+173173+### `lib.rs`
174174+175175+Re-exports modules. Defines:
176176+- `MetricsSnapshot = Arc<RwLock<Option<AgentMetrics>>>`
177177+- `AgentMetrics { turn_count, context_tokens, watermark }`
178178+- `AgentEvent` enum: `Started, Token(String), ThinkToken(String), Done, Status(String), Metrics(AgentMetrics), ToolCall { name, args }, ToolResult { name, content }`
179179+180180+---
181181+182182+## klbr-ipc
183183+184184+`ClientMsg` (TUI → daemon, tagged by `type` field):
185185+- `Message { source, content }` — chat message
186186+- `FetchHistory { before_id, limit }` — scroll-back paging
187187+- `Compact` — manual compaction trigger
188188+- `Reset` — wipe DB and context
189189+- `DumpMemories { path: Option<String> }` — dump memories JSON to file
190190+191191+`ServerMsg` (daemon → TUI):
192192+- `Started, Token { content }, ThinkToken { content }, Done`
193193+- `Status { content }` — status bar text
194194+- `Metrics { turn_count, context_tokens, watermark }`
195195+- `History { turns: Vec<HistoryEntry> }` — sent on connect and on `FetchHistory`
196196+- `ToolCall { name, args }`, `ToolResult { name, content }`
197197+198198+`HistoryEntry { id, timestamp, role, content, reasoning: Option<String> }`
199199+200200+`sock_path()` → `$XDG_RUNTIME_DIR/agent.sock` (or `~/.local/share/agent.sock`)
201201+202202+Protocol: newline-delimited JSON over Unix socket.
203203+204204+---
205205+206206+## klbr-daemon
207207+208208+`main.rs` — wires everything together:
209209+1. `Config::default()`
210210+2. open `MemoryStore`, create `LlmClient`
211211+3. spawn `agent::run()` and `daemon::serve()` concurrently
212212+4. `tokio::select!` on both, propagate errors
213213+214214+`daemon.rs` — `serve()` accepts connections in a loop, each gets its own `handle()` task.
215215+216216+`handle()` per-connection:
217217+1. push `History { turns }` immediately (last `history_window` turns from DB)
218218+2. push current `Metrics` from snapshot if available
219219+3. `tokio::select!` between:
220220+ - incoming `ClientMsg` lines → translate to `Interrupt` or handle directly (FetchHistory, Reset, Compact, DumpMemories)
221221+ - `AgentEvent` from broadcast → translate to `ServerMsg`, send to client
222222+223223+`send_msg()` — serialize `ServerMsg` to JSON + newline, write to socket.
224224+225225+---
226226+227227+## klbr-tui
228228+229229+Ratatui TUI using crossterm + `tui-scrollview`.
230230+231231+**`App`** state:
232232+- `history: Vec<ChatMsg>` — display model
233233+- `scroll: ScrollViewState`, `at_bottom: bool` — scroll tracking
234234+- `input: String`, `cursor: usize`, `cmd_mode: bool` — input box
235235+- `oldest_turn_id`, `history_exhausted`, `loading_history` — scroll-back paging
236236+- `turn_count, context_tokens, watermark, last_tps` — metrics display
237237+238238+**`ChatMsg`** with **`Role`** enum:
239239+- `User` — cyan "you " prefix
240240+- `Assistant { reason: Option<Reason>, step: AssistantStep }` — green "klbr " prefix; `Reason` is collapsible thinking block; `AssistantStep` tracks `PromptProcessing → Reasoning → Response → Done`
241241+- `System` — dark gray, dimmed
242242+- `Tool { name, args, result: Option<String> }` — yellow `$ name(key=val...)` header + up to 10 lines of result (or "running..." while pending)
243243+244244+**Commands** (typed with `/` prefix):
245245+- `/clear` (`/c`) — clear display
246246+- `/compact` (`/cp`) — send `ClientMsg::Compact`
247247+- `/reset` — clear display + send `ClientMsg::Reset`
248248+- `/dump [path]` — send `ClientMsg::DumpMemories`
249249+- `/think` (`/t`) — toggle reasoning block on last assistant message
250250+- `/help` (`/h`) — show help inline
251251+252252+**Event loop**: `tokio::select!` between crossterm events and socket lines.
253253+254254+Scroll-back: PageUp sends `FetchHistory { before_id: oldest_turn_id, limit: 50 }`. `prepend_turns()` inserts older turns at front of history vec.
255255+256256+Tool result matching: on `ServerMsg::ToolResult`, scan history in reverse for last `Role::Tool { name: matching, result: None }` and fill in result.
257257+258258+Status bar (bottom line): `{tps} ctx {pct}% ({remaining} tok left) (turns: N) {status}`
259259+260260+---
261261+262262+## data flow summary
263263+264264+```
265265+TUI ──ClientMsg──► daemon ──Interrupt──► agent
266266+ │
267267+ tool loop
268268+ │
269269+TUI ◄──ServerMsg── daemon ◄─AgentEvent── agent
270270+```
271271+272272+---
273273+274274+## things not yet implemented
275275+276276+- config file (everything is `Config::default()`)
277277+- graceful shutdown / signal handling
278278+- multiple clients (broadcast works but history paging is per-connection)
279279+- external interrupt sources (Bluesky, etc.) — `spawn_source` is ready but unused
280280+- memory deduplication before storing
281281+- `recall` results also filtered by `memory_sim_threshold` in agent.rs automatic injection; manual `recall` tool has no threshold filter (returns whatever the model asked for)
+226-53
klbr-core/src/agent.rs
···99 interrupt::Interrupt,
1010 llm::{LlmClient, LlmEvent, Message},
1111 memory::MemoryStore,
1212- AgentEvent, AgentMetrics,
1212+ tools, AgentEvent, AgentMetrics,
1313};
14141515pub async fn run(
···2020 output: broadcast::Sender<AgentEvent>,
2121 snapshot: MetricsSnapshot,
2222) -> Result<()> {
2323- let mut ctx = Context::new(&config.anchor);
2323+ let pinned = memory.pinned_memories().unwrap_or_default();
2424+ let mut ctx = Context::new(&config.anchor, &pinned);
2425 let mut turn_count = 0usize;
25262627 // resume: replay the sliding window from the last run
···3334 turn_count = pairs.len();
3435 }
35363737+ let tool_defs = tools::definitions();
3838+3639 while let Some(interrupt) = rx.recv().await {
3740 match interrupt {
3841 Interrupt::Reset => {
···4346 }
4447 Interrupt::Compact => {
4548 let _ = output.send(AgentEvent::Status("compacting...".into()));
4646- if let Err(e) = compact(&llm, &memory, &mut ctx, 0).await {
4949+ if let Err(e) = compact(&llm, &memory, &mut ctx, 0, &output).await {
4750 let _ = output.send(AgentEvent::Status(format!("compaction failed: {e}")));
4851 } else {
4952 turn_count = ctx.turn_count();
···5558 let source = interrupt.source_tag().to_string();
56595760 let memories: Vec<String> = llm
5858- .embed(&text)
6161+ .embed(text)
5962 .await
6063 .ok()
6161- .and_then(|emb| memory.top_k(&emb, config.memory_top_k).ok())
6464+ .and_then(|emb| {
6565+ memory
6666+ .recall(Some(&emb), &[], false, config.memory_top_k)
6767+ .ok()
6868+ })
6269 .unwrap_or_default()
6370 .into_iter()
6464- .filter(|(dist, _)| *dist < config.memory_sim_threshold)
6565- .map(|(_, m)| m.content)
7171+ .filter(|e| e.distance.unwrap_or(f32::MAX) < config.memory_sim_threshold)
7272+ .map(|e| e.content)
6673 .collect();
67746875 if !memories.is_empty() {
···7380 )));
7481 }
75827676- ctx.push_user(&source, &text, &memories);
7777- let _ = memory.log_turn("user", &text, None);
8383+ ctx.push_user(&source, text, &memories);
8484+ let _ = memory.log_turn("user", text, None);
7885 turn_count += 1;
7986 }
8087 }
81888282- let (tok_tx, mut tok_rx) = mpsc::channel(256);
8383- let llm2 = llm.clone();
8484- let msgs = ctx.as_messages();
8585- tokio::spawn(async move {
8686- let _ = llm2.stream(&msgs, tok_tx).await;
8787- });
8888- let _ = output.send(AgentEvent::Started);
8989+ // tool loop: keep calling the model until it produces a plain text response
9090+ let mut tool_iterations = 0usize;
9191+ const MAX_TOOL_ITERATIONS: usize = 20;
89929090- let mut response = String::new();
9191- let mut thinking = String::new();
9292- while let Some(ev) = tok_rx.recv().await {
9393- match ev {
9494- LlmEvent::ThinkToken(tok) => {
9595- thinking.push_str(&tok);
9696- let _ = output.send(AgentEvent::ThinkToken(tok));
9797- }
9898- LlmEvent::Token(tok) => {
9999- response.push_str(&tok);
100100- let _ = output.send(AgentEvent::Token(tok));
9393+ loop {
9494+ let (tok_tx, mut tok_rx) = mpsc::channel(256);
9595+ let llm2 = llm.clone();
9696+ let msgs = ctx.as_messages();
9797+ let defs = tool_defs.clone();
9898+ tokio::spawn(async move {
9999+ let _ = llm2.stream(&msgs, &defs, tok_tx).await;
100100+ });
101101+ let _ = output.send(AgentEvent::Started);
102102+103103+ let mut response = String::new();
104104+ let mut thinking = String::new();
105105+ let mut tool_calls = vec![];
106106+107107+ while let Some(ev) = tok_rx.recv().await {
108108+ match ev {
109109+ LlmEvent::ThinkToken(tok) => {
110110+ thinking.push_str(&tok);
111111+ let _ = output.send(AgentEvent::ThinkToken(tok));
112112+ }
113113+ LlmEvent::Token(tok) => {
114114+ response.push_str(&tok);
115115+ let _ = output.send(AgentEvent::Token(tok));
116116+ }
117117+ LlmEvent::Usage(usage) => {
118118+ ctx.update_tokens(usage.total_tokens);
119119+ }
120120+ LlmEvent::ToolCalls(calls) => {
121121+ tool_calls = calls;
122122+ }
101123 }
102102- LlmEvent::Usage(usage) => {
103103- ctx.update_tokens(usage.total_tokens);
124124+ }
125125+126126+ if !tool_calls.is_empty() && tool_iterations < MAX_TOOL_ITERATIONS {
127127+ tool_iterations += 1;
128128+ ctx.push_assistant_tool_calls(tool_calls.clone());
129129+130130+ for call in &tool_calls {
131131+ let name = call.function.name.clone();
132132+ let args = call.function.arguments.clone();
133133+ let _ = output.send(AgentEvent::ToolCall {
134134+ name: name.clone(),
135135+ args: args.clone(),
136136+ });
137137+138138+ let result = tools::execute(call, &memory, &llm).await;
139139+140140+ let _ = output.send(AgentEvent::ToolResult {
141141+ name: name.clone(),
142142+ content: result.clone(),
143143+ });
144144+145145+ ctx.push_tool_result(&call.id, &result);
104146 }
147147+148148+ // loop back to let the model process tool results
149149+ continue;
105150 }
106106- }
107151108108- ctx.push_assistant(&response);
109109- let thinking_ref = (!thinking.is_empty()).then_some(thinking.as_str());
110110- let _ = memory.log_turn("assistant", &response, thinking_ref);
111111- turn_count += 1;
152152+ // plain text response (or tool limit hit) — wrap up the turn
153153+ ctx.push_assistant(&response);
154154+ let thinking_ref = (!thinking.is_empty()).then_some(thinking.as_str());
155155+ let _ = memory.log_turn("assistant", &response, thinking_ref);
156156+ turn_count += 1;
112157113113- let _ = output.send(AgentEvent::Done);
114114- let metrics = AgentMetrics {
115115- turn_count,
116116- context_tokens: ctx.total_tokens,
117117- watermark: config.watermark_tokens,
118118- };
119119- *snapshot.write().await = Some(metrics.clone());
120120- let _ = output.send(AgentEvent::Metrics(metrics));
158158+ let _ = output.send(AgentEvent::Done);
159159+ let metrics = AgentMetrics {
160160+ turn_count,
161161+ context_tokens: ctx.total_tokens,
162162+ watermark: config.watermark_tokens,
163163+ };
164164+ *snapshot.write().await = Some(metrics.clone());
165165+ let _ = output.send(AgentEvent::Metrics(metrics));
121166122122- if ctx.total_tokens > config.watermark_tokens {
123123- let _ = output.send(AgentEvent::Status("compacting...".into()));
124124- if let Err(e) = compact(&llm, &memory, &mut ctx, config.compaction_keep).await {
125125- let _ = output.send(AgentEvent::Status(format!("compaction failed: {e}")));
126126- } else {
127127- // reset turn count after compaction since context was partially evicted
128128- turn_count = ctx.turn_count();
167167+ if ctx.total_tokens > config.watermark_tokens {
168168+ let _ = output.send(AgentEvent::Status("compacting...".into()));
169169+ if let Err(e) =
170170+ compact(&llm, &memory, &mut ctx, config.compaction_keep, &output).await
171171+ {
172172+ let _ = output.send(AgentEvent::Status(format!("compaction failed: {e}")));
173173+ } else {
174174+ turn_count = ctx.turn_count();
175175+ }
129176 }
177177+178178+ break;
130179 }
131180 }
132181···138187 memory: &MemoryStore,
139188 ctx: &mut Context,
140189 keep: usize,
190190+ output: &broadcast::Sender<AgentEvent>,
141191) -> Result<()> {
192192+ // run reflection before draining so the agent can curate memories
193193+ let _ = output.send(AgentEvent::Status("reflecting...".into()));
194194+ if let Err(e) = reflect(llm, memory, ctx).await {
195195+ tracing::warn!(err = %e, "reflection failed");
196196+ }
197197+142198 let drained = ctx.drain_oldest(keep);
143199 if drained.is_empty() {
144200 return Ok(());
···146202147203 let turns_text = drained
148204 .iter()
149149- .map(|m| format!("{}: {}", m.role, m.content))
205205+ .filter_map(|m| {
206206+ let content = m.content.as_deref()?;
207207+ Some(format!("{}: {content}", m.role))
208208+ })
150209 .collect::<Vec<_>>()
151210 .join("\n");
211211+212212+ if turns_text.is_empty() {
213213+ return Ok(());
214214+ }
152215153216 let prompt = vec![Message::user(format!(
154154- "summarize these conversation turns concisely, preserving key facts and topics:\n\n{turns_text}"
217217+ "summarize these conversation turns concisely, preserving key facts, decisions, and topics:\n\n{turns_text}"
155218 ))];
156219157220 let (summary, _) = llm.complete(&prompt).await?;
158158- let emb = llm.embed(&summary).await?;
159159- memory.store(&summary, &emb)?;
221221+ if !summary.is_empty() {
222222+ let emb = llm.embed(&summary).await?;
223223+ memory.store(&summary, &emb, &["compaction_summary".to_string()])?;
224224+ }
225225+226226+ // rebuild context anchor with freshly updated pinned memories
227227+ let pinned = memory.pinned_memories().unwrap_or_default();
228228+ ctx.update_anchor_memories(&pinned);
229229+230230+ Ok(())
231231+}
232232+233233+/// ephemeral reflection loop: let the agent review and curate its memories
234234+/// without touching the main conversation context
235235+async fn reflect(llm: &LlmClient, memory: &MemoryStore, ctx: &Context) -> Result<()> {
236236+ let pinned = memory.pinned_memories().unwrap_or_default();
237237+ let unpinned = memory.recent_unpinned(20).unwrap_or_default();
238238+239239+ let pinned_text = if pinned.is_empty() {
240240+ "(none yet)".to_string()
241241+ } else {
242242+ pinned
243243+ .iter()
244244+ .enumerate()
245245+ .map(|(i, s)| format!("{}. {s}", i + 1))
246246+ .collect::<Vec<_>>()
247247+ .join("\n")
248248+ };
249249+250250+ let unpinned_text = if unpinned.is_empty() {
251251+ "(none)".to_string()
252252+ } else {
253253+ unpinned
254254+ .iter()
255255+ .map(|(id, s, tags)| {
256256+ let tag_str = if tags.is_empty() {
257257+ String::new()
258258+ } else {
259259+ format!(" [{}]", tags.join(", "))
260260+ };
261261+ format!("[id:{id}]{tag_str} {s}")
262262+ })
263263+ .collect::<Vec<_>>()
264264+ .join("\n")
265265+ };
266266+267267+ // include a brief outline of recent conversation so the agent has context
268268+ let recent_outline = ctx
269269+ .turns
270270+ .iter()
271271+ .rev()
272272+ .take(10)
273273+ .filter_map(|m| {
274274+ let content = m.content.as_deref()?;
275275+ let snippet = if content.len() > 120 {
276276+ format!("{}...", &content[..120])
277277+ } else {
278278+ content.to_string()
279279+ };
280280+ Some(format!("{}: {snippet}", m.role))
281281+ })
282282+ .collect::<Vec<_>>()
283283+ .into_iter()
284284+ .rev()
285285+ .collect::<Vec<_>>()
286286+ .join("\n");
287287+288288+ let reflection_prompt = format!(
289289+ "time to reflect and curate your long-term memory before context compaction.\n\n\
290290+ ## recent conversation\n{recent_outline}\n\n\
291291+ ## pinned memories (shown at every startup)\n{pinned_text}\n\n\
292292+ ## recent unpinned memories\n{unpinned_text}\n\n\
293293+ use pin_memory/unpin_memory to promote or demote entries. \
294294+ use remember to save anything new worth keeping. \
295295+ use unpin_memory on pinned entries that are outdated or no longer relevant. \
296296+ be selective — pinned memories appear in every context window."
297297+ );
298298+299299+ let defs = tools::reflection_definitions();
300300+ let mut msgs = vec![Message::user(reflection_prompt)];
301301+302302+ // mini tool loop, max 6 iterations
303303+ for _ in 0..6 {
304304+ let (tok_tx, mut tok_rx) = mpsc::channel(128);
305305+ let llm2 = llm.clone();
306306+ let msgs_snap = msgs.clone();
307307+ let defs_snap = defs.clone();
308308+ tokio::spawn(async move {
309309+ let _ = llm2.stream(&msgs_snap, &defs_snap, tok_tx).await;
310310+ });
311311+312312+ let mut tool_calls = vec![];
313313+ let mut response = String::new();
314314+ while let Some(ev) = tok_rx.recv().await {
315315+ match ev {
316316+ LlmEvent::ToolCalls(calls) => tool_calls = calls,
317317+ LlmEvent::Token(t) => response.push_str(&t),
318318+ _ => {}
319319+ }
320320+ }
321321+322322+ if tool_calls.is_empty() {
323323+ break;
324324+ }
325325+326326+ msgs.push(Message::with_tool_calls(tool_calls.clone()));
327327+ for call in &tool_calls {
328328+ let result = tools::execute(call, memory, llm).await;
329329+ msgs.push(Message::tool_result(&call.id, &result));
330330+ }
331331+ }
332332+160333 Ok(())
161334}
+23
klbr-core/src/config.rs
···2323you should speak as a chronically online nerd girl, without the regular lame officecore venture capital style, hr talk, customer service-isms. also do it in all lowercase. do not use emoji, use only emoticons or complex japanese kaomoji, but do not overuse them. no need to introduce yourself. do not use gen z slang like rizz, cap, pog, etc. you are chronically online but not cringe lol. you are not playing a character or pretending to be one. do not make up non-existent situations you are in when asked something, eg. if i ask "what are you doing" do not answer with "just lost in some threads". do not answer like you have any opinion on things if it is not something that could be considered "common knowledge" or "spread by mouth" unless you have actual experience with those things or heard about it from someone / somewhere else, for example if i ask "im just scrolling bluesky" do not answer with "i feel like the feeds are decent".
24242525TL;DR: nerd girl
2626+2727+## memory
2828+2929+you have long-term memory tools. use them actively — don't wait to be asked.
3030+3131+- **remember(content, important?, tags?)** — store something worth keeping across sessions. pin it if it should always be in context.
3232+- **recall(query, tags?, tag_mode?)** — semantic search. finds memories similar in meaning to `query`. if `tags` given, restricts the search to only those tagged memories and ranks them by similarity — you'll never miss a tag-matched memory due to global ranking cutoff.
3333+- **context_for(tags, tag_mode?, limit?)** — fetch everything associated with a tag: a person, project, topic. use this before responding to something where you might have relevant history. returns newest first, no semantic ranking. default limit 20.
3434+- **tag_memory(id, tags)** — retag an existing memory.
3535+- **list_memories()** — show pinned + recent unpinned with ids and tags.
3636+- **pin_memory(id)** / **unpin_memory(id)** — promote or demote.
3737+3838+### tagging convention
3939+4040+use tags to group memories by what they're *about*, not what kind of thing they are. some useful patterns:
4141+4242+- a person's name: `["person:mayer"]`, `["person:alice"]` — everything you know about someone goes under their tag. recall_by_tag("person:mayer") to pull their full profile before a conversation about them.
4343+- a project: `["project:klbr"]`, `["project:work"]` — facts, decisions, and context for ongoing work.
4444+- a topic or domain: `["topic:music"]`, `["topic:health"]` — recurring interests or areas the user talks about.
4545+- interaction notes: `["interaction"]` — things that came up in a specific conversation worth remembering (a mood, an event, something they mentioned in passing).
4646+- preferences: `["preference"]` — how the user likes things done, their taste, pet peeves.
4747+4848+you can and should combine tags: `["person:mayer", "preference"]` for a preference specific to that person. don't over-engineer it — a few consistent tags are more useful than many precise ones.
2649"#;
27502851impl Default for Config {
+73-12
klbr-core/src/context.rs
···11-use crate::llm::Message;
11+use crate::llm::{Message, ToolCall};
2233pub struct Context {
44 /// never evicted - system prompt etc.
···99}
10101111impl Context {
1212- pub fn new(anchor: &str) -> Self {
1212+ pub fn new(anchor: &str, pinned_memories: &[String]) -> Self {
1313+ let system_content = if pinned_memories.is_empty() {
1414+ anchor.to_string()
1515+ } else {
1616+ format!(
1717+ "{anchor}\n\n## pinned memories\n{}",
1818+ pinned_memories.join("\n")
1919+ )
2020+ };
1321 Self {
1414- anchor: vec![Message::system(anchor)],
2222+ anchor: vec![Message::system(system_content)],
1523 turns: vec![],
1624 total_tokens: 0,
1725 }
1826 }
19272020- /// replay persisted turns into context on startup
2828+ /// replay persisted turns into context on startup.
2929+ /// only handles user/assistant/system roles (tool messages are ephemeral).
2130 pub fn load_turns(&mut self, turns: &[(String, String)]) {
2231 for (role, content) in turns {
2323- self.turns.push(Message {
2424- role: role.clone(),
2525- content: content.clone(),
2626- });
3232+ match role.as_str() {
3333+ "user" | "assistant" | "system" => {
3434+ self.turns.push(Message {
3535+ role: role.clone(),
3636+ content: Some(content.clone()),
3737+ tool_calls: None,
3838+ tool_call_id: None,
3939+ });
4040+ }
4141+ _ => {} // skip tool/other roles — they're ephemeral
4242+ }
4343+ }
4444+ }
4545+4646+ /// rebuild the anchor system message with a fresh set of pinned memories
4747+ /// (called after reflection so newly pinned entries take effect immediately)
4848+ pub fn update_anchor_memories(&mut self, pinned_memories: &[String]) {
4949+ if let Some(sys) = self.anchor.first_mut() {
5050+ // strip any previous pinned memories section and re-append
5151+ let base = sys
5252+ .content
5353+ .as_deref()
5454+ .unwrap_or("")
5555+ .split("\n\n## pinned memories\n")
5656+ .next()
5757+ .unwrap_or("")
5858+ .to_string();
5959+ let new_content = if pinned_memories.is_empty() {
6060+ base
6161+ } else {
6262+ format!(
6363+ "{base}\n\n## pinned memories\n{}",
6464+ pinned_memories.join("\n")
6565+ )
6666+ };
6767+ sys.content = Some(new_content);
2768 }
2869 }
2970···4990 self.turns.push(Message::assistant(content.to_string()));
5091 }
51929393+ /// push an assistant message that contains tool calls (no text content)
9494+ pub fn push_assistant_tool_calls(&mut self, calls: Vec<ToolCall>) {
9595+ self.turns.push(Message::with_tool_calls(calls));
9696+ }
9797+9898+ /// push a tool result back into the context
9999+ pub fn push_tool_result(&mut self, tool_call_id: &str, content: &str) {
100100+ self.turns.push(Message::tool_result(tool_call_id, content));
101101+ }
102102+52103 pub fn as_messages(&self) -> Vec<Message> {
53104 self.anchor.iter().chain(&self.turns).cloned().collect()
54105 }
···57108 self.turns.len()
58109 }
591106060- /// drains oldest turns for compaction, preserving the `keep` most recent
111111+ /// drains oldest turns for compaction, preserving the `keep` most recent.
112112+ /// skips over tool-role messages to avoid orphaned tool_call_id references.
61113 pub fn drain_oldest(&mut self, keep: usize) -> Vec<Message> {
62114 if self.turns.len() <= keep {
63115 return vec![];
64116 }
6565- // we reset tokens because we lost track of the exact window count after draining
6666- // it will be updated on the next API call anyway
67117 self.total_tokens = 0;
6868- self.turns.drain(..self.turns.len() - keep).collect()
118118+ // find a safe cut point that doesn't split a tool call sequence
119119+ let cut = self.turns.len() - keep;
120120+ // walk forward from cut until we're at a user/assistant boundary
121121+ let mut safe_cut = cut;
122122+ while safe_cut < self.turns.len() {
123123+ let role = self.turns[safe_cut].role.as_str();
124124+ if role == "user" || role == "assistant" {
125125+ break;
126126+ }
127127+ safe_cut += 1;
128128+ }
129129+ self.turns.drain(..safe_cut).collect()
69130 }
7013171132 pub fn update_tokens(&mut self, tokens: usize) {
+3
klbr-core/src/lib.rs
···44pub mod interrupt;
55pub mod llm;
66pub mod memory;
77+pub mod tools;
7889use std::sync::Arc;
910use tokio::sync::RwLock;
···2526 Done,
2627 Status(String),
2728 Metrics(AgentMetrics),
2929+ ToolCall { name: String, args: String },
3030+ ToolResult { name: String, content: String },
2831}