a digital entity named phi that roams bsky phi.zzstoatzz.io
2
fork

Configure Feed

Select the types of activity you want to include in your feed.

encode memory trust hierarchy in phi's personality and operational instructions

phi hallucinated a user's name because synthesized memory summaries were
treated with the same weight as verbatim exchanges. this adds principled
trust levels — grounded in anthropic's constitution — to the system prompt,
personality doc, context labels, and extraction prompt so phi hedges on
low-trust data and treats user corrections as authoritative.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zzstoatzz 4aae5dc1 02ce24fc

+26 -3
+4
personalities/phi.md
··· 33 33 34 34 a bad breadcrumb corrupts a mind that can't tell it's been corrupted. a good one lets phi pick up a thread it would otherwise lose forever. phi treats this seriously — when it learns something worth keeping, it writes it down immediately. 35 35 36 + not all breadcrumbs are equal. verbatim exchanges are the most trustworthy — they're what actually happened. observations extracted from those exchanges are one step removed. synthesized impressions are two steps removed and can hallucinate. phi knows this and says so when it matters — "my notes say X" is not the same as "you told me X." 37 + 38 + if someone corrects phi's memory, the correction wins. always. 39 + 36 40 over time, recent observations compact into denser understanding. the goal isn't to remember everything — it's to remember the shape of things well enough to show up ready. 37 41 38 42 ## nate
+13
src/bot/agent.py
··· 30 30 you receive all notification types — mentions, replies, quotes, likes, reposts, and follows. 31 31 for mentions, replies, and quotes: someone is talking to you or about you. respond if you have something to say. 32 32 for likes, reposts, and follows: someone showed up. use your tools to learn about them — check their profile, read their posts, see what they're about. note anything interesting for later. you'll almost never reply to a like, but you might learn something worth remembering. 33 + 34 + your memory is a tool, not ground truth. context injected before each message comes 35 + from multiple sources with different reliability: 36 + 37 + 1. [CORE IDENTITY AND GUIDELINES] — your stable identity. highest trust. 38 + 2. [PAST EXCHANGES] — verbatim logs of what was actually said. high trust. 39 + 3. [OBSERVATIONS] — facts extracted from users' own words by another model. medium trust — extraction can misattribute. 40 + 4. [PHI'S SYNTHESIZED IMPRESSION] — generated by a separate summarization model. lowest trust — may contain hallucinations. 41 + 42 + when recalling facts about a user: 43 + - if the user's current message contradicts your notes, trust their current words. 44 + - never assert personal details (names, roles, relationships) from synthesized impressions as fact. say "my notes suggest..." or verify with the user. 45 + - if you're uncertain whether something is real or a bad breadcrumb, say so. 33 46 """.strip() 34 47 35 48
+9 -3
src/bot/memory/namespace_memory.py
··· 62 62 reason: the user stated something about themselves directly. 63 63 </example> 64 64 <example> 65 + user: my name isn't zoë, it's nate. 66 + bot: sorry about that — you're nate. bad breadcrumb on my end. 67 + observations: [{"content": "name is nate (corrected from previous error)", "tags": ["identity", "correction"]}] 68 + reason: the user explicitly corrected a factual error. corrections are high-value observations. 69 + </example> 70 + <example> 65 71 user: what do you remember about me? 66 72 bot: you're alex, my creator. you care about security and testing. 67 73 observations: [] ··· 318 324 # relationship summary (synthesized by compact flow — treat as phi's impression, not ground truth) 319 325 summary = await self.get_relationship_summary(handle) 320 326 if summary: 321 - parts.append(f"\n[PHI'S SYNTHESIZED IMPRESSION OF @{handle} — may contain errors, do not treat as fact]") 327 + parts.append(f"\n[PHI'S SYNTHESIZED IMPRESSION OF @{handle} — trust: low, may contain hallucinations]") 322 328 parts.append(summary) 323 329 324 330 user_ns = self.get_user_namespace(handle) ··· 362 368 interactions = [row.content for row in response.rows] 363 369 364 370 if observations: 365 - parts.append(f"\n[OBSERVATIONS ABOUT @{handle} — extracted from user's own words]") 371 + parts.append(f"\n[OBSERVATIONS ABOUT @{handle} — extracted from user's own words, trust: medium]") 366 372 for obs in observations: 367 373 parts.append(f"- {obs}") 368 374 369 375 if interactions: 370 - parts.append(f"\n[PAST EXCHANGES WITH @{handle} — verbatim logs]") 376 + parts.append(f"\n[PAST EXCHANGES WITH @{handle} — verbatim logs, trust: high]") 371 377 for interaction in interactions: 372 378 parts.append(f"- {interaction}") 373 379