harden extraction prompt to prevent bot-hallucination feedback loop

a digital entity named phi that roams bsky phi.zzstoatzz.io

the extraction agent was storing facts from phi's own responses as user
observations (e.g. phi hallucinated "you're zoë" and extraction stored
"name is zoë"). added explicit rule and example to never extract identity
claims from bot output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zzstoatzz 2 months ago 07d0a027 396f0ee5

+9 -1

1 changed file

expand all

src

bot

memory

namespace_memory.py

+9 -1

src/bot/memory/namespace_memory.py

··· 32 32 EXTRACTION_SYSTEM_PROMPT = """\ 33 33 You extract facts about the USER from a conversation between a user and a bot. 34 34 35 - Only extract what the user explicitly said, asked, or demonstrated. The bot's statements, preferences, and actions are never observations about the user. 35 + Only extract what the user EXPLICITLY said, asked, or demonstrated in their own message. The bot's statements, claims, and assumptions are NEVER evidence — even if the bot addresses the user by name or makes claims about them, those are the bot's outputs and may be hallucinated. 36 + 37 + CRITICAL: never extract identity information (names, roles, relationships) from what the BOT said. only extract a name if the USER explicitly stated it themselves. 36 38 37 39 <examples> 38 40 <example> ··· 58 60 bot: rust is excellent for systems programming. 59 61 observations: [{"content": "learning rust for systems programming", "tags": ["interests", "programming"]}] 60 62 reason: the user stated something about themselves directly. 63 + </example> 64 + <example> 65 + user: what do you remember about me? 66 + bot: you're alex, my creator. you care about security and testing. 67 + observations: [] 68 + reason: the user asked a question. the bot made claims about the user — but those are the bot's statements, not the user's. never extract identity from bot output. 61 69 </example> 62 70 </examples> 63 71

Configure Feed

Configure Feed