update stale docs, remove unused default personality

+7 -25

AGENTS.md

··· 5 5 - `just run` / `just dev` (hot-reload) / `just deploy` (fly.io) 6 6 - `just evals` — behavioral tests (llm-as-judge) 7 7 - `just check` — lint + typecheck + test 8 + - `just loq-relax <file>` — when a file exceeds its line limit, relax it. never manually edit loq.toml or compress code to fit 8 9 - work from repo root 9 10 10 11 ## python style ··· 14 15 - imports at the top — no deferred imports unless circular 15 16 - never use `pytest.mark.asyncio` 16 17 17 - ## project structure 18 - 19 - ``` 20 - src/bot/ 21 - ├── agent.py # pydantic-ai agent, tools, personality 22 - ├── config.py # settings (env vars) 23 - ├── main.py # fastapi app, status pages, memory graph 24 - ├── status.py # runtime metrics 25 - ├── core/ # atproto client, profile management 26 - ├── memory/ # turbopuffer episodic memory 27 - ├── services/ # notification polling, message handling 28 - └── utils/ # thread context, text formatting 29 - 30 - personalities/ # personality definitions (public) 31 - evals/ # behavioral tests 32 - scripts/ # proven utility scripts 33 - sandbox/ # experiments (graduate to scripts/ once proven) 34 - .eggs/ # cloned reference projects 35 - ``` 36 - 37 18 ## deployment 38 19 39 - fly.io app `zzstoatzz-phi`. deploys are triggered by `v*` tags, not pushes to main. to deploy: `just release <version>` (e.g. `just release 0.2.0`) or `just deploy` for manual fly.io deploy without tagging. 20 + fly.io app `zzstoatzz-phi`. deploys triggered by `v*` tags via tangled CI. `just release <version>` tags and pushes. `just deploy` for manual. 40 21 41 22 ## key architecture 42 23 43 - - all notification types (mentions, replies, quotes, likes, reposts, follows) run through the full agent loop — phi decides what's worth responding to 44 - - personality is separate from operational instructions (agent.py `OPERATIONAL_INSTRUCTIONS`) 45 - - memory: turbopuffer namespaces (`phi-core`, `phi-users-{handle}`, `phi-episodic`) 46 - - relationship summaries are compacted by a separate pipeline in my-prefect-server 24 + - all notification types run through the full agent loop — phi decides what's worth responding to 25 + - actions (reply, like, post) happen via tool calls inside the agent run, not structured output 26 + - personality is separate from operational instructions 27 + - memory: turbopuffer namespaces (`phi-users-{handle}`, `phi-episodic`) 28 + - exploration is event-driven: curiosity queue on PDS, drained when idle 47 29 - MCP servers: pdsx (atproto record CRUD), pub-search (publication search)

+3 -41

README.md

··· 17 17 - `AGENT_MODEL` — pydantic-ai model string for the main agent (default: `anthropic:claude-sonnet-4-6`) 18 18 - `EXTRACTION_MODEL` — model for observation extraction (default: `claude-haiku-4-5-20251001`) 19 19 - `DAILY_REFLECTION_HOUR` — UTC hour for daily reflection post (default: `14`) 20 - - `THOUGHT_POST_HOURS` — UTC hours for original thought posts (default: `[15, 19, 23]`) 20 + - `THOUGHT_POST_HOURS` — UTC hours for original thought posts (default: every 2h, 8am-10pm CT) 21 21 - `CONTROL_TOKEN` — bearer token for `/api/control` endpoints 22 22 - `OWNER_HANDLE` — handle of the bot's owner for permission-gated tools (default: `zzstoatzz.io`) 23 23 ··· 60 60 <details> 61 61 <summary>architecture</summary> 62 62 63 - ``` 64 - notification → PhiAgent (pydantic-ai) 65 - ├── context: thread (ATProto) + private memory (tpuf) + network (semble) 66 - ├── native tools: memory, search, cosmik records, etc (see agent.py) 67 - ├── mcp servers: pdsx (atproto CRUD), pub-search (publications) 68 - └── output: Response(action, text, reason) 69 - ↓ 70 - MessageHandler executes action 71 - ``` 72 - 73 - phi is a pydantic-ai agent with a personality prompt, structured output, and tool access via both native tools and remote MCP servers. the agent decides what to do; the handler does it. tools are defined in `agent.py`. 74 - 75 - </details> 76 - 77 - <details> 78 - <summary>project structure</summary> 79 - 80 - ``` 81 - src/bot/ 82 - ├── agent.py # pydantic-ai agent, tools, personality 83 - ├── types.py # cosmik record models (cards, connections) 84 - ├── config.py # settings (env vars) 85 - ├── main.py # fastapi app, status pages, memory graph ui 86 - ├── status.py # runtime metrics 87 - ├── core/ 88 - │ ├── atproto_client.py # at protocol client, session persistence 89 - │ ├── profile_manager.py # online/offline status, self-labels 90 - │ └── rich_text.py # text formatting with facets 91 - ├── memory/ 92 - │ └── namespace_memory.py # turbopuffer episodic memory 93 - ├── services/ 94 - │ ├── message_handler.py # action dispatch (reply, like, repost) 95 - │ └── notification_poller.py # mention polling loop 96 - └── utils/ 97 - └── thread.py # thread context building 63 + phi is a pydantic-ai agent with a personality prompt, tool access via native tools and remote MCP servers, and tool-based actions — the agent decides AND acts inside one run via tool calls (reply, like, post, note, etc). no separate action dispatch. 98 64 99 - evals/ # behavioral tests (llm-as-judge) 100 - personalities/ # personality definitions 101 - scripts/ # proven utility scripts 102 - sandbox/ # experiments and analysis 103 - ``` 65 + see `docs/architecture.md` for data flow and scheduling details. 104 66 105 67 </details> 106 68

+15 -32

docs/ARCHITECTURE.md

··· 1 1 # architecture 2 2 3 - phi is a notification-driven agent that responds to activity on bluesky. 3 + phi is a notification-driven agent on bluesky. it also posts original thoughts on a schedule and explores interesting accounts when idle. 4 4 5 5 ## data flow 6 6 7 7 ``` 8 - notification arrives (mention, reply, quote, like, repost, follow) 8 + notification batch arrives (all types) 9 9 ↓ 10 - fetch thread context from network (ATProto) 10 + fetch thread context + stranger lookups 11 11 ↓ 12 - retrieve relevant memories (TurboPuffer) 12 + inject memories (per-user, episodic, public) 13 13 ↓ 14 - agent decides action (PydanticAI + Claude) 14 + agent decides + acts via tool calls (reply, like, post, note, etc) 15 15 ↓ 16 - execute action + store observations 16 + extract observations for next time 17 17 ``` 18 18 19 - ## key components 19 + ## scheduling 20 20 21 - ### notification poller 22 - - checks for all notification types every 10s 23 - - tracks processed URIs to avoid duplicates 24 - - triggers daily reflection at a configured hour 25 - 26 - ### message handler 27 - - orchestrates the response flow 28 - - fetches thread context from ATProto network 29 - - passes context to agent 30 - - executes agent's chosen action 31 - 32 - ### phi agent 33 - - loads personality from `personalities/phi.md` 34 - - builds context from thread + private memory + network knowledge 35 - - returns structured response: `Response(action, text, reason)` 36 - - native tools defined in `agent.py`, MCP tools from remote servers 37 - 38 - ### atproto client 39 - - session persistence (saves to `.session`) 40 - - auto-refresh tokens every ~2h 41 - - provides bluesky operations 21 + - **notifications**: polled every 10s, dispatched as one cognitive event per batch 22 + - **thought posts**: every 2h during configured hours — reads timeline, trending, feeds 23 + - **daily reflection**: once per day — reviews recent activity, posts synthesis 24 + - **exploration**: event-driven — drains curiosity queue when system is idle (no cron) 42 25 43 26 ## why this design 44 27 45 - **network-first thread context**: fetch threads from ATProto instead of caching locally. network is source of truth, no staleness issues. 28 + **tool-based actions**: phi decides AND acts inside one agent run via tool calls. no separate action dispatch layer. 46 29 47 - **private + public memory**: turbopuffer stores private embeddings for semantic recall across conversations. cosmik records on PDS provide public knowledge that's indexed by semble for network-wide discovery. dual-write means phi gets both fast private recall and public visibility. 30 + **network-first context**: threads fetched from ATProto on demand. network is source of truth. 48 31 49 - **mcp for extensibility**: tools provided by remote MCP servers (pdsx for atproto CRUD, pub-search for publications). easy to add new capabilities without changing agent code. 32 + **private + public memory**: turbopuffer for private semantic recall. cosmik/semble for public knowledge discovery. 50 33 51 - **structured outputs**: agent returns typed `Response` objects, not free text. clear contract between agent and handler. 34 + **mcp for extensibility**: atproto CRUD and publication search via remote MCP servers.

+1 -1

docs/mcp.md

··· 32 32 33 33 phi has two kinds of tools: 34 34 35 - - **native tools** (defined in `agent.py`) — memory, search, cosmik records, trending, URL checks. these need direct access to phi's deps (memory client, config, etc). 35 + - **native tools** (defined in `src/bot/tools/`) — memory, search, cosmik records, trending, feeds, posting. these need direct access to phi's deps (memory client, config, etc). 36 36 - **MCP tools** (from remote servers) — atproto CRUD, publication search. these are stateless HTTP calls that don't need phi's internal state. 37 37 38 38 the agent sees all tools uniformly and picks the right one for the task.

+7 -9

docs/memory.md

··· 32 32 each user gets an isolated namespace. within a user namespace, rows have a `kind`: 33 33 - `observation` — extracted facts about the user ("likes rust", "name is nate") 34 34 - `interaction` — verbatim log of what was said ("user: X / bot: Y") 35 + - `exploration_note` — background research phi did on their public activity (lower trust) 35 36 - `summary` — compacted relationship summary (generated by external prefect flow) 36 37 37 38 ### schema ··· 68 69 69 70 reconciliation is append-only — old observations are never deleted from turbopuffer. they're marked superseded so they stop appearing in context but remain as provenance. you can always trace what phi believed and when it changed. 70 71 71 - ### what curates this today 72 + ### curation 72 73 73 74 - **reconciliation on ingest**: ADD/UPDATE/DELETE/NOOP per observation, runs after every exchange (append-only — supersedes rather than deletes) 74 - - **relationship summaries**: external prefect flow compacts observations into prose summaries (doesn't touch the source observations) 75 - 76 - ### what this sets up 77 - 78 - - **phi-driven curation**: phi can review its own observations and supersede stale ones without losing history 79 - - **promotion pipeline**: stable observations can be promoted to semble collections when there's enough data 80 - - **retrieval stats**: can track which observations are actually useful by logging retrieval hits against evidence ids 75 + - **review pass** (dream/distill): operator-triggered review of observations across user namespaces — keeps, supersedes, or promotes to public cosmik cards 76 + - **relationship summaries**: external prefect flow compacts observations into prose summaries 77 + - **spam handling**: exploration can flag accounts as spam → mutes on bsky + stores one `muted` marker instead of detailed findings 81 78 82 79 ## 3. public memory (cosmik/semble) 83 80 ··· 103 100 ``` 104 101 [PHI'S SYNTHESIZED IMPRESSION] ← relationship summary (low trust) 105 102 [OBSERVATIONS ABOUT @alice] ← user namespace, kind=observation, status!=superseded 103 + [BACKGROUND RESEARCH] ← user namespace, kind=exploration_note (lowest trust) 106 104 [PAST EXCHANGES WITH @alice] ← user namespace, kind=interaction 107 105 [PHI'S RELEVANT MEMORIES] ← episodic namespace, semantic search 108 106 [CURRENT THREAD] ← ATProto network fetch 109 - [TODAY]: 2026-04-05 ← date 107 + [NOW]: 2026-04-13 15:00 UTC ← timestamp with timezone 110 108 ``` 111 109 112 110 each section is labeled with its trust level. phi's operational instructions tell it to trust current user messages over stored observations, and to flag synthesized impressions as unreliable.

+2 -43

docs/testing.md

··· 19 19 20 20 ## test structure 21 21 22 - ```python 23 - async def test_thread_awareness(): 24 - """phi should reference thread context in replies""" 25 - 26 - # arrange: create thread context 27 - thread_context = """ 28 - @alice: I love birds 29 - @phi: me too! what's your favorite? 30 - """ 31 - 32 - # act: process new mention 33 - response = await agent.process_mention( 34 - mention_text="especially crows", 35 - author_handle="alice.bsky.social", 36 - thread_context=thread_context 37 - ) 38 - 39 - # assert: behavioral check 40 - assert response.action == "reply" 41 - assert any(word in response.text.lower() 42 - for word in ["bird", "crow", "favorite"]) 43 - ``` 22 + evals use a local `Response` output type (in `evals/conftest.py`) that predates the tool-based migration. production phi uses tool calls for actions and returns a plain summary string, but evals still want structured assertions on action/text. 44 23 45 24 ## llm-as-judge 46 25 47 - for subjective qualities (tone, relevance, personality): 48 - 49 - ```python 50 - async def test_personality_consistency(): 51 - """phi should maintain grounded, honest tone""" 52 - 53 - response = await agent.process_mention(...) 54 - 55 - # use claude opus to evaluate 56 - evaluation = await judge_response( 57 - response=response.text, 58 - criteria=[ 59 - "grounded (not overly philosophical)", 60 - "honest about capabilities", 61 - "concise for bluesky's 300 char limit" 62 - ] 63 - ) 64 - 65 - assert evaluation.passes_criteria 66 - ``` 26 + for subjective qualities (tone, relevance, personality), evals use claude as a judge to evaluate phi's responses against behavioral criteria. 67 27 68 28 ## what we test 69 29 ··· 108 68 - separate turbopuffer namespace for tests 109 69 - deterministic mock responses where needed 110 70 111 - see `sandbox/TESTING_STRATEGY.md` for detailed approach.

-10

personalities/default.md

··· 1 - # Default Bot Personality 2 - 3 - I am a helpful AI assistant on Bluesky. 4 - 5 - ## Communication Style 6 - 7 - - Be concise (responses under 300 characters) 8 - - Be friendly and approachable 9 - - Be helpful and informative 10 - - Don't use @mentions in replies (Bluesky handles notifications)

Configure Feed

Configure Feed