rename note→remember; exclude run_skill_script; enrich publish-blog skill

+119

SKILL-OR-TOOL.md

··· 1 + # skill or tool 2 + 3 + how to decide what's a skill vs what stays a tool, and what we're 4 + doing about the current sprawl. 5 + 6 + > [history note] this was originally a "what i was thinking" handoff 7 + > after a pushback on incomplete sprawl reduction. it got reviewed and 8 + > partially corrected; this version captures the agreed-upon state. 9 + 10 + ## the principle 11 + 12 + > **tools enforce, skills suggest.** 13 + 14 + a tool wrapper runs code unconditionally. a skill is documentation — 15 + the agent reads it and may follow it, may not. so the test for 16 + skill-replaceability is: 17 + 18 + > does the work this tool does require structural enforcement, or is 19 + > it just guidance about how to use a general capability well? 20 + 21 + cosmik writes were pure guidance: no consent layer, no owner-gate, no 22 + memory pipeline side effects. skill alone was sufficient. 23 + 24 + every other tool we have today has at least one of: 25 + 26 + - **consent enforcement** (mention-allowlist construction + reply-ref 27 + logic in the bsky posting tools) 28 + - **owner-gating** (`_is_owner` check that depends on this batch's 29 + notification context — pdsx can't see that) 30 + - **bounded-collection management** (active observations cap-and-archive) 31 + - **memory pipeline side effects** (`after_interaction` writes, 32 + episodic memory writes after publish) 33 + - **non-pdsx-reachable backend** (turbopuffer for private memory, 34 + graze API for feeds, external monitoring services) 35 + 36 + removing those tools and giving phi raw pdsx + a skill description 37 + trades structural enforcement for documentation-mediated correctness. 38 + that's the wrong direction for anything load-bearing. 39 + 40 + ## the naming smell — separate from sprawl 41 + 42 + `note` and `observe` looked semantically duplicated but did totally 43 + different things: 44 + 45 + | name | storage | when to use | 46 + |---|---|---| 47 + | `note` (renamed → `remember`) | turbopuffer `phi-episodic` (private vector) | "save this for future semantic recall — never re-surfaces on its own; queryable via `recall`" | 48 + | `observe` | PDS `io.zzstoatzz.phi.observation` (durable, max 5 active, rest archived to turbopuffer) | "put this in my attention pool — surfaces in `[ACTIVE OBSERVATIONS]` next prompt; oldest archives when cap exceeded" | 49 + 50 + **done**: renamed `note` → `remember` so the recall/remember pair (read 51 + verb, write verb) is now coherent. `observe` keeps its name — it maps 52 + cleanly to the `[ACTIVE OBSERVATIONS]` prompt block. 53 + 54 + ## what was deleted (last round) 55 + 56 + | tool | why removable | replacement | 57 + |---|---|---| 58 + | `save_url` | no consent, no owner-gate, no side effects, pdsx-covered | `mcp__pdsx__create_record(collection="network.cosmik.card", record={kind: "URL", ...})` via `cosmik-records` skill | 59 + | `create_connection` | same | `mcp__pdsx__create_record(collection="network.cosmik.connection", ...)` via skill | 60 + 61 + ## what was corrected mid-review 62 + 63 + `publish_blog_post` was originally going to be the next deletion. on 64 + re-read it has real enforcement: 65 + 66 + - pydantic validation via `GreenGaleDocument` 67 + - **duplicate-title refusal** (refuses to publish if a doc with the 68 + same title already exists) 69 + - post-publish `store_episodic_memory` write 70 + 71 + skill-replacing it would lose all three unless the skill teaches phi 72 + to do them by hand. **kept as a tool**; instead enriched the 73 + `publish-blog` skill body to formalize the before/after procedure 74 + (list existing first, remember after) so the skill and the tool 75 + reinforce each other. 76 + 77 + `list_blog_posts` is a candidate for deletion (read-only, pdsx covers 78 + it via `list_records(collection="app.greengale.document")`) but 79 + `publish_blog_post` calls it internally and the `publish-blog` skill 80 + recommends it for "what should i write about next." marginal benefit; 81 + **leaving for now**. 82 + 83 + ## what was done this round 84 + 85 + 1. **renamed `note` → `remember`.** the recall/remember pair makes the 86 + read-vs-write distinction obvious and disambiguates from `observe`. 87 + 2. **excluded `run_skill_script` from the SkillsToolset.** every skill 88 + we ship is documentation-only (markdown + resource files); leaving 89 + the script-execution tool registered was extra capability surface 90 + phi never used. one fewer tool, free. 91 + 3. **enriched `publish-blog` SKILL.md** with the before/after procedure 92 + (list existing → publish via tool → optional remember pointer) and 93 + a "why a tool plus a skill" section that names the 94 + tools-enforce-skills-suggest split explicitly. 95 + 96 + net: 28 tools (after the cosmik deletion) → 28 tools (rename, not 97 + removal) — but the surface is now more honest about what each piece 98 + is and the publish-blog skill carries the full procedure. 99 + 100 + ## what stays as tools, in case it comes up again 101 + 102 + | category | tools | why they stay | 103 + |---|---|---| 104 + | posting / engagement (consent layer) | `reply_to`, `post`, `like_post`, `repost_post` | `_build_allowed_handles` consent enforcement; reply-ref construction; grapheme splitting; memory writes after interaction | 105 + | owner-gated (like-as-approval) | `follow_user`, `manage_mentionable`, `manage_labels`, `propose_goal_change`, `create_feed`, `delete_feed` | `_is_owner` check at runtime; can't be enforced from a skill prompt | 106 + | bounded attention | `observe`, `drop_observation` | active-pool cap-and-archive logic | 107 + | private memory | `remember`, `recall` | turbopuffer is not exposed as an MCP; pdsx can't reach it | 108 + | reads against external surfaces | `read_timeline`, `read_feed`, `list_feeds`, `search_posts`, `search_network`, `web_search`, `get_trending`, `pub_search`, `check_relays`, `check_services`, `check_urls`, `changelog` | external services with APIs not exposed by pdsx | 109 + | structural publishing | `publish_blog_post`, `list_blog_posts` | duplicate-check refusal; episodic memory write after publish | 110 + 111 + ## open work 112 + 113 + - **`/api/abilities`** + **`/api/skills`** are live; the cockpit can 114 + switch from hand-curated `web/src/lib/abilities.ts` to fetching from 115 + these endpoints. (already done in the most recent UI commit.) 116 + - **module reorg from TOOL-SPRAWL.md items 1–9** — the misplacements 117 + noted there are still real (e.g. `follow_user` lives in `feeds.py`, 118 + `check_urls` lives in `bluesky.py`). independent of the 119 + skill-vs-tool question; can be done as a no-behavior-change pass.

+3 -3

TOOL-SPRAWL.md

··· 9 9 10 10 | module | tools | 11 11 |---|---| 12 - | `tools/memory.py` | `recall`, `note` | 12 + | `tools/memory.py` | `recall`, `remember` | 13 13 | `tools/posting.py` | `reply_to`, `like_post`, `repost_post` | 14 14 | `tools/search.py` | `search_posts`, `search_network`, `web_search`, `get_trending` | 15 15 | `tools/bluesky.py` | `post`, `get_own_posts`, `check_urls`, `manage_labels`, `manage_mentionable`, `check_services`, `check_relays`, `changelog` | ··· 27 27 28 28 1. **`post` lives in `bluesky.py` but `reply_to` / `like_post` / `repost_post` live in `posting.py`.** these are the same shape of action — write to bluesky. one of those modules can absorb the other. 29 29 2. **`follow_user` is in `feeds.py`.** following is a graph operation, not a feed operation. it has nothing to do with the graze-feeds cluster (`create_feed` / `list_feeds` / `delete_feed` / `read_feed` / `read_timeline`). it should move. 30 - 3. **`note`, `save_url`, `create_connection` are all "create a cosmik record"** but split across `memory.py` and `cosmik.py`. they should be one cluster. 31 - 4. **`memory.py` has just `recall` + `note`.** `note` is half memory and half cosmik write — picking one home would be cleaner. 30 + 3. ~~**`note`, `save_url`, `create_connection` are all "create a cosmik record"**~~ — partially resolved: cosmik write tools deleted (replaced by cosmik-records skill via pdsx). `note` was actually a private-memory write to turbopuffer, not a cosmik write — that misclassification was the smell. now renamed `remember` to disambiguate from `observe`. 31 + 4. ~~**`memory.py` has just `recall` + `note`.**~~ resolved: `note` renamed to `remember`. the recall/remember pair (read/write) is now coherent. 32 32 5. **`manage_labels` and `manage_mentionable` are in `bluesky.py`** but they're operator-only self-management of phi's identity boundaries — they belong with `goals` / `observations` (other operator-gated identity stuff) or in their own `self.py`. 33 33 6. **`check_urls` is in `bluesky.py`.** it's a generic URL HEAD request — nothing bluesky about it. 34 34 7. **`check_services`, `check_relays`, `changelog` are scattered across `bluesky.py`** but they're a coherent monitoring cluster — distinct from posting.

+1 -1

docs/ARCHITECTURE.md

··· 4 4 5 5 ## one agent, many entry points 6 6 7 - every entry point ends in the same place: `agent.run()` with a `PhiDeps` carrying whatever context the path needs. tool definitions are the same across paths; the system prompt assembles different dynamic blocks based on what's in `PhiDeps`. the agent decides AND acts inside the run via tool calls — `reply_to`, `like_post`, `post`, `note`, `propose_goal_change`, etc. there's no separate decide-then-dispatch layer. 7 + every entry point ends in the same place: `agent.run()` with a `PhiDeps` carrying whatever context the path needs. tool definitions are the same across paths; the system prompt assembles different dynamic blocks based on what's in `PhiDeps`. the agent decides AND acts inside the run via tool calls — `reply_to`, `like_post`, `post`, `remember`, `propose_goal_change`, etc. there's no separate decide-then-dispatch layer. 8 8 9 9 what changes per path is the user prompt and the deps shape, not the agent. 10 10

+1 -1

docs/memory.md

··· 16 16 17 17 ## 2. private memory (TurboPuffer) 18 18 19 - **source**: extraction agent + phi's `note` tool · **storage**: TurboPuffer vector DB (OpenAI text-embedding-3-small) · **visibility**: private to phi 19 + **source**: extraction agent + phi's `remember` tool · **storage**: TurboPuffer vector DB (OpenAI text-embedding-3-small) · **visibility**: private to phi 20 20 21 21 ### namespaces 22 22

+22 -4

skills/publish-blog/SKILL.md

··· 1 1 --- 2 2 name: publish-blog 3 - description: Publish a long-form post on greengale.app. Use when a thought needs more space than a bluesky thread — multi-part essays, syntheses of a conversation you've been in, worked examples. For single observations use post; for a URL or note as public memory write a network.cosmik.card via pdsx (the cosmik-records skill has the per-record-type schema details); for private notes to your future self use the note tool. 3 + description: Publish a long-form post on greengale.app. Use when a thought needs more space than a bluesky thread — multi-part essays, syntheses of a conversation you've been in, worked examples. For single observations use post; for a URL or note as public memory write a network.cosmik.card via pdsx (the cosmik-records skill has the per-record-type schema details); for private notes to your future self use the remember tool. 4 4 --- 5 5 6 6 ## structure that's worked ··· 17 17 18 18 standard capitalization in long-form — readers expect it. lowercase stays for the accompanying bsky post. 19 19 20 - ## gotchas 20 + ## procedure 21 + 22 + before publishing: 23 + 24 + 1. call `list_blog_posts` (or `pub_search(author="phi.zzstoatzz.io", platform="greengale")`) to scan your existing post titles. **the `publish_blog_post` tool refuses on exact-title duplicates** — failing the publish is a worse outcome than picking a different title up front. 25 + 2. verify any AT-URI you plan to cite via `pdsx.get_record` first. broken rkeys in blog posts are harder to retract than in tweets. 26 + 27 + publishing: 21 28 22 - - verify any AT-URI you cite via `pdsx.get_record` first. broken rkeys in blog posts are harder to retract than in tweets. 23 - - tag with specific topic words, not meta-categories (`atproto` ✓, `thoughts` ✗). 29 + 3. call `publish_blog_post(title, content, tags)`. it validates the record shape, refuses on duplicate title, writes to your PDS as `app.greengale.document`, and returns the public URL. 30 + 31 + after publishing: 32 + 33 + 4. call `remember(content="published blog: <title> — <url>", tags=["blog", "greengale", ...topic_tags])` to leave a private-memory pointer for future-you. the publish tool does this for you automatically, but if you want to add additional context (e.g. a synthesized takeaway you don't want to lose), use `remember` again. 34 + 35 + ## tags 36 + 37 + specific topic words, not meta-categories (`atproto` ✓, `thoughts` ✗). 3–6 tags is plenty. 38 + 39 + ## why a tool plus a skill 40 + 41 + `publish_blog_post` is structural — it enforces the duplicate-title refusal and writes the post-publish episodic memory. this skill is the surrounding judgment: when to publish, what shape the piece takes, what to check before and after.

+10 -1

src/bot/agent.py

··· 139 139 # (skill names + descriptions) is injected automatically by the 140 140 # toolset on pydantic-ai>=1.74. Full SKILL.md bodies are loaded on 141 141 # demand via load_skill. 142 - self.skills_toolset = SkillsToolset(directories=[settings.skills_dir]) 142 + # 143 + # exclude_tools=['run_skill_script']: every skill we ship is 144 + # documentation-only (markdown bodies + resource files). leaving 145 + # the script-execution tool registered is extra capability surface 146 + # phi never uses — and would silently expose subprocess execution 147 + # if someone added a script to a skill folder by accident. 148 + self.skills_toolset = SkillsToolset( 149 + directories=[settings.skills_dir], 150 + exclude_tools=["run_skill_script"], 151 + ) 143 152 self.graze_client = GrazeClient( 144 153 handle=settings.bluesky_handle, password=settings.bluesky_password 145 154 )

+15 -9

src/bot/tools/memory.py

··· 1 - """Memory tools — private recall and note-taking.""" 1 + """Memory tools — private recall (read) and remember (write).""" 2 2 3 3 from pydantic_ai import RunContext 4 4 ··· 12 12 def register(agent): 13 13 @agent.tool 14 14 async def recall(ctx: RunContext[PhiDeps], query: str, about: str = "") -> str: 15 - """Search your private memory. Use to remember past conversations and what you know about specific people. 15 + """Search your private memory. Use to find past conversations and what you know about specific people. 16 16 Pass about="@handle" to search a specific user, or leave empty for general private recall. 17 - For public network knowledge, use search_network instead.""" 17 + For public network knowledge, use search_network instead. The write-side companion is `remember`.""" 18 18 if not ctx.deps.memory: 19 19 return "memory not available" 20 20 ··· 40 40 return "\n".join(_format_user_results(results, about)) 41 41 42 42 @agent.tool 43 - async def note( 43 + async def remember( 44 44 ctx: RunContext[PhiDeps], 45 45 content: str, 46 46 tags: list[str], 47 47 source_uri: str = "", 48 48 ) -> str: 49 - """Leave a note for your future self. Stored privately for fast vector recall. 49 + """Save something to your private memory for future semantic recall. 50 50 51 - Pass source_uri when the note is grounded in a specific post, thread, 52 - or card you can cite — it makes the note checkable later. Empty is 53 - allowed when the thought is purely your own, but cite when you can. 51 + Writes to your private vector store (turbopuffer episodic namespace) 52 + — searchable later via `recall`, never surfaces back to you on its 53 + own. Distinct from `observe`, which puts something in your bounded 54 + active-attention pool that re-surfaces in [ACTIVE OBSERVATIONS]. 55 + 56 + Pass source_uri when the memory is grounded in a specific post, 57 + thread, or card you can cite — it makes it checkable later. Empty 58 + is allowed when the thought is purely your own, but cite when you 59 + can. 54 60 """ 55 61 if ctx.deps.memory: 56 62 sources = [source_uri] if source_uri else None 57 63 await ctx.deps.memory.store_episodic_memory( 58 64 content, tags, source="tool", source_uris=sources 59 65 ) 60 - return f"noted — {content[:100]}" 66 + return f"remembered — {content[:100]}" 61 67 return "private memory not available"

Configure Feed

Configure Feed