batch notification handling — one cognitive event per poll, action via tools
The unit of work is no longer "a notification" — it's "the set of new
notifications since the last poll." When phi polls and sees activity,
she now gets the entire batch in one agent run, looks at it as a whole,
and decides what (if anything) to do about each item. Multiple replies
in one thread by the same author no longer produce N independent
agent runs and N redundant responses.
The action layer also moved off structured output. Previously the
agent returned `Response(action="reply", text="...", reason="...")`
and the handler interpreted that to call create_post. Now the agent
calls trusted posting tools directly: reply_to / like_post / repost_post
(new) and the existing `post` tool for top-level musings/reflections.
The structured-output layer was load-bearing for mention-consent
allowlists, reply-ref construction, memory writes, status metrics,
and grapheme-aware splitting — all of those concerns moved into the
tool implementations themselves, so nothing was lost.
what changed:
agent.py
- main agent's output_type: Response -> str (just a summary for logs)
- new _format_notifications_block(): renders the batch as a thread-grouped
+ engagement-listed [NEW NOTIFICATIONS] block for the system prompt
- new inject_notifications dynamic system prompt
- inject_user_memory rewritten to walk notifications_context and build
per-author memory blocks (deduped, core memory included once)
- inject_episodic uses notification post texts as the query when present,
falls back to user prompt for musing/reflection
- inject_author_lookup -> inject_author_lookups (dict of stranger blocks)
- process_mention -> process_notifications(notifications_context, ...)
- process_musing / process_reflection updated to drop Response interpretation
- operational instructions: replaced "indicate via structured output" with
trusted-posting-tool docs and "treat author chains in one thread as one
logical message"
- removed Response class entirely
tools/posting.py (new)
- reply_to(uri, text): wraps bot_client.create_post with mention-consent
allowlist, reply-ref construction, memory writes, status recording.
refuses URIs not in current notifications_context.
- like_post(uri): wraps bot_client.like_post. context-validated.
- repost_post(uri): wraps bot_client.repost. context-validated.
- (top-level posts still go through the existing `post` tool in bluesky.py)
tools/_helpers.py PhiDeps
- new notifications_context: dict | None — populated by handler before agent.run
- author_lookup -> author_lookups: dict[handle, str] | None — for batched lookups
services/message_handler.py
- replaced _handle_post / _handle_engagement / _handle_follow / handle_notification
with one handle_batch(notifications) method
- handle_batch: filters rate-limited authors, builds notifications_context with
per-notification context (via _build_post_entry / _build_engagement_entry /
_build_follow_entry), eagerly looks up unfamiliar authors deduped by handle,
calls process_notifications once
- original_thought / daily_reflection updated for str return; posting happens
via the `post` tool inside the agent run
- explore() unchanged (process_exploration still returns int)
services/notification_poller.py
- _check_notifications dispatches one task per poll for the whole batch,
not one per notification
- _handle_with_semaphore -> _handle_batch_with_semaphore
evals/conftest.py
- defines a local Response model for the eval test agents (which still want
structured output for assertion convenience). production no longer has
Response so the import was broken.
tests/test_rate_limiting.py
- updated TestMessageHandlerRateLimiting to assert on handle_batch behavior:
rate-limited authors get filtered out of the batch, and if nothing remains
after filtering, the agent run is skipped entirely.
what stays the same:
- extraction agent (result-shaped, ExtractionResult unchanged)
- exploration agent (result-shaped, ExplorationResult unchanged)
- personality file
- memory layer
- MCP toolsets
- mention-consent allowlist logic (now lives inside the trusted tools)
prompted by the duplicate-reply incident: phi got a 2-post chain from
devlog, processed each post as a separate mention notification, and
posted two separate replies. operator pointed out that the framing of
"one notification = one task" was the bug — the unit of work should be
"the conversation phi is currently looking at," and the tool layer is
the natural place for the agent to decide where and how to act.