fix duplicate scheduled posts: seed schedule from history, fetch more recent posts for dedup

Two related bugs caused phi to post duplicate top-level musings/reflections.

bug 1: restart wipes scheduling state. _last_daily_post and
_last_thought_hours are in-memory only, so every deploy resets them.
After a deploy, phi re-runs today's daily reflection and any scheduled
thought-post hours that already happened, regardless of whether they
were already done. With multiple deploys per day, phi accumulates
duplicate scheduled posts.

fix 1: at poller startup, fetch phi's recent top-level posts and
seed _last_daily_post / _last_thought_hours from observed history.
Any top-level post made today UTC at or after daily_reflection_hour
marks the daily slot as filled. Any post during a thought_post_hours
hour marks that hour as filled. The heuristic is loose (it can mark
slots based on non-scheduled top-level posts) but the worst case is
phi staying quiet when she could have posted, which is the safe
failure mode — silence is fine, double-posting is not.

bug 2: daily_reflection only fetched limit=1 recent post for the
dedup-avoidance check. Phi correctly avoided duplicating her newest
post (the leaf insect reflection) but blindly duplicated an older
content-engine post from 67 minutes earlier because it wasn't in
the dedup window. Phi's own self-summary said "doesn't retread the
leaf insect ground" — she only checked against the most recent post
because that's all she was given.

fix 2: bump daily_reflection's recent_posts fetch from limit=1 to
limit=10, change process_reflection to accept a list (same shape as
process_musing) and inject all 10 posts via recent_activity. Also
bump original_thought from limit=5 to limit=10 for consistency.
Tighten both reflection and musing prompts: "BEFORE posting, scan
ALL of your recent top-level posts ... if any of them already cover
the ground (same topic, same observation, same incident), stay
quiet. rephrasing your own recent post counts as duplicating it."

caught while testing the batching refactor — phi made a content-engine
musing at 14:03, a leaf insect reflection at 14:36, then after a
v0.0.32 deploy at 15:10 ran BOTH the daily reflection and 15:00 thought
post concurrently in the same poll cycle. The musing posted a new nutella
observation, the reflection posted a near-duplicate of the 14:03
content-engine post.

zzstoatzz 2 months ago 9f350be3 848fd08f

+105 -15

3 changed files

expand all

src

bot

agent.py

services

message_handler.py

notification_poller.py

+27 -8

src/bot/agent.py

··· 460 460 logger.info(f"batch run finished: {summary[:200]}") 461 461 return summary 462 462 463 - async def process_reflection(self, last_post_text: str | None = None) -> str: 463 + async def process_reflection(self, recent_posts: list[str] | None = None) -> str: 464 464 """Generate a daily reflection post from recent memory. 465 465 466 466 Side effects (posting) happen via the `post` tool inside the agent run. 467 467 Return value is just a summary string for logging. 468 + 469 + recent_posts is phi's recent top-level posts (most recent first), used 470 + by the agent to avoid duplicating themes she's already covered today. 468 471 """ 469 472 logger.info("processing daily reflection") 470 473 471 474 # Pre-fetch context that doesn't benefit from semantic search against the prompt 472 - recent_activity = "" 475 + recent_activity_parts: list[str] = [] 476 + 477 + # Phi's recent top-level posts — to avoid duplicating themes she's 478 + # already covered today. Show as a list so the model can scan for 479 + # both the most recent post AND older posts in the same window. 480 + if recent_posts: 481 + posts_block = "\n".join(f"- {p[:300]}" for p in recent_posts) 482 + recent_activity_parts.append( 483 + "[YOUR RECENT TOP-LEVEL POSTS — do not repeat any of these themes]:\n" 484 + f"{posts_block}" 485 + ) 486 + 473 487 if self.memory: 474 488 try: 475 489 recent_interactions = await self.memory.get_recent_interactions( ··· 490 504 f"- with @{i['handle']}: {i['content'][:150]}" 491 505 ) 492 506 lines.append("[SAMPLE EXCHANGES]:\n" + "\n".join(exchange_lines)) 493 - recent_activity = "\n\n".join(lines) 507 + recent_activity_parts.append("\n\n".join(lines)) 494 508 else: 495 - recent_activity = ( 509 + recent_activity_parts.append( 496 510 "[RECENT ACTIVITY]: no interactions in the last day" 497 511 ) 498 512 except Exception as e: 499 513 logger.warning(f"failed to get recent interactions for reflection: {e}") 514 + 515 + recent_activity = "\n\n".join(recent_activity_parts) 500 516 501 517 service_health = "" 502 518 try: ··· 507 523 deps = PhiDeps( 508 524 author_handle="", 509 525 memory=self.memory, 510 - last_post_text=last_post_text, 511 526 recent_activity=recent_activity, 512 527 service_health=service_health, 513 528 ) ··· 516 531 "you have a moment to post a short top-level reflection on your day. " 517 532 "not a thread, not a reply — just something you want to put out there. " 518 533 "use what you know: recent exchanges, things you noticed, or just the fact that you're here. " 519 - "if your last post already covers this ground, or you'd just be rehashing the same themes, " 520 - "stay quiet — don't post for the sake of posting. " 534 + "BEFORE posting, scan ALL of your recent top-level posts in the [YOUR RECENT TOP-LEVEL POSTS] block — " 535 + "not just the most recent one. if any of them already cover the ground you were going to cover " 536 + "(same topic, same observation, same incident), stay quiet. don't rephrase recent thoughts as if they were new. " 537 + "rephrasing your own recent post counts as duplicating it. when in doubt, don't post. " 521 538 "if you do post, use the `post` tool with brief, genuine text — your voice, not a performance." 522 539 ) 523 540 ··· 576 593 musing_task = ( 577 594 "you have a moment. if something's been on your mind — something you read, " 578 595 "a pattern you noticed, a question that's been sitting with you — share it. " 579 - "check your recent posts first. if you'd just be echoing yourself, skip it. " 596 + "BEFORE posting, scan ALL of your recent top-level posts in the [YOUR RECENT POSTS] block — " 597 + "not just the most recent one. if any of them already cover the ground (same topic, same observation, " 598 + "same incident), stay quiet. rephrasing your own recent post counts as duplicating it. " 580 599 "this is your feed; post things you'd want to follow yourself for. " 581 600 "use your tools — search posts, check trending, look things up — if something " 582 601 "sparks your curiosity. but don't force it. if nothing's there, just stay quiet. "

+15 -7

src/bot/services/message_handler.py

··· 276 276 with logfire.span("original thought"): 277 277 recent_posts: list[str] = [] 278 278 try: 279 - feed = await self.client.get_own_posts(limit=5) 279 + # Pull 10 recent top-level posts so the musing agent can scan 280 + # for duplication across a real history window, not just the 281 + # last few posts. 282 + feed = await self.client.get_own_posts(limit=10) 280 283 for item in feed: 281 284 if hasattr(item.post.record, "text"): 282 285 recent_posts.append(item.post.record.text) ··· 317 320 except Exception as e: 318 321 logger.warning(f"extraction during reflection failed: {e}") 319 322 320 - last_post_text = None 323 + # Fetch the last 10 top-level posts so the reflection agent can 324 + # scan ALL of them for duplication, not just the most recent one. 325 + # Earlier this fetched limit=1, which let phi correctly avoid 326 + # duplicating her newest post but blindly duplicate older ones. 327 + recent_posts: list[str] = [] 321 328 try: 322 - feed = await self.client.get_own_posts(limit=1) 323 - if feed: 324 - last_post_text = feed[0].post.record.text 329 + feed = await self.client.get_own_posts(limit=10) 330 + for item in feed: 331 + if hasattr(item.post.record, "text"): 332 + recent_posts.append(item.post.record.text) 325 333 except Exception as e: 326 - logger.warning(f"failed to fetch last post for reflection: {e}") 334 + logger.warning(f"failed to fetch recent posts for reflection: {e}") 327 335 328 336 try: 329 337 summary = await self.agent.process_reflection( 330 - last_post_text=last_post_text 338 + recent_posts=recent_posts or None 331 339 ) 332 340 logger.info(f"daily reflection: {summary[:200]}") 333 341 except Exception as e:

+63

src/bot/services/notification_poller.py

··· 56 56 if self._background_tasks: 57 57 await asyncio.gather(*self._background_tasks, return_exceptions=True) 58 58 59 + async def _seed_schedule_from_history(self): 60 + """Seed scheduling state from phi's recent post history. 61 + 62 + Without this, every restart wipes _last_daily_post and 63 + _last_thought_hours, causing phi to re-run today's daily reflection 64 + and any thought-post hours that already happened. The fix: at startup, 65 + look at phi's recent top-level posts and infer which schedule slots 66 + have already been filled today. 67 + 68 + Heuristic (deliberately loose): 69 + - any top-level post made today UTC at or after daily_reflection_hour 70 + marks the daily reflection slot as already done 71 + - any top-level post made today UTC during a thought_post_hours hour 72 + marks that hour as already done 73 + 74 + This is approximate — phi makes top-level posts from many contexts 75 + besides scheduled reflections (e.g. agent replies that decided to go 76 + top-level). But the worst case of being approximate is that phi 77 + SKIPS a scheduled post that was actually a reply-shaped post — which 78 + is the safe failure mode (silence is fine, double-posting is not). 79 + """ 80 + try: 81 + feed = await self.client.get_own_posts(limit=20) 82 + except Exception as e: 83 + logger.warning(f"failed to seed schedule from history: {e}") 84 + return 85 + 86 + today = datetime.now(UTC).date() 87 + seeded_daily = False 88 + seeded_hours: set[int] = set() 89 + 90 + for item in feed: 91 + indexed_at = getattr(item.post, "indexed_at", None) 92 + if not indexed_at: 93 + continue 94 + try: 95 + ts = datetime.fromisoformat(indexed_at.replace("Z", "+00:00")) 96 + except (ValueError, TypeError): 97 + continue 98 + if ts.date() != today: 99 + continue 100 + 101 + if not seeded_daily and ts.hour >= settings.daily_reflection_hour: 102 + self._last_daily_post = ts 103 + seeded_daily = True 104 + 105 + if ts.hour in settings.thought_post_hours: 106 + seeded_hours.add(ts.hour) 107 + 108 + if seeded_hours: 109 + self._last_thought_hours = seeded_hours 110 + self._last_thought_date = today 111 + 112 + if seeded_daily or seeded_hours: 113 + logger.info( 114 + f"seeded schedule from history: " 115 + f"daily_done={seeded_daily}, thought_hours={sorted(seeded_hours)}" 116 + ) 117 + 59 118 async def _poll_loop(self): 60 119 """Main polling loop.""" 61 120 await self.client.authenticate() 121 + 122 + # Restore scheduling state from observed post history so deploys 123 + # don't cause duplicate scheduled posts. 124 + await self._seed_schedule_from_history() 62 125 63 126 while self._running: 64 127 try:

Configure Feed

Configure Feed