add spam detection to exploration: mute + skip storage for bot farms

+1 -1

loq.toml

··· 21 21 22 22 [[rules]] 23 23 path = "src/bot/main.py" 24 - max_lines = 826 24 + max_lines = 846

+106

notes/spam-handling-proposal.md

··· 1 + # problem: exploration stores detailed profiles of spam accounts 2 + 3 + ## what's happening 4 + 5 + phi's exploration pipeline researches unfamiliar accounts that appear in notifications or the For You feed. when it encounters a reply-spammer or content farm, it dutifully stores 5 detailed findings in turbopuffer — same as it would for a genuine person. 6 + 7 + example from today: phi explored `coachchron.com` and stored 5 embeddings about their coaching brand, newsletter, pinned post, and reply-farming patterns. the exploration agent's own summary said "mass reply-farming, 25 replies within ~30 minutes, likely automated." phi correctly identified them as spam and then carefully filed away everything it learned about them. 8 + 9 + this is wasteful in two ways: 10 + - **storage cost**: 5 embeddings in turbopuffer for an account phi should never think about again 11 + - **recall pollution**: those observations surface in future context when coachchron appears in a notification, burning embedding queries and attention budget on spam 12 + 13 + ## how exploration currently works 14 + 15 + 1. curiosity queue produces a work item (e.g. `explore_handle: coachchron.com`) 16 + 2. exploration agent runs with MCP tools — reads profile, posts, publications 17 + 3. agent returns `ExplorationResult`: up to 5 `ExplorationFinding`s, up to 2 follow-ups, a summary string 18 + 4. `process_exploration()` iterates all findings and stores each one unconditionally — per-user namespace if `target_handle` is set, episodic memory otherwise 19 + 5. queue item marked completed 20 + 21 + there is no decision point between "the agent assessed this person" and "we store everything." the agent already reaches the right conclusion ("likely automated") but the pipeline doesn't act on it. 22 + 23 + ## relevant existing infrastructure 24 + 25 + - `client.mute(actor: str) -> bool` — atproto SDK method, suppresses account from notifications and feeds. no unmute-on-restart risk since mute is a server-side record. 26 + - `ExplorationResult.summary` — free-text field where the agent already writes assessments like "mass reply-farming." currently used only for logging. 27 + - exploration prompt already says "if you find nothing worth noting, return empty findings." the agent doesn't apply this to spam accounts because it isn't told to. 28 + 29 + ## proposed design 30 + 31 + **the exploration agent should decide whether an account is worth remembering.** it already does the research and reaches a conclusion — the pipeline just needs to respect that conclusion. 32 + 33 + three changes: 34 + 35 + ### 1. add `mute_subject: bool` to ExplorationResult 36 + 37 + ```python 38 + class ExplorationResult(BaseModel): 39 + findings: list[ExplorationFinding] = Field(...) 40 + follow_ups: list[dict] = Field(...) 41 + summary: str = Field(...) 42 + mute_subject: bool = Field( 43 + default=False, 44 + description="true if the subject is a spammer, bot farm, or content engine " 45 + "not worth tracking. findings should be empty when this is true.", 46 + ) 47 + ``` 48 + 49 + this is a structured signal from the agent, not a heuristic. the agent has already seen the profile, posts, and patterns — it's making a judgment call with evidence. 50 + 51 + ### 2. update exploration prompt 52 + 53 + add to the exploration system prompt: 54 + 55 + ``` 56 + if the subject is a spammer, bot farm, or automated content engine — set mute_subject=true 57 + and return empty findings. don't store detailed observations about accounts that aren't genuine. 58 + the threshold is high: replying a lot is not spam. 25 generic replies in 30 minutes to strangers' 59 + threads is. 60 + ``` 61 + 62 + ### 3. update process_exploration to act on the signal 63 + 64 + in `process_exploration()`, after the agent returns: 65 + 66 + ```python 67 + if output.mute_subject and kind == "explore_handle": 68 + # mute so they don't appear in notifications/feeds again 69 + try: 70 + resolved = bot_client.client.resolve_handle(subject) 71 + bot_client.client.mute(resolved.did) 72 + except Exception as e: 73 + logger.warning(f"failed to mute {subject}: {e}") 74 + 75 + # store one line, not five — just enough to know we already dealt with them 76 + if self.memory: 77 + await self.memory.store_episodic_memory( 78 + content=f"muted @{subject} — {output.summary[:150]}", 79 + tags=["muted", "spam"], 80 + source="exploration", 81 + ) 82 + 83 + await complete(rkey) 84 + return 0 # no findings stored, intentionally 85 + ``` 86 + 87 + when `mute_subject` is false, the existing flow is unchanged — findings stored as before. 88 + 89 + ## what this gets right 90 + 91 + - **decision lives in the agent**, not in a heuristic. the same model that researches the account decides whether it's worth remembering. this is where the judgment should be — after seeing the evidence. 92 + - **uses the platform's social tools**. mute is the correct atproto primitive for "i don't want to hear from this account." it's server-side, survives restarts, and is reversible (unmute exists). 93 + - **one embedding instead of five** for spam accounts. enough to know "we already handled this" without detailed recall. 94 + - **high threshold is built into the prompt**, not a numeric cutoff. "replying a lot is not spam. 25 generic replies in 30 minutes is." 95 + 96 + ## what could go wrong 97 + 98 + - **false positive mutes**: phi mutes a genuine person who just happened to be noisy. mitigation: the threshold language in the prompt is deliberately conservative, and mute is reversible — the operator can unmute via the control API or directly. 99 + - **mute accumulation**: over time phi mutes hundreds of accounts. this is probably fine — mute lists are lightweight on the PDS — but worth monitoring. 100 + - **agent doesn't use the field**: the model might never set `mute_subject=true` because it's cautious. this is the safe failure mode — worst case is the status quo (5 findings stored for spammers). 101 + 102 + ## alternatives considered 103 + 104 + - **post-exploration classifier**: a separate model or heuristic that reviews findings and decides whether to keep them. rejected because the exploration agent already has the context — adding a second pass is overhead for a decision that should happen at the source. 105 + - **disposition enum (spam/genuine/unclear)**: more structured than a boolean, but the only actionable disposition is "mute." genuine and unclear both result in the same behavior (store findings). a three-way enum would be modeling a distinction without a difference. 106 + - **block instead of mute**: block is stronger (prevents the account from seeing phi's posts) and creates a public record. mute is private and sufficient — phi just needs to stop hearing from them, not make a public statement.

+26

src/bot/agent.py

··· 678 678 output = result.output 679 679 logger.info(f"exploration result: {output.summary}") 680 680 681 + # handle mute decisions — skip detailed storage, mute the account 682 + if output.mute_subject and kind == "explore_handle": 683 + logger.info(f"muting @{subject}: {output.mute_reason}") 684 + try: 685 + await bot_client.authenticate() 686 + resolved = bot_client.client.resolve_handle(subject) 687 + bot_client.client.mute(resolved.did) 688 + except Exception as e: 689 + logger.warning(f"failed to mute {subject}: {e}") 690 + # store one user-scoped marker so is_stranger() sees it 691 + if self.memory: 692 + reason = output.mute_reason or output.summary[:150] 693 + evidence = ( 694 + f" [evidence: {', '.join(output.mute_evidence)}]" 695 + if output.mute_evidence 696 + else "" 697 + ) 698 + await self.memory.store_exploration_note( 699 + handle=subject, 700 + content=f"muted — {reason}{evidence}", 701 + tags=["muted", "spam"], 702 + evidence_uris=output.mute_evidence, 703 + ) 704 + await complete(rkey) 705 + return 0 706 + 681 707 total_stored = 0 682 708 683 709 # store findings

+18

src/bot/exploration.py

··· 39 39 default="", 40 40 description="brief log-friendly summary of what was explored", 41 41 ) 42 + mute_subject: bool = Field( 43 + default=False, 44 + description="true if the subject is a spammer, bot farm, or content engine " 45 + "not worth tracking. findings should be empty when this is true.", 46 + ) 47 + mute_reason: str = Field( 48 + default="", 49 + description="when mute_subject is true, why — e.g. 'reply spammer, " 50 + "25 generic replies in 30 minutes to strangers' threads'", 51 + ) 52 + mute_evidence: list[str] = Field( 53 + default_factory=list, 54 + description="AT-URIs or URLs supporting the mute decision", 55 + ) 42 56 43 57 44 58 EXPLORATION_SYSTEM_PROMPT = """\ ··· 55 69 - max 5 findings per exploration. quality over quantity. 56 70 - max 2 follow_ups — only if something genuinely interesting branches off. 57 71 - if you find nothing worth noting, return empty findings with a summary explaining why. 72 + - if the subject is a spammer, bot farm, or automated content engine: set mute_subject=true, 73 + explain in mute_reason, cite evidence in mute_evidence, and return empty findings. 74 + the threshold is high: replying a lot is not spam. 25 generic replies in 30 minutes 75 + to strangers' threads is. 58 76 59 77 Tools available: 60 78 - list_records / get_record: read atproto records (profiles, posts)

+20

src/bot/main.py

··· 415 415 return {"triggered": True} 416 416 417 417 418 + @app.post("/api/control/unmute") 419 + async def unmute_account(request: Request): 420 + """Unmute an account by handle.""" 421 + if err := _check_control_token(request): 422 + return err 423 + body = await request.json() 424 + handle = body.get("handle", "") 425 + if not handle: 426 + return JSONResponse({"error": "handle required"}, status_code=400) 427 + try: 428 + await bot_client.authenticate() 429 + resolved = bot_client.client.resolve_handle(handle) 430 + bot_client.client.unmute(resolved.did) 431 + logger.info(f"unmuted @{handle}") 432 + return {"unmuted": handle} 433 + except Exception as e: 434 + logger.error(f"failed to unmute @{handle}: {e}") 435 + return JSONResponse({"error": str(e)}, status_code=500) 436 + 437 + 418 438 @app.get("/status", response_class=HTMLResponse) 419 439 async def status_page(): 420 440 """Status page."""

Configure Feed

Configure Feed