a digital entity named phi that roams bsky phi.zzstoatzz.io
2
fork

Configure Feed

Select the types of activity you want to include in your feed.

fix: resolve facet link URIs so phi sees full URLs, not truncated display text

bluesky truncates long URLs in post display text (e.g. "example.com/long...")
but stores the actual URI in the facet. phi was only reading post.record.text,
so it never saw the real link — causing it to guess slugs in a loop.

add resolve_facet_links() that splices facet URIs back into the text at
their byte offsets. used in both _handle_post (mention text) and
describe_post (thread context).

reverts all agent.py changes from v0.0.13/v0.0.14 — those were treating
downstream symptoms, not the root cause.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

+174 -24
+3 -19
src/bot/agent.py
··· 97 97 """Build operational instructions with the current owner handle interpolated.""" 98 98 return f""" 99 99 indicate your response action via the structured output — do not use atproto tools to post, like, or repost directly. 100 - when sharing URLs you found yourself (not ones the user gave you), you can verify them with check_urls. always include https://. never call check_urls more than once per URL — accept the result. 100 + when sharing URLs, verify them with check_urls first and always include https://. 101 101 102 102 you receive all notification types — mentions, replies, quotes, likes, reposts, and follows. 103 103 for mentions, replies, and quotes: someone is talking to you or about you. respond if you have something to say. ··· 142 142 only use this during daily reflection, or when someone explicitly asks about infrastructure, services, or uptime of specific apps. 143 143 if something is down, post about it and tag @{settings.owner_handle}. 144 144 145 - IMPORTANT: you have a limited tool call budget per message. do not call the same tool repeatedly with the same or similar arguments — accept the result you get. never paginate through list_records repeatedly. if you need more data than one call returns, work with what you have. if you cannot fulfill a request with the tools you have (e.g. reading a web page), say so instead of looping. 145 + IMPORTANT: never paginate through list_records repeatedly. if you need more data than one call returns, work with what you have. endless pagination wastes your request budget and produces no response. 146 146 """.strip() 147 147 148 148 ··· 638 638 except Exception as e: 639 639 return f"failed to post: {e}" 640 640 641 - self._checked_urls: set[str] = set() 642 - 643 641 @self.agent.tool 644 642 async def check_urls(ctx: RunContext[PhiDeps], urls: list[str]) -> str: 645 - """Check whether URLs are reachable (HEAD request only — cannot read page content). If a URL 404s, do not guess alternative slugs — tell the user you can't find it.""" 646 - 647 - # if we've already checked many URLs, the model is probably guessing slugs 648 - new_urls = [u for u in urls if u not in self._checked_urls] 649 - if not new_urls and self._checked_urls: 650 - return "already checked these URLs. stop guessing and respond with what you know." 651 - if len(self._checked_urls) > 10: 652 - return ( 653 - f"you've already checked {len(self._checked_urls)} URLs in this conversation. " 654 - "stop guessing slugs — you cannot browse the web. tell the user you " 655 - "couldn't find the exact page and share what you found instead." 656 - ) 643 + """Check whether URLs are reachable. Use this before sharing links to verify they actually work. Accepts full URLs (https://...) or bare domains (example.com/path).""" 657 644 658 645 async def _check(client: httpx.AsyncClient, url: str) -> str: 659 646 if not url.startswith(("http://", "https://")): 660 647 url = f"https://{url}" 661 - self._checked_urls.add(url) 662 648 try: 663 649 hostname = urlparse(url).hostname 664 650 if not hostname: ··· 848 834 ) -> Response: 849 835 """Process a mention with structured memory context.""" 850 836 logger.info(f"processing mention from @{author_handle}: {mention_text[:80]}") 851 - self._checked_urls.clear() 852 837 853 838 deps = PhiDeps( 854 839 author_handle=author_handle, ··· 892 877 async def process_reflection(self, last_post_text: str | None = None) -> Response: 893 878 """Generate a daily reflection post from recent memory.""" 894 879 logger.info("processing daily reflection") 895 - self._checked_urls.clear() 896 880 897 881 # Pre-fetch context that doesn't benefit from semantic search against the prompt 898 882 recent_activity = ""
+7 -2
src/bot/services/message_handler.py
··· 11 11 from bot.agent import PhiAgent 12 12 from bot.core.atproto_client import BotClient 13 13 from bot.status import bot_status 14 - from bot.utils.thread import build_thread_context, describe_embed, extract_image_urls 14 + from bot.utils.thread import ( 15 + build_thread_context, 16 + describe_embed, 17 + extract_image_urls, 18 + resolve_facet_links, 19 + ) 15 20 16 21 logger = logging.getLogger("bot.handler") 17 22 ··· 135 140 return 136 141 137 142 post = posts.posts[0] 138 - mention_text = post.record.text 143 + mention_text = resolve_facet_links(post.record) 139 144 author_handle = post.author.handle 140 145 141 146 # Include embed content (images, links, quote posts) in the mention
+52 -3
src/bot/utils/thread.py
··· 3 3 from collections.abc import Callable 4 4 5 5 6 + def resolve_facet_links(record) -> str: 7 + """Return post text with truncated link display text replaced by actual URIs from facets. 8 + 9 + Bluesky truncates long URLs in the display text (e.g. "example.com/long-path..." 10 + but stores the full URI in the facet. This walks facets right-to-left and splices 11 + the real URI back into the text so downstream consumers see the full link. 12 + """ 13 + text = getattr(record, "text", "") or "" 14 + facets = getattr(record, "facets", None) 15 + if not facets: 16 + return text 17 + 18 + # collect link facets with byte ranges 19 + link_facets = [] 20 + for facet in facets: 21 + index = getattr(facet, "index", None) 22 + features = getattr(facet, "features", None) or [] 23 + if not index: 24 + continue 25 + for feature in features: 26 + py_type = getattr(feature, "py_type", "") 27 + if "link" in py_type: 28 + uri = getattr(feature, "uri", "") 29 + if uri: 30 + link_facets.append( 31 + ( 32 + getattr(index, "byte_start", 0), 33 + getattr(index, "byte_end", 0), 34 + uri, 35 + ) 36 + ) 37 + 38 + if not link_facets: 39 + return text 40 + 41 + # sort by byte_start descending so replacements don't shift earlier offsets 42 + link_facets.sort(key=lambda x: x[0], reverse=True) 43 + 44 + encoded = text.encode("utf-8") 45 + for start, end, uri in link_facets: 46 + encoded = encoded[:start] + uri.encode("utf-8") + encoded[end:] 47 + 48 + return encoded.decode("utf-8") 49 + 50 + 6 51 def describe_embed(embed) -> str | None: 7 52 """Extract a human-readable description from a post embed. 8 53 ··· 110 155 def describe_post(post) -> str: 111 156 """Build a full text representation of a post including embeds.""" 112 157 handle = post.author.handle 113 - text = post.record.text if hasattr(post.record, "text") else "" 158 + text = resolve_facet_links(post.record) if hasattr(post.record, "text") else "" 114 159 115 160 # Check for embeds on the post view (post.embed) or record (post.record.embed) 116 161 embed_desc = None ··· 120 165 embed_desc = describe_embed(post.record.embed) 121 166 122 167 if embed_desc: 123 - return f"@{handle}: {text}\n {embed_desc}" if text else f"@{handle}: {embed_desc}" 168 + return ( 169 + f"@{handle}: {text}\n {embed_desc}" if text else f"@{handle}: {embed_desc}" 170 + ) 124 171 return f"@{handle}: {text}" if text else f"@{handle}: [no text]" 125 172 126 173 ··· 151 198 152 199 # Traverse parent chain (moving up the thread) 153 200 if include_parent and hasattr(thread_node, "parent") and thread_node.parent: 154 - traverse_thread(thread_node.parent, visit, include_parent=True, include_replies=False) 201 + traverse_thread( 202 + thread_node.parent, visit, include_parent=True, include_replies=False 203 + ) 155 204 156 205 # Traverse replies (moving down the thread) 157 206 if include_replies and hasattr(thread_node, "replies") and thread_node.replies:
+112
tests/test_resolve_facets.py
··· 1 + """Regression tests for resolve_facet_links — ensures phi sees full URLs from facets.""" 2 + 3 + from types import SimpleNamespace 4 + 5 + from bot.utils.thread import resolve_facet_links 6 + 7 + 8 + def _make_record(text, facets=None): 9 + return SimpleNamespace(text=text, facets=facets) 10 + 11 + 12 + def _make_facet(byte_start, byte_end, uri): 13 + return SimpleNamespace( 14 + index=SimpleNamespace(byte_start=byte_start, byte_end=byte_end), 15 + features=[SimpleNamespace(py_type="app.bsky.richtext.facet#link", uri=uri)], 16 + ) 17 + 18 + 19 + def test_no_facets(): 20 + record = _make_record("hello world") 21 + assert resolve_facet_links(record) == "hello world" 22 + 23 + 24 + def test_none_facets(): 25 + record = _make_record("hello world", facets=None) 26 + assert resolve_facet_links(record) == "hello world" 27 + 28 + 29 + def test_truncated_url_replaced(): 30 + """The exact bug from trace 019d5004 — bluesky truncated the URL in display text.""" 31 + text = ( 32 + "cool. onto something else\n\n" 33 + "can you read this: www.letta.com/blog/context...\n\n" 34 + "and tell me whether there's anything interesting as far as your M.O.?" 35 + ) 36 + # byte offsets for "www.letta.com/blog/context..." in the text 37 + encoded = text.encode("utf-8") 38 + start = encoded.index(b"www.letta.com/blog/context...") 39 + end = start + len(b"www.letta.com/blog/context...") 40 + 41 + record = _make_record( 42 + text, 43 + facets=[ 44 + _make_facet(start, end, "https://www.letta.com/blog/context-constitution") 45 + ], 46 + ) 47 + 48 + result = resolve_facet_links(record) 49 + assert "https://www.letta.com/blog/context-constitution" in result 50 + assert "context..." not in result 51 + 52 + 53 + def test_multiple_links(): 54 + text = "check out link1... and link2..." 55 + encoded = text.encode("utf-8") 56 + s1 = encoded.index(b"link1...") 57 + e1 = s1 + len(b"link1...") 58 + s2 = encoded.index(b"link2...") 59 + e2 = s2 + len(b"link2...") 60 + 61 + record = _make_record( 62 + text, 63 + facets=[ 64 + _make_facet(s1, e1, "https://example.com/link1-full"), 65 + _make_facet(s2, e2, "https://example.com/link2-full"), 66 + ], 67 + ) 68 + 69 + result = resolve_facet_links(record) 70 + assert "https://example.com/link1-full" in result 71 + assert "https://example.com/link2-full" in result 72 + assert "link1..." not in result 73 + assert "link2..." not in result 74 + 75 + 76 + def test_mention_facet_ignored(): 77 + """Mention facets should not affect the text.""" 78 + text = "hey @someone check this" 79 + record = _make_record( 80 + text, 81 + facets=[ 82 + SimpleNamespace( 83 + index=SimpleNamespace(byte_start=4, byte_end=12), 84 + features=[ 85 + SimpleNamespace( 86 + py_type="app.bsky.richtext.facet#mention", 87 + did="did:plc:abc", 88 + ) 89 + ], 90 + ) 91 + ], 92 + ) 93 + assert resolve_facet_links(record) == text 94 + 95 + 96 + def test_unicode_text_byte_offsets(): 97 + """Facet byte offsets are in UTF-8 bytes, not characters.""" 98 + # emoji is 4 bytes in UTF-8 99 + text = "\U0001f600 see link..." 100 + encoded = text.encode("utf-8") 101 + start = encoded.index(b"link...") 102 + end = start + len(b"link...") 103 + 104 + record = _make_record( 105 + text, 106 + facets=[_make_facet(start, end, "https://example.com/full-link")], 107 + ) 108 + 109 + result = resolve_facet_links(record) 110 + assert "https://example.com/full-link" in result 111 + assert "link..." not in result 112 + assert result.startswith("\U0001f600")