search for standard sites pub-search.waow.tech
search zig blog atproto
11
fork

Configure Feed

Select the types of activity you want to include in your feed.

refactor: collapse 3 search tools into 1 with mode kwarg

search, search_semantic, search_hybrid → search(mode="keyword"|"semantic"|"hybrid")
mirrors how the backend API actually works (single /search endpoint with mode param).
reduces tool surface for LLM consumers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

+24 -92
+23 -91
mcp/src/pub_search/server.py
··· 23 23 return """\ 24 24 # pub-search MCP 25 25 26 - search ATProto publishing platforms: leaflet, pckt, offprint, greengale, whitewind. 26 + search long-form writing on ATProto: leaflet, pckt, offprint, greengale, whitewind. 27 27 28 28 ## tools 29 29 30 - - `search(query, tag, platform, since, author)` - keyword search with filters 31 - - `search_semantic(query)` - meaning-based search (natural language queries) 32 - - `search_hybrid(query)` - combined keyword + semantic with source annotations 30 + - `search(query, mode, tag, platform, since, author)` - search with mode: keyword, semantic, or hybrid 33 31 - `get_document(uri)` - fetch full content by AT-URI 34 32 - `find_similar(uri)` - find related documents 35 33 - `get_tags()` - available tags ··· 38 36 39 37 ## workflow 40 38 41 - 1. `search("topic")` for keyword search, `search_hybrid("topic")` for best results 39 + 1. `search("topic")` for keyword search, `search("topic", mode="hybrid")` for best results 42 40 2. `get_document(uri)` for full text 43 41 3. `find_similar(uri)` for related content 44 42 45 43 ## search modes 46 44 47 - - **keyword**: fast exact match (~100ms), supports tag/since filters 45 + - **keyword** (default): fast exact match (~100ms), supports all filters 48 46 - **semantic**: meaning-based (~500ms), good for natural language queries 49 - - **hybrid**: both combined with rank fusion, `source` field shows how each result was found 47 + - **hybrid**: both combined via rank fusion, `source` field shows how each result was found 50 48 51 49 ## result types 52 50 ··· 68 66 - combine filters: `search("python", tag="tutorial", platform="leaflet")` 69 67 - filter by author: `search("python", author="nate.bsky.social")` or `search("", author="did:plc:xyz")` 70 68 - use `since="2025-01-01"` for recent content 71 - - `search_semantic("natural language query")` for meaning-based search 72 - - `search_hybrid("query")` for best of both — results show `source` field 69 + - `search("natural language query", mode="semantic")` for meaning-based search 70 + - `search("query", mode="hybrid")` for best of both — results show `source` field 73 71 - `find_similar(uri)` to discover related documents 74 72 - `get_tags()` to discover available tags 75 73 """ ··· 92 90 return [] 93 91 94 92 93 + Mode = Literal["keyword", "semantic", "hybrid"] 94 + 95 + 95 96 @mcp.tool 96 97 async def search( 97 98 query: str = "", ··· 99 100 platform: Platform | None = None, 100 101 since: str | None = None, 101 102 author: str | None = None, 103 + mode: Mode = "keyword", 102 104 limit: int = 5, 103 105 ) -> list[SearchResult]: 104 - """search documents and publications. 106 + """search long-form writing across ATProto publishing platforms. 107 + 108 + modes: 109 + keyword: fast exact match (~100ms), supports all filters 110 + semantic: meaning-based (~500ms), good for natural language queries 111 + hybrid: both combined via rank fusion — results include a `source` field 105 112 106 113 args: 107 - query: search query (titles and content) 108 - tag: filter by tag 114 + query: search query (titles and content). for semantic/hybrid, natural language works well. 115 + tag: filter by tag (keyword mode only) 109 116 platform: filter by platform (leaflet, pckt, offprint, greengale, whitewind, other) 110 - since: ISO date - only documents created after this date 117 + since: ISO date - only documents created after this date (keyword mode only) 111 118 author: filter by author (DID like "did:plc:xyz" or handle like "nate.bsky.social") 119 + mode: search mode — keyword, semantic, or hybrid (default: keyword) 112 120 limit: max results (default 5, max 40) 113 121 114 122 returns: ··· 128 136 params["since"] = since 129 137 if author: 130 138 params["author"] = author 131 - 132 - async with get_http_client() as client: 133 - response = await client.get("/search", params=params) 134 - response.raise_for_status() 135 - data = response.json() 136 - 137 - results = _extract_results(data) 138 - return [SearchResult(**r) for r in results[:limit]] 139 - 140 - 141 - @mcp.tool 142 - async def search_semantic( 143 - query: str, 144 - platform: Platform | None = None, 145 - author: str | None = None, 146 - limit: int = 5, 147 - ) -> list[SearchResult]: 148 - """semantic search using vector embeddings. 149 - 150 - finds documents by meaning rather than exact keyword match. 151 - good for natural language queries like "essays about loneliness" 152 - or oblique descriptions like "guy from south africa with lots of kids". 153 - 154 - args: 155 - query: natural language query 156 - platform: filter by platform (leaflet, pckt, offprint, greengale, whitewind, other) 157 - author: filter by author (DID like "did:plc:xyz" or handle like "nate.bsky.social") 158 - limit: max results (default 5, max 40) 159 - 160 - returns: 161 - list of results ranked by semantic similarity 162 - """ 163 - params: dict[str, Any] = {"q": query, "mode": "semantic", "format": "v2", "limit": str(limit)} 164 - if platform: 165 - params["platform"] = platform 166 - if author: 167 - params["author"] = author 168 - 169 - async with get_http_client() as client: 170 - response = await client.get("/search", params=params) 171 - response.raise_for_status() 172 - data = response.json() 173 - 174 - if isinstance(data, dict) and "error" in data: 175 - return [] 176 - 177 - results = _extract_results(data) 178 - return [SearchResult(**r) for r in results[:limit]] 179 - 180 - 181 - @mcp.tool 182 - async def search_hybrid( 183 - query: str, 184 - platform: Platform | None = None, 185 - author: str | None = None, 186 - limit: int = 5, 187 - ) -> list[SearchResult]: 188 - """hybrid search combining keyword and semantic results. 189 - 190 - runs both keyword (exact match) and semantic (meaning-based) search, 191 - then merges results using Reciprocal Rank Fusion. documents found by 192 - both methods rank highest. results include a `source` field indicating 193 - how each result was found: "keyword", "semantic", or "keyword+semantic". 194 - 195 - args: 196 - query: search query 197 - platform: filter by platform (leaflet, pckt, offprint, greengale, whitewind, other) 198 - author: filter by author (DID like "did:plc:xyz" or handle like "nate.bsky.social") 199 - limit: max results (default 5, max 40) 200 - 201 - returns: 202 - list of results with source annotations, ranked by combined relevance 203 - """ 204 - params: dict[str, Any] = {"q": query, "mode": "hybrid", "format": "v2", "limit": str(limit)} 205 - if platform: 206 - params["platform"] = platform 207 - if author: 208 - params["author"] = author 139 + if mode != "keyword": 140 + params["mode"] = mode 209 141 210 142 async with get_http_client() as client: 211 143 response = await client.get("/search", params=params)
+1 -1
mcp/tests/test_mcp.py
··· 147 147 tools = await client.list_tools() 148 148 149 149 tool_names = {t.name for t in tools} 150 - expected = {"search", "search_semantic", "search_hybrid", "get_document", "find_similar", "get_tags", "get_stats", "get_popular"} 150 + expected = {"search", "get_document", "find_similar", "get_tags", "get_stats", "get_popular"} 151 151 assert expected == tool_names 152 152 153 153 async def test_list_prompts(self, client):