search for standard sites pub-search.waow.tech
search zig blog atproto
11
fork

Configure Feed

Select the types of activity you want to include in your feed.

feat(mcp): add platform and since filters to search tool

- search() now accepts platform (leaflet|pckt|offprint|greengale|other)
- search() now accepts since (ISO date) for filtering recent content
- SearchResult includes platform field from API
- trimmed prompts for fewer tokens
- added docs/api.md with full API reference
- added scripts/test_live.py for testing with FastMCP client

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

zzstoatzz 9020e144 ad99da87

+308 -51
+198
docs/api.md
··· 1 + # API reference 2 + 3 + base URL: `https://leaflet-search-backend.fly.dev` 4 + 5 + ## endpoints 6 + 7 + ### search 8 + 9 + ``` 10 + GET /search?q=<query>&tag=<tag>&platform=<platform>&since=<date> 11 + ``` 12 + 13 + full-text search across documents and publications. 14 + 15 + **parameters:** 16 + | param | type | required | description | 17 + |-------|------|----------|-------------| 18 + | `q` | string | no* | search query (titles and content) | 19 + | `tag` | string | no | filter by tag (documents only) | 20 + | `platform` | string | no | filter by platform: `leaflet`, `pckt`, `offprint`, `greengale`, `other` | 21 + | `since` | string | no | ISO date, filter to documents created after | 22 + 23 + *at least one of `q` or `tag` required 24 + 25 + **response:** 26 + ```json 27 + [ 28 + { 29 + "type": "article|looseleaf|publication", 30 + "uri": "at://did:plc:.../collection/rkey", 31 + "did": "did:plc:...", 32 + "title": "document title", 33 + "snippet": "...matched text...", 34 + "createdAt": "2025-01-15T...", 35 + "rkey": "abc123", 36 + "basePath": "gyst.leaflet.pub", 37 + "platform": "leaflet", 38 + "path": "/001" 39 + } 40 + ] 41 + ``` 42 + 43 + **result types:** 44 + - `article`: document in a publication 45 + - `looseleaf`: standalone document (no publication) 46 + - `publication`: the publication itself (only returned for text queries, not tag/platform filters) 47 + 48 + **ranking:** hybrid BM25 + recency. text relevance primary, recent docs boosted (~1 point per 30 days). 49 + 50 + ### similar 51 + 52 + ``` 53 + GET /similar?uri=<at-uri> 54 + ``` 55 + 56 + find semantically similar documents using vector similarity (voyage-3-lite embeddings). 57 + 58 + **parameters:** 59 + | param | type | required | description | 60 + |-------|------|----------|-------------| 61 + | `uri` | string | yes | AT-URI of source document | 62 + 63 + **response:** same format as search (array of results) 64 + 65 + ### tags 66 + 67 + ``` 68 + GET /tags 69 + ``` 70 + 71 + list all tags with document counts, sorted by popularity. 72 + 73 + **response:** 74 + ```json 75 + [ 76 + {"tag": "programming", "count": 42}, 77 + {"tag": "rust", "count": 15} 78 + ] 79 + ``` 80 + 81 + ### popular 82 + 83 + ``` 84 + GET /popular 85 + ``` 86 + 87 + popular search queries. 88 + 89 + **response:** 90 + ```json 91 + [ 92 + {"query": "rust async", "count": 12}, 93 + {"query": "leaflet", "count": 8} 94 + ] 95 + ``` 96 + 97 + ### platforms 98 + 99 + ``` 100 + GET /platforms 101 + ``` 102 + 103 + document counts by platform. 104 + 105 + **response:** 106 + ```json 107 + [ 108 + {"platform": "leaflet", "count": 2500}, 109 + {"platform": "pckt", "count": 800}, 110 + {"platform": "greengale", "count": 150}, 111 + {"platform": "offprint", "count": 50}, 112 + {"platform": "other", "count": 100} 113 + ] 114 + ``` 115 + 116 + ### stats 117 + 118 + ``` 119 + GET /stats 120 + ``` 121 + 122 + index statistics and request timing. 123 + 124 + **response:** 125 + ```json 126 + { 127 + "documents": 3500, 128 + "publications": 120, 129 + "embeddings": 3200, 130 + "searches": 5000, 131 + "errors": 5, 132 + "cache_hits": 1200, 133 + "cache_misses": 800, 134 + "timing": { 135 + "search": {"count": 1000, "avg_ms": 25, "p50_ms": 20, "p95_ms": 50, "p99_ms": 80, "max_ms": 150}, 136 + "similar": {"count": 200, "avg_ms": 150, "p50_ms": 140, "p95_ms": 200, "p99_ms": 250, "max_ms": 300}, 137 + "tags": {"count": 500, "avg_ms": 5, "p50_ms": 4, "p95_ms": 10, "p99_ms": 15, "max_ms": 25}, 138 + "popular": {"count": 300, "avg_ms": 3, "p50_ms": 2, "p95_ms": 5, "p99_ms": 8, "max_ms": 12} 139 + } 140 + } 141 + ``` 142 + 143 + ### activity 144 + 145 + ``` 146 + GET /activity 147 + ``` 148 + 149 + hourly activity counts (last 24 hours). 150 + 151 + **response:** 152 + ```json 153 + [12, 8, 5, 3, 2, 1, 0, 0, 1, 5, 15, 25, 30, 28, 22, 18, 20, 25, 30, 35, 28, 20, 15, 10] 154 + ``` 155 + 156 + ### dashboard 157 + 158 + ``` 159 + GET /api/dashboard 160 + ``` 161 + 162 + rich dashboard data for analytics UI. 163 + 164 + **response:** 165 + ```json 166 + { 167 + "startedAt": 1705000000, 168 + "searches": 5000, 169 + "publications": 120, 170 + "documents": 3500, 171 + "platforms": [{"platform": "leaflet", "count": 2500}], 172 + "tags": [{"tag": "programming", "count": 42}], 173 + "timeline": [{"date": "2025-01-15", "count": 25}], 174 + "topPubs": [{"name": "gyst", "basePath": "gyst.leaflet.pub", "count": 150}], 175 + "timing": {...} 176 + } 177 + ``` 178 + 179 + ### health 180 + 181 + ``` 182 + GET /health 183 + ``` 184 + 185 + **response:** 186 + ```json 187 + {"status": "ok"} 188 + ``` 189 + 190 + ## building URLs 191 + 192 + documents can be accessed on the web via their `basePath` and `rkey`: 193 + - articles: `https://{basePath}/{rkey}` or `https://{basePath}{path}` if path is set 194 + - publications: `https://{basePath}` 195 + 196 + examples: 197 + - `https://gyst.leaflet.pub/3ldasifz7bs2l` 198 + - `https://greengale.app/3fz.org/001`
+67
mcp/scripts/test_live.py
··· 1 + #!/usr/bin/env python3 2 + """Test the pub-search MCP server.""" 3 + 4 + import asyncio 5 + import sys 6 + 7 + from fastmcp import Client 8 + from fastmcp.client.transports import FastMCPTransport 9 + 10 + from pub_search.server import mcp 11 + 12 + 13 + async def main(): 14 + # use local transport for testing, or live URL if --live flag 15 + if "--live" in sys.argv: 16 + print("testing against live Horizon server...") 17 + client = Client("https://pub-search-by-zzstoatzz.fastmcp.app/mcp") 18 + else: 19 + print("testing locally with FastMCPTransport...") 20 + client = Client(transport=FastMCPTransport(mcp)) 21 + 22 + async with client: 23 + # list tools 24 + print("=== tools ===") 25 + tools = await client.list_tools() 26 + for t in tools: 27 + print(f" {t.name}") 28 + 29 + # test search with new platform filter 30 + print("\n=== search(query='zig', platform='leaflet', limit=3) ===") 31 + result = await client.call_tool( 32 + "search", {"query": "zig", "platform": "leaflet", "limit": 3} 33 + ) 34 + for item in result.content: 35 + print(f" {item.text[:200]}...") 36 + 37 + # test search with since filter 38 + print("\n=== search(query='python', since='2025-01-01', limit=2) ===") 39 + result = await client.call_tool( 40 + "search", {"query": "python", "since": "2025-01-01", "limit": 2} 41 + ) 42 + for item in result.content: 43 + print(f" {item.text[:200]}...") 44 + 45 + # test get_tags 46 + print("\n=== get_tags() ===") 47 + result = await client.call_tool("get_tags", {}) 48 + for item in result.content: 49 + print(f" {item.text[:150]}...") 50 + 51 + # test get_stats 52 + print("\n=== get_stats() ===") 53 + result = await client.call_tool("get_stats", {}) 54 + for item in result.content: 55 + print(f" {item.text}") 56 + 57 + # test get_popular 58 + print("\n=== get_popular(limit=3) ===") 59 + result = await client.call_tool("get_popular", {"limit": 3}) 60 + for item in result.content: 61 + print(f" {item.text[:100]}...") 62 + 63 + print("\n=== all tests passed ===") 64 + 65 + 66 + if __name__ == "__main__": 67 + asyncio.run(main())
+1
mcp/src/pub_search/_types.py
··· 16 16 createdAt: str = "" 17 17 rkey: str 18 18 basePath: str = "" 19 + platform: Literal["leaflet", "pckt", "offprint", "greengale", "other"] = "leaflet" 19 20 20 21 @computed_field 21 22 @property
+38 -50
mcp/src/pub_search/server.py
··· 2 2 3 3 from __future__ import annotations 4 4 5 - from typing import Any 5 + from typing import Any, Literal 6 6 7 7 from fastmcp import FastMCP 8 8 ··· 21 21 def usage_guide() -> str: 22 22 """instructions for using pub-search MCP tools.""" 23 23 return """\ 24 - # pub-search MCP usage guide 24 + # pub-search MCP 25 25 26 - search documents across ATProto publishing platforms including Leaflet, pckt, and others. 26 + search ATProto publishing platforms: leaflet, pckt, offprint, greengale. 27 27 28 - ## core tools 28 + ## tools 29 29 30 - - `search(query, tag)` - search documents and publications by text or tag 31 - - `get_document(uri)` - get the full content of a document by its AT-URI 32 - - `find_similar(uri)` - find documents similar to a given document 33 - - `get_tags()` - list all available tags with document counts 34 - - `get_stats()` - get index statistics (document/publication counts) 35 - - `get_popular()` - see popular search queries 30 + - `search(query, tag, platform, since)` - full-text search with filters 31 + - `get_document(uri)` - fetch full content by AT-URI 32 + - `find_similar(uri)` - semantic similarity search 33 + - `get_tags()` - available tags 34 + - `get_stats()` - index statistics 35 + - `get_popular()` - popular queries 36 36 37 - ## workflow for research 37 + ## workflow 38 38 39 - 1. use `search("your topic")` to find relevant documents 40 - 2. use `get_document(uri)` to retrieve full content of interesting results 41 - 3. use `find_similar(uri)` to discover related content 39 + 1. `search("topic")` or `search("topic", platform="leaflet")` 40 + 2. `get_document(uri)` for full text 41 + 3. `find_similar(uri)` for related content 42 42 43 43 ## result types 44 44 45 - search returns three types of results: 46 - - **publication**: a collection of articles (like a blog or magazine) 47 - - **article**: a document that belongs to a publication 48 - - **looseleaf**: a standalone document not part of a publication 49 - 50 - ## AT-URIs 45 + - **article**: document in a publication 46 + - **looseleaf**: standalone document 47 + - **publication**: the publication itself 51 48 52 - documents are identified by AT-URIs like: 53 - `at://did:plc:abc123/pub.leaflet.document/xyz789` 54 - 55 - browse the web UI at pub-search.waow.tech 49 + results include a `url` field for web access. 56 50 """ 57 51 58 52 ··· 62 56 return """\ 63 57 # search tips 64 58 65 - ## text search 66 - - searches both document titles and content 67 - - uses FTS5 full-text search with prefix matching 68 - - the last word gets prefix matching: "cat dog" matches "cat dogs" 69 - 70 - ## tag filtering 71 - - combine text search with tag filter: `search("python", tag="programming")` 72 - - use `get_tags()` to discover available tags 73 - - tags are only applied to documents, not publications 74 - 75 - ## finding related content 76 - - after finding an interesting document, use `find_similar(uri)` 77 - - similarity is based on semantic embeddings (voyage-3-lite) 78 - - great for exploring related topics 79 - 80 - ## browsing by popularity 81 - - use `get_popular()` to see what others are searching for 82 - - can inspire new research directions 59 + - prefix matching on last word: "cat dog" matches "cat dogs" 60 + - combine filters: `search("python", tag="tutorial", platform="leaflet")` 61 + - use `since="2025-01-01"` for recent content 62 + - `find_similar(uri)` for semantic similarity (voyage-3-lite embeddings) 63 + - `get_tags()` to discover available tags 83 64 """ 84 65 85 66 ··· 88 69 # ----------------------------------------------------------------------------- 89 70 90 71 72 + Platform = Literal["leaflet", "pckt", "offprint", "greengale", "other"] 73 + 74 + 91 75 @mcp.tool 92 76 async def search( 93 77 query: str = "", 94 78 tag: str | None = None, 79 + platform: Platform | None = None, 80 + since: str | None = None, 95 81 limit: int = 5, 96 82 ) -> list[SearchResult]: 97 83 """search documents and publications. 98 84 99 - searches the full text of documents (titles and content) and publications. 100 - results include a snippet showing where the match was found. 101 - 102 85 args: 103 - query: search query (searches titles and content) 104 - tag: optional tag to filter by (only applies to documents) 105 - limit: max results to return (default 5, max 40) 86 + query: search query (titles and content) 87 + tag: filter by tag 88 + platform: filter by platform (leaflet, pckt, offprint, greengale, other) 89 + since: ISO date - only documents created after this date 90 + limit: max results (default 5, max 40) 106 91 107 92 returns: 108 - list of search results with uri, title, snippet, and metadata 93 + list of results with uri, title, snippet, platform, and web url 109 94 """ 110 95 if not query and not tag: 111 96 return [] ··· 115 100 params["q"] = query 116 101 if tag: 117 102 params["tag"] = tag 103 + if platform: 104 + params["platform"] = platform 105 + if since: 106 + params["since"] = since 118 107 119 108 async with get_http_client() as client: 120 109 response = await client.get("/search", params=params) 121 110 response.raise_for_status() 122 111 results = response.json() 123 112 124 - # apply client-side limit since API returns up to 40 125 113 return [SearchResult(**r) for r in results[:limit]] 126 114 127 115
+4 -1
mcp/tests/test_mcp.py
··· 23 23 snippet="this is a test...", 24 24 createdAt="2025-01-01T00:00:00Z", 25 25 rkey="123", 26 - basePath="/blog", 26 + basePath="gyst.leaflet.pub", 27 + platform="leaflet", 27 28 ) 28 29 assert r.type == "article" 29 30 assert r.uri == "at://did:plc:abc/pub.leaflet.document/123" 30 31 assert r.title == "test article" 32 + assert r.platform == "leaflet" 33 + assert r.url == "https://gyst.leaflet.pub/123" 31 34 32 35 def test_search_result_looseleaf(self): 33 36 """SearchResult supports looseleaf type."""