a love letter to tangled (android, iOS, and a search API)
19
fork

Configure Feed

Select the types of activity you want to include in your feed.

1--- 2title: API Service Reference 3updated: 2026-03-24 4--- 5 6Twister is a Go service that indexes Tangled content and serves a search API. It connects to the AT Protocol ecosystem via Tap (firehose consumer), XRPC (direct record lookups), and Constellation (backlink queries), storing indexed data in Turso/libSQL with FTS5 full-text search. 7 8## Architecture 9 10The service is a single Go binary with multiple subcommands, each running a different runtime mode. All modes share the same database and configuration layer. 11 12**Runtime modes:** 13 14| Command | Purpose | 15| ------------- | ------------------------------------------------------------------- | 16| `api` (serve) | HTTP search API server | 17| `indexer` | Consumes Tap firehose events, normalizes and indexes records | 18| `backfill` | Discovers users from seed files, registers them with Tap | 19| `enrich` | Backfills missing metadata (repo names, handles, web URLs) via XRPC | 20| `reindex` | Re-syncs all documents into the FTS index | 21| `healthcheck` | One-shot liveness probe for container orchestration | 22 23The `embed-worker` and `reembed` commands exist as stubs for the upcoming semantic search pipeline (Nomic Embed Text v1.5 deployed via Railway template). 24 25All commands accept a `--local` flag that switches to a local SQLite file and text-format logging for development. 26 27## HTTP API 28 29The API server binds to `:8080` by default (configurable via `HTTP_BIND_ADDR`). CORS is open (`*` origin, GET/OPTIONS). 30 31### Search 32 33**`GET /search`** — Main search endpoint. Routes to keyword, semantic, or hybrid based on `mode` parameter. 34 35**`GET /search/keyword`** — Full-text search via FTS5 with BM25 scoring. 36 37Parameters: 38 39- `q` (required) — Query string 40- `limit` (1–100, default 20) — Results per page 41- `offset` (default 0) — Pagination offset 42- `collection` — Filter by AT Protocol collection NSID 43- `type` — Filter by record type (repo, issue, pull, profile, string) 44- `author` — Filter by handle or DID 45- `repo` — Filter by repo name or DID 46- `language` — Filter by primary language 47- `from`, `to` — Date range (ISO 8601) 48- `state` — Filter issues/PRs by state (open, closed, merged) 49- `mode` — Search mode (keyword, semantic, hybrid) 50 51Response includes query metadata, total count, and an array of results each containing: ID, collection, record type, title, summary, body snippet (with `<mark>` highlights), score, repo name, author handle, DID, AT-URI, web URL, and timestamps. 52 53**`GET /documents/{id}`** — Fetch a single document by stable ID. 54 55### Health 56 57- **`GET /healthz`** — Liveness probe, always 200 58- **`GET /readyz`** — Readiness probe, pings database 59 60### Admin 61 62When `ENABLE_ADMIN_ENDPOINTS=true` with a configured `ADMIN_AUTH_TOKEN`: 63 64- **`POST /admin/reindex`** — Trigger FTS re-sync 65 66### Static Content 67 68The API also serves a search site with live search and API documentation at `/` and `/docs*`, built with Alpine.js (no build step, embedded in `internal/view/`). 69 70## Database 71 72Turso (libSQL) with the following tables: 73 74**documents** — Core search index. Each record gets a stable ID of `did|collection|rkey`. Stores title, body, summary, metadata (repo name, author handle, web URL, language, tags), and timestamps. Soft-deleted via `deleted_at`. 75 76**documents_fts** — FTS5 virtual table for full-text search over title, body, summary, repo name, author handle, and tags. Uses `unicode61` tokenizer with tuned BM25 weights (title weighted highest at 2.5, then author handle at 2.0, summary at 1.5). 77 78**sync_state** — Cursor tracking for the Tap consumer. Stores consumer name, current cursor, high water mark, and last update time. Enables crash-safe resume. 79 80**identity_handles** — DID-to-handle cache. Updated from Tap identity events and XRPC lookups. 81 82**record_state** — Issue and PR state cache (open/closed/merged). Keyed by subject AT-URI. 83 84**document_embeddings** — Vector storage (768-dim F32_BLOB with DiskANN cosine index). Schema ready but not yet populated. 85 86**embedding_jobs** — Async embedding job queue. Schema ready but worker not yet active. 87 88## Indexing Pipeline 89 90The indexer connects to Tap via WebSocket, consuming AT Protocol record events in real-time. For each event: 91 921. Filter against the configured collection allowlist (supports wildcards like `sh.tangled.*`) 932. Route to the appropriate normalizer based on collection 943. Normalize into a document (extract title, body, summary, metadata) 954. Optionally enrich via XRPC (resolve author handle, repo name, web URL) 965. Upsert into the database (auto-syncs FTS) 976. Advance cursor and acknowledge to Tap 98 99The indexer resumes from its last cursor on restart (no duplicate processing). It logs status every 30 seconds and uses exponential backoff (1s–5s) for transient failures. 100 101## Record Normalizers 102 103Each AT Protocol collection has a dedicated normalizer that extracts searchable content: 104 105| Collection | Record Type | Searchable | Content | 106| ------------------------------- | ------------- | ------------------------ | --------------------------- | 107| `sh.tangled.repo` | repo | Yes (if named) | Name, description, topics | 108| `sh.tangled.repo.issue` | issue | Yes | Title, body, repo reference | 109| `sh.tangled.repo.pull` | pull | Yes | Title, body, target branch | 110| `sh.tangled.repo.issue.comment` | issue_comment | Yes (if has body) | Comment body | 111| `sh.tangled.repo.pull.comment` | pull_comment | Yes (if has body) | Comment body | 112| `sh.tangled.string` | string | Yes (if has content) | Filename, contents | 113| `sh.tangled.actor.profile` | profile | Yes (if has description) | Profile description | 114| `sh.tangled.graph.follow` | follow | No | Graph edge only | 115 116State records (`sh.tangled.repo.issue.state`, `sh.tangled.repo.pull.status`) update the `record_state` table rather than creating documents. 117 118## XRPC Client 119 120The built-in XRPC client provides typed access to AT Protocol endpoints with caching (1-hour TTL for DID docs and repo names): 121 122- DID resolution via PLC Directory (`did:plc:`) or `.well-known/did.json` (`did:web:`) 123- Identity resolution (PDS endpoint + handle from DID document) 124- Record fetching (`com.atproto.repo.getRecord`, `com.atproto.repo.listRecords`) 125- Repo name resolution from `sh.tangled.repo` records 126- Web URL construction for Tangled entities 127 128## Backfill 129 130The backfill command discovers users from a seed file and registers them with Tap for indexing. Discovery fans out via follow graphs and repo collaborators up to a configurable hop depth (default 2). Supports dry-run mode, configurable concurrency and batch sizes, and is idempotent. 131 132## Configuration 133 134All configuration is via environment variables (with `.env` file support): 135 136| Variable | Default | Purpose | 137| -------------------------- | ----------------------- | ---------------------------------------------- | 138| `TURSO_DATABASE_URL` | — | Database connection (required) | 139| `TURSO_AUTH_TOKEN` | — | Auth token (required for remote) | 140| `TAP_URL` | — | Tap WebSocket URL | 141| `TAP_AUTH_PASSWORD` | — | Tap admin password | 142| `INDEXED_COLLECTIONS` | all | Collection allowlist (CSV, supports wildcards) | 143| `HTTP_BIND_ADDR` | `:8080` | API server bind address | 144| `INDEXER_HEALTH_ADDR` | `:9090` | Indexer health probe address | 145| `LOG_LEVEL` | info | debug/info/warn/error | 146| `LOG_FORMAT` | json | json or text | 147| `ENABLE_ADMIN_ENDPOINTS` | false | Enable admin routes | 148| `ADMIN_AUTH_TOKEN` | — | Bearer token for admin | 149| `ENABLE_INGEST_ENRICHMENT` | true | XRPC enrichment at ingest time | 150| `PLC_DIRECTORY_URL` | `https://plc.directory` | PLC Directory | 151| `XRPC_TIMEOUT` | 15s | XRPC HTTP timeout | 152 153## Deployment 154 155Deployed on Railway with three services: 156 157- **api** — HTTP server (port 8080, health at `/healthz`) 158- **indexer** — Tap consumer (health at `:9090/healthz`) 159- **tap** — Tap instance (external dependency) 160 161All services share the same Turso database. The API and indexer are separate deployments of the same binary with different subcommands.