···11-# Twisted Monorepo
11+# Twisted
2233-- `apps/twisted`: Ionic/Vue client
44-- `packages/api`: Go API copied from `~/Projects/TWISTER`
33+Twisted is a monorepo for a Tangled mobile client and the supporting Tap-backed indexing API.
44+55+## Projects
66+77+- `apps/twisted`: Ionic Vue client for browsing Tangled repos, profiles, issues, PRs, and indexed search results
88+- `packages/api`: Go service that consumes Tangled records through Tap, fills gaps in the public Tangled API, and serves search
99+- `docs`: top-level specs and plans, split by project under `docs/app` and `docs/api`
1010+1111+## Architecture
1212+1313+The app still uses Tangled's public knot and PDS APIs for canonical repo and profile data. The API project adds two complementary capabilities:
1414+1515+1. Global search over indexed Tangled content
1616+2. Index-backed summaries for data that is hard to derive from the public API alone, such as followers
1717+1818+That keeps direct browsing honest while giving the client one place to ask for cross-network discovery and graph augmentation.
519620## Development
72188-Use the top-level `justfile` for common tasks:
2222+Use the top-level [`justfile`](justfile) for common workflows:
9231024```bash
1125just dev
1226just build
1327just test
2828+just api-run-api
1429```
15301616-The existing client package still works directly from `apps/twisted`.
3131+To enable indexed search in the client, set `VITE_TWISTER_API_BASE_URL` in `apps/twisted/.env`.
3232+3333+## Infrastructure Setup
3434+3535+### Turso
3636+3737+Use one Turso database per environment, for example:
3838+3939+- `twister-dev`
4040+- `twister-prod`
4141+4242+Do not introduce separate app variable names for dev and prod. Always use the same variables:
4343+4444+- `TURSO_DATABASE_URL`
4545+- `TURSO_AUTH_TOKEN`
4646+4747+Only the values change per environment.
4848+4949+Example:
5050+5151+```bash
5252+# Development
5353+TURSO_DATABASE_URL=libsql://twister-dev-your-org.turso.io
5454+TURSO_AUTH_TOKEN=...
5555+5656+# Production
5757+TURSO_DATABASE_URL=libsql://twister-prod-your-org.turso.io
5858+TURSO_AUTH_TOKEN=...
5959+```
6060+6161+### Railway
6262+6363+Create or reuse one Railway project containing:
6464+6565+- existing `tap`
6666+- `api` running `twister api`
6767+- `indexer` running `twister indexer`
6868+6969+Set these shared variables on the Railway services:
7070+7171+- `TURSO_DATABASE_URL`
7272+- `TURSO_AUTH_TOKEN`
7373+- `LOG_LEVEL`
7474+- `LOG_FORMAT`
7575+7676+Set these API-specific variables:
7777+7878+- `HTTP_BIND_ADDR`
7979+- `SEARCH_DEFAULT_LIMIT`
8080+- `SEARCH_MAX_LIMIT`
8181+8282+Set these indexer-specific variables:
8383+8484+- `TAP_URL`
8585+- `TAP_AUTH_PASSWORD`
8686+- `INDEXED_COLLECTIONS`
8787+8888+If you use separate Railway environments for dev and prod, keep the same variable names in both and only swap the Turso values.
8989+9090+### First Bootstrap
9191+9292+For a brand-new environment:
9393+9494+1. Point `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` at the target database.
9595+2. Deploy `api` and `indexer` on Railway.
9696+3. Verify API readiness and indexer health.
9797+4. Run `twister backfill` with your seed file.
9898+5. Treat the environment as search-ready only after historical backfill completes.
9999+100100+## Docs
101101+102102+- Index: [`docs/README.md`](docs/README.md)
···11-# Twisted
22-33-A mobile client for [Tangled](https://tangled.org).
44-55-## Development
11+# Twisted App
6277-Run the mobile apps with Capacitor:
88-99-```bash
1010-pnpm cap run ios
1111-pnpm cap run android
1212-```
1313-1414-Or to test the web version:
1515-1616-```bash
1717-pnpm dev
1818-```
33+Ionic Vue client for the Twisted monorepo.
···11-# Phase 3 — Deferred Search and Activity
22-33-## Goal
44-55-Preserve honest product boundaries before search is implemented as a separate project. Public browsing continues through known AT Protocol handles on Home, while Explore and Activity stay visible as clearly labeled in-progress placeholders.
66-77-## Current Product Shape
88-99-### Home
1010-1111-Home is the temporary public entry point for unauthenticated browsing:
1212-1313-- Enter a known AT Protocol handle
1414-- Open that user's profile directly
1515-- Resolve the handle to DID + PDS via AT Protocol identity
1616-- List that user's public Tangled repos inline and open one directly
1717-1818-This keeps public browsing fully real without implying that global discovery already exists.
1919-2020-### Explore
2121-2222-Explore remains a tab-level placeholder:
2323-2424-- No global repo search
2525-- No global user search
2626-- No curated fallback discovery pretending to be search
2727-- Empty state should explicitly say search is in progress
2828-2929-### Activity
3030-3131-Activity also remains a tab-level placeholder:
3232-3333-- No public timeline yet
3434-- No curated public feed fallback
3535-- Empty state should explicitly say activity is in progress
3636-3737-## Identity and Routing
3838-3939-The Home handle flow continues to use the existing AT Protocol resolution path:
4040-4141-1. Resolve `handle -> DID` via `com.atproto.identity.resolveHandle`
4242-2. Fetch the DID document and extract the PDS endpoint
4343-3. Query the user's PDS for `sh.tangled.repo` records via `com.atproto.repo.listRecords`
4444-4. Route to existing profile and repo detail screens
4545-4646-No backend search index, feed service, or additional dependency is introduced in this phase.
4747-4848-## UI Expectations
4949-5050-- Home shows one handle input plus explicit actions for profile jump and repo browsing
5151-- Home shows loading, invalid-handle, no-repos, and resolved-repo-list states
5252-- Explore shows a static in-progress empty state
5353-- Activity shows a static in-progress empty state
5454-- Profile remains unchanged
5555-5656-## Deferred Work
5757-5858-The following work is intentionally deferred out of this phase:
5959-6060-- Search indexing and ranking
6161-- Search result UI and recent searches
6262-- Trending or suggested discovery sections
6363-- Public activity feed ingestion, pagination, and caching
6464-- Jetstream or appview timeline investigation
6565-6666-These capabilities will be revisited when search and feed work are scheduled independently.
···11-# Phase 6 — Write Features & Backend Adapter
11+# Phase 6 — Write Features & Project Services
2233## Goal
4455-Add authenticated write operations (create issues, comment on PRs/issues, edit profile) and introduce a thin backend (BFF) for operations that don't work well from a pure SPA.
55+Add authenticated write operations (create issues, comment on PRs/issues, edit profile) and extend the Twister project services only where the client should not or cannot do the work directly.
6677-## Why a Backend
77+## Why Project Services
8899Some operations are awkward or unsafe from a browser client:
1010···1212- **Unstable procedures**: Tangled's API may change — a backend adapter isolates the mobile client from churn
1313- **Push notifications**: require server-side registration and delivery
1414- **Personalized feeds**: server-side aggregation is more efficient than client-side filtering
1515-- **Search**: if no public JSON search API exists, the backend can index and serve search results
1515+- **Graph gaps**: follower lists/counts and other cross-network summaries may require index-backed derivation
1616- **Rate limiting**: backend can batch and deduplicate requests
17171818-### BFF Scope
1818+### Service Scope
19192020-Thin adapter — not a full backend. Proxies and transforms Tangled/AT Protocol calls.
2020+Thin service layer — not a replacement for Tangled's public APIs. Use it for cross-network aggregation, search, notifications, and operations the SPA should not own.
21212222| Endpoint | Purpose |
2323| ------------------------------------ | --------------------------------------------------- |
2424| `POST /auth/session` | OAuth token exchange and session management |
2525| `GET /feed/personalized` | Pre-filtered activity feed for the user |
2626-| `GET /search/repos`, `/search/users` | Search proxy/index |
2626+| `GET /search`, `GET /profiles/:did/summary` | Search and index-backed graph/profile summaries |
2727| `POST /notifications/register` | Push notification device registration |
2828| Passthrough for stable XRPC calls | Avoid duplicating what the client already does well |
2929···56565757## Push Notifications
58585959-- Register device token with BFF
6060-- BFF subscribes to Jetstream for the user's relevant events
5959+- Register device token with project services
6060+- Project services subscribe to Jetstream or indexed events relevant to the user
6161- Deliver via APNs (iOS) / FCM (Android)
6262- Notification types: PR activity on your repos, issue comments, new followers, stars
6363
···4444- "Team" — activity from users I follow
4545- Custom filters: by repo, by user, by event type
46464747-Feeds are stored locally in IndexedDB. If a BFF exists, they can optionally sync server-side for push notification filtering.
4747+Feeds are stored locally in IndexedDB. If project services exist, they can optionally sync server-side for push notification filtering.
48484949## Advanced Features
5050
···1616 <EmptyState
1717 :icon="pulseOutline"
1818 title="Activity is in progress"
1919- message="The public activity feed is being rebuilt separately. This tab stays as a placeholder until that work is ready." />
1919+ message="The public activity feed is still in progress. This tab stays as a placeholder until the indexed feed work is ready." />
2020 </ion-content>
2121 </ion-page>
2222</template>
···4545 </ion-button>
4646 </div>
47474848- <p class="hint-copy">Repo browsing is temporary here until search ships in a separate project.</p>
4848+ <p class="hint-copy">
4949+ Home is still the fastest way to jump to a known handle directly.
5050+ </p>
4951 </section>
50525153 <section v-if="hasAttemptedBrowse" class="results-section">
···11+# Twisted Documentation
22+33+Documentation is organized by project:
44+55+- [`app/`](app/) for the Ionic/Vue client
66+- [`api/`](api/) for the Go Tap/index/search service
77+88+## Quick Links
99+1010+- App spec index: [`app/specs/README.md`](app/specs/README.md)
1111+- App task index: [`app/tasks/phase-6.md`](app/tasks/phase-6.md)
1212+- API spec index: [`api/specs/README.md`](api/specs/README.md)
1313+- API task index: [`api/tasks/README.md`](api/tasks/README.md)
+300
docs/api/specs/05-search.md
···11+---
22+title: "Spec 05 — Search"
33+updated: 2026-03-22
44+---
55+66+Covers all search modes, the public search API contract, scoring, and filtering.
77+88+## 1. Search Modes
99+1010+| Mode | Backing | Available |
1111+| ---------- | ------------------------------------ | --------- |
1212+| `keyword` | Turso Tantivy-backed FTS | MVP |
1313+| `semantic` | Vector similarity (DiskANN index) | Phase 2 |
1414+| `hybrid` | Weighted merge of keyword + semantic | Phase 3 |
1515+1616+## 2. Keyword Search
1717+1818+### Implementation
1919+2020+Uses Turso's `fts_score()` function for BM25 ranking:
2121+2222+```sql
2323+SELECT
2424+ d.id, d.title, d.summary, d.repo_name, d.author_handle,
2525+ d.collection, d.record_type, d.updated_at,
2626+ fts_score(d.title, d.body, d.summary, d.repo_name, d.author_handle, d.tags_json, ?) AS score
2727+FROM documents d
2828+WHERE fts_match(d.title, d.body, d.summary, d.repo_name, d.author_handle, d.tags_json, ?)
2929+ AND d.deleted_at IS NULL
3030+ORDER BY score DESC
3131+LIMIT ? OFFSET ?;
3232+```
3333+3434+### Field Weights
3535+3636+Configured in the FTS index definition:
3737+3838+| Field | Weight | Rationale |
3939+| --------------- | ------ | ------------------------------------ |
4040+| `title` | 3.0 | Highest signal for relevance |
4141+| `repo_name` | 2.5 | Exact repo lookups should rank first |
4242+| `author_handle` | 2.0 | Author search is common |
4343+| `summary` | 1.5 | More focused than body |
4444+| `tags_json` | 1.2 | Topic matching |
4545+| `body` | 1.0 | Baseline |
4646+4747+### Query Features
4848+4949+Tantivy query syntax is exposed to users:
5050+5151+- Boolean: `go AND search`, `rust NOT unsafe`
5252+- Phrase: `"pull request"`
5353+- Prefix: `tang*`
5454+- Field-specific: `title:parser`
5555+5656+### Snippets
5757+5858+Use `fts_highlight()` to generate highlighted snippets:
5959+6060+```sql
6161+fts_highlight(d.body, '<mark>', '</mark>', ?) AS body_snippet
6262+```
6363+6464+## 3. Semantic Search
6565+6666+### Query Flow
6767+6868+1. Convert user query text to embedding via the configured provider
6969+2. Query `vector_top_k` for nearest neighbors
7070+3. Join back to `documents` to get metadata
7171+4. Filter out deleted/hidden documents
7272+5. Return results with distance as score
7373+7474+```sql
7575+SELECT d.id, d.title, d.summary, d.repo_name, d.author_handle,
7676+ d.collection, d.record_type, d.updated_at
7777+FROM vector_top_k('idx_embeddings_vec', vector32(?), ?) AS v
7878+JOIN document_embeddings e ON e.rowid = v.id
7979+JOIN documents d ON d.id = e.document_id
8080+WHERE d.deleted_at IS NULL;
8181+```
8282+8383+### Score Normalization
8484+8585+Cosine distance ranges from 0 (identical) to 2 (opposite). Normalize to a 0–1 relevance score:
8686+8787+```text
8888+semantic_score = 1.0 - (distance / 2.0)
8989+```
9090+9191+## 4. Hybrid Search
9292+9393+### v1: Weighted Score Blending
9494+9595+```text
9696+hybrid_score = 0.65 * keyword_score_normalized + 0.35 * semantic_score_normalized
9797+```
9898+9999+### Score Normalization for Blending
100100+101101+Keyword (BM25) scores are unbounded. Normalize using min-max within the result set:
102102+103103+```text
104104+keyword_normalized = (score - min_score) / (max_score - min_score)
105105+```
106106+107107+Semantic scores are already bounded after the distance-to-relevance conversion.
108108+109109+### Merge Strategy
110110+111111+1. Fetch top N keyword results (e.g., N=50)
112112+2. Fetch top N semantic results
113113+3. Merge on `document_id`
114114+4. For documents appearing in both sets, combine scores
115115+5. For documents in only one set, use that score (with 0 for the missing signal)
116116+6. Sort by `hybrid_score` descending
117117+7. Deduplicate
118118+8. Apply limit/offset
119119+120120+### v2: Reciprocal Rank Fusion (future)
121121+122122+If keyword and semantic score scales prove unstable under weighted blending, replace with RRF:
123123+124124+```text
125125+rrf_score = Σ 1 / (k + rank_i)
126126+```
127127+128128+where `k` is a constant (typically 60) and `rank_i` is the document's rank in each result list.
129129+130130+## 5. Filtering
131131+132132+All search modes support these filters, applied as SQL WHERE clauses:
133133+134134+| Filter | Parameter | SQL |
135135+| ----------- | ------------ | ------------------------------------------- |
136136+| Collection | `collection` | `d.collection = ?` |
137137+| Author | `author` | `d.author_handle = ?` or `d.did = ?` |
138138+| Repo | `repo` | `d.repo_name = ?` or `d.repo_did = ?` |
139139+| Record type | `type` | `d.record_type = ?` |
140140+| Language | `language` | `d.language = ?` |
141141+| Date range | `from`, `to` | `d.created_at >= ?` and `d.created_at <= ?` |
142142+| State | `state` | Join to `record_state` table |
143143+144144+## 6. Embedding Eligibility
145145+146146+A document is eligible for embedding if:
147147+148148+- `deleted_at IS NULL`
149149+- `record_type` is one of: `repo`, `issue`, `pull`, `string`, `profile`
150150+- At least one of `title`, `body`, or `summary` is non-empty
151151+- Total text length exceeds a minimum threshold (e.g., 20 characters)
152152+153153+## 7. API Endpoints
154154+155155+### Health
156156+157157+| Method | Path | Description |
158158+| ------ | ---------- | -------------------------------- |
159159+| GET | `/healthz` | Liveness — process is responsive |
160160+| GET | `/readyz` | Readiness — DB is reachable |
161161+162162+### Search
163163+164164+| Method | Path | Description |
165165+| ------ | ------------------ | ------------------------------------------------ |
166166+| GET | `/search` | Search with configurable mode (default: keyword) |
167167+| GET | `/search/keyword` | Keyword-only search |
168168+| GET | `/search/semantic` | Semantic-only search |
169169+| GET | `/search/hybrid` | Hybrid search |
170170+171171+### Documents
172172+173173+| Method | Path | Description |
174174+| ------ | ----------------- | ----------------------------- |
175175+| GET | `/documents/{id}` | Fetch a single document by ID |
176176+177177+### Admin
178178+179179+| Method | Path | Description |
180180+| ------ | ---------------- | -------------------- |
181181+| POST | `/admin/reindex` | Trigger reindex |
182182+| POST | `/admin/reembed` | Trigger re-embedding |
183183+184184+Admin endpoints are disabled by default. Enable with `ENABLE_ADMIN_ENDPOINTS=true`.
185185+186186+## 8. Query Parameters
187187+188188+| Parameter | Type | Default | Description |
189189+| ------------ | ------ | --------- | -------------------------------------------------------------------- |
190190+| `q` | string | required | Search query |
191191+| `mode` | string | `keyword` | `keyword`, `semantic`, or `hybrid` |
192192+| `limit` | int | 20 | Results per page (max: `SEARCH_MAX_LIMIT`) |
193193+| `offset` | int | 0 | Pagination offset |
194194+| `collection` | string | — | Filter by `sh.tangled.*` collection |
195195+| `type` | string | — | Filter by record type (`repo`, `issue`, `pull`, `string`, `profile`) |
196196+| `author` | string | — | Filter by author handle or DID |
197197+| `repo` | string | — | Filter by repo name or repo DID |
198198+| `language` | string | — | Filter by language |
199199+| `from` | string | — | Created after (ISO 8601) |
200200+| `to` | string | — | Created before (ISO 8601) |
201201+| `state` | string | — | Filter by state (`open`, `closed`, `merged`) |
202202+203203+## 9. Search Response
204204+205205+```json
206206+{
207207+ "query": "rust markdown tui",
208208+ "mode": "hybrid",
209209+ "total": 142,
210210+ "limit": 20,
211211+ "offset": 0,
212212+ "results": [
213213+ {
214214+ "id": "did:plc:abc|sh.tangled.repo|3kb3fge5lm32x",
215215+ "collection": "sh.tangled.repo",
216216+ "record_type": "repo",
217217+ "title": "glow-rs",
218218+ "body_snippet": "A TUI markdown viewer inspired by <mark>Glow</mark>...",
219219+ "summary": "Rust TUI markdown viewer",
220220+ "repo_name": "glow-rs",
221221+ "author_handle": "desertthunder.dev",
222222+ "score": 0.842,
223223+ "matched_by": ["keyword", "semantic"],
224224+ "created_at": "2026-03-20T10:00:00Z",
225225+ "updated_at": "2026-03-22T15:03:11Z"
226226+ }
227227+ ]
228228+}
229229+```
230230+231231+### Result Fields
232232+233233+| Field | Type | Description |
234234+| ------------------ | -------- | ------------------------------------------- |
235235+| `id` | string | Document stable ID |
236236+| `collection` | string | ATProto collection NSID |
237237+| `record_type` | string | Normalized type label |
238238+| `title` | string | Document title |
239239+| `body_snippet` | string | Highlighted body excerpt |
240240+| `summary` | string | Short description |
241241+| `repo_name` | string | Repository name (if applicable) |
242242+| `author_handle` | string | Author handle |
243243+| `did` | string | Author DID when available |
244244+| `at_uri` | string | Canonical AT URI when available |
245245+| `primary_language` | string | Primary language for repo results |
246246+| `stars` | number | Indexed star count for repo results |
247247+| `follower_count` | number | Indexed follower count for profile results |
248248+| `following_count` | number | Indexed following count for profile results |
249249+| `score` | float | Relevance score (0–1) |
250250+| `matched_by` | string[] | Which search modes produced this result |
251251+| `created_at` | string | ISO 8601 creation timestamp |
252252+| `updated_at` | string | ISO 8601 last update timestamp |
253253+254254+## 10. Document Response
255255+256256+`GET /documents/{id}` returns the full document:
257257+258258+```json
259259+{
260260+ "id": "did:plc:abc|sh.tangled.repo|3kb3fge5lm32x",
261261+ "did": "did:plc:abc",
262262+ "collection": "sh.tangled.repo",
263263+ "rkey": "3kb3fge5lm32x",
264264+ "at_uri": "at://did:plc:abc/sh.tangled.repo/3kb3fge5lm32x",
265265+ "cid": "bafyreig...",
266266+ "record_type": "repo",
267267+ "title": "glow-rs",
268268+ "body": "A TUI markdown viewer inspired by Glow, written in Rust.",
269269+ "summary": "Rust TUI markdown viewer",
270270+ "repo_name": "glow-rs",
271271+ "author_handle": "desertthunder.dev",
272272+ "tags_json": "[\"rust\", \"tui\", \"markdown\"]",
273273+ "language": "en",
274274+ "created_at": "2026-03-20T10:00:00Z",
275275+ "updated_at": "2026-03-22T15:03:11Z",
276276+ "indexed_at": "2026-03-22T15:05:00Z",
277277+ "has_embedding": true
278278+}
279279+```
280280+281281+## 11. Error Responses
282282+283283+| Status | Condition |
284284+| ------ | ------------------------------------------------------------------ |
285285+| 400 | Missing `q` parameter, invalid `limit`/`offset`, malformed filters |
286286+| 404 | Document not found |
287287+| 503 | DB unreachable (readiness failure) |
288288+289289+```json
290290+{ "error": "invalid_parameter", "message": "limit must be between 1 and 100" }
291291+```
292292+293293+## 12. API Behavior
294294+295295+- `keyword` returns only lexical matches via `fts_match`/`fts_score`
296296+- `semantic` returns only embedding-backed matches via `vector_top_k`
297297+- `hybrid` merges both result sets and reranks
298298+- All modes exclude documents with `deleted_at IS NOT NULL` by default
299299+- Pagination uses `limit`/`offset` (cursor-based pagination deferred)
300300+- Mobile clients may use `type=repo` and `type=profile` to render repo/profile search directly
+89
docs/api/specs/08-app-integration.md
···11+---
22+title: "Spec 08 — App Integration"
33+updated: 2026-03-23
44+---
55+66+## 1. Purpose
77+88+Define the mobile-facing Twister API surface.
99+1010+The Twisted app should keep using Tangled's public knot and PDS APIs for canonical repo/profile detail. Twister is responsible for:
1111+1212+- cross-network discovery via search
1313+- index-backed summaries for data gaps such as followers
1414+1515+## 2. Client Boundary
1616+1717+The mobile client uses Twister only for:
1818+1919+- Explore search
2020+- index-backed profile summaries
2121+- future feed and notification features
2222+2323+The mobile client does not use Twister for:
2424+2525+- repo tree/blob/detail reads
2626+- direct profile record reads
2727+- issue/PR detail reads
2828+2929+Those remain on Tangled's public APIs.
3030+3131+## 3. Search Contract
3232+3333+`GET /search`
3434+3535+Required query parameters:
3636+3737+- `q`
3838+3939+Optional query parameters:
4040+4141+- `mode=keyword|semantic|hybrid`
4242+- `type=repo|profile`
4343+- `limit`
4444+- `offset`
4545+4646+For mobile clients, repo and profile results should include:
4747+4848+- `did`
4949+- `at_uri`
5050+- `record_type`
5151+- `title`
5252+- `summary`
5353+- `repo_name`
5454+- `author_handle`
5555+- `updated_at`
5656+- `primary_language` for repos when known
5757+- `stars` for repos when known
5858+- `follower_count` and `following_count` for profiles when known
5959+6060+## 4. Profile Summary Contract
6161+6262+`GET /profiles/{did}/summary`
6363+6464+Response:
6565+6666+```json
6767+{
6868+ "did": "did:plc:abc123",
6969+ "handle": "desertthunder.dev",
7070+ "follower_count": 128,
7171+ "following_count": 84,
7272+ "indexed_at": "2026-03-23T10:15:00Z"
7373+}
7474+```
7575+7676+This endpoint exists because follower counts and follower lists are derived from indexed graph state, not from a single direct public Tangled API call.
7777+7878+## 5. Failure Handling
7979+8080+If Twister is unavailable:
8181+8282+- the app should keep direct known-handle browsing working
8383+- Explore should show a clear "index unavailable" state
8484+- profile pages should omit index-backed follower counts rather than fail entirely
8585+8686+## 6. Ownership
8787+8888+- Twister owns search ranking, document normalization, and graph summary derivation
8989+- The app owns result presentation, route transitions, and fallback behavior
+40
docs/api/tasks/README.md
···11+---
22+title: "Twister — Task Index"
33+updated: 2026-03-22
44+---
55+66+# Twister Tasks
77+88+Assumes Go, Tap (deployed on Railway), Turso/libSQL, and Railway for deployment.
99+1010+## Delivery Strategy
1111+1212+Build in four phases:
1313+1414+1. **MVP** — ingestion, graph backfill, keyword search, deployment, operational tooling
1515+2. **Semantic Search** — embeddings, vector retrieval
1616+3. **Hybrid Search** — weighted merge of keyword + semantic
1717+4. **Quality Polish** — ranking refinement, advanced filters, analytics
1818+1919+Ship keyword search before embeddings. That gives a testable, inspectable baseline before introducing model behavior.
2020+Within MVP, run graph backfill before calling the environment search-ready for users.
2121+2222+## Phases
2323+2424+| Phase | Title | Document | Status |
2525+| ----- | --------------- | ------------------------------------------ | --------------------------------------------------------------------- |
2626+| 1 | MVP | [phase-1-mvp.md](phase-1-mvp.md) | In progress (M0–M2 complete; backfill scheduled before public launch) |
2727+| 2 | Semantic Search | [phase-2-semantic.md](phase-2-semantic.md) | Not started |
2828+| 3 | Hybrid Search | [phase-3-hybrid.md](phase-3-hybrid.md) | Not started |
2929+| 4 | Quality Polish | [phase-4-quality.md](phase-4-quality.md) | Not started |
3030+3131+## MVP Complete When
3232+3333+- Tap ingests tracked `sh.tangled.*` records
3434+- Documents normalize into a stable store
3535+- Keyword search works publicly
3636+- Index-backed profile summaries can fill public API gaps such as followers
3737+- API and indexer are deployed on Railway
3838+- Restart does not lose sync position
3939+- Reindex exists for repair
4040+- Graph backfill populates initial content from seed users
+68
docs/app/specs/phase-3.md
···11+# Phase 3 — Indexed Search and Honest Discovery
22+33+## Goal
44+55+Introduce global discovery through the Twister project index while preserving honest product boundaries. Home continues to support direct known-handle browsing, Explore becomes index-backed search, and Activity remains a clearly labeled in-progress surface.
66+77+## Current Product Shape
88+99+### Home
1010+1111+Home is the temporary public entry point for unauthenticated browsing:
1212+1313+- Enter a known AT Protocol handle
1414+- Open that user's profile directly
1515+- Resolve the handle to DID + PDS via AT Protocol identity
1616+- List that user's public Tangled repos inline and open one directly
1717+1818+This keeps public browsing fully real while still giving the app a lightweight direct-entry path.
1919+2020+### Explore
2121+2222+Explore becomes the network-level discovery surface:
2323+2424+- Global repo search via the Twister index
2525+- Global profile search via the Twister index
2626+- Empty state should clearly distinguish "index unavailable" from "no results"
2727+- Search results route into the existing profile and repo detail screens
2828+2929+### Activity
3030+3131+Activity also remains a tab-level placeholder:
3232+3333+- No public timeline yet
3434+- No curated public feed fallback
3535+- Empty state should explicitly say activity is in progress
3636+3737+## Identity and Routing
3838+3939+The app now uses two read paths:
4040+4141+1. **Direct handle browsing**
4242+ Resolve `handle -> DID` via `com.atproto.identity.resolveHandle`
4343+ Fetch the DID document and extract the PDS endpoint
4444+ Query the user's PDS for `sh.tangled.repo` records via `com.atproto.repo.listRecords`
4545+2. **Indexed discovery**
4646+ Query the Twister API for global search results
4747+ Open the selected profile or repo in the existing screens
4848+ Continue detail fetching from Tangled's public APIs
4949+5050+The Twister API is additive, not authoritative for repo detail. It fills discovery and graph gaps; knots and PDSes remain the source of truth for detail screens.
5151+5252+## UI Expectations
5353+5454+- Home shows one handle input plus explicit actions for profile jump and repo browsing
5555+- Home shows loading, invalid-handle, no-repos, and resolved-repo-list states
5656+- Explore shows a working search form, loading state, index-unavailable state, and no-results state
5757+- Activity shows a static in-progress empty state
5858+- Profile may show index-backed follower/following summaries when available
5959+6060+## Deferred Work
6161+6262+The following work is intentionally deferred out of this phase:
6363+6464+- Trending or suggested discovery sections
6565+- Public activity feed ingestion, pagination, and caching
6666+- Jetstream or appview timeline investigation
6767+6868+These capabilities will be revisited after the baseline search and graph-summary integration is stable.
+1-1
justfile
···4141api-build:
4242 just --justfile packages/api/justfile build
43434444-api-run-api:
4444+api-dev:
4545 just --justfile packages/api/justfile run-api
46464747api-run-indexer:
+1-1
packages/api/README.md
···11# Twister
2233-Tap-based search engine for Tangled.
33+Tap-based indexing and search service for Tangled.
···7788Build a Go-based search service for Tangled content on AT Protocol that:
991010-* ingests Tangled records through **Tap** (already deployed on Railway)
1111-* denormalizes them into internal search documents
1212-* indexes them in **Turso/libSQL**
1313-* exposes a search API with **keyword**, **semantic**, and **hybrid** retrieval modes
1010+- ingests Tangled records through **Tap** (already deployed on Railway)
1111+- denormalizes them into internal search documents
1212+- indexes them in **Turso/libSQL**
1313+- exposes a search API with **keyword**, **semantic**, and **hybrid** retrieval modes
1414+- exposes index-backed summary APIs for data the public Tangled APIs do not answer efficiently, such as followers
14151516## 2. Functional Goals
16171718The system shall:
18191919-* index Tangled-specific ATProto collections under the `sh.tangled.*` namespace
2020-* support initial backfill and continuous incremental sync via Tap
2121-* support lexical retrieval using Turso's Tantivy-backed FTS
2222-* support semantic retrieval using vector embeddings
2323-* support hybrid ranking combining lexical and semantic signals
2424-* expose stable HTTP APIs for search and document lookup
2525-* support deployment on **Railway**
2020+- index Tangled-specific ATProto collections under the `sh.tangled.*` namespace
2121+- support initial backfill and continuous incremental sync via Tap
2222+- support lexical retrieval using Turso's Tantivy-backed FTS
2323+- support semantic retrieval using vector embeddings
2424+- support hybrid ranking combining lexical and semantic signals
2525+- expose stable HTTP APIs for search, document lookup, and graph/profile summaries
2626+- support deployment on **Railway**
26272728## 3. Non-Functional Goals
28292930The system shall prioritize:
30313131-* **correctness of sync** — cursors never advance ahead of committed data
3232-* **operational simplicity** — single binary, subcommand-driven
3333-* **incremental delivery** — keyword search ships before embeddings
3434-* **small deployable services** — process groups, not microservices
3535-* **reindexability** — any document or collection can be re-normalized and re-indexed
3636-* **low coupling** — sync, indexing, and serving are independent concerns
3232+- **correctness of sync** — cursors never advance ahead of committed data
3333+- **operational simplicity** — single binary, subcommand-driven
3434+- **incremental delivery** — keyword search ships before embeddings
3535+- **small deployable services** — process groups, not microservices
3636+- **reindexability** — any document or collection can be re-normalized and re-indexed
3737+- **low coupling** — sync, indexing, and serving are independent concerns
37383839## 4. Out of Scope (v1)
39404040-* code-aware symbol search
4141-* sourcegraph-style structural search
4242-* personalized ranking
4343-* access control beyond public/private visibility flags in indexed records
4444-* full analytics pipeline
4545-* custom ANN infrastructure outside Turso/libSQL
4141+- code-aware symbol search
4242+- sourcegraph-style structural search
4343+- personalized ranking
4444+- access control beyond public/private visibility flags in indexed records
4545+- full analytics pipeline
4646+- custom ANN infrastructure outside Turso/libSQL
46474748## 5. Design Principles
4849···505151522. **The indexer owns denormalization.** Raw ATProto records are never queried directly by the public API.
52535353-3. **Search serves denormalized documents.** Search ranking depends on the document model, not transport.
5454+3. **The public API serves denormalized projections.** Search ranking and graph summaries depend on the indexed document model, not transport.
545555564. **Keyword search is the baseline.** Semantic and hybrid search are layered on top.
565757585. **Embeddings are asynchronous.** Ingestion is never blocked on vector generation unless explicitly configured.
5959+6060+6. **Twister complements public Tangled APIs.** Repo detail stays on knots/PDSes; the index adds discovery and cross-network summaries.
58615962## 6. External Systems
6063···9396 ├─ keyword search (fts_match / fts_score)
9497 ├─ semantic search (vector_top_k)
9598 ├─ hybrid search (weighted merge)
9999+ ├─ profile and graph summaries
96100 └─ document fetch
97101```
9810299103## 8. Runtime Units
100104101101-| Unit | Role | Deployment |
102102-| -------------- | ----------------------------------- | ------------------------------- |
103103-| `api` | HTTP search and document API | Railway service (public) |
104104-| `indexer` | Tap consumer, normalizer, DB writer | Railway service (internal) |
105105-| `embed-worker` | Async embedding generation | Optional Railway service |
106106-| `tap` | ATProto sync | Railway (already deployed) |
105105+| Unit | Role | Deployment |
106106+| -------------- | -------------------------------------------- | -------------------------- |
107107+| `api` | HTTP search, graph summary, and document API | Railway service (public) |
108108+| `indexer` | Tap consumer, normalizer, DB writer | Railway service (internal) |
109109+| `embed-worker` | Async embedding generation | Optional Railway service |
110110+| `tap` | ATProto sync | Railway (already deployed) |
107111108112## 9. Repository Structure
109113
···11---
22title: "Spec 06 — Operations"
33-updated: 2026-03-22
33+updated: 2026-03-23
44---
5566Covers configuration, observability, security, and deployment.
77+88+## 0. Quick Setup
99+1010+Tap is already deployed. For a new environment, the minimum operator work is:
1111+1212+1. Create or choose a Turso database for that environment
1313+2. Generate a Turso auth token for that database
1414+3. Point `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` at that database
1515+4. Create Railway services for `api` and `indexer`
1616+5. Point `TAP_URL` at the existing Tap deployment
1717+6. Run migrations/start the services
1818+7. Run `twister backfill` before treating the environment as search-ready
1919+2020+No separate `*_DEV` or `*_PROD` variables are required. Each environment keeps using the same variable names and simply points them at the appropriate Turso database.
721822## 1. Configuration
923···8599ENABLE_ADMIN_ENDPOINTS=false
86100```
87101102102+### Environment Selection
103103+104104+Use the same variable names in every environment:
105105+106106+- local development can point `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` at `twister-dev`
107107+- production can point those same variables at `twister-prod`
108108+109109+The application should not care which database it is talking to; only the environment wiring changes.
110110+111111+## 1.5. Turso Setup
112112+113113+### Recommended Databases
114114+115115+Use one Turso database per environment, for example:
116116+117117+- `twister-dev`
118118+- `twister-prod`
119119+120120+Keep the app config identical across environments and swap only these values:
121121+122122+- `TURSO_DATABASE_URL`
123123+- `TURSO_AUTH_TOKEN`
124124+125125+### Basic Flow
126126+127127+Using the Turso dashboard or CLI:
128128+129129+1. Create the database for the target environment
130130+2. Capture its libSQL URL
131131+3. Create an auth token for the service
132132+4. Set `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` in that environment
133133+134134+Example values:
135135+136136+```bash
137137+# Development environment
138138+TURSO_DATABASE_URL=libsql://twister-dev-your-org.turso.io
139139+TURSO_AUTH_TOKEN=...
140140+141141+# Production environment
142142+TURSO_DATABASE_URL=libsql://twister-prod-your-org.turso.io
143143+TURSO_AUTH_TOKEN=...
144144+```
145145+146146+### Practical Rule
147147+148148+Do not introduce `TURSO_DATABASE_URL_DEV`, `TURSO_DATABASE_URL_PROD`, or similar split variables. Railway environments, local shells, and CI should all set the same names with environment-specific values.
149149+150150+## 1.6. Railway Setup
151151+152152+### Project Layout
153153+154154+Create or reuse one Railway project containing:
155155+156156+- existing `tap` service
157157+- `api` service running `twister api`
158158+- `indexer` service running `twister indexer`
159159+160160+### Basic Steps
161161+162162+1. Connect the monorepo to Railway
163163+2. Create the `api` and `indexer` services from the same source repo/Docker image
164164+3. Set shared variables on both services:
165165+ - `TURSO_DATABASE_URL`
166166+ - `TURSO_AUTH_TOKEN`
167167+ - `LOG_LEVEL`
168168+ - `LOG_FORMAT`
169169+4. Set API-specific variables:
170170+ - `HTTP_BIND_ADDR`
171171+ - `SEARCH_DEFAULT_LIMIT`
172172+ - `SEARCH_MAX_LIMIT`
173173+5. Set indexer-specific variables:
174174+ - `TAP_URL`
175175+ - `TAP_AUTH_PASSWORD`
176176+ - `INDEXED_COLLECTIONS`
177177+6. Configure health checks
178178+7. Deploy
179179+8. Run backfill against the environment before public validation
180180+181181+### Dev vs Production on Railway
182182+183183+If you use multiple Railway environments, keep the same service definitions and variable names in each one. Only the values change:
184184+185185+- dev Railway environment -> `TURSO_DATABASE_URL=...twister-dev...`
186186+- prod Railway environment -> `TURSO_DATABASE_URL=...twister-prod...`
187187+188188+This keeps deployment logic simple and avoids conditional application config.
189189+88190## 2. Observability
8919190192### Structured Logging
···261363```
262364263365Railway supports referencing other services' variables with `${{service.VAR}}` syntax, which is useful for linking the indexer to Tap's domain.
366366+367367+#### First-Time Bootstrap Checklist
368368+369369+After the first successful deploy of a new environment:
370370+371371+1. Confirm API readiness on `/readyz`
372372+2. Confirm indexer health and Tap connectivity
373373+3. Run graph backfill with the environment's seed file
374374+4. Wait for Tap historical sync to settle
375375+5. Verify that search returns known historical repos/profiles
264376265377#### Health Checks
266378
···5566# Twister Technical Specifications
7788-Twister is a Go-based search service for [Tangled](https://tangled.org) content on AT Protocol.
99-It ingests records through [Tap](https://github.com/bluesky-social/indigo/tree/main/cmd/tap), denormalizes them into search documents, indexes them in [Turso/libSQL](https://docs.turso.tech), and exposes keyword, semantic, and hybrid search APIs.
88+Twister is a Go-based index and search service for [Tangled](https://tangled.org) content on AT Protocol.
99+It ingests records through [Tap](https://github.com/bluesky-social/indigo/tree/main/cmd/tap), denormalizes them into search documents and graph summaries, indexes them in [Turso/libSQL](https://docs.turso.tech), and exposes public APIs for search and index-backed data gaps.
10101111## Specifications
1212···1919| 5 | [Search](05-search.md) | Search modes, API contract, scoring, filtering |
2020| 6 | [Operations](06-operations.md) | Configuration, observability, security, deployment |
2121| 7 | [Graph Backfill](07-graph-backfill.md) | Seed-based user discovery and content backfill |
2222+| 8 | [App Integration](08-app-integration.md) | Mobile-facing contracts for search and graph summaries |
-38
packages/api/docs/tasks/README.md
···11----
22-title: "Twister — Task Index"
33-updated: 2026-03-22
44----
55-66-# Twister Tasks
77-88-Assumes Go, Tap (deployed on Railway), Turso/libSQL, and Railway for deployment.
99-1010-## Delivery Strategy
1111-1212-Build in four phases:
1313-1414-1. **MVP** — ingestion, keyword search, deployment, operational tooling, graph backfill
1515-2. **Semantic Search** — embeddings, vector retrieval
1616-3. **Hybrid Search** — weighted merge of keyword + semantic
1717-4. **Quality Polish** — ranking refinement, advanced filters, analytics
1818-1919-Ship keyword search before embeddings. That gives a testable, inspectable baseline before introducing model behavior.
2020-2121-## Phases
2222-2323-| Phase | Title | Document | Status |
2424-| ----- | ----- | -------- | ------ |
2525-| 1 | MVP | [phase-1-mvp.md](phase-1-mvp.md) | In progress (M0–M2 complete) |
2626-| 2 | Semantic Search | [phase-2-semantic.md](phase-2-semantic.md) | Not started |
2727-| 3 | Hybrid Search | [phase-3-hybrid.md](phase-3-hybrid.md) | Not started |
2828-| 4 | Quality Polish | [phase-4-quality.md](phase-4-quality.md) | Not started |
2929-3030-## MVP Complete When
3131-3232-- Tap ingests tracked `sh.tangled.*` records
3333-- Documents normalize into a stable store
3434-- Keyword search works publicly
3535-- API and indexer are deployed on Railway
3636-- Restart does not lose sync position
3737-- Reindex exists for repair
3838-- Graph backfill populates initial content from seed users
···1717- Reindex exists for repair
1818- Graph backfill populates initial content from seed users
19192020----
2121-2220## M0 — Repository Bootstrap ✅
23212422Executable layout, local tooling, and development conventions (completed 2026-03-22).
25232626----
2727-2824## M1 — Database Schema and Store Layer ✅
29253026refs: [specs/03-data-model.md](../specs/03-data-model.md)
31273228Implemented the Turso/libSQL schema and Go store package for document persistence.
3333-3434----
35293630## M2 — Normalization Layer ✅
3731···39334034Translate `sh.tangled.*` records into internal search documents.
41354242----
4343-4436## M3 — Tap Client and Ingestion Loop
45374638refs: [specs/04-data-pipeline.md](../specs/04-data-pipeline.md), [specs/01-architecture.md](../specs/01-architecture.md)
···4840### Goal
49415042Connect the indexer to Tap (on Railway) and process live events into the store.
5151-5252-### Why Now
5353-5454-Tap is the point of truth for synchronized ATProto ingestion. It is already deployed on Railway.
55435644### Deliverables
5745···126114127115The system continuously ingests and persists `sh.tangled.*` records from Tap.
128116129129----
117117+## M4 — Graph Backfill from Seed Users
118118+119119+refs: [specs/07-graph-backfill.md](../specs/07-graph-backfill.md)
120120+121121+### Goal
122122+123123+Bootstrap the index with historical Tangled content by discovering and backfilling users from a curated seed set.
124124+125125+### Deliverables
126126+127127+- `twister backfill` CLI command
128128+- Seed file parser and documented seed-file format
129129+- Graph fan-out discovery (follows and collaborators)
130130+- Tap `/repos/add` integration for discovered users
131131+- Deduplication against already-tracked repos
132132+- Dry-run mode and progress logging
133133+- Basic operator runbook for first bootstrap and repeat runs
134134+135135+### Tasks
136136+137137+- [ ] Implement `backfill` subcommand with flags:
138138+ - `--seeds <file>` — required seed file path
139139+ - `--max-hops <n>` — depth limit for fan-out (default: 2)
140140+ - `--dry-run` — print the discovery plan without mutating Tap
141141+ - `--concurrency <n>` — parallel discovery workers (default: 5)
142142+ - `--batch-size <n>` — DIDs per `/repos/add` request
143143+ - `--batch-delay <duration>` — delay between Tap registration batches
144144+- [ ] Implement seed file parsing:
145145+ - One DID or handle per line
146146+ - `#` comments allowed
147147+ - Blank lines ignored
148148+ - Handles resolved to DIDs before graph expansion
149149+- [ ] Decide and document the initial seed file location for operators:
150150+ - Repository-managed example file for format/reference
151151+ - Deployment-specific runtime file or mounted secret for real runs
152152+- [ ] Implement graph discovery:
153153+ 1. Start from hop-0 seed users
154154+ 2. Fetch `sh.tangled.graph.follow` records and collect subject DIDs
155155+ 3. Fetch repo collaborators by inspecting repos, issues, PRs, and comments
156156+ 4. Enqueue newly discovered DIDs with hop metadata
157157+ 5. Stop expanding beyond `max-hops`
158158+- [ ] Track discovery metadata for logs:
159159+ - source DID
160160+ - hop depth
161161+ - discovery reason (`seed`, `follow`, `collaborator`)
162162+- [ ] Integrate with Tap admin endpoints:
163163+ - `GET /info/:did` to skip already-tracked repos when practical
164164+ - `POST /repos/add` to register new DIDs for backfill
165165+- [ ] Make the command safe to re-run:
166166+ - in-memory visited DID set during crawl
167167+ - tolerate duplicate `/repos/add`
168168+ - rely on index upsert idempotency for re-delivered records
169169+- [ ] Add operator-friendly logging:
170170+ - seed count
171171+ - users discovered per hop
172172+ - already-tracked vs newly-submitted DIDs
173173+ - batch progress
174174+ - final totals
175175+- [ ] Add a short runbook covering:
176176+ - first bootstrap against an empty database
177177+ - repeat run after expanding the seed list
178178+ - dry-run before production mutation
179179+180180+### Verification
181181+182182+- [ ] A small seed file of known Tangled users produces a non-empty discovery graph
183183+- [ ] `--max-hops 1` limits discovery to direct neighbors
184184+- [ ] `--dry-run` does not call Tap mutation endpoints
185185+- [ ] Already-tracked DIDs are reported and not re-submitted unnecessarily
186186+- [ ] Re-running the same seeds is effectively idempotent
187187+- [ ] Newly submitted DIDs cause Tap to begin historical backfill
188188+- [ ] Search results become materially richer after bootstrap than they were under live-only ingestion
189189+190190+### Exit Criteria
191191+192192+Operators can bootstrap an empty environment to a usable historical baseline before public rollout.
130193131131-## M4 — Keyword Search API
194194+## M5 — Keyword Search API
132195133196refs: [specs/05-search.md](../specs/05-search.md)
134197135198### Goal
136199137200Expose a usable public search API backed by Turso's Tantivy-backed FTS.
138138-139139-### Why Now
140140-141141-First real product milestone. Searchable Tangled content without waiting for embeddings.
142201143202### Deliverables
144203···200259201260A user can search Tangled content reliably with keyword search.
202261203203----
204204-205205-## M5 — Railway Deployment
262262+## M6 — Railway Deployment
206263207264refs: [specs/06-operations.md](../specs/06-operations.md)
208265209266### Goal
210267211268Deploy the API and indexer as Railway services alongside Tap.
212212-213213-### Why Now
214214-215215-At this point, the product is useful enough to run continuously.
216269217270### Deliverables
218271···253306254307The system runs as a deployed service with health-checked processes on Railway.
255308256256----
257257-258258-## M6 — Reindex and Repair
309309+## M7 — Reindex and Repair
259310260311refs: [specs/05-search.md](../specs/05-search.md)
261312···263314264315Make the system recoverable and operable with repair tools.
265316266266-### Why Now
267267-268268-Search systems are never perfect on first ingestion. Repair tools are needed before production.
269269-270317### Deliverables
271318272319- `twister reindex` command with scoping options
···304351305352Operators can repair bad indexes without rebuilding everything manually.
306353307307----
308308-309309-## M7 — Observability
354354+## M8 — Observability
310355311356refs: [specs/06-operations.md](../specs/06-operations.md)
312357···349394### Exit Criteria
350395351396The system is maintainable without guesswork.
352352-353353----
354354-355355-## M-New — Graph Backfill from Seed Users
356356-357357-refs: [specs/07-graph-backfill.md](../specs/07-graph-backfill.md)
358358-359359-### Goal
360360-361361-Bootstrap the search index with existing Tangled content by discovering and backfilling users from a seed set.
362362-363363-### Why Now
364364-365365-Before MVP launch, the index needs existing content. Live ingestion only captures new events — backfill populates historical data.
366366-367367-### Deliverables
368368-369369-- `twister backfill` CLI command
370370-- Seed file parser
371371-- Graph fan-out discovery (follows/collaborators)
372372-- Tap `/repos/add` integration for discovered users
373373-- Deduplication against already-indexed users
374374-- Progress logging
375375-376376-### Tasks
377377-378378-- [ ] Implement `backfill` subcommand with flags:
379379- - `--seeds <file>` — path to seed file (one DID or handle per line)
380380- - `--max-hops <n>` — depth limit for fan-out (default: 2)
381381- - `--dry-run` — show discovered users without triggering backfill
382382- - `--concurrency <n>` — parallel discovery workers (default: 5)
383383-- [ ] Implement seed file parser (supports DIDs and handles, comments with `#`)
384384-- [ ] Implement graph fan-out:
385385- 1. For each seed user, resolve DID if handle provided
386386- 2. Fetch `sh.tangled.graph.follow` records for the user
387387- 3. Fetch collaborators from repos owned by the user
388388- 4. Add discovered DIDs to the crawl queue
389389- 5. Repeat up to `max-hops` depth
390390-- [ ] Integrate with Tap `/repos/add` to register discovered DIDs for tracking
391391-- [ ] Deduplicate: skip DIDs already tracked by Tap (check via `/info/:did`)
392392-- [ ] Log progress: seeds processed, users discovered per hop, DIDs submitted to Tap
393393-- [ ] Handle rate limiting and errors gracefully (retry with backoff)
394394-- [ ] Make idempotent: safe to re-run; Tap handles duplicate `/repos/add` calls
395395-396396-### Verification
397397-398398-- [ ] Running with a seed file of 3 known users discovers their followers
399399-- [ ] `--max-hops 1` limits discovery to direct connections only
400400-- [ ] `--dry-run` lists discovered DIDs without calling Tap
401401-- [ ] Already-tracked users are skipped
402402-- [ ] Re-running the same seed file produces no duplicate work
403403-- [ ] Tap begins backfilling records for newly added DIDs
404404-405405-### Exit Criteria
406406-407407-The index contains historical content from the seed user graph, not just new events.
···7788Add embedding generation and vector-based retrieval on top of the keyword baseline.
991010----
1111-1210## M8 — Embedding Pipeline
13111412refs: [specs/03-data-model.md](../specs/03-data-model.md), [specs/05-search.md](../specs/05-search.md)
···1614### Goal
17151816Add asynchronous embedding generation without blocking ingestion.
1919-2020-### Why Now
2121-2222-Only after keyword search is stable should semantic complexity be added.
23172418### Deliverables
2519···70647165Embeddings are produced asynchronously and stored durably.
72667373----
7474-7567## M9 — Semantic Search
76687769refs: [specs/05-search.md](../specs/05-search.md)
···7971### Goal
80728173Expose vector-based semantic retrieval.
8282-8383-### Why Now
8484-8585-Natural next step once embeddings exist. Turso/libSQL has native vector search with `vector_top_k`.
86748775### Deliverables
8876