a love letter to tangled (android, iOS, and a search API)
19
fork

Configure Feed

Select the types of activity you want to include in your feed.

docs: reorganize and add overview to README

+1466 -633
+91 -5
README.md
··· 1 - # Twisted Monorepo 1 + # Twisted 2 2 3 - - `apps/twisted`: Ionic/Vue client 4 - - `packages/api`: Go API copied from `~/Projects/TWISTER` 3 + Twisted is a monorepo for a Tangled mobile client and the supporting Tap-backed indexing API. 4 + 5 + ## Projects 6 + 7 + - `apps/twisted`: Ionic Vue client for browsing Tangled repos, profiles, issues, PRs, and indexed search results 8 + - `packages/api`: Go service that consumes Tangled records through Tap, fills gaps in the public Tangled API, and serves search 9 + - `docs`: top-level specs and plans, split by project under `docs/app` and `docs/api` 10 + 11 + ## Architecture 12 + 13 + The app still uses Tangled's public knot and PDS APIs for canonical repo and profile data. The API project adds two complementary capabilities: 14 + 15 + 1. Global search over indexed Tangled content 16 + 2. Index-backed summaries for data that is hard to derive from the public API alone, such as followers 17 + 18 + That keeps direct browsing honest while giving the client one place to ask for cross-network discovery and graph augmentation. 5 19 6 20 ## Development 7 21 8 - Use the top-level `justfile` for common tasks: 22 + Use the top-level [`justfile`](justfile) for common workflows: 9 23 10 24 ```bash 11 25 just dev 12 26 just build 13 27 just test 28 + just api-run-api 14 29 ``` 15 30 16 - The existing client package still works directly from `apps/twisted`. 31 + To enable indexed search in the client, set `VITE_TWISTER_API_BASE_URL` in `apps/twisted/.env`. 32 + 33 + ## Infrastructure Setup 34 + 35 + ### Turso 36 + 37 + Use one Turso database per environment, for example: 38 + 39 + - `twister-dev` 40 + - `twister-prod` 41 + 42 + Do not introduce separate app variable names for dev and prod. Always use the same variables: 43 + 44 + - `TURSO_DATABASE_URL` 45 + - `TURSO_AUTH_TOKEN` 46 + 47 + Only the values change per environment. 48 + 49 + Example: 50 + 51 + ```bash 52 + # Development 53 + TURSO_DATABASE_URL=libsql://twister-dev-your-org.turso.io 54 + TURSO_AUTH_TOKEN=... 55 + 56 + # Production 57 + TURSO_DATABASE_URL=libsql://twister-prod-your-org.turso.io 58 + TURSO_AUTH_TOKEN=... 59 + ``` 60 + 61 + ### Railway 62 + 63 + Create or reuse one Railway project containing: 64 + 65 + - existing `tap` 66 + - `api` running `twister api` 67 + - `indexer` running `twister indexer` 68 + 69 + Set these shared variables on the Railway services: 70 + 71 + - `TURSO_DATABASE_URL` 72 + - `TURSO_AUTH_TOKEN` 73 + - `LOG_LEVEL` 74 + - `LOG_FORMAT` 75 + 76 + Set these API-specific variables: 77 + 78 + - `HTTP_BIND_ADDR` 79 + - `SEARCH_DEFAULT_LIMIT` 80 + - `SEARCH_MAX_LIMIT` 81 + 82 + Set these indexer-specific variables: 83 + 84 + - `TAP_URL` 85 + - `TAP_AUTH_PASSWORD` 86 + - `INDEXED_COLLECTIONS` 87 + 88 + If you use separate Railway environments for dev and prod, keep the same variable names in both and only swap the Turso values. 89 + 90 + ### First Bootstrap 91 + 92 + For a brand-new environment: 93 + 94 + 1. Point `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` at the target database. 95 + 2. Deploy `api` and `indexer` on Railway. 96 + 3. Verify API readiness and indexer health. 97 + 4. Run `twister backfill` with your seed file. 98 + 5. Treat the environment as search-ready only after historical backfill completes. 99 + 100 + ## Docs 101 + 102 + - Index: [`docs/README.md`](docs/README.md)
+1
apps/twisted/.env.example
··· 1 + VITE_TWISTER_API_BASE_URL=http://localhost:8080
+2 -17
apps/twisted/README.md
··· 1 - # Twisted 2 - 3 - A mobile client for [Tangled](https://tangled.org). 4 - 5 - ## Development 1 + # Twisted App 6 2 7 - Run the mobile apps with Capacitor: 8 - 9 - ```bash 10 - pnpm cap run ios 11 - pnpm cap run android 12 - ``` 13 - 14 - Or to test the web version: 15 - 16 - ```bash 17 - pnpm dev 18 - ``` 3 + Ionic Vue client for the Twisted monorepo.
+15 -13
apps/twisted/docs/specs/README.md docs/app/specs/README.md
··· 12 12 13 13 ## What Twisted Does 14 14 15 - **Reader and social companion** for Tangled. Focused on browsing, known-handle lookup, and lightweight interactions. 15 + **Reader and social companion** for Tangled. Focused on direct browsing, indexed discovery, and lightweight interactions. 16 16 17 17 - Browse repos, files, READMEs, issues, PRs 18 18 - Jump to profiles and repos from a known AT Protocol handle 19 - - Explore and Activity placeholders until search/feed work lands 19 + - Search indexed repos and profiles through the Twister API 20 + - Use index-backed graph summaries where the public API is incomplete 20 21 - Sign in via AT Protocol OAuth 21 22 - Star repos, follow users, react to content 22 23 - Offline-capable with cached data ··· 40 41 41 42 **Presentation** — Ionic pages, Vue components, composables, Pinia stores. 42 43 **Domain** — Normalized models (`UserSummary`, `RepoDetail`, `ActivityItem`, etc.), action policies, pagination. 43 - **Data** — `@atcute/client` XRPC calls, `@atcute/tangled` type definitions, local cache, optional BFF. 44 + **Data** — `@atcute/client` XRPC calls, `@atcute/tangled` type definitions, local cache, and the Twister API for search/index-backed summaries. 44 45 45 46 Protocol isolation: no Vue component imports `@atcute/*` directly. All API access flows through `src/services/`. 46 47 47 48 ## Tangled API Surface 48 49 49 - Two distinct API hosts: 50 + Three distinct data hosts: 50 51 51 52 | Host | Protocol | Data | 52 53 | ---------------------------------- | ---------------------------------- | ----------------------------------------------------------------- | 53 54 | Knots (`us-west.tangled.sh`, etc.) | XRPC at `/xrpc/sh.tangled.*` | Git data: trees, blobs, commits, branches, diffs, tags | 54 55 | User's PDS | XRPC at `/xrpc/com.atproto.repo.*` | AT Protocol records: repos, issues, PRs, stars, follows, profiles | 56 + | Twister API | HTTP JSON | Global search and index-backed graph/profile summaries | 55 57 56 - The appview (`tangled.org`) serves HTML — it's the web UI, not a JSON API. The mobile client talks to knots and PDS servers directly. 58 + The appview (`tangled.org`) serves HTML — it's the web UI, not a JSON API. The mobile client talks to knots and PDS servers directly for canonical detail and uses the Twister API for cross-network discovery. 57 59 58 60 Repo param format: `did:plc:xxx/repoName`. 59 61 ··· 61 63 62 64 | Phase | Focus | Spec | Tasks | 63 65 | ----- | ------------------------------------------------------------------------ | ------------------------------------ | ------------------------------------ | 64 - | 1 | Project shell, tabs, mock data, design system | [specs/phase-1.md](specs/phase-1.md) | [tasks/phase-1.md](tasks/phase-1.md) | 65 - | 2 | Public browsing — repos, files, profiles, issues, PRs | [specs/phase-2.md](specs/phase-2.md) | [tasks/phase-2.md](tasks/phase-2.md) | 66 - | 3 | Deferred search/feed placeholders and Home-first public browsing | [specs/phase-3.md](specs/phase-3.md) | [tasks/phase-3.md](tasks/phase-3.md) | 67 - | 4 | OAuth sign-in, star, follow, react, personalized feed | [specs/phase-4.md](specs/phase-4.md) | [tasks/phase-4.md](tasks/phase-4.md) | 68 - | 5 | Offline persistence, performance, bundle optimization | [specs/phase-5.md](specs/phase-5.md) | [tasks/phase-5.md](tasks/phase-5.md) | 69 - | 6 | Write features (issues, comments, profile edit), BFF, push notifications | [specs/phase-6.md](specs/phase-6.md) | [tasks/phase-6.md](tasks/phase-6.md) | 70 - | 7 | Real-time Jetstream feed, custom feeds, forking, labels, interdiff | [specs/phase-7.md](specs/phase-7.md) | [tasks/phase-7.md](tasks/phase-7.md) | 66 + | 1 | Project shell, tabs, mock data, design system | [phase-1.md](phase-1.md) | [../tasks/phase-1.md](../tasks/phase-1.md) | 67 + | 2 | Public browsing — repos, files, profiles, issues, PRs | [phase-2.md](phase-2.md) | [../tasks/phase-2.md](../tasks/phase-2.md) | 68 + | 3 | Index-backed search and handle-first public browsing | [phase-3.md](phase-3.md) | [../tasks/phase-3.md](../tasks/phase-3.md) | 69 + | 4 | OAuth sign-in, star, follow, react, personalized feed | [phase-4.md](phase-4.md) | [../tasks/phase-4.md](../tasks/phase-4.md) | 70 + | 5 | Offline persistence, performance, bundle optimization | [phase-5.md](phase-5.md) | [../tasks/phase-5.md](../tasks/phase-5.md) | 71 + | 6 | Write features, project service integration, push notifications | [phase-6.md](phase-6.md) | [../tasks/phase-6.md](../tasks/phase-6.md) | 72 + | 7 | Real-time Jetstream feed, custom feeds, forking, labels, interdiff | [phase-7.md](phase-7.md) | [../tasks/phase-7.md](../tasks/phase-7.md) | 71 73 72 74 ## Key Design Decisions 73 75 74 76 1. **`@atcute` end-to-end** for all AT Protocol interaction — no mixing client stacks. 75 77 2. **Tangled lexicon handling in one module boundary** (`src/services/tangled/`) — don't scatter `sh.tangled.*` awareness across pages. 76 78 3. **Read-first** — the primary product is a fast reader. Social mutations are a controlled second layer. 77 - 4. **Thin BFF when needed** (Phase 6+) for search indexing, personalized feeds, push notifications, and unstable procedure wrapping. 79 + 4. **Use the project API sparingly and intentionally.** Search and index-backed graph gaps belong there; canonical repo detail stays on Tangled's public APIs. 78 80 5. **Mobile-first, not desktop-forge-first** — prioritize readability, direct browsing, and small focused actions before broader discovery surfaces.
apps/twisted/docs/specs/phase-1.md docs/app/specs/phase-1.md
apps/twisted/docs/specs/phase-2.md docs/app/specs/phase-2.md
-66
apps/twisted/docs/specs/phase-3.md
··· 1 - # Phase 3 — Deferred Search and Activity 2 - 3 - ## Goal 4 - 5 - Preserve honest product boundaries before search is implemented as a separate project. Public browsing continues through known AT Protocol handles on Home, while Explore and Activity stay visible as clearly labeled in-progress placeholders. 6 - 7 - ## Current Product Shape 8 - 9 - ### Home 10 - 11 - Home is the temporary public entry point for unauthenticated browsing: 12 - 13 - - Enter a known AT Protocol handle 14 - - Open that user's profile directly 15 - - Resolve the handle to DID + PDS via AT Protocol identity 16 - - List that user's public Tangled repos inline and open one directly 17 - 18 - This keeps public browsing fully real without implying that global discovery already exists. 19 - 20 - ### Explore 21 - 22 - Explore remains a tab-level placeholder: 23 - 24 - - No global repo search 25 - - No global user search 26 - - No curated fallback discovery pretending to be search 27 - - Empty state should explicitly say search is in progress 28 - 29 - ### Activity 30 - 31 - Activity also remains a tab-level placeholder: 32 - 33 - - No public timeline yet 34 - - No curated public feed fallback 35 - - Empty state should explicitly say activity is in progress 36 - 37 - ## Identity and Routing 38 - 39 - The Home handle flow continues to use the existing AT Protocol resolution path: 40 - 41 - 1. Resolve `handle -> DID` via `com.atproto.identity.resolveHandle` 42 - 2. Fetch the DID document and extract the PDS endpoint 43 - 3. Query the user's PDS for `sh.tangled.repo` records via `com.atproto.repo.listRecords` 44 - 4. Route to existing profile and repo detail screens 45 - 46 - No backend search index, feed service, or additional dependency is introduced in this phase. 47 - 48 - ## UI Expectations 49 - 50 - - Home shows one handle input plus explicit actions for profile jump and repo browsing 51 - - Home shows loading, invalid-handle, no-repos, and resolved-repo-list states 52 - - Explore shows a static in-progress empty state 53 - - Activity shows a static in-progress empty state 54 - - Profile remains unchanged 55 - 56 - ## Deferred Work 57 - 58 - The following work is intentionally deferred out of this phase: 59 - 60 - - Search indexing and ranking 61 - - Search result UI and recent searches 62 - - Trending or suggested discovery sections 63 - - Public activity feed ingestion, pagination, and caching 64 - - Jetstream or appview timeline investigation 65 - 66 - These capabilities will be revisited when search and feed work are scheduled independently.
apps/twisted/docs/specs/phase-4.md docs/app/specs/phase-4.md
apps/twisted/docs/specs/phase-5.md docs/app/specs/phase-5.md
+9 -9
apps/twisted/docs/specs/phase-6.md docs/app/specs/phase-6.md
··· 1 - # Phase 6 — Write Features & Backend Adapter 1 + # Phase 6 — Write Features & Project Services 2 2 3 3 ## Goal 4 4 5 - Add authenticated write operations (create issues, comment on PRs/issues, edit profile) and introduce a thin backend (BFF) for operations that don't work well from a pure SPA. 5 + Add authenticated write operations (create issues, comment on PRs/issues, edit profile) and extend the Twister project services only where the client should not or cannot do the work directly. 6 6 7 - ## Why a Backend 7 + ## Why Project Services 8 8 9 9 Some operations are awkward or unsafe from a browser client: 10 10 ··· 12 12 - **Unstable procedures**: Tangled's API may change — a backend adapter isolates the mobile client from churn 13 13 - **Push notifications**: require server-side registration and delivery 14 14 - **Personalized feeds**: server-side aggregation is more efficient than client-side filtering 15 - - **Search**: if no public JSON search API exists, the backend can index and serve search results 15 + - **Graph gaps**: follower lists/counts and other cross-network summaries may require index-backed derivation 16 16 - **Rate limiting**: backend can batch and deduplicate requests 17 17 18 - ### BFF Scope 18 + ### Service Scope 19 19 20 - Thin adapter — not a full backend. Proxies and transforms Tangled/AT Protocol calls. 20 + Thin service layer — not a replacement for Tangled's public APIs. Use it for cross-network aggregation, search, notifications, and operations the SPA should not own. 21 21 22 22 | Endpoint | Purpose | 23 23 | ------------------------------------ | --------------------------------------------------- | 24 24 | `POST /auth/session` | OAuth token exchange and session management | 25 25 | `GET /feed/personalized` | Pre-filtered activity feed for the user | 26 - | `GET /search/repos`, `/search/users` | Search proxy/index | 26 + | `GET /search`, `GET /profiles/:did/summary` | Search and index-backed graph/profile summaries | 27 27 | `POST /notifications/register` | Push notification device registration | 28 28 | Passthrough for stable XRPC calls | Avoid duplicating what the client already does well | 29 29 ··· 56 56 57 57 ## Push Notifications 58 58 59 - - Register device token with BFF 60 - - BFF subscribes to Jetstream for the user's relevant events 59 + - Register device token with project services 60 + - Project services subscribe to Jetstream or indexed events relevant to the user 61 61 - Deliver via APNs (iOS) / FCM (Android) 62 62 - Notification types: PR activity on your repos, issue comments, new followers, stars 63 63
+1 -1
apps/twisted/docs/specs/phase-7.md docs/app/specs/phase-7.md
··· 44 44 - "Team" — activity from users I follow 45 45 - Custom filters: by repo, by user, by event type 46 46 47 - Feeds are stored locally in IndexedDB. If a BFF exists, they can optionally sync server-side for push notification filtering. 47 + Feeds are stored locally in IndexedDB. If project services exist, they can optionally sync server-side for push notification filtering. 48 48 49 49 ## Advanced Features 50 50
apps/twisted/docs/tasks/phase-1.md docs/app/tasks/phase-1.md
apps/twisted/docs/tasks/phase-2.md docs/app/tasks/phase-2.md
apps/twisted/docs/tasks/phase-3.md docs/app/tasks/phase-3.md
apps/twisted/docs/tasks/phase-4.md docs/app/tasks/phase-4.md
apps/twisted/docs/tasks/phase-5.md docs/app/tasks/phase-5.md
+17 -18
apps/twisted/docs/tasks/phase-6.md docs/app/tasks/phase-6.md
··· 1 - # Phase 6 Tasks — Write Features & Backend Adapter 1 + # Phase 6 Tasks — Write Features & Project Services 2 2 3 - ## Backend (BFF) Setup 3 + ## Project Services Setup 4 4 5 - - [ ] Choose runtime (Node/Deno/Bun) and framework (Hono/Fastify/Express) 6 - - [ ] Scaffold BFF project with TypeScript 7 - - [ ] Implement health check endpoint 8 - - [ ] Deploy to hosting (Fly.io, Railway, etc.) 5 + - [ ] Decide which write and notification operations belong in `packages/api` versus a separate service 6 + - [ ] Implement health and readiness endpoints for all public client-facing services 9 7 - [ ] Configure CORS for the mobile app's origins 8 + - [ ] Document the mobile-facing service contract in `docs/api` 10 9 11 - ## Backend — Auth Proxy 10 + ## Project Services — Auth Proxy 12 11 13 12 - [ ] Implement OAuth token exchange endpoint (if moving auth server-side) 14 13 - [ ] Implement session endpoint that returns user info 15 - - [ ] Decide: keep client-side OAuth or migrate to BFF-mediated auth 14 + - [ ] Decide: keep client-side OAuth or migrate to service-mediated auth 16 15 17 - ## Backend — Search 16 + ## Project Services — Search and Graph 18 17 19 - - [ ] Implement search indexer: subscribe to Jetstream, index `sh.tangled.repo` and `sh.tangled.actor.profile` records 20 - - [ ] Implement `GET /search/repos?q=` endpoint 21 - - [ ] Implement `GET /search/users?q=` endpoint 22 - - [ ] Wire mobile client's search service to BFF endpoints 18 + - [ ] Implement `GET /search` endpoint for repo/profile discovery 19 + - [ ] Return enough repo/profile metadata for the mobile client to render result cards directly 20 + - [ ] Implement `GET /profiles/:did/summary` for follower/following counts and other graph-derived gaps 21 + - [ ] Wire mobile client's search and profile summary services to these endpoints 23 22 24 - ## Backend — Personalized Feed 23 + ## Project Services — Personalized Feed 25 24 26 25 - [ ] Implement `GET /feed/personalized` — aggregate activity for the user's follows and stars 27 26 - [ ] Index relevant events from Jetstream 28 - - [ ] Wire mobile client's feed to BFF endpoint when authenticated 27 + - [ ] Wire mobile client's feed to the project service endpoint when authenticated 29 28 30 29 ## Create Issue 31 30 ··· 69 68 - [ ] Prompt user to re-authorize with expanded scopes 70 69 - [ ] Handle scope upgrade flow gracefully (no data loss) 71 70 72 - ## Push Notifications (if BFF exists) 71 + ## Push Notifications (if services exist) 73 72 74 - - [ ] Implement `POST /notifications/register` on BFF — register device token 73 + - [ ] Implement `POST /notifications/register` — register device token 75 74 - [ ] Configure Capacitor Push Notifications plugin 76 75 - [ ] Register device token on login 77 - - [ ] BFF: subscribe to Jetstream events relevant to user, deliver via APNs/FCM 76 + - [ ] Services: subscribe to events relevant to user, deliver via APNs/FCM 78 77 - [ ] Handle notification tap → deep link to relevant content 79 78 80 79 ## Quality
apps/twisted/docs/tasks/phase-7.md docs/app/tasks/phase-7.md
+12
apps/twisted/src/core/config/project.ts
··· 1 + const rawTwisterApiBaseUrl = import.meta.env.VITE_TWISTER_API_BASE_URL?.trim() ?? ""; 2 + 3 + export const twisterApiBaseUrl = rawTwisterApiBaseUrl.replace(/\/+$/, ""); 4 + export const hasTwisterApi = twisterApiBaseUrl.length > 0; 5 + 6 + export function getTwisterApiUrl(path: string): string { 7 + if (!hasTwisterApi) { 8 + throw new Error("Twister API base URL is not configured."); 9 + } 10 + 11 + return new URL(path.replace(/^\/+/, ""), `${twisterApiBaseUrl}/`).toString(); 12 + }
+1 -1
apps/twisted/src/features/activity/ActivityPage.vue
··· 16 16 <EmptyState 17 17 :icon="pulseOutline" 18 18 title="Activity is in progress" 19 - message="The public activity feed is being rebuilt separately. This tab stays as a placeholder until that work is ready." /> 19 + message="The public activity feed is still in progress. This tab stays as a placeholder until the indexed feed work is ready." /> 20 20 </ion-content> 21 21 </ion-page> 22 22 </template>
+297 -6
apps/twisted/src/features/explore/ExplorePage.vue
··· 13 13 </ion-toolbar> 14 14 </ion-header> 15 15 16 - <EmptyState 17 - :icon="searchOutline" 18 - title="Search is in progress" 19 - message="Global repo and user search is being built separately. For now, use Home to jump to a known handle and browse that account's repos." /> 16 + <section class="hero"> 17 + <p class="eyebrow">Indexed Search</p> 18 + <h1 class="hero-title">Search the Tangled network through the project index.</h1> 19 + <p class="hero-copy"> 20 + Explore uses the Twister index for global search. Open any result to continue browsing through Tangled's 21 + public repo and profile APIs. 22 + </p> 23 + </section> 24 + 25 + <section class="search-card"> 26 + <label class="field-label" for="search-input">Search query</label> 27 + <ion-input 28 + id="search-input" 29 + v-model="draftQuery" 30 + class="search-input" 31 + autocomplete="off" 32 + autocapitalize="off" 33 + :spellcheck="false" 34 + clear-input 35 + placeholder="Search repos, profiles, issues, and strings" 36 + @keydown.enter="runSearch" /> 37 + 38 + <ion-segment v-model="resultType" class="search-segment"> 39 + <ion-segment-button value="all">All</ion-segment-button> 40 + <ion-segment-button value="repo">Repos</ion-segment-button> 41 + <ion-segment-button value="profile">People</ion-segment-button> 42 + </ion-segment> 43 + 44 + <div class="action-row"> 45 + <ion-button class="primary-action" expand="block" @click="runSearch" :disabled="!canSearch"> 46 + Search 47 + </ion-button> 48 + <ion-button fill="outline" expand="block" @click="clearSearch" :disabled="!hasAnyQuery">Clear</ion-button> 49 + </div> 50 + 51 + <p v-if="hasTwisterApi" class="hint-copy"> 52 + Search results and follower counts come from the project index when available. 53 + </p> 54 + <p v-else class="hint-copy"> 55 + Set <code>VITE_TWISTER_API_BASE_URL</code> to enable global search and index-backed graph summaries. 56 + </p> 57 + </section> 58 + 59 + <section class="results-section"> 60 + <EmptyState 61 + v-if="!hasTwisterApi" 62 + :icon="searchOutline" 63 + title="Index API not configured" 64 + message="Explore can search globally once the Twister API base URL is configured for this app." /> 65 + 66 + <EmptyState 67 + v-else-if="!hasAttemptedSearch" 68 + :icon="searchOutline" 69 + title="Search repos and people" 70 + message="Run a query against the project index, then open any result to continue browsing with Tangled's public APIs." /> 71 + 72 + <template v-else-if="isLoading"> 73 + <SkeletonLoader v-for="n in 3" :key="`repo-${n}`" variant="card" /> 74 + <SkeletonLoader v-for="n in 2" :key="`user-${n}`" variant="list-item" /> 75 + </template> 76 + 77 + <EmptyState 78 + v-else-if="isError" 79 + :icon="alertCircleOutline" 80 + title="Search failed" 81 + :message="errorMessage" 82 + action-label="Try Again" 83 + @action="runSearch" /> 84 + 85 + <template v-else-if="hasResults"> 86 + <div class="results-header"> 87 + <div> 88 + <p class="results-label">Indexed results</p> 89 + <h2 class="results-title">{{ submittedQuery }}</h2> 90 + </div> 91 + <p class="results-meta">{{ totalLabel }}</p> 92 + </div> 93 + 94 + <template v-if="repos.length"> 95 + <h3 class="section-label">Repos</h3> 96 + <RepoCard 97 + v-for="repo in repos" 98 + :key="repo.atUri" 99 + :repo="repo" 100 + @click="navigateToRepo(repo)" 101 + @owner-click="navigateToUser(repo.ownerHandle)" /> 102 + </template> 103 + 104 + <template v-if="profiles.length"> 105 + <h3 class="section-label">People</h3> 106 + <UserCard 107 + v-for="user in profiles" 108 + :key="user.did || user.handle" 109 + :user="user" 110 + @click="navigateToUser(user.handle)" /> 111 + </template> 112 + </template> 113 + 114 + <EmptyState 115 + v-else 116 + :icon="searchOutline" 117 + title="No indexed matches" 118 + message="Try a different query, or use Home to jump directly to a known handle." /> 119 + </section> 20 120 </ion-content> 21 121 </ion-page> 22 122 </template> 23 123 24 124 <script setup lang="ts"> 25 - import { IonPage, IonHeader, IonToolbar, IonTitle, IonContent } from "@ionic/vue"; 26 - import { searchOutline } from "ionicons/icons"; 125 + import { computed, ref } from "vue"; 126 + import { useRouter } from "vue-router"; 127 + import { 128 + IonPage, 129 + IonHeader, 130 + IonToolbar, 131 + IonTitle, 132 + IonContent, 133 + IonInput, 134 + IonButton, 135 + IonSegment, 136 + IonSegmentButton, 137 + } from "@ionic/vue"; 138 + import { alertCircleOutline, searchOutline } from "ionicons/icons"; 27 139 import EmptyState from "@/components/common/EmptyState.vue"; 140 + import RepoCard from "@/components/common/RepoCard.vue"; 141 + import SkeletonLoader from "@/components/common/SkeletonLoader.vue"; 142 + import UserCard from "@/components/common/UserCard.vue"; 143 + import { hasTwisterApi } from "@/core/config/project.js"; 144 + import type { RepoSummary } from "@/domain/models/repo.js"; 145 + import { useProjectSearch } from "@/services/project-api/queries.js"; 146 + 147 + const router = useRouter(); 148 + 149 + const draftQuery = ref(""); 150 + const submittedQuery = ref(""); 151 + const hasAttemptedSearch = ref(false); 152 + const resultType = ref<"all" | "repo" | "profile">("all"); 153 + 154 + const hasAnyQuery = computed(() => draftQuery.value.trim().length > 0 || submittedQuery.value.length > 0); 155 + const canSearch = computed(() => hasTwisterApi && draftQuery.value.trim().length > 0); 156 + 157 + const searchQuery = useProjectSearch(submittedQuery, { 158 + type: resultType, 159 + enabled: computed(() => hasTwisterApi && hasAttemptedSearch.value && submittedQuery.value.length > 0), 160 + }); 161 + 162 + const repos = computed(() => searchQuery.data.value?.repos ?? []); 163 + const profiles = computed(() => searchQuery.data.value?.profiles ?? []); 164 + const hasResults = computed(() => repos.value.length > 0 || profiles.value.length > 0); 165 + const isLoading = computed(() => searchQuery.isPending.value); 166 + const isError = computed(() => searchQuery.isError.value); 167 + const errorMessage = computed(() => { 168 + const err = searchQuery.error.value; 169 + return err instanceof Error ? err.message : "An unexpected error occurred while searching the project index."; 170 + }); 171 + const totalLabel = computed(() => { 172 + const total = searchQuery.data.value?.total ?? repos.value.length + profiles.value.length; 173 + return `${total} indexed result${total === 1 ? "" : "s"}`; 174 + }); 175 + 176 + function runSearch() { 177 + if (!canSearch.value) return; 178 + submittedQuery.value = draftQuery.value.trim(); 179 + hasAttemptedSearch.value = true; 180 + } 181 + 182 + function clearSearch() { 183 + draftQuery.value = ""; 184 + submittedQuery.value = ""; 185 + hasAttemptedSearch.value = false; 186 + } 187 + 188 + function navigateToRepo(repo: RepoSummary) { 189 + router.push(`/tabs/explore/repo/${repo.ownerHandle}/${repo.name}`); 190 + } 191 + 192 + function navigateToUser(handle: string) { 193 + router.push(`/tabs/explore/user/${handle}`); 194 + } 28 195 </script> 196 + 197 + <style scoped> 198 + .hero { 199 + padding: 24px 20px 12px; 200 + } 201 + 202 + .eyebrow { 203 + margin: 0 0 10px; 204 + font-size: 12px; 205 + font-weight: 700; 206 + letter-spacing: 0.08em; 207 + text-transform: uppercase; 208 + color: var(--t-accent); 209 + } 210 + 211 + .hero-title { 212 + margin: 0; 213 + font-size: 28px; 214 + line-height: 1.15; 215 + color: var(--t-text-primary); 216 + } 217 + 218 + .hero-copy { 219 + margin: 12px 0 0; 220 + font-size: 14px; 221 + line-height: 1.6; 222 + color: var(--t-text-secondary); 223 + max-width: 34rem; 224 + } 225 + 226 + .search-card { 227 + margin: 0 16px; 228 + padding: 18px 16px 16px; 229 + border: 1px solid var(--t-border); 230 + border-radius: var(--t-radius-lg); 231 + background: linear-gradient(180deg, var(--t-surface-raised), var(--t-surface)); 232 + } 233 + 234 + .field-label { 235 + display: block; 236 + margin-bottom: 8px; 237 + font-size: 13px; 238 + font-weight: 600; 239 + color: var(--t-text-primary); 240 + } 241 + 242 + .search-input { 243 + --background: rgba(255, 255, 255, 0.04); 244 + --border-radius: var(--t-radius-md); 245 + --color: var(--t-text-primary); 246 + --padding-start: 14px; 247 + --padding-end: 14px; 248 + margin-bottom: 12px; 249 + border: 1px solid var(--t-border); 250 + border-radius: var(--t-radius-md); 251 + font-family: var(--t-mono); 252 + } 253 + 254 + .search-segment { 255 + margin-bottom: 12px; 256 + } 257 + 258 + .action-row { 259 + display: grid; 260 + grid-template-columns: repeat(2, minmax(0, 1fr)); 261 + gap: 10px; 262 + } 263 + 264 + .primary-action { 265 + --background: var(--t-accent); 266 + --background-activated: var(--t-accent); 267 + --color: #0d1117; 268 + } 269 + 270 + .hint-copy { 271 + margin: 12px 0 0; 272 + font-size: 12px; 273 + line-height: 1.5; 274 + color: var(--t-text-muted); 275 + } 276 + 277 + .results-section { 278 + padding: 18px 0 24px; 279 + } 280 + 281 + .results-header { 282 + display: flex; 283 + align-items: flex-end; 284 + justify-content: space-between; 285 + gap: 12px; 286 + margin: 0 16px 12px; 287 + } 288 + 289 + .results-label { 290 + margin: 0 0 4px; 291 + font-size: 12px; 292 + font-weight: 700; 293 + letter-spacing: 0.08em; 294 + text-transform: uppercase; 295 + color: var(--t-accent); 296 + } 297 + 298 + .results-title { 299 + margin: 0; 300 + font-size: 20px; 301 + line-height: 1.2; 302 + color: var(--t-text-primary); 303 + } 304 + 305 + .results-meta { 306 + margin: 0; 307 + font-size: 12px; 308 + color: var(--t-text-muted); 309 + } 310 + 311 + .section-label { 312 + margin: 0 16px 8px; 313 + font-size: 12px; 314 + font-weight: 700; 315 + letter-spacing: 0.08em; 316 + text-transform: uppercase; 317 + color: var(--t-text-muted); 318 + } 319 + </style>
+3 -1
apps/twisted/src/features/home/HomePage.vue
··· 45 45 </ion-button> 46 46 </div> 47 47 48 - <p class="hint-copy">Repo browsing is temporary here until search ships in a separate project.</p> 48 + <p class="hint-copy"> 49 + Home is still the fastest way to jump to a known handle directly. 50 + </p> 49 51 </section> 50 52 51 53 <section v-if="hasAttemptedBrowse" class="results-section">
+23 -7
apps/twisted/src/features/profile/UserProfilePage.vue
··· 183 183 useUserPullRequests, 184 184 useUserFollowing, 185 185 } from "@/services/tangled/queries.js"; 186 + import { useIndexedProfileSummary } from "@/services/project-api/queries.js"; 186 187 import type { IssueSummary } from "@/domain/models/issue.js"; 187 188 import type { PullRequestSummary } from "@/domain/models/pull-request.js"; 188 189 import type { RepoSummary } from "@/domain/models/repo.js"; ··· 208 209 const issuesQuery = useUserIssues(pds, did, handle, { enabled: hasIdentity }); 209 210 const pullRequestsQuery = useUserPullRequests(pds, did, handle, { enabled: hasIdentity }); 210 211 const followingQuery = useUserFollowing(pds, did, { enabled: hasIdentity }); 212 + const indexedProfileSummaryQuery = useIndexedProfileSummary(did, { enabled: hasIdentity }); 211 213 212 214 const profile = computed(() => profileQuery.data.value); 213 215 const repos = computed(() => reposQuery.data.value ?? []); ··· 215 217 const issues = computed(() => issuesQuery.data.value ?? []); 216 218 const pullRequests = computed(() => pullRequestsQuery.data.value ?? []); 217 219 const following = computed(() => followingQuery.data.value ?? []); 220 + const indexedProfileSummary = computed(() => indexedProfileSummaryQuery.data.value); 218 221 219 222 const pinnedUris = computed(() => (profile.value as { pinnedRepos?: string[] } | undefined)?.pinnedRepos ?? []); 220 223 const pinnedRepos = computed(() => repos.value.filter((r) => pinnedUris.value.includes(r.atUri))); 221 224 const otherRepos = computed(() => repos.value.filter((repo) => !pinnedUris.value.includes(repo.atUri))); 222 - const stats = computed(() => [ 223 - { label: "repos", value: repos.value.length }, 224 - { label: "strings", value: strings.value.length }, 225 - { label: "issues", value: issues.value.length }, 226 - { label: "prs", value: pullRequests.value.length }, 227 - { label: "following", value: following.value.length }, 228 - ]); 225 + const stats = computed(() => { 226 + const values = [ 227 + { label: "repos", value: repos.value.length }, 228 + { label: "strings", value: strings.value.length }, 229 + { label: "issues", value: issues.value.length }, 230 + { label: "prs", value: pullRequests.value.length }, 231 + ]; 232 + 233 + if (indexedProfileSummary.value?.followerCount != null) { 234 + values.splice(1, 0, { label: "followers", value: indexedProfileSummary.value.followerCount }); 235 + } 236 + 237 + values.splice( 238 + indexedProfileSummary.value?.followerCount != null ? 2 : 1, 239 + 0, 240 + { label: "following", value: indexedProfileSummary.value?.followingCount ?? following.value.length }, 241 + ); 242 + 243 + return values; 244 + }); 229 245 230 246 const isLoading = computed(() => identity.isPending.value || profileQuery.isPending.value); 231 247 const isError = computed(() => identity.isError.value || profileQuery.isError.value);
+31
apps/twisted/src/services/project-api/client.ts
··· 1 + import { getTwisterApiUrl, hasTwisterApi } from "@/core/config/project.js"; 2 + 3 + type ErrorPayload = { error?: string; message?: string }; 4 + 5 + export async function fetchProjectApiJson<T>(path: string, init?: RequestInit): Promise<T> { 6 + if (!hasTwisterApi) { 7 + throw new Error("Twister API base URL is not configured."); 8 + } 9 + 10 + const response = await fetch(getTwisterApiUrl(path), { 11 + ...init, 12 + headers: { Accept: "application/json", ...(init?.headers ?? {}) }, 13 + }); 14 + 15 + if (!response.ok) { 16 + const fallbackMessage = `Project API request failed with status ${response.status}.`; 17 + 18 + try { 19 + const payload = (await response.json()) as ErrorPayload; 20 + throw new Error(payload.message ?? payload.error ?? fallbackMessage); 21 + } catch (error) { 22 + if (error instanceof Error && error.message !== "Unexpected end of JSON input") { 23 + throw error; 24 + } 25 + 26 + throw new Error(fallbackMessage, { cause: error }); 27 + } 28 + } 29 + 30 + return (await response.json()) as T; 31 + }
+198
apps/twisted/src/services/project-api/queries.ts
··· 1 + import { useQuery } from "@tanstack/vue-query"; 2 + import { computed, toValue } from "vue"; 3 + import type { MaybeRef } from "vue"; 4 + import { hasTwisterApi } from "@/core/config/project.js"; 5 + import type { RepoSummary } from "@/domain/models/repo.js"; 6 + import type { UserSummary } from "@/domain/models/user.js"; 7 + import { fetchProjectApiJson } from "./client.js"; 8 + 9 + const MIN = 60_000; 10 + 11 + export type ProjectSearchMode = "keyword" | "semantic" | "hybrid"; 12 + export type ProjectSearchType = "all" | "repo" | "profile"; 13 + 14 + type ProjectSearchResult = { 15 + id: string; 16 + did?: string; 17 + at_uri?: string; 18 + collection: string; 19 + record_type: string; 20 + title: string; 21 + body_snippet?: string; 22 + summary?: string; 23 + repo_name?: string; 24 + author_handle?: string; 25 + score?: number; 26 + matched_by?: string[]; 27 + created_at?: string; 28 + updated_at?: string; 29 + primary_language?: string; 30 + stars?: number; 31 + follower_count?: number; 32 + following_count?: number; 33 + }; 34 + 35 + type ProjectSearchResponse = { 36 + query: string; 37 + mode: ProjectSearchMode; 38 + total: number; 39 + limit: number; 40 + offset: number; 41 + results: ProjectSearchResult[]; 42 + }; 43 + 44 + type IndexedProfileSummaryResponse = { 45 + did: string; 46 + handle?: string; 47 + follower_count?: number; 48 + following_count?: number; 49 + indexed_at?: string; 50 + }; 51 + 52 + export type IndexedProfileSummary = { 53 + did: string; 54 + handle?: string; 55 + followerCount?: number; 56 + followingCount?: number; 57 + indexedAt?: string; 58 + }; 59 + 60 + export type ProjectSearchResults = { 61 + query: string; 62 + mode: ProjectSearchMode; 63 + total: number; 64 + repos: RepoSummary[]; 65 + profiles: UserSummary[]; 66 + }; 67 + 68 + function stripHighlight(html?: string): string | undefined { 69 + if (!html) return undefined; 70 + 71 + const text = html.replace(/<[^>]+>/g, " ").replace(/\s+/g, " ").trim(); 72 + return text || undefined; 73 + } 74 + 75 + function parseAtUriRkey(atUri?: string, fallback = ""): string { 76 + if (!atUri) return fallback; 77 + 78 + const segments = atUri.split("/"); 79 + return segments[segments.length - 1] || fallback; 80 + } 81 + 82 + function toRepoSummary(result: ProjectSearchResult): RepoSummary { 83 + const atUri = result.at_uri ?? result.id; 84 + const repoName = result.repo_name ?? result.title; 85 + 86 + return { 87 + atUri, 88 + rkey: parseAtUriRkey(result.at_uri, result.id), 89 + ownerDid: result.did ?? "", 90 + ownerHandle: result.author_handle ?? "unknown", 91 + name: repoName, 92 + description: result.summary ?? stripHighlight(result.body_snippet), 93 + primaryLanguage: result.primary_language, 94 + stars: result.stars, 95 + updatedAt: result.updated_at ?? result.created_at, 96 + knot: "", 97 + }; 98 + } 99 + 100 + function toUserSummary(result: ProjectSearchResult): UserSummary { 101 + const handle = result.author_handle ?? result.title; 102 + const displayName = result.title !== handle ? result.title : undefined; 103 + 104 + return { 105 + did: result.did ?? "", 106 + handle, 107 + displayName, 108 + bio: result.summary ?? stripHighlight(result.body_snippet), 109 + followerCount: result.follower_count, 110 + followingCount: result.following_count, 111 + }; 112 + } 113 + 114 + function normalizeProfileSummary(summary: IndexedProfileSummaryResponse): IndexedProfileSummary { 115 + return { 116 + did: summary.did, 117 + handle: summary.handle, 118 + followerCount: summary.follower_count, 119 + followingCount: summary.following_count, 120 + indexedAt: summary.indexed_at, 121 + }; 122 + } 123 + 124 + export function useProjectSearch( 125 + query: MaybeRef<string>, 126 + options: { 127 + type?: MaybeRef<ProjectSearchType>; 128 + mode?: MaybeRef<ProjectSearchMode>; 129 + limit?: number; 130 + enabled?: MaybeRef<boolean>; 131 + } = {}, 132 + ) { 133 + const normalizedQuery = computed(() => toValue(query).trim()); 134 + const normalizedType = computed(() => toValue(options.type) ?? "all"); 135 + const normalizedMode = computed(() => toValue(options.mode) ?? "keyword"); 136 + const enabled = computed( 137 + () => 138 + hasTwisterApi && 139 + normalizedQuery.value.length > 0 && 140 + (options.enabled === undefined || !!toValue(options.enabled)), 141 + ); 142 + 143 + return useQuery({ 144 + queryKey: computed(() => ["projectSearch", normalizedQuery.value, normalizedType.value, normalizedMode.value]), 145 + queryFn: async (): Promise<ProjectSearchResults> => { 146 + const params = new URLSearchParams({ 147 + q: normalizedQuery.value, 148 + mode: normalizedMode.value, 149 + limit: String(options.limit ?? 20), 150 + }); 151 + 152 + if (normalizedType.value !== "all") { 153 + params.set("type", normalizedType.value); 154 + } 155 + 156 + const response = await fetchProjectApiJson<ProjectSearchResponse>(`/search?${params.toString()}`); 157 + const repos = response.results.filter((result) => result.record_type === "repo").map(toRepoSummary); 158 + const profiles = response.results.filter((result) => result.record_type === "profile").map(toUserSummary); 159 + 160 + return { 161 + query: response.query, 162 + mode: response.mode, 163 + total: response.total, 164 + repos, 165 + profiles, 166 + }; 167 + }, 168 + enabled, 169 + staleTime: 2 * MIN, 170 + gcTime: 10 * MIN, 171 + }); 172 + } 173 + 174 + export function useIndexedProfileSummary( 175 + did: MaybeRef<string>, 176 + options: { enabled?: MaybeRef<boolean> } = {}, 177 + ) { 178 + const normalizedDid = computed(() => toValue(did).trim()); 179 + const enabled = computed( 180 + () => 181 + hasTwisterApi && 182 + normalizedDid.value.length > 0 && 183 + (options.enabled === undefined || !!toValue(options.enabled)), 184 + ); 185 + 186 + return useQuery({ 187 + queryKey: computed(() => ["indexedProfileSummary", normalizedDid.value]), 188 + queryFn: async () => { 189 + const response = await fetchProjectApiJson<IndexedProfileSummaryResponse>( 190 + `/profiles/${encodeURIComponent(normalizedDid.value)}/summary`, 191 + ); 192 + return normalizeProfileSummary(response); 193 + }, 194 + enabled, 195 + staleTime: 10 * MIN, 196 + gcTime: 30 * MIN, 197 + }); 198 + }
+8
apps/twisted/src/vite-env.d.ts
··· 2 2 /// <reference types="@atcute/bluesky" /> 3 3 /// <reference types="@atcute/tangled" /> 4 4 5 + interface ImportMetaEnv { 6 + readonly VITE_TWISTER_API_BASE_URL?: string; 7 + } 8 + 9 + interface ImportMeta { 10 + readonly env: ImportMetaEnv; 11 + } 12 + 5 13 declare module "markdown-it" { 6 14 type MarkdownIt = { 7 15 render(content: string): string;
+13
docs/README.md
··· 1 + # Twisted Documentation 2 + 3 + Documentation is organized by project: 4 + 5 + - [`app/`](app/) for the Ionic/Vue client 6 + - [`api/`](api/) for the Go Tap/index/search service 7 + 8 + ## Quick Links 9 + 10 + - App spec index: [`app/specs/README.md`](app/specs/README.md) 11 + - App task index: [`app/tasks/phase-6.md`](app/tasks/phase-6.md) 12 + - API spec index: [`api/specs/README.md`](api/specs/README.md) 13 + - API task index: [`api/tasks/README.md`](api/tasks/README.md)
+300
docs/api/specs/05-search.md
··· 1 + --- 2 + title: "Spec 05 — Search" 3 + updated: 2026-03-22 4 + --- 5 + 6 + Covers all search modes, the public search API contract, scoring, and filtering. 7 + 8 + ## 1. Search Modes 9 + 10 + | Mode | Backing | Available | 11 + | ---------- | ------------------------------------ | --------- | 12 + | `keyword` | Turso Tantivy-backed FTS | MVP | 13 + | `semantic` | Vector similarity (DiskANN index) | Phase 2 | 14 + | `hybrid` | Weighted merge of keyword + semantic | Phase 3 | 15 + 16 + ## 2. Keyword Search 17 + 18 + ### Implementation 19 + 20 + Uses Turso's `fts_score()` function for BM25 ranking: 21 + 22 + ```sql 23 + SELECT 24 + d.id, d.title, d.summary, d.repo_name, d.author_handle, 25 + d.collection, d.record_type, d.updated_at, 26 + fts_score(d.title, d.body, d.summary, d.repo_name, d.author_handle, d.tags_json, ?) AS score 27 + FROM documents d 28 + WHERE fts_match(d.title, d.body, d.summary, d.repo_name, d.author_handle, d.tags_json, ?) 29 + AND d.deleted_at IS NULL 30 + ORDER BY score DESC 31 + LIMIT ? OFFSET ?; 32 + ``` 33 + 34 + ### Field Weights 35 + 36 + Configured in the FTS index definition: 37 + 38 + | Field | Weight | Rationale | 39 + | --------------- | ------ | ------------------------------------ | 40 + | `title` | 3.0 | Highest signal for relevance | 41 + | `repo_name` | 2.5 | Exact repo lookups should rank first | 42 + | `author_handle` | 2.0 | Author search is common | 43 + | `summary` | 1.5 | More focused than body | 44 + | `tags_json` | 1.2 | Topic matching | 45 + | `body` | 1.0 | Baseline | 46 + 47 + ### Query Features 48 + 49 + Tantivy query syntax is exposed to users: 50 + 51 + - Boolean: `go AND search`, `rust NOT unsafe` 52 + - Phrase: `"pull request"` 53 + - Prefix: `tang*` 54 + - Field-specific: `title:parser` 55 + 56 + ### Snippets 57 + 58 + Use `fts_highlight()` to generate highlighted snippets: 59 + 60 + ```sql 61 + fts_highlight(d.body, '<mark>', '</mark>', ?) AS body_snippet 62 + ``` 63 + 64 + ## 3. Semantic Search 65 + 66 + ### Query Flow 67 + 68 + 1. Convert user query text to embedding via the configured provider 69 + 2. Query `vector_top_k` for nearest neighbors 70 + 3. Join back to `documents` to get metadata 71 + 4. Filter out deleted/hidden documents 72 + 5. Return results with distance as score 73 + 74 + ```sql 75 + SELECT d.id, d.title, d.summary, d.repo_name, d.author_handle, 76 + d.collection, d.record_type, d.updated_at 77 + FROM vector_top_k('idx_embeddings_vec', vector32(?), ?) AS v 78 + JOIN document_embeddings e ON e.rowid = v.id 79 + JOIN documents d ON d.id = e.document_id 80 + WHERE d.deleted_at IS NULL; 81 + ``` 82 + 83 + ### Score Normalization 84 + 85 + Cosine distance ranges from 0 (identical) to 2 (opposite). Normalize to a 0–1 relevance score: 86 + 87 + ```text 88 + semantic_score = 1.0 - (distance / 2.0) 89 + ``` 90 + 91 + ## 4. Hybrid Search 92 + 93 + ### v1: Weighted Score Blending 94 + 95 + ```text 96 + hybrid_score = 0.65 * keyword_score_normalized + 0.35 * semantic_score_normalized 97 + ``` 98 + 99 + ### Score Normalization for Blending 100 + 101 + Keyword (BM25) scores are unbounded. Normalize using min-max within the result set: 102 + 103 + ```text 104 + keyword_normalized = (score - min_score) / (max_score - min_score) 105 + ``` 106 + 107 + Semantic scores are already bounded after the distance-to-relevance conversion. 108 + 109 + ### Merge Strategy 110 + 111 + 1. Fetch top N keyword results (e.g., N=50) 112 + 2. Fetch top N semantic results 113 + 3. Merge on `document_id` 114 + 4. For documents appearing in both sets, combine scores 115 + 5. For documents in only one set, use that score (with 0 for the missing signal) 116 + 6. Sort by `hybrid_score` descending 117 + 7. Deduplicate 118 + 8. Apply limit/offset 119 + 120 + ### v2: Reciprocal Rank Fusion (future) 121 + 122 + If keyword and semantic score scales prove unstable under weighted blending, replace with RRF: 123 + 124 + ```text 125 + rrf_score = Σ 1 / (k + rank_i) 126 + ``` 127 + 128 + where `k` is a constant (typically 60) and `rank_i` is the document's rank in each result list. 129 + 130 + ## 5. Filtering 131 + 132 + All search modes support these filters, applied as SQL WHERE clauses: 133 + 134 + | Filter | Parameter | SQL | 135 + | ----------- | ------------ | ------------------------------------------- | 136 + | Collection | `collection` | `d.collection = ?` | 137 + | Author | `author` | `d.author_handle = ?` or `d.did = ?` | 138 + | Repo | `repo` | `d.repo_name = ?` or `d.repo_did = ?` | 139 + | Record type | `type` | `d.record_type = ?` | 140 + | Language | `language` | `d.language = ?` | 141 + | Date range | `from`, `to` | `d.created_at >= ?` and `d.created_at <= ?` | 142 + | State | `state` | Join to `record_state` table | 143 + 144 + ## 6. Embedding Eligibility 145 + 146 + A document is eligible for embedding if: 147 + 148 + - `deleted_at IS NULL` 149 + - `record_type` is one of: `repo`, `issue`, `pull`, `string`, `profile` 150 + - At least one of `title`, `body`, or `summary` is non-empty 151 + - Total text length exceeds a minimum threshold (e.g., 20 characters) 152 + 153 + ## 7. API Endpoints 154 + 155 + ### Health 156 + 157 + | Method | Path | Description | 158 + | ------ | ---------- | -------------------------------- | 159 + | GET | `/healthz` | Liveness — process is responsive | 160 + | GET | `/readyz` | Readiness — DB is reachable | 161 + 162 + ### Search 163 + 164 + | Method | Path | Description | 165 + | ------ | ------------------ | ------------------------------------------------ | 166 + | GET | `/search` | Search with configurable mode (default: keyword) | 167 + | GET | `/search/keyword` | Keyword-only search | 168 + | GET | `/search/semantic` | Semantic-only search | 169 + | GET | `/search/hybrid` | Hybrid search | 170 + 171 + ### Documents 172 + 173 + | Method | Path | Description | 174 + | ------ | ----------------- | ----------------------------- | 175 + | GET | `/documents/{id}` | Fetch a single document by ID | 176 + 177 + ### Admin 178 + 179 + | Method | Path | Description | 180 + | ------ | ---------------- | -------------------- | 181 + | POST | `/admin/reindex` | Trigger reindex | 182 + | POST | `/admin/reembed` | Trigger re-embedding | 183 + 184 + Admin endpoints are disabled by default. Enable with `ENABLE_ADMIN_ENDPOINTS=true`. 185 + 186 + ## 8. Query Parameters 187 + 188 + | Parameter | Type | Default | Description | 189 + | ------------ | ------ | --------- | -------------------------------------------------------------------- | 190 + | `q` | string | required | Search query | 191 + | `mode` | string | `keyword` | `keyword`, `semantic`, or `hybrid` | 192 + | `limit` | int | 20 | Results per page (max: `SEARCH_MAX_LIMIT`) | 193 + | `offset` | int | 0 | Pagination offset | 194 + | `collection` | string | — | Filter by `sh.tangled.*` collection | 195 + | `type` | string | — | Filter by record type (`repo`, `issue`, `pull`, `string`, `profile`) | 196 + | `author` | string | — | Filter by author handle or DID | 197 + | `repo` | string | — | Filter by repo name or repo DID | 198 + | `language` | string | — | Filter by language | 199 + | `from` | string | — | Created after (ISO 8601) | 200 + | `to` | string | — | Created before (ISO 8601) | 201 + | `state` | string | — | Filter by state (`open`, `closed`, `merged`) | 202 + 203 + ## 9. Search Response 204 + 205 + ```json 206 + { 207 + "query": "rust markdown tui", 208 + "mode": "hybrid", 209 + "total": 142, 210 + "limit": 20, 211 + "offset": 0, 212 + "results": [ 213 + { 214 + "id": "did:plc:abc|sh.tangled.repo|3kb3fge5lm32x", 215 + "collection": "sh.tangled.repo", 216 + "record_type": "repo", 217 + "title": "glow-rs", 218 + "body_snippet": "A TUI markdown viewer inspired by <mark>Glow</mark>...", 219 + "summary": "Rust TUI markdown viewer", 220 + "repo_name": "glow-rs", 221 + "author_handle": "desertthunder.dev", 222 + "score": 0.842, 223 + "matched_by": ["keyword", "semantic"], 224 + "created_at": "2026-03-20T10:00:00Z", 225 + "updated_at": "2026-03-22T15:03:11Z" 226 + } 227 + ] 228 + } 229 + ``` 230 + 231 + ### Result Fields 232 + 233 + | Field | Type | Description | 234 + | ------------------ | -------- | ------------------------------------------- | 235 + | `id` | string | Document stable ID | 236 + | `collection` | string | ATProto collection NSID | 237 + | `record_type` | string | Normalized type label | 238 + | `title` | string | Document title | 239 + | `body_snippet` | string | Highlighted body excerpt | 240 + | `summary` | string | Short description | 241 + | `repo_name` | string | Repository name (if applicable) | 242 + | `author_handle` | string | Author handle | 243 + | `did` | string | Author DID when available | 244 + | `at_uri` | string | Canonical AT URI when available | 245 + | `primary_language` | string | Primary language for repo results | 246 + | `stars` | number | Indexed star count for repo results | 247 + | `follower_count` | number | Indexed follower count for profile results | 248 + | `following_count` | number | Indexed following count for profile results | 249 + | `score` | float | Relevance score (0–1) | 250 + | `matched_by` | string[] | Which search modes produced this result | 251 + | `created_at` | string | ISO 8601 creation timestamp | 252 + | `updated_at` | string | ISO 8601 last update timestamp | 253 + 254 + ## 10. Document Response 255 + 256 + `GET /documents/{id}` returns the full document: 257 + 258 + ```json 259 + { 260 + "id": "did:plc:abc|sh.tangled.repo|3kb3fge5lm32x", 261 + "did": "did:plc:abc", 262 + "collection": "sh.tangled.repo", 263 + "rkey": "3kb3fge5lm32x", 264 + "at_uri": "at://did:plc:abc/sh.tangled.repo/3kb3fge5lm32x", 265 + "cid": "bafyreig...", 266 + "record_type": "repo", 267 + "title": "glow-rs", 268 + "body": "A TUI markdown viewer inspired by Glow, written in Rust.", 269 + "summary": "Rust TUI markdown viewer", 270 + "repo_name": "glow-rs", 271 + "author_handle": "desertthunder.dev", 272 + "tags_json": "[\"rust\", \"tui\", \"markdown\"]", 273 + "language": "en", 274 + "created_at": "2026-03-20T10:00:00Z", 275 + "updated_at": "2026-03-22T15:03:11Z", 276 + "indexed_at": "2026-03-22T15:05:00Z", 277 + "has_embedding": true 278 + } 279 + ``` 280 + 281 + ## 11. Error Responses 282 + 283 + | Status | Condition | 284 + | ------ | ------------------------------------------------------------------ | 285 + | 400 | Missing `q` parameter, invalid `limit`/`offset`, malformed filters | 286 + | 404 | Document not found | 287 + | 503 | DB unreachable (readiness failure) | 288 + 289 + ```json 290 + { "error": "invalid_parameter", "message": "limit must be between 1 and 100" } 291 + ``` 292 + 293 + ## 12. API Behavior 294 + 295 + - `keyword` returns only lexical matches via `fts_match`/`fts_score` 296 + - `semantic` returns only embedding-backed matches via `vector_top_k` 297 + - `hybrid` merges both result sets and reranks 298 + - All modes exclude documents with `deleted_at IS NOT NULL` by default 299 + - Pagination uses `limit`/`offset` (cursor-based pagination deferred) 300 + - Mobile clients may use `type=repo` and `type=profile` to render repo/profile search directly
+89
docs/api/specs/08-app-integration.md
··· 1 + --- 2 + title: "Spec 08 — App Integration" 3 + updated: 2026-03-23 4 + --- 5 + 6 + ## 1. Purpose 7 + 8 + Define the mobile-facing Twister API surface. 9 + 10 + The Twisted app should keep using Tangled's public knot and PDS APIs for canonical repo/profile detail. Twister is responsible for: 11 + 12 + - cross-network discovery via search 13 + - index-backed summaries for data gaps such as followers 14 + 15 + ## 2. Client Boundary 16 + 17 + The mobile client uses Twister only for: 18 + 19 + - Explore search 20 + - index-backed profile summaries 21 + - future feed and notification features 22 + 23 + The mobile client does not use Twister for: 24 + 25 + - repo tree/blob/detail reads 26 + - direct profile record reads 27 + - issue/PR detail reads 28 + 29 + Those remain on Tangled's public APIs. 30 + 31 + ## 3. Search Contract 32 + 33 + `GET /search` 34 + 35 + Required query parameters: 36 + 37 + - `q` 38 + 39 + Optional query parameters: 40 + 41 + - `mode=keyword|semantic|hybrid` 42 + - `type=repo|profile` 43 + - `limit` 44 + - `offset` 45 + 46 + For mobile clients, repo and profile results should include: 47 + 48 + - `did` 49 + - `at_uri` 50 + - `record_type` 51 + - `title` 52 + - `summary` 53 + - `repo_name` 54 + - `author_handle` 55 + - `updated_at` 56 + - `primary_language` for repos when known 57 + - `stars` for repos when known 58 + - `follower_count` and `following_count` for profiles when known 59 + 60 + ## 4. Profile Summary Contract 61 + 62 + `GET /profiles/{did}/summary` 63 + 64 + Response: 65 + 66 + ```json 67 + { 68 + "did": "did:plc:abc123", 69 + "handle": "desertthunder.dev", 70 + "follower_count": 128, 71 + "following_count": 84, 72 + "indexed_at": "2026-03-23T10:15:00Z" 73 + } 74 + ``` 75 + 76 + This endpoint exists because follower counts and follower lists are derived from indexed graph state, not from a single direct public Tangled API call. 77 + 78 + ## 5. Failure Handling 79 + 80 + If Twister is unavailable: 81 + 82 + - the app should keep direct known-handle browsing working 83 + - Explore should show a clear "index unavailable" state 84 + - profile pages should omit index-backed follower counts rather than fail entirely 85 + 86 + ## 6. Ownership 87 + 88 + - Twister owns search ranking, document normalization, and graph summary derivation 89 + - The app owns result presentation, route transitions, and fallback behavior
+40
docs/api/tasks/README.md
··· 1 + --- 2 + title: "Twister — Task Index" 3 + updated: 2026-03-22 4 + --- 5 + 6 + # Twister Tasks 7 + 8 + Assumes Go, Tap (deployed on Railway), Turso/libSQL, and Railway for deployment. 9 + 10 + ## Delivery Strategy 11 + 12 + Build in four phases: 13 + 14 + 1. **MVP** — ingestion, graph backfill, keyword search, deployment, operational tooling 15 + 2. **Semantic Search** — embeddings, vector retrieval 16 + 3. **Hybrid Search** — weighted merge of keyword + semantic 17 + 4. **Quality Polish** — ranking refinement, advanced filters, analytics 18 + 19 + Ship keyword search before embeddings. That gives a testable, inspectable baseline before introducing model behavior. 20 + Within MVP, run graph backfill before calling the environment search-ready for users. 21 + 22 + ## Phases 23 + 24 + | Phase | Title | Document | Status | 25 + | ----- | --------------- | ------------------------------------------ | --------------------------------------------------------------------- | 26 + | 1 | MVP | [phase-1-mvp.md](phase-1-mvp.md) | In progress (M0–M2 complete; backfill scheduled before public launch) | 27 + | 2 | Semantic Search | [phase-2-semantic.md](phase-2-semantic.md) | Not started | 28 + | 3 | Hybrid Search | [phase-3-hybrid.md](phase-3-hybrid.md) | Not started | 29 + | 4 | Quality Polish | [phase-4-quality.md](phase-4-quality.md) | Not started | 30 + 31 + ## MVP Complete When 32 + 33 + - Tap ingests tracked `sh.tangled.*` records 34 + - Documents normalize into a stable store 35 + - Keyword search works publicly 36 + - Index-backed profile summaries can fill public API gaps such as followers 37 + - API and indexer are deployed on Railway 38 + - Restart does not lose sync position 39 + - Reindex exists for repair 40 + - Graph backfill populates initial content from seed users
+68
docs/app/specs/phase-3.md
··· 1 + # Phase 3 — Indexed Search and Honest Discovery 2 + 3 + ## Goal 4 + 5 + Introduce global discovery through the Twister project index while preserving honest product boundaries. Home continues to support direct known-handle browsing, Explore becomes index-backed search, and Activity remains a clearly labeled in-progress surface. 6 + 7 + ## Current Product Shape 8 + 9 + ### Home 10 + 11 + Home is the temporary public entry point for unauthenticated browsing: 12 + 13 + - Enter a known AT Protocol handle 14 + - Open that user's profile directly 15 + - Resolve the handle to DID + PDS via AT Protocol identity 16 + - List that user's public Tangled repos inline and open one directly 17 + 18 + This keeps public browsing fully real while still giving the app a lightweight direct-entry path. 19 + 20 + ### Explore 21 + 22 + Explore becomes the network-level discovery surface: 23 + 24 + - Global repo search via the Twister index 25 + - Global profile search via the Twister index 26 + - Empty state should clearly distinguish "index unavailable" from "no results" 27 + - Search results route into the existing profile and repo detail screens 28 + 29 + ### Activity 30 + 31 + Activity also remains a tab-level placeholder: 32 + 33 + - No public timeline yet 34 + - No curated public feed fallback 35 + - Empty state should explicitly say activity is in progress 36 + 37 + ## Identity and Routing 38 + 39 + The app now uses two read paths: 40 + 41 + 1. **Direct handle browsing** 42 + Resolve `handle -> DID` via `com.atproto.identity.resolveHandle` 43 + Fetch the DID document and extract the PDS endpoint 44 + Query the user's PDS for `sh.tangled.repo` records via `com.atproto.repo.listRecords` 45 + 2. **Indexed discovery** 46 + Query the Twister API for global search results 47 + Open the selected profile or repo in the existing screens 48 + Continue detail fetching from Tangled's public APIs 49 + 50 + The Twister API is additive, not authoritative for repo detail. It fills discovery and graph gaps; knots and PDSes remain the source of truth for detail screens. 51 + 52 + ## UI Expectations 53 + 54 + - Home shows one handle input plus explicit actions for profile jump and repo browsing 55 + - Home shows loading, invalid-handle, no-repos, and resolved-repo-list states 56 + - Explore shows a working search form, loading state, index-unavailable state, and no-results state 57 + - Activity shows a static in-progress empty state 58 + - Profile may show index-backed follower/following summaries when available 59 + 60 + ## Deferred Work 61 + 62 + The following work is intentionally deferred out of this phase: 63 + 64 + - Trending or suggested discovery sections 65 + - Public activity feed ingestion, pagination, and caching 66 + - Jetstream or appview timeline investigation 67 + 68 + These capabilities will be revisited after the baseline search and graph-summary integration is stable.
+1 -1
justfile
··· 41 41 api-build: 42 42 just --justfile packages/api/justfile build 43 43 44 - api-run-api: 44 + api-dev: 45 45 just --justfile packages/api/justfile run-api 46 46 47 47 api-run-indexer:
+1 -1
packages/api/README.md
··· 1 1 # Twister 2 2 3 - Tap-based search engine for Tangled. 3 + Tap-based indexing and search service for Tangled.
+34 -30
packages/api/docs/specs/01-architecture.md docs/api/specs/01-architecture.md
··· 7 7 8 8 Build a Go-based search service for Tangled content on AT Protocol that: 9 9 10 - * ingests Tangled records through **Tap** (already deployed on Railway) 11 - * denormalizes them into internal search documents 12 - * indexes them in **Turso/libSQL** 13 - * exposes a search API with **keyword**, **semantic**, and **hybrid** retrieval modes 10 + - ingests Tangled records through **Tap** (already deployed on Railway) 11 + - denormalizes them into internal search documents 12 + - indexes them in **Turso/libSQL** 13 + - exposes a search API with **keyword**, **semantic**, and **hybrid** retrieval modes 14 + - exposes index-backed summary APIs for data the public Tangled APIs do not answer efficiently, such as followers 14 15 15 16 ## 2. Functional Goals 16 17 17 18 The system shall: 18 19 19 - * index Tangled-specific ATProto collections under the `sh.tangled.*` namespace 20 - * support initial backfill and continuous incremental sync via Tap 21 - * support lexical retrieval using Turso's Tantivy-backed FTS 22 - * support semantic retrieval using vector embeddings 23 - * support hybrid ranking combining lexical and semantic signals 24 - * expose stable HTTP APIs for search and document lookup 25 - * support deployment on **Railway** 20 + - index Tangled-specific ATProto collections under the `sh.tangled.*` namespace 21 + - support initial backfill and continuous incremental sync via Tap 22 + - support lexical retrieval using Turso's Tantivy-backed FTS 23 + - support semantic retrieval using vector embeddings 24 + - support hybrid ranking combining lexical and semantic signals 25 + - expose stable HTTP APIs for search, document lookup, and graph/profile summaries 26 + - support deployment on **Railway** 26 27 27 28 ## 3. Non-Functional Goals 28 29 29 30 The system shall prioritize: 30 31 31 - * **correctness of sync** — cursors never advance ahead of committed data 32 - * **operational simplicity** — single binary, subcommand-driven 33 - * **incremental delivery** — keyword search ships before embeddings 34 - * **small deployable services** — process groups, not microservices 35 - * **reindexability** — any document or collection can be re-normalized and re-indexed 36 - * **low coupling** — sync, indexing, and serving are independent concerns 32 + - **correctness of sync** — cursors never advance ahead of committed data 33 + - **operational simplicity** — single binary, subcommand-driven 34 + - **incremental delivery** — keyword search ships before embeddings 35 + - **small deployable services** — process groups, not microservices 36 + - **reindexability** — any document or collection can be re-normalized and re-indexed 37 + - **low coupling** — sync, indexing, and serving are independent concerns 37 38 38 39 ## 4. Out of Scope (v1) 39 40 40 - * code-aware symbol search 41 - * sourcegraph-style structural search 42 - * personalized ranking 43 - * access control beyond public/private visibility flags in indexed records 44 - * full analytics pipeline 45 - * custom ANN infrastructure outside Turso/libSQL 41 + - code-aware symbol search 42 + - sourcegraph-style structural search 43 + - personalized ranking 44 + - access control beyond public/private visibility flags in indexed records 45 + - full analytics pipeline 46 + - custom ANN infrastructure outside Turso/libSQL 46 47 47 48 ## 5. Design Principles 48 49 ··· 50 51 51 52 2. **The indexer owns denormalization.** Raw ATProto records are never queried directly by the public API. 52 53 53 - 3. **Search serves denormalized documents.** Search ranking depends on the document model, not transport. 54 + 3. **The public API serves denormalized projections.** Search ranking and graph summaries depend on the indexed document model, not transport. 54 55 55 56 4. **Keyword search is the baseline.** Semantic and hybrid search are layered on top. 56 57 57 58 5. **Embeddings are asynchronous.** Ingestion is never blocked on vector generation unless explicitly configured. 59 + 60 + 6. **Twister complements public Tangled APIs.** Repo detail stays on knots/PDSes; the index adds discovery and cross-network summaries. 58 61 59 62 ## 6. External Systems 60 63 ··· 93 96 ├─ keyword search (fts_match / fts_score) 94 97 ├─ semantic search (vector_top_k) 95 98 ├─ hybrid search (weighted merge) 99 + ├─ profile and graph summaries 96 100 └─ document fetch 97 101 ``` 98 102 99 103 ## 8. Runtime Units 100 104 101 - | Unit | Role | Deployment | 102 - | -------------- | ----------------------------------- | ------------------------------- | 103 - | `api` | HTTP search and document API | Railway service (public) | 104 - | `indexer` | Tap consumer, normalizer, DB writer | Railway service (internal) | 105 - | `embed-worker` | Async embedding generation | Optional Railway service | 106 - | `tap` | ATProto sync | Railway (already deployed) | 105 + | Unit | Role | Deployment | 106 + | -------------- | -------------------------------------------- | -------------------------- | 107 + | `api` | HTTP search, graph summary, and document API | Railway service (public) | 108 + | `indexer` | Tap consumer, normalizer, DB writer | Railway service (internal) | 109 + | `embed-worker` | Async embedding generation | Optional Railway service | 110 + | `tap` | ATProto sync | Railway (already deployed) | 107 111 108 112 ## 9. Repository Structure 109 113
packages/api/docs/specs/02-tangled-lexicons.md docs/api/specs/02-tangled-lexicons.md
packages/api/docs/specs/03-data-model.md docs/api/specs/03-data-model.md
packages/api/docs/specs/04-data-pipeline.md docs/api/specs/04-data-pipeline.md
-296
packages/api/docs/specs/05-search.md
··· 1 - --- 2 - title: "Spec 05 — Search" 3 - updated: 2026-03-22 4 - --- 5 - 6 - Covers all search modes, the API contract, scoring, and filtering. 7 - 8 - ## 1. Search Modes 9 - 10 - | Mode | Backing | Available | 11 - |------|---------|-----------| 12 - | `keyword` | Turso Tantivy-backed FTS | MVP | 13 - | `semantic` | Vector similarity (DiskANN index) | Phase 2 | 14 - | `hybrid` | Weighted merge of keyword + semantic | Phase 3 | 15 - 16 - ## 2. Keyword Search 17 - 18 - ### Implementation 19 - 20 - Uses Turso's `fts_score()` function for BM25 ranking: 21 - 22 - ```sql 23 - SELECT 24 - d.id, d.title, d.summary, d.repo_name, d.author_handle, 25 - d.collection, d.record_type, d.updated_at, 26 - fts_score(d.title, d.body, d.summary, d.repo_name, d.author_handle, d.tags_json, ?) AS score 27 - FROM documents d 28 - WHERE fts_match(d.title, d.body, d.summary, d.repo_name, d.author_handle, d.tags_json, ?) 29 - AND d.deleted_at IS NULL 30 - ORDER BY score DESC 31 - LIMIT ? OFFSET ?; 32 - ``` 33 - 34 - ### Field Weights 35 - 36 - Configured in the FTS index definition: 37 - 38 - | Field | Weight | Rationale | 39 - |-------|--------|-----------| 40 - | `title` | 3.0 | Highest signal for relevance | 41 - | `repo_name` | 2.5 | Exact repo lookups should rank first | 42 - | `author_handle` | 2.0 | Author search is common | 43 - | `summary` | 1.5 | More focused than body | 44 - | `tags_json` | 1.2 | Topic matching | 45 - | `body` | 1.0 | Baseline | 46 - 47 - ### Query Features 48 - 49 - Tantivy query syntax is exposed to users: 50 - 51 - - Boolean: `go AND search`, `rust NOT unsafe` 52 - - Phrase: `"pull request"` 53 - - Prefix: `tang*` 54 - - Field-specific: `title:parser` 55 - 56 - ### Snippets 57 - 58 - Use `fts_highlight()` to generate highlighted snippets: 59 - 60 - ```sql 61 - fts_highlight(d.body, '<mark>', '</mark>', ?) AS body_snippet 62 - ``` 63 - 64 - ## 3. Semantic Search 65 - 66 - ### Query Flow 67 - 68 - 1. Convert user query text to embedding via the configured provider 69 - 2. Query `vector_top_k` for nearest neighbors 70 - 3. Join back to `documents` to get metadata 71 - 4. Filter out deleted/hidden documents 72 - 5. Return results with distance as score 73 - 74 - ```sql 75 - SELECT d.id, d.title, d.summary, d.repo_name, d.author_handle, 76 - d.collection, d.record_type, d.updated_at 77 - FROM vector_top_k('idx_embeddings_vec', vector32(?), ?) AS v 78 - JOIN document_embeddings e ON e.rowid = v.id 79 - JOIN documents d ON d.id = e.document_id 80 - WHERE d.deleted_at IS NULL; 81 - ``` 82 - 83 - ### Score Normalization 84 - 85 - Cosine distance ranges from 0 (identical) to 2 (opposite). Normalize to a 0–1 relevance score: 86 - 87 - ``` 88 - semantic_score = 1.0 - (distance / 2.0) 89 - ``` 90 - 91 - ## 4. Hybrid Search 92 - 93 - ### v1: Weighted Score Blending 94 - 95 - ``` 96 - hybrid_score = 0.65 * keyword_score_normalized + 0.35 * semantic_score_normalized 97 - ``` 98 - 99 - ### Score Normalization for Blending 100 - 101 - Keyword (BM25) scores are unbounded. Normalize using min-max within the result set: 102 - 103 - ``` 104 - keyword_normalized = (score - min_score) / (max_score - min_score) 105 - ``` 106 - 107 - Semantic scores are already bounded after the distance-to-relevance conversion. 108 - 109 - ### Merge Strategy 110 - 111 - 1. Fetch top N keyword results (e.g., N=50) 112 - 2. Fetch top N semantic results 113 - 3. Merge on `document_id` 114 - 4. For documents appearing in both sets, combine scores 115 - 5. For documents in only one set, use that score (with 0 for the missing signal) 116 - 6. Sort by `hybrid_score` descending 117 - 7. Deduplicate 118 - 8. Apply limit/offset 119 - 120 - ### v2: Reciprocal Rank Fusion (future) 121 - 122 - If keyword and semantic score scales prove unstable under weighted blending, replace with RRF: 123 - 124 - ``` 125 - rrf_score = Σ 1 / (k + rank_i) 126 - ``` 127 - 128 - where `k` is a constant (typically 60) and `rank_i` is the document's rank in each result list. 129 - 130 - ## 5. Filtering 131 - 132 - All search modes support these filters, applied as SQL WHERE clauses: 133 - 134 - | Filter | Parameter | SQL | 135 - |--------|-----------|-----| 136 - | Collection | `collection` | `d.collection = ?` | 137 - | Author | `author` | `d.author_handle = ?` or `d.did = ?` | 138 - | Repo | `repo` | `d.repo_name = ?` or `d.repo_did = ?` | 139 - | Record type | `type` | `d.record_type = ?` | 140 - | Language | `language` | `d.language = ?` | 141 - | Date range | `from`, `to` | `d.created_at >= ?` and `d.created_at <= ?` | 142 - | State | `state` | Join to `record_state` table | 143 - 144 - ## 6. Embedding Eligibility 145 - 146 - A document is eligible for embedding if: 147 - 148 - - `deleted_at IS NULL` 149 - - `record_type` is one of: `repo`, `issue`, `pull`, `string`, `profile` 150 - - At least one of `title`, `body`, or `summary` is non-empty 151 - - Total text length exceeds a minimum threshold (e.g., 20 characters) 152 - 153 - ## 7. API Endpoints 154 - 155 - ### Health 156 - 157 - | Method | Path | Description | 158 - | ------ | ---------- | -------------------------------- | 159 - | GET | `/healthz` | Liveness — process is responsive | 160 - | GET | `/readyz` | Readiness — DB is reachable | 161 - 162 - ### Search 163 - 164 - | Method | Path | Description | 165 - | ------ | ------------------ | ------------------------------------------------ | 166 - | GET | `/search` | Search with configurable mode (default: keyword) | 167 - | GET | `/search/keyword` | Keyword-only search | 168 - | GET | `/search/semantic` | Semantic-only search | 169 - | GET | `/search/hybrid` | Hybrid search | 170 - 171 - ### Documents 172 - 173 - | Method | Path | Description | 174 - | ------ | ----------------- | ----------------------------- | 175 - | GET | `/documents/{id}` | Fetch a single document by ID | 176 - 177 - ### Admin 178 - 179 - | Method | Path | Description | 180 - | ------ | ---------------- | -------------------- | 181 - | POST | `/admin/reindex` | Trigger reindex | 182 - | POST | `/admin/reembed` | Trigger re-embedding | 183 - 184 - Admin endpoints are disabled by default. Enable with `ENABLE_ADMIN_ENDPOINTS=true`. 185 - 186 - ## 8. Query Parameters 187 - 188 - | Parameter | Type | Default | Description | 189 - | ------------ | ------ | --------- | -------------------------------------------------------------------- | 190 - | `q` | string | required | Search query | 191 - | `mode` | string | `keyword` | `keyword`, `semantic`, or `hybrid` | 192 - | `limit` | int | 20 | Results per page (max: `SEARCH_MAX_LIMIT`) | 193 - | `offset` | int | 0 | Pagination offset | 194 - | `collection` | string | — | Filter by `sh.tangled.*` collection | 195 - | `type` | string | — | Filter by record type (`repo`, `issue`, `pull`, `string`, `profile`) | 196 - | `author` | string | — | Filter by author handle or DID | 197 - | `repo` | string | — | Filter by repo name or repo DID | 198 - | `language` | string | — | Filter by language | 199 - | `from` | string | — | Created after (ISO 8601) | 200 - | `to` | string | — | Created before (ISO 8601) | 201 - | `state` | string | — | Filter by state (`open`, `closed`, `merged`) | 202 - 203 - ## 9. Search Response 204 - 205 - ```json 206 - { 207 - "query": "rust markdown tui", 208 - "mode": "hybrid", 209 - "total": 142, 210 - "limit": 20, 211 - "offset": 0, 212 - "results": [ 213 - { 214 - "id": "did:plc:abc|sh.tangled.repo|3kb3fge5lm32x", 215 - "collection": "sh.tangled.repo", 216 - "record_type": "repo", 217 - "title": "glow-rs", 218 - "body_snippet": "A TUI markdown viewer inspired by <mark>Glow</mark>...", 219 - "summary": "Rust TUI markdown viewer", 220 - "repo_name": "glow-rs", 221 - "author_handle": "desertthunder.dev", 222 - "score": 0.842, 223 - "matched_by": ["keyword", "semantic"], 224 - "created_at": "2026-03-20T10:00:00Z", 225 - "updated_at": "2026-03-22T15:03:11Z" 226 - } 227 - ] 228 - } 229 - ``` 230 - 231 - ### Result Fields 232 - 233 - | Field | Type | Description | 234 - | --------------- | -------- | --------------------------------------- | 235 - | `id` | string | Document stable ID | 236 - | `collection` | string | ATProto collection NSID | 237 - | `record_type` | string | Normalized type label | 238 - | `title` | string | Document title | 239 - | `body_snippet` | string | Highlighted body excerpt | 240 - | `summary` | string | Short description | 241 - | `repo_name` | string | Repository name (if applicable) | 242 - | `author_handle` | string | Author handle | 243 - | `score` | float | Relevance score (0–1) | 244 - | `matched_by` | string[] | Which search modes produced this result | 245 - | `created_at` | string | ISO 8601 creation timestamp | 246 - | `updated_at` | string | ISO 8601 last update timestamp | 247 - 248 - ## 10. Document Response 249 - 250 - `GET /documents/{id}` returns the full document: 251 - 252 - ```json 253 - { 254 - "id": "did:plc:abc|sh.tangled.repo|3kb3fge5lm32x", 255 - "did": "did:plc:abc", 256 - "collection": "sh.tangled.repo", 257 - "rkey": "3kb3fge5lm32x", 258 - "at_uri": "at://did:plc:abc/sh.tangled.repo/3kb3fge5lm32x", 259 - "cid": "bafyreig...", 260 - "record_type": "repo", 261 - "title": "glow-rs", 262 - "body": "A TUI markdown viewer inspired by Glow, written in Rust.", 263 - "summary": "Rust TUI markdown viewer", 264 - "repo_name": "glow-rs", 265 - "author_handle": "desertthunder.dev", 266 - "tags_json": "[\"rust\", \"tui\", \"markdown\"]", 267 - "language": "en", 268 - "created_at": "2026-03-20T10:00:00Z", 269 - "updated_at": "2026-03-22T15:03:11Z", 270 - "indexed_at": "2026-03-22T15:05:00Z", 271 - "has_embedding": true 272 - } 273 - ``` 274 - 275 - ## 11. Error Responses 276 - 277 - | Status | Condition | 278 - | ------ | ------------------------------------------------------------------ | 279 - | 400 | Missing `q` parameter, invalid `limit`/`offset`, malformed filters | 280 - | 404 | Document not found | 281 - | 503 | DB unreachable (readiness failure) | 282 - 283 - ```json 284 - { 285 - "error": "invalid_parameter", 286 - "message": "limit must be between 1 and 100" 287 - } 288 - ``` 289 - 290 - ## 12. API Behavior 291 - 292 - - `keyword` returns only lexical matches via `fts_match`/`fts_score` 293 - - `semantic` returns only embedding-backed matches via `vector_top_k` 294 - - `hybrid` merges both result sets and reranks 295 - - All modes exclude documents with `deleted_at IS NOT NULL` by default 296 - - Pagination uses `limit`/`offset` (cursor-based pagination deferred)
+113 -1
packages/api/docs/specs/06-operations.md docs/api/specs/06-operations.md
··· 1 1 --- 2 2 title: "Spec 06 — Operations" 3 - updated: 2026-03-22 3 + updated: 2026-03-23 4 4 --- 5 5 6 6 Covers configuration, observability, security, and deployment. 7 + 8 + ## 0. Quick Setup 9 + 10 + Tap is already deployed. For a new environment, the minimum operator work is: 11 + 12 + 1. Create or choose a Turso database for that environment 13 + 2. Generate a Turso auth token for that database 14 + 3. Point `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` at that database 15 + 4. Create Railway services for `api` and `indexer` 16 + 5. Point `TAP_URL` at the existing Tap deployment 17 + 6. Run migrations/start the services 18 + 7. Run `twister backfill` before treating the environment as search-ready 19 + 20 + No separate `*_DEV` or `*_PROD` variables are required. Each environment keeps using the same variable names and simply points them at the appropriate Turso database. 7 21 8 22 ## 1. Configuration 9 23 ··· 85 99 ENABLE_ADMIN_ENDPOINTS=false 86 100 ``` 87 101 102 + ### Environment Selection 103 + 104 + Use the same variable names in every environment: 105 + 106 + - local development can point `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` at `twister-dev` 107 + - production can point those same variables at `twister-prod` 108 + 109 + The application should not care which database it is talking to; only the environment wiring changes. 110 + 111 + ## 1.5. Turso Setup 112 + 113 + ### Recommended Databases 114 + 115 + Use one Turso database per environment, for example: 116 + 117 + - `twister-dev` 118 + - `twister-prod` 119 + 120 + Keep the app config identical across environments and swap only these values: 121 + 122 + - `TURSO_DATABASE_URL` 123 + - `TURSO_AUTH_TOKEN` 124 + 125 + ### Basic Flow 126 + 127 + Using the Turso dashboard or CLI: 128 + 129 + 1. Create the database for the target environment 130 + 2. Capture its libSQL URL 131 + 3. Create an auth token for the service 132 + 4. Set `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` in that environment 133 + 134 + Example values: 135 + 136 + ```bash 137 + # Development environment 138 + TURSO_DATABASE_URL=libsql://twister-dev-your-org.turso.io 139 + TURSO_AUTH_TOKEN=... 140 + 141 + # Production environment 142 + TURSO_DATABASE_URL=libsql://twister-prod-your-org.turso.io 143 + TURSO_AUTH_TOKEN=... 144 + ``` 145 + 146 + ### Practical Rule 147 + 148 + Do not introduce `TURSO_DATABASE_URL_DEV`, `TURSO_DATABASE_URL_PROD`, or similar split variables. Railway environments, local shells, and CI should all set the same names with environment-specific values. 149 + 150 + ## 1.6. Railway Setup 151 + 152 + ### Project Layout 153 + 154 + Create or reuse one Railway project containing: 155 + 156 + - existing `tap` service 157 + - `api` service running `twister api` 158 + - `indexer` service running `twister indexer` 159 + 160 + ### Basic Steps 161 + 162 + 1. Connect the monorepo to Railway 163 + 2. Create the `api` and `indexer` services from the same source repo/Docker image 164 + 3. Set shared variables on both services: 165 + - `TURSO_DATABASE_URL` 166 + - `TURSO_AUTH_TOKEN` 167 + - `LOG_LEVEL` 168 + - `LOG_FORMAT` 169 + 4. Set API-specific variables: 170 + - `HTTP_BIND_ADDR` 171 + - `SEARCH_DEFAULT_LIMIT` 172 + - `SEARCH_MAX_LIMIT` 173 + 5. Set indexer-specific variables: 174 + - `TAP_URL` 175 + - `TAP_AUTH_PASSWORD` 176 + - `INDEXED_COLLECTIONS` 177 + 6. Configure health checks 178 + 7. Deploy 179 + 8. Run backfill against the environment before public validation 180 + 181 + ### Dev vs Production on Railway 182 + 183 + If you use multiple Railway environments, keep the same service definitions and variable names in each one. Only the values change: 184 + 185 + - dev Railway environment -> `TURSO_DATABASE_URL=...twister-dev...` 186 + - prod Railway environment -> `TURSO_DATABASE_URL=...twister-prod...` 187 + 188 + This keeps deployment logic simple and avoids conditional application config. 189 + 88 190 ## 2. Observability 89 191 90 192 ### Structured Logging ··· 261 363 ``` 262 364 263 365 Railway supports referencing other services' variables with `${{service.VAR}}` syntax, which is useful for linking the indexer to Tap's domain. 366 + 367 + #### First-Time Bootstrap Checklist 368 + 369 + After the first successful deploy of a new environment: 370 + 371 + 1. Confirm API readiness on `/readyz` 372 + 2. Confirm indexer health and Tap connectivity 373 + 3. Run graph backfill with the environment's seed file 374 + 4. Wait for Tap historical sync to settle 375 + 5. Verify that search returns known historical repos/profiles 264 376 265 377 #### Health Checks 266 378
+15 -13
packages/api/docs/specs/07-graph-backfill.md docs/api/specs/07-graph-backfill.md
··· 21 21 ``` 22 22 23 23 Format: 24 + 24 25 - One entry per line 25 26 - Lines starting with `#` are comments 26 27 - Blank lines are ignored ··· 49 50 ### Crawl Queue 50 51 51 52 Discovered DIDs are added to a queue, deduplicated by DID. Each entry tracks: 53 + 52 54 - DID 53 55 - Discovery hop (distance from seed) 54 56 - Source (which seed/user led to discovery) ··· 92 94 93 95 ### Flags 94 96 95 - | Flag | Default | Description | 96 - |------|---------|-------------| 97 - | `--seeds` | required | Path to seed file | 98 - | `--max-hops` | `2` | Max fan-out depth from seed users | 99 - | `--dry-run` | `false` | List discovered users without submitting to Tap | 100 - | `--concurrency` | `5` | Parallel discovery workers | 101 - | `--batch-size` | `10` | DIDs per `/repos/add` call | 102 - | `--batch-delay` | `1s` | Delay between batches | 97 + | Flag | Default | Description | 98 + | --------------- | -------- | ----------------------------------------------- | 99 + | `--seeds` | required | Path to seed file | 100 + | `--max-hops` | `2` | Max fan-out depth from seed users | 101 + | `--dry-run` | `false` | List discovered users without submitting to Tap | 102 + | `--concurrency` | `5` | Parallel discovery workers | 103 + | `--batch-size` | `10` | DIDs per `/repos/add` call | 104 + | `--batch-delay` | `1s` | Delay between batches | 103 105 104 106 ### Output 105 107 ··· 127 129 128 130 ## 8. Configuration 129 131 130 - | Variable | Default | Description | 131 - |----------|---------|-------------| 132 - | `TAP_URL` | (existing) | Tap base URL for API calls | 133 - | `TAP_AUTH_PASSWORD` | (existing) | Tap admin auth | 132 + | Variable | Default | Description | 133 + | -------------------- | ---------- | ----------------------------- | 134 + | `TAP_URL` | (existing) | Tap base URL for API calls | 135 + | `TAP_AUTH_PASSWORD` | (existing) | Tap admin auth | 134 136 | `TURSO_DATABASE_URL` | (existing) | For checking existing records | 135 - | `TURSO_AUTH_TOKEN` | (existing) | DB auth | 137 + | `TURSO_AUTH_TOKEN` | (existing) | DB auth | 136 138 137 139 No new environment variables are needed — backfill reuses existing Tap and DB configuration.
+3 -2
packages/api/docs/specs/README.md docs/api/specs/README.md
··· 5 5 6 6 # Twister Technical Specifications 7 7 8 - Twister is a Go-based search service for [Tangled](https://tangled.org) content on AT Protocol. 9 - It ingests records through [Tap](https://github.com/bluesky-social/indigo/tree/main/cmd/tap), denormalizes them into search documents, indexes them in [Turso/libSQL](https://docs.turso.tech), and exposes keyword, semantic, and hybrid search APIs. 8 + Twister is a Go-based index and search service for [Tangled](https://tangled.org) content on AT Protocol. 9 + It ingests records through [Tap](https://github.com/bluesky-social/indigo/tree/main/cmd/tap), denormalizes them into search documents and graph summaries, indexes them in [Turso/libSQL](https://docs.turso.tech), and exposes public APIs for search and index-backed data gaps. 10 10 11 11 ## Specifications 12 12 ··· 19 19 | 5 | [Search](05-search.md) | Search modes, API contract, scoring, filtering | 20 20 | 6 | [Operations](06-operations.md) | Configuration, observability, security, deployment | 21 21 | 7 | [Graph Backfill](07-graph-backfill.md) | Seed-based user discovery and content backfill | 22 + | 8 | [App Integration](08-app-integration.md) | Mobile-facing contracts for search and graph summaries |
-38
packages/api/docs/tasks/README.md
··· 1 - --- 2 - title: "Twister — Task Index" 3 - updated: 2026-03-22 4 - --- 5 - 6 - # Twister Tasks 7 - 8 - Assumes Go, Tap (deployed on Railway), Turso/libSQL, and Railway for deployment. 9 - 10 - ## Delivery Strategy 11 - 12 - Build in four phases: 13 - 14 - 1. **MVP** — ingestion, keyword search, deployment, operational tooling, graph backfill 15 - 2. **Semantic Search** — embeddings, vector retrieval 16 - 3. **Hybrid Search** — weighted merge of keyword + semantic 17 - 4. **Quality Polish** — ranking refinement, advanced filters, analytics 18 - 19 - Ship keyword search before embeddings. That gives a testable, inspectable baseline before introducing model behavior. 20 - 21 - ## Phases 22 - 23 - | Phase | Title | Document | Status | 24 - | ----- | ----- | -------- | ------ | 25 - | 1 | MVP | [phase-1-mvp.md](phase-1-mvp.md) | In progress (M0–M2 complete) | 26 - | 2 | Semantic Search | [phase-2-semantic.md](phase-2-semantic.md) | Not started | 27 - | 3 | Hybrid Search | [phase-3-hybrid.md](phase-3-hybrid.md) | Not started | 28 - | 4 | Quality Polish | [phase-4-quality.md](phase-4-quality.md) | Not started | 29 - 30 - ## MVP Complete When 31 - 32 - - Tap ingests tracked `sh.tangled.*` records 33 - - Documents normalize into a stable store 34 - - Keyword search works publicly 35 - - API and indexer are deployed on Railway 36 - - Restart does not lose sync position 37 - - Reindex exists for repair 38 - - Graph backfill populates initial content from seed users
+80 -91
packages/api/docs/tasks/phase-1-mvp.md docs/api/tasks/phase-1-mvp.md
··· 17 17 - Reindex exists for repair 18 18 - Graph backfill populates initial content from seed users 19 19 20 - --- 21 - 22 20 ## M0 — Repository Bootstrap ✅ 23 21 24 22 Executable layout, local tooling, and development conventions (completed 2026-03-22). 25 23 26 - --- 27 - 28 24 ## M1 — Database Schema and Store Layer ✅ 29 25 30 26 refs: [specs/03-data-model.md](../specs/03-data-model.md) 31 27 32 28 Implemented the Turso/libSQL schema and Go store package for document persistence. 33 - 34 - --- 35 29 36 30 ## M2 — Normalization Layer ✅ 37 31 ··· 39 33 40 34 Translate `sh.tangled.*` records into internal search documents. 41 35 42 - --- 43 - 44 36 ## M3 — Tap Client and Ingestion Loop 45 37 46 38 refs: [specs/04-data-pipeline.md](../specs/04-data-pipeline.md), [specs/01-architecture.md](../specs/01-architecture.md) ··· 48 40 ### Goal 49 41 50 42 Connect the indexer to Tap (on Railway) and process live events into the store. 51 - 52 - ### Why Now 53 - 54 - Tap is the point of truth for synchronized ATProto ingestion. It is already deployed on Railway. 55 43 56 44 ### Deliverables 57 45 ··· 126 114 127 115 The system continuously ingests and persists `sh.tangled.*` records from Tap. 128 116 129 - --- 117 + ## M4 — Graph Backfill from Seed Users 118 + 119 + refs: [specs/07-graph-backfill.md](../specs/07-graph-backfill.md) 120 + 121 + ### Goal 122 + 123 + Bootstrap the index with historical Tangled content by discovering and backfilling users from a curated seed set. 124 + 125 + ### Deliverables 126 + 127 + - `twister backfill` CLI command 128 + - Seed file parser and documented seed-file format 129 + - Graph fan-out discovery (follows and collaborators) 130 + - Tap `/repos/add` integration for discovered users 131 + - Deduplication against already-tracked repos 132 + - Dry-run mode and progress logging 133 + - Basic operator runbook for first bootstrap and repeat runs 134 + 135 + ### Tasks 136 + 137 + - [ ] Implement `backfill` subcommand with flags: 138 + - `--seeds <file>` — required seed file path 139 + - `--max-hops <n>` — depth limit for fan-out (default: 2) 140 + - `--dry-run` — print the discovery plan without mutating Tap 141 + - `--concurrency <n>` — parallel discovery workers (default: 5) 142 + - `--batch-size <n>` — DIDs per `/repos/add` request 143 + - `--batch-delay <duration>` — delay between Tap registration batches 144 + - [ ] Implement seed file parsing: 145 + - One DID or handle per line 146 + - `#` comments allowed 147 + - Blank lines ignored 148 + - Handles resolved to DIDs before graph expansion 149 + - [ ] Decide and document the initial seed file location for operators: 150 + - Repository-managed example file for format/reference 151 + - Deployment-specific runtime file or mounted secret for real runs 152 + - [ ] Implement graph discovery: 153 + 1. Start from hop-0 seed users 154 + 2. Fetch `sh.tangled.graph.follow` records and collect subject DIDs 155 + 3. Fetch repo collaborators by inspecting repos, issues, PRs, and comments 156 + 4. Enqueue newly discovered DIDs with hop metadata 157 + 5. Stop expanding beyond `max-hops` 158 + - [ ] Track discovery metadata for logs: 159 + - source DID 160 + - hop depth 161 + - discovery reason (`seed`, `follow`, `collaborator`) 162 + - [ ] Integrate with Tap admin endpoints: 163 + - `GET /info/:did` to skip already-tracked repos when practical 164 + - `POST /repos/add` to register new DIDs for backfill 165 + - [ ] Make the command safe to re-run: 166 + - in-memory visited DID set during crawl 167 + - tolerate duplicate `/repos/add` 168 + - rely on index upsert idempotency for re-delivered records 169 + - [ ] Add operator-friendly logging: 170 + - seed count 171 + - users discovered per hop 172 + - already-tracked vs newly-submitted DIDs 173 + - batch progress 174 + - final totals 175 + - [ ] Add a short runbook covering: 176 + - first bootstrap against an empty database 177 + - repeat run after expanding the seed list 178 + - dry-run before production mutation 179 + 180 + ### Verification 181 + 182 + - [ ] A small seed file of known Tangled users produces a non-empty discovery graph 183 + - [ ] `--max-hops 1` limits discovery to direct neighbors 184 + - [ ] `--dry-run` does not call Tap mutation endpoints 185 + - [ ] Already-tracked DIDs are reported and not re-submitted unnecessarily 186 + - [ ] Re-running the same seeds is effectively idempotent 187 + - [ ] Newly submitted DIDs cause Tap to begin historical backfill 188 + - [ ] Search results become materially richer after bootstrap than they were under live-only ingestion 189 + 190 + ### Exit Criteria 191 + 192 + Operators can bootstrap an empty environment to a usable historical baseline before public rollout. 130 193 131 - ## M4 — Keyword Search API 194 + ## M5 — Keyword Search API 132 195 133 196 refs: [specs/05-search.md](../specs/05-search.md) 134 197 135 198 ### Goal 136 199 137 200 Expose a usable public search API backed by Turso's Tantivy-backed FTS. 138 - 139 - ### Why Now 140 - 141 - First real product milestone. Searchable Tangled content without waiting for embeddings. 142 201 143 202 ### Deliverables 144 203 ··· 200 259 201 260 A user can search Tangled content reliably with keyword search. 202 261 203 - --- 204 - 205 - ## M5 — Railway Deployment 262 + ## M6 — Railway Deployment 206 263 207 264 refs: [specs/06-operations.md](../specs/06-operations.md) 208 265 209 266 ### Goal 210 267 211 268 Deploy the API and indexer as Railway services alongside Tap. 212 - 213 - ### Why Now 214 - 215 - At this point, the product is useful enough to run continuously. 216 269 217 270 ### Deliverables 218 271 ··· 253 306 254 307 The system runs as a deployed service with health-checked processes on Railway. 255 308 256 - --- 257 - 258 - ## M6 — Reindex and Repair 309 + ## M7 — Reindex and Repair 259 310 260 311 refs: [specs/05-search.md](../specs/05-search.md) 261 312 ··· 263 314 264 315 Make the system recoverable and operable with repair tools. 265 316 266 - ### Why Now 267 - 268 - Search systems are never perfect on first ingestion. Repair tools are needed before production. 269 - 270 317 ### Deliverables 271 318 272 319 - `twister reindex` command with scoping options ··· 304 351 305 352 Operators can repair bad indexes without rebuilding everything manually. 306 353 307 - --- 308 - 309 - ## M7 — Observability 354 + ## M8 — Observability 310 355 311 356 refs: [specs/06-operations.md](../specs/06-operations.md) 312 357 ··· 349 394 ### Exit Criteria 350 395 351 396 The system is maintainable without guesswork. 352 - 353 - --- 354 - 355 - ## M-New — Graph Backfill from Seed Users 356 - 357 - refs: [specs/07-graph-backfill.md](../specs/07-graph-backfill.md) 358 - 359 - ### Goal 360 - 361 - Bootstrap the search index with existing Tangled content by discovering and backfilling users from a seed set. 362 - 363 - ### Why Now 364 - 365 - Before MVP launch, the index needs existing content. Live ingestion only captures new events — backfill populates historical data. 366 - 367 - ### Deliverables 368 - 369 - - `twister backfill` CLI command 370 - - Seed file parser 371 - - Graph fan-out discovery (follows/collaborators) 372 - - Tap `/repos/add` integration for discovered users 373 - - Deduplication against already-indexed users 374 - - Progress logging 375 - 376 - ### Tasks 377 - 378 - - [ ] Implement `backfill` subcommand with flags: 379 - - `--seeds <file>` — path to seed file (one DID or handle per line) 380 - - `--max-hops <n>` — depth limit for fan-out (default: 2) 381 - - `--dry-run` — show discovered users without triggering backfill 382 - - `--concurrency <n>` — parallel discovery workers (default: 5) 383 - - [ ] Implement seed file parser (supports DIDs and handles, comments with `#`) 384 - - [ ] Implement graph fan-out: 385 - 1. For each seed user, resolve DID if handle provided 386 - 2. Fetch `sh.tangled.graph.follow` records for the user 387 - 3. Fetch collaborators from repos owned by the user 388 - 4. Add discovered DIDs to the crawl queue 389 - 5. Repeat up to `max-hops` depth 390 - - [ ] Integrate with Tap `/repos/add` to register discovered DIDs for tracking 391 - - [ ] Deduplicate: skip DIDs already tracked by Tap (check via `/info/:did`) 392 - - [ ] Log progress: seeds processed, users discovered per hop, DIDs submitted to Tap 393 - - [ ] Handle rate limiting and errors gracefully (retry with backoff) 394 - - [ ] Make idempotent: safe to re-run; Tap handles duplicate `/repos/add` calls 395 - 396 - ### Verification 397 - 398 - - [ ] Running with a seed file of 3 known users discovers their followers 399 - - [ ] `--max-hops 1` limits discovery to direct connections only 400 - - [ ] `--dry-run` lists discovered DIDs without calling Tap 401 - - [ ] Already-tracked users are skipped 402 - - [ ] Re-running the same seed file produces no duplicate work 403 - - [ ] Tap begins backfilling records for newly added DIDs 404 - 405 - ### Exit Criteria 406 - 407 - The index contains historical content from the seed user graph, not just new events.
-12
packages/api/docs/tasks/phase-2-semantic.md docs/api/tasks/phase-2-semantic.md
··· 7 7 8 8 Add embedding generation and vector-based retrieval on top of the keyword baseline. 9 9 10 - --- 11 - 12 10 ## M8 — Embedding Pipeline 13 11 14 12 refs: [specs/03-data-model.md](../specs/03-data-model.md), [specs/05-search.md](../specs/05-search.md) ··· 16 14 ### Goal 17 15 18 16 Add asynchronous embedding generation without blocking ingestion. 19 - 20 - ### Why Now 21 - 22 - Only after keyword search is stable should semantic complexity be added. 23 17 24 18 ### Deliverables 25 19 ··· 70 64 71 65 Embeddings are produced asynchronously and stored durably. 72 66 73 - --- 74 - 75 67 ## M9 — Semantic Search 76 68 77 69 refs: [specs/05-search.md](../specs/05-search.md) ··· 79 71 ### Goal 80 72 81 73 Expose vector-based semantic retrieval. 82 - 83 - ### Why Now 84 - 85 - Natural next step once embeddings exist. Turso/libSQL has native vector search with `vector_top_k`. 86 74 87 75 ### Deliverables 88 76
-2
packages/api/docs/tasks/phase-3-hybrid.md docs/api/tasks/phase-3-hybrid.md
··· 7 7 8 8 Merge lexical and semantic search into the default high-quality retrieval mode. 9 9 10 - --- 11 - 12 10 ## M10 — Hybrid Search 13 11 14 12 refs: [specs/05-search.md](../specs/05-search.md)
-2
packages/api/docs/tasks/phase-4-quality.md docs/api/tasks/phase-4-quality.md
··· 7 7 8 8 Improve search quality without changing the core architecture. 9 9 10 - --- 11 - 12 10 ## M11 — Ranking and Quality Polish 13 11 14 12 refs: [specs/05-search.md](../specs/05-search.md)