a love letter to tangled (android, iOS, and a search API)
19
fork

Configure Feed

Select the types of activity you want to include in your feed.

docs: search site and data layer extensions

+382 -49
+20 -3
docs/api/specs/01-architecture.md
··· 64 64 - **AT Protocol network** — source of all Tangled content 65 65 - **Tap** — filtered event delivery from the AT Protocol firehose (deployed on Railway) 66 66 - **Turso/libSQL** — relational storage, Tantivy-backed FTS, and native vector search 67 - - **Embedding provider** — generates vectors for semantic search 68 - - **Railway** — deployment platform for Twister services and Tap 67 + - **Ollama** — local embedding model server (nomic-embed-text or EmbeddingGemma); deployed as a Railway sidecar service 68 + - **Railway** — deployment platform for Twister services, Tap, and Ollama 69 69 70 70 ## 7. Architecture Summary 71 71 ··· 106 106 | -------------- | -------------------------------------------- | -------------------------- | 107 107 | `api` | HTTP search, graph summary, and document API | Railway service (public) | 108 108 | `indexer` | Tap consumer, normalizer, DB writer | Railway service (internal) | 109 - | `embed-worker` | Async embedding generation | Optional Railway service | 109 + | `embed-worker` | Async embedding generation via Ollama | Optional Railway service | 110 + | `ollama` | Local embedding model server | Railway service (internal) | 110 111 | `tap` | ATProto sync | Railway (already deployed) | 111 112 112 113 ## 9. Repository Structure ··· 141 142 ``` 142 143 143 144 ## 11. Technology Choices 145 + 146 + ### Embedding: Ollama (self-hosted) 147 + 148 + Embeddings are generated locally via Ollama rather than an external API service. This eliminates per-token costs, external service dependencies, and data egress concerns. 149 + 150 + **Recommended models (in order of preference):** 151 + 152 + | Model | Parameters | Dimensions | Quantized Size | Notes | 153 + |-------|-----------|------------|----------------|-------| 154 + | nomic-embed-text-v1.5 | 137M | 768 (Matryoshka: 64–768) | ~262 MB (F16) | 8192 context, battle-tested, Railway template exists | 155 + | EmbeddingGemma | 308M | 768 | <200 MB (quantized) | Best-in-class MTEB for size, released Sept 2025 | 156 + | all-minilm | 23M | 384 | ~46 MB | Budget option, lower quality | 157 + 158 + **Go integration:** Use the official Ollama Go client (`github.com/ollama/ollama/api`) with the `Embed()` method. The embed-worker calls Ollama over Railway's internal network (`ollama.railway.internal:11434`). 159 + 160 + **Railway deployment:** Ollama runs as a separate Railway service (~1–2 GB RAM, 1–2 vCPU, ~$10–30/mo). The nomic-embed Railway template provides a proven starting point. No cold starts on always-on services; model loads in 2–10 seconds on first request after deploy. 144 161 145 162 ### Language: Go 146 163
+32 -1
docs/api/specs/03-data-model.md
··· 93 93 ) WITH (weights='title=3.0,repo_name=2.5,author_handle=2.0,summary=1.5,tags_json=1.2,body=1.0'); 94 94 ``` 95 95 96 + ### FTS Maintenance 97 + 98 + Turso's Tantivy-backed FTS uses `NoMergePolicy` — segment count grows with writes and is never automatically compacted. This increases query fan-out over time. 99 + 100 + **Required maintenance:** Run `OPTIMIZE INDEX idx_documents_fts;` periodically (e.g., daily cron or after bulk backfill). This merges segments and reclaims space. 101 + 102 + **Known limitations:** 103 + - No read-your-writes within a transaction — FTS queries see a pre-commit snapshot 104 + - No snippet function (use `fts_highlight()` for highlighting) 105 + - FTS is experimental in Turso; requires the `fts` feature flag 106 + 96 107 ## 4. Embeddings Table 97 108 98 109 ```sql ··· 108 119 ); 109 120 ``` 110 121 111 - The vector dimension (768) is configurable by model. Changing models requires a new column or table migration. 122 + The vector dimension (768) matches nomic-embed-text-v1.5 and EmbeddingGemma defaults. Changing models may require a new column or table migration if the dimension changes. 123 + 124 + ### Vector Index Tuning 125 + 126 + The DiskANN index accepts tuning parameters at creation time: 127 + 128 + ```sql 129 + CREATE INDEX idx_embeddings_vec ON document_embeddings( 130 + libsql_vector_idx(embedding, 'metric=cosine', 'max_neighbors=50', 'search_l=200') 131 + ); 132 + ``` 133 + 134 + | Parameter | Default | Description | 135 + |-----------|---------|-------------| 136 + | `max_neighbors` | 3*sqrt(D) | Graph connectivity; higher = better recall, more storage | 137 + | `search_l` | 200 | Neighbors visited during search; higher = better recall, slower | 138 + | `insert_l` | 70 | Neighbors visited during insert | 139 + | `alpha` | 1.2 | Graph sparsity factor | 140 + | `compress_neighbors` | — | Quantize neighbor vectors for storage savings | 141 + 142 + Start with defaults and tune after measuring recall on representative queries. 112 143 113 144 ## 5. Sync State Table 114 145
+1
docs/api/specs/04-data-pipeline.md
··· 324 324 - Jobs are retried with exponential backoff up to a max attempt count 325 325 - After max attempts, the job enters `dead` state 326 326 - The embed-worker exposes failed job count as a metric 327 + - If Ollama is unreachable (sidecar down), all pending jobs pause until connectivity is restored 327 328 328 329 ### DB Failures 329 330
+7 -1
docs/api/specs/05-search.md
··· 61 61 fts_highlight(d.body, '<mark>', '</mark>', ?) AS body_snippet 62 62 ``` 63 63 64 + ### FTS Operational Notes 65 + 66 + - **Segment merging:** Turso FTS uses Tantivy's `NoMergePolicy`. Run `OPTIMIZE INDEX idx_documents_fts;` after bulk writes (backfill) and periodically in production to keep query performance stable. 67 + - **Read-your-writes:** FTS queries within the same transaction see a pre-commit snapshot. If a document is written and immediately searched in the same transaction, FTS will not find it. The indexer and API are separate processes, so this is not a concern in normal operation. 68 + - **Feature flag:** Turso FTS requires the `fts` feature flag to be enabled on the database. 69 + 64 70 ## 3. Semantic Search 65 71 66 72 ### Query Flow 67 73 68 - 1. Convert user query text to embedding via the configured provider 74 + 1. Convert user query text to embedding via Ollama (self-hosted) 69 75 2. Query `vector_top_k` for nearest neighbors 70 76 3. Join back to `documents` to get metadata 71 77 4. Filter out deleted/hidden documents
+15 -18
docs/api/specs/06-operations.md
··· 41 41 | `SEARCH_MAX_LIMIT` | `100` | Maximum results per page | 42 42 | `SEARCH_DEFAULT_MODE` | `keyword` | Default search mode | 43 43 44 - ### Embedding 44 + ### Embedding (Ollama — self-hosted) 45 45 46 - | Variable | Default | Description | 47 - | ---------------------- | ------- | ---------------------------------------------------- | 48 - | `EMBEDDING_PROVIDER` | — | Provider name (e.g., `openai`, `ollama`, `voyageai`) | 49 - | `EMBEDDING_MODEL` | — | Model name (e.g., `text-embedding-3-small`) | 50 - | `EMBEDDING_API_KEY` | — | Provider API key | 51 - | `EMBEDDING_API_URL` | — | Provider base URL (for self-hosted) | 52 - | `EMBEDDING_DIM` | `768` | Vector dimensionality | 53 - | `EMBEDDING_BATCH_SIZE` | `32` | Batch size for embed-worker | 46 + | Variable | Default | Description | 47 + | ---------------------- | ------------------------------------------ | ---------------------------------------------- | 48 + | `OLLAMA_URL` | `http://ollama.railway.internal:11434` | Ollama server URL | 49 + | `EMBEDDING_MODEL` | `nomic-embed-text` | Ollama model name | 50 + | `EMBEDDING_DIM` | `768` | Vector dimensionality (must match model) | 51 + | `EMBEDDING_BATCH_SIZE` | `32` | Documents per embedding batch | 54 52 55 53 ### Hybrid Search 56 54 ··· 87 85 SEARCH_DEFAULT_LIMIT=20 88 86 SEARCH_MAX_LIMIT=100 89 87 90 - # Embedding (Phase 2) 91 - # EMBEDDING_PROVIDER=openai 92 - # EMBEDDING_MODEL=text-embedding-3-small 93 - # EMBEDDING_API_KEY=sk-... 88 + # Embedding — Ollama (Phase 2) 89 + # OLLAMA_URL=http://ollama.railway.internal:11434 90 + # EMBEDDING_MODEL=nomic-embed-text 94 91 # EMBEDDING_DIM=768 95 92 96 93 # Server ··· 285 282 | ------------------- | --------------------------------- | 286 283 | `TURSO_AUTH_TOKEN` | Turso database authentication | 287 284 | `TAP_AUTH_PASSWORD` | Tap admin API authentication | 288 - | `EMBEDDING_API_KEY` | Embedding provider authentication | 285 + | `OLLAMA_URL` | Ollama sidecar connection (no secret if internal networking) | 289 286 | `ADMIN_AUTH_TOKEN` | Admin endpoint authentication | 290 287 291 288 ### Admin Endpoints ··· 331 328 | api | `twister api` | `GET /healthz` | yes | 332 329 | indexer | `twister indexer` | `GET :9090/health` | no | 333 330 | embed-worker | `twister embed-worker` | `GET :9091/health` | no | 331 + | ollama | (Railway template) | `GET /api/tags` | no | 334 332 335 333 All services share the same Docker image. Railway uses the start command to select the subcommand. 336 334 ··· 356 354 TAP_AUTH_PASSWORD=... 357 355 INDEXED_COLLECTIONS=sh.tangled.repo,sh.tangled.repo.issue,sh.tangled.repo.pull,sh.tangled.string,sh.tangled.actor.profile 358 356 359 - # Embed-worker (Phase 2) 360 - # EMBEDDING_PROVIDER=openai 361 - # EMBEDDING_MODEL=text-embedding-3-small 362 - # EMBEDDING_API_KEY=sk-... 357 + # Embed-worker + Ollama (Phase 2) 358 + # OLLAMA_URL=http://ollama.railway.internal:11434 359 + # EMBEDDING_MODEL=nomic-embed-text 363 360 ``` 364 361 365 362 Railway supports referencing other services' variables with `${{service.VAR}}` syntax, which is useful for linking the indexer to Tap's domain.
+166
docs/api/specs/09-search-site.md
··· 1 + --- 2 + title: "Spec 09 — Search Site" 3 + updated: 2026-03-23 4 + --- 5 + 6 + A minimal static site that serves as both the public Twister API documentation and a live search showcase. Dark mode only, no framework or build step. 7 + 8 + ## 1. Purpose 9 + 10 + - Give developers a browsable reference for the Twister search API 11 + - Give anyone a way to try search against live indexed Tangled content 12 + - Provide a shareable public URL before the mobile app ships 13 + 14 + ## 2. Scope 15 + 16 + In scope: 17 + 18 + - Static HTML/CSS/JS (Alpine.js, no bundler) 19 + - API reference pages generated from the spec docs 20 + - Live search input wired to `GET /search` 21 + - Result rendering with type-aware cards (repo, issue, PR, profile, string) 22 + - Filter controls for collection, type, author, language, state 23 + - Pagination 24 + - Responsive layout (mobile-friendly, single breakpoint) 25 + 26 + Out of scope: 27 + 28 + - Auth, OAuth, or any write operations 29 + - Semantic or hybrid mode toggle (keyword only for MVP) 30 + - Server-side rendering or static-site generator 31 + - Analytics or telemetry 32 + 33 + ## 3. Pages 34 + 35 + | Route | Content | 36 + | ----------------- | --------------------------------------------------------------------------- | 37 + | `/` | Search input + results (the homepage is the search page) | 38 + | `/docs` | API overview: base URL, auth (none for public), rate limits, response shape | 39 + | `/docs/search` | `GET /search` — parameters, filters, response contract, examples | 40 + | `/docs/documents` | `GET /documents/{id}` — request/response, examples | 41 + | `/docs/health` | `GET /healthz`, `GET /readyz` — purpose and expected responses | 42 + 43 + ## 4. Search Page Behavior 44 + 45 + 1. Text input with a submit button. No debounce search-as-you-type for MVP. 46 + 2. On submit, fetch `GET {API_BASE}/search?q={query}&limit=20` (plus any active filters). 47 + 3. Render results as a vertical list of cards. 48 + 4. Each card shows: `record_type` badge, `title`, `body_snippet` (with `<mark>` highlights preserved), `author_handle`, `repo_name` (when present), `updated_at` relative time. 49 + 5. Clicking a result opens the canonical Tangled URL (`https://tangled.org/{handle}/{repo}` for repos, etc.) in a new tab. 50 + 6. "Load more" button appends the next page (`offset += limit`). 51 + 7. Empty state: "No results" message. 52 + 8. Error state: inline message if the API is unreachable. 53 + 9. Filter bar above results: dropdowns/inputs for `type`, `language`, `author`. Filters are query params so URLs are shareable. 54 + 55 + ## 5. API Docs Pages 56 + 57 + Hand-written HTML mirroring the contracts in spec 05 (search) and spec 08 (app integration). Each page includes: 58 + 59 + - Endpoint signature (method, path) 60 + - Parameter table (name, type, default, description) 61 + - Example request (curl) 62 + - Example response (JSON block with syntax highlighting via `<pre><code>`) 63 + 64 + No generated docs tooling. The pages are static and updated manually when the API changes. 65 + 66 + ## 6. Styling 67 + 68 + Minimal CSS, no utility framework. 69 + 70 + ### Tokens 71 + 72 + ```css 73 + :root { 74 + --bg: #0e0e0e; 75 + --surface: #1a1a1a; 76 + --border: #2a2a2a; 77 + --text: #e0e0e0; 78 + --text-dim: #888; 79 + --accent: #7aa2f7; 80 + --mark-bg: #7aa2f733; 81 + --mono: "Google Sans Mono", monospace; 82 + --sans: "Google Sans", sans-serif; 83 + --radius: 6px; 84 + } 85 + ``` 86 + 87 + ### Rules 88 + 89 + - Dark theming. 90 + - `Google Sans` for body text. `Google Sans Mono` for code, JSON, and badges. 91 + - Fonts loaded via Google Fonts `<link>`. System fallbacks: `sans-serif`, `monospace`. 92 + - Max content width: `720px`, centered. 93 + - Cards: `var(--surface)` background, `var(--border)` border, `var(--radius)` corners. 94 + - `<mark>` tags in snippets styled with `var(--mark-bg)` background and `var(--accent)` text. 95 + - Code blocks: `var(--surface)` background, horizontal scroll, no wrapping. 96 + - Links: `var(--accent)`, no underline, underline on hover. 97 + - Inputs and buttons: `var(--surface)` background, `var(--border)` border, `var(--text)` text. 98 + - One breakpoint at `640px` for mobile: full-width cards, stacked filter bar. 99 + 100 + ## 7. Package Design 101 + 102 + The site lives in `internal/view/` as a self-contained Go package. It owns the templates, static assets, and HTTP handlers. The `api` package mounts `view.Handler()` into its router — nothing else leaks out. 103 + 104 + ### Exports 105 + 106 + The package exposes a single constructor: 107 + 108 + ```go 109 + // Handler returns an http.Handler that serves the site pages and static assets. 110 + func Handler() http.Handler 111 + ``` 112 + 113 + The `api` package calls `view.Handler()` and mounts it as a fallback after API routes. 114 + 115 + ### Package Structure 116 + 117 + ```text 118 + internal/view/ 119 + view.go # Handler(), route setup, embed directives 120 + templates/ 121 + layout.html # Shared shell (head, nav, footer) 122 + index.html # Search page 123 + docs/ 124 + index.html # API overview 125 + search.html # GET /search docs 126 + documents.html # GET /documents/{id} docs 127 + health.html # Health endpoints docs 128 + static/ 129 + style.css # All styles, single file 130 + search.js # Search fetch, render, pagination, filters 131 + ``` 132 + 133 + ### Embedding 134 + 135 + `view.go` uses `//go:embed` to bundle `templates/` and `static/`. Templates are parsed once at init. Static assets are served under `/static/` via `http.FileServer`. 136 + 137 + ### Routing 138 + 139 + `view.Handler()` returns a mux that handles: 140 + 141 + | Pattern | Handler | 142 + | --- | --- | 143 + | `GET /` | Render `index.html` | 144 + | `GET /docs` | Render `docs/index.html` | 145 + | `GET /docs/search` | Render `docs/search.html` | 146 + | `GET /docs/documents` | Render `docs/documents.html` | 147 + | `GET /docs/health` | Render `docs/health.html` | 148 + | `GET /static/*` | Serve embedded CSS/JS files | 149 + 150 + ## 9. Configuration 151 + 152 + Since the site is served by the same origin as the API, search requests use relative paths (`/search?q=...`). No `API_BASE` config needed — the browser's origin is the API. 153 + 154 + ## 10. Local Development 155 + 156 + Run `twister api` locally. The site is served at `http://localhost:8080/` alongside the API endpoints. No separate dev server or file server required. 157 + 158 + The API docs pages render without any indexed data. The search page needs a running indexer and populated database to return results. 159 + 160 + ## 11. Constraints 161 + 162 + - No dependencies besides Alpine via CDN. 163 + - Total site weight target: under 50 KB excluding fonts. 164 + - Works in modern browsers (last 2 versions of Chrome, Firefox, Safari). 165 + - All fetch calls include error handling for network failures and non-200 responses. 166 + - No CORS concerns — the site and API share an origin.
+11 -10
docs/api/specs/README.md
··· 10 10 11 11 ## Specifications 12 12 13 - | # | Document | Description | 14 - |---|----------|-------------| 15 - | 1 | [Architecture](01-architecture.md) | Purpose, goals, design principles, system context, tech choices | 16 - | 2 | [Tangled Lexicons](02-tangled-lexicons.md) | `sh.tangled.*` record schemas and fields | 17 - | 3 | [Data Model](03-data-model.md) | Database schema, search documents, sync state | 18 - | 4 | [Data Pipeline](04-data-pipeline.md) | Tap integration, normalization, failure handling | 19 - | 5 | [Search](05-search.md) | Search modes, API contract, scoring, filtering | 20 - | 6 | [Operations](06-operations.md) | Configuration, observability, security, deployment | 21 - | 7 | [Graph Backfill](07-graph-backfill.md) | Seed-based user discovery and content backfill | 22 - | 8 | [App Integration](08-app-integration.md) | Mobile-facing contracts for search and graph summaries | 13 + | # | Document | Description | 14 + | --- | ------------------------------------------ | --------------------------------------------------------------- | 15 + | 1 | [Architecture](01-architecture.md) | Purpose, goals, design principles, system context, tech choices | 16 + | 2 | [Tangled Lexicons](02-tangled-lexicons.md) | `sh.tangled.*` record schemas and fields | 17 + | 3 | [Data Model](03-data-model.md) | Database schema, search documents, sync state | 18 + | 4 | [Data Pipeline](04-data-pipeline.md) | Tap integration, normalization, failure handling | 19 + | 5 | [Search](05-search.md) | Search modes, API contract, scoring, filtering | 20 + | 6 | [Operations](06-operations.md) | Configuration, observability, security, deployment | 21 + | 7 | [Graph Backfill](07-graph-backfill.md) | Seed-based user discovery and content backfill | 22 + | 8 | [App Integration](08-app-integration.md) | Mobile-facing contracts for search and graph summaries | 23 + | 9 | [Search Site](09-search-site.md) | Static site for API docs and live search |
+1
docs/api/tasks/README.md
··· 38 38 - Restart does not lose sync position 39 39 - Reindex exists for repair 40 40 - Graph backfill populates initial content from seed users 41 + - A static search site with API docs is publicly accessible
+59 -1
docs/api/tasks/phase-1-mvp.md
··· 16 16 - Restart does not lose sync position 17 17 - Reindex exists for repair 18 18 - Graph backfill populates initial content from seed users 19 + - A static search site with API docs is publicly accessible 19 20 20 21 ## M0 — Repository Bootstrap ✅ 21 22 ··· 261 262 262 263 A user can search Tangled content reliably with keyword search. 263 264 265 + ## M5a — Search Site 266 + 267 + refs: [specs/09-search-site.md](../specs/09-search-site.md) 268 + 269 + ### Goal 270 + 271 + Ship a static site that doubles as public API documentation and a live search demo. Alpine.js via CDN for reactivity, no build step. 272 + 273 + ### Deliverables 274 + 275 + - `internal/view/` package exporting `Handler() http.Handler` 276 + - Embedded templates (`templates/`) and static assets (`static/`) via `//go:embed` 277 + - Search page (`/`) wired to `GET /search` with result cards, filters, and pagination 278 + - API docs pages (`/docs/*`) covering search, documents, and health endpoints 279 + - Dark-mode-only styling with Google Sans fonts and minimal CSS tokens 280 + 281 + ### Tasks 282 + 283 + - [ ] Create `internal/view/` package with `view.go`, `templates/`, and `static/` directories 284 + - [ ] Implement `Handler()` that returns an `http.Handler` with routes for all pages and `/static/*` 285 + - [ ] Embed templates and static assets via `//go:embed`; parse templates once at init 286 + - [ ] Use a shared `layout.html` template for the shell (head, nav, footer) 287 + - [ ] Mount `view.Handler()` in the `api` package router as a fallback after API routes 288 + - [ ] Build search page: 289 + - Text input + submit 290 + - Fetch `GET /search` with relative path (same origin) 291 + - Render result cards with type badge, title, snippet (preserve `<mark>`), author, repo, relative time 292 + - "Load more" pagination via offset 293 + - Filter bar: type, language, author (reflected in URL query params) 294 + - Empty and error states 295 + - [ ] Build API docs pages: 296 + - `/docs` — overview (base URL, response shape, no auth) 297 + - `/docs/search` — `GET /search` params, filters, example curl, example response 298 + - `/docs/documents` — `GET /documents/{id}` request/response 299 + - `/docs/health` — `GET /healthz`, `GET /readyz` 300 + - [ ] Implement `style.css` with design tokens (`--bg`, `--surface`, `--border`, `--accent`, etc.) 301 + - [ ] Load Google Sans and Google Sans Mono via Google Fonts `<link>` 302 + - [ ] Result card links open canonical Tangled URLs in new tab 303 + - [ ] Verify total site weight under 50 KB (excluding fonts and Alpine CDN) 304 + 305 + ### Verification 306 + 307 + - [ ] `twister api` serves the search page at `http://localhost:8080/` 308 + - [ ] API endpoints (`/search`, `/healthz`, etc.) still work alongside the site 309 + - [ ] Searching a known repo name shows it in results 310 + - [ ] Filter by type restricts results to that type 311 + - [ ] "Load more" appends next page of results 312 + - [ ] API docs pages render correct endpoint signatures, parameter tables, and example JSON 313 + - [ ] Site works on mobile viewport (stacked layout at 640px) 314 + - [ ] Site works with API unavailable (error state shown, no crash) 315 + - [ ] All pages share consistent styling and navigation 316 + 317 + ### Exit Criteria 318 + 319 + A user can search Tangled content and read API docs from a public URL without installing anything. 320 + 264 321 ## M6 — Railway Deployment 265 322 266 323 refs: [specs/06-operations.md](../specs/06-operations.md) ··· 336 393 2. For each document, re-run normalization from stored fields (or re-fetch if source available) 337 394 3. Update FTS-relevant fields 338 395 4. Upsert back to store 339 - 5. Log progress (N/total, errors) 396 + 5. Run `OPTIMIZE INDEX idx_documents_fts` after bulk reindex to merge Tantivy segments 397 + 6. Log progress (N/total, errors) 340 398 - [ ] Implement `POST /admin/reindex` endpoint (behind `ENABLE_ADMIN_ENDPOINTS` + `ADMIN_AUTH_TOKEN`) 341 399 - [ ] Add error summary output on completion 342 400 - [ ] Exit non-zero on unrecoverable failures
+70 -15
docs/api/tasks/phase-2-semantic.md
··· 1 1 --- 2 2 title: "Phase 2 — Semantic Search" 3 - updated: 2026-03-22 3 + updated: 2026-03-23 4 4 --- 5 5 6 6 # Phase 2 — Semantic Search 7 7 8 - Add embedding generation and vector-based retrieval on top of the keyword baseline. 8 + Add embedding generation and vector-based retrieval on top of the keyword baseline, using self-hosted Ollama for embeddings instead of external API services. 9 9 10 - ## M8 — Embedding Pipeline 10 + ## M8 — Ollama Sidecar and Embedding Pipeline 11 11 12 - refs: [specs/03-data-model.md](../specs/03-data-model.md), [specs/05-search.md](../specs/05-search.md) 12 + refs: [specs/01-architecture.md](../specs/01-architecture.md), [specs/03-data-model.md](../specs/03-data-model.md), [specs/05-search.md](../specs/05-search.md) 13 13 14 14 ### Goal 15 15 16 - Add asynchronous embedding generation without blocking ingestion. 16 + Deploy Ollama as a Railway sidecar and add asynchronous embedding generation without blocking ingestion. 17 17 18 18 ### Deliverables 19 19 20 + - Ollama Railway service running nomic-embed-text-v1.5 (or EmbeddingGemma) 20 21 - `embedding_jobs` table operational (schema from M1) 21 22 - `embed-worker` subcommand 22 - - Embedding provider abstraction (OpenAI, Voyage, Ollama) 23 + - Ollama-backed embedding provider (with interface for future alternatives) 23 24 - Retry and dead-letter behavior 24 25 - `twister reembed` command 25 26 26 27 ### Tasks 27 28 29 + - [ ] Deploy Ollama on Railway: 30 + - Use the nomic-embed Railway template as a starting point 31 + - Configure as internal service (no public URL) 32 + - Pre-pull `nomic-embed-text` model on startup 33 + - Health check: `GET /api/tags` on port 11434 34 + - Resource budget: 1–2 GB RAM, 1–2 vCPU 28 35 - [ ] Define embedding provider interface: 29 36 30 37 ```go ··· 35 42 } 36 43 ``` 37 44 38 - - [ ] Implement OpenAI provider (or preferred provider) 45 + - [ ] Implement Ollama provider using the official Go client: 46 + 47 + ```go 48 + import "github.com/ollama/ollama/api" 49 + 50 + // OllamaProvider calls Ollama's /api/embed endpoint 51 + // over Railway internal networking (ollama.railway.internal:11434) 52 + type OllamaProvider struct { 53 + client *api.Client 54 + model string // "nomic-embed-text" 55 + dim int // 768 56 + } 57 + ``` 58 + 59 + - Configure via `OLLAMA_URL` env var (default: `http://ollama.railway.internal:11434`) 60 + - Support batch embedding (Ollama accepts multiple inputs per request) 61 + - Timeout per request (default: 30s) 62 + - Connection health check on startup 39 63 - [ ] Implement embedding input text composition (see spec 04-data-pipeline.md, section 5): 40 64 `title\nrepo_name\nauthor_handle\ntags\nsummary\nbody` 41 65 - [ ] Add job enqueueing: on document upsert, insert `embedding_jobs` row with `status=pending` 42 66 - [ ] Implement `embed-worker` loop: 43 - 1. Poll for `pending` jobs (batch by `EMBEDDING_BATCH_SIZE`) 67 + 1. Poll for `pending` jobs (batch by `EMBEDDING_BATCH_SIZE`, default: 32) 44 68 2. Compose input text per document 45 - 3. Call embedding provider 69 + 3. Call Ollama provider 46 70 4. Store vectors in `document_embeddings` with `vector32(?)` 47 71 5. Mark job `completed` 48 72 6. On failure: increment `attempts`, set `last_error`, backoff 49 73 7. After max attempts: mark `dead` 50 - - [ ] Create DiskANN vector index: `CREATE INDEX idx_embeddings_vec ON document_embeddings(libsql_vector_idx(embedding, 'metric=cosine'))` 74 + - [ ] Create DiskANN vector index (see spec 03 for tuning params): 75 + ```sql 76 + CREATE INDEX idx_embeddings_vec ON document_embeddings( 77 + libsql_vector_idx(embedding, 'metric=cosine') 78 + ); 79 + ``` 51 80 - [ ] Implement `reembed` command (re-generate all embeddings, useful for model migration) 52 81 - [ ] Skip deleted documents in embedding pipeline 53 82 - [ ] Add health check endpoint for embed-worker (port 9091) 83 + - [ ] Add Ollama connectivity check to embed-worker readiness probe 84 + 85 + ### Model Selection Notes 86 + 87 + **nomic-embed-text-v1.5** is the default recommendation: 88 + - 137M parameters, 768-dimension vectors 89 + - Matryoshka support (can truncate to 64/128/256/512 dims for storage tradeoff) 90 + - 8192 token context window 91 + - ~262 MB at F16 quantization, ~500 MB RAM at runtime 92 + - Battle-tested with llama.cpp/Ollama, Railway template exists 93 + 94 + **EmbeddingGemma** is the quality alternative: 95 + - 308M parameters, 768-dimension vectors 96 + - Best MTEB scores for models under 500M parameters 97 + - <200 MB quantized, similar RAM footprint 98 + - Released Sept 2025, less deployment track record 99 + 100 + **all-minilm** is the budget fallback: 101 + - 23M parameters, 384-dimension vectors (requires schema change) 102 + - ~46 MB model, minimal resources 103 + - Suitable for testing or cost-constrained environments 54 104 55 105 ### Verification 56 106 107 + - [ ] Ollama service starts on Railway and responds to health checks 57 108 - [ ] Creating a new searchable document enqueues an embedding job 58 109 - [ ] Worker processes the job and stores a vector in `document_embeddings` 59 110 - [ ] Failed embedding calls retry with bounded attempts 60 - - [ ] Keyword search still works when embed-worker is down 111 + - [ ] Keyword search still works when embed-worker or Ollama is down 61 112 - [ ] `reembed` regenerates embeddings for all eligible documents 113 + - [ ] Ollama connectivity failure is surfaced in embed-worker health check 62 114 63 115 ### Exit Criteria 64 116 65 - Embeddings are produced asynchronously and stored durably. 117 + Embeddings are produced asynchronously via self-hosted Ollama and stored durably in Turso. 66 118 67 119 ## M9 — Semantic Search 68 120 ··· 75 127 ### Deliverables 76 128 77 129 - `GET /search/semantic` endpoint 78 - - Query-time embedding (convert query text → vector) 130 + - Query-time embedding (convert query text → vector via Ollama) 79 131 - Vector similarity search via `vector_top_k` 80 132 - Response parity with keyword search 81 133 82 134 ### Tasks 83 135 84 - - [ ] Implement query embedding: call embedding provider with user's query text 136 + - [ ] Implement query embedding: call Ollama provider with user's query text 137 + - [ ] Cache query embeddings for identical queries within a short TTL (optional, reduces Ollama load) 85 138 - [ ] Implement semantic search repository: 86 139 87 140 ```sql ··· 98 151 - [ ] Add timeout and cost controls (limit vector search to reasonable K) 99 152 - [ ] Wire `/search/semantic` handler 100 153 - [ ] Return `matched_by: ["semantic"]` in results 154 + - [ ] Graceful degradation: if Ollama is unreachable, return 503 for semantic search while keyword search remains available 101 155 102 156 ### Verification 103 157 ··· 106 160 - [ ] Semantic search returns the same JSON schema as keyword search 107 161 - [ ] Latency is acceptable under small test load 108 162 - [ ] Filters work correctly with semantic results 163 + - [ ] Semantic search degrades gracefully when Ollama is down 109 164 110 165 ### Exit Criteria 111 166 112 - The API supports true semantic search over Tangled documents. 167 + The API supports true semantic search over Tangled documents, powered entirely by self-hosted infrastructure.