···11+---
22+title: Phase 7 Spec
33+updated: 2026-04-09
44+---
55+66+## Semantic Search for Saved & Liked Posts
77+88+On-device vector search over the user's saved and liked posts.
99+Posts are embedded at save/like time using an on-device text embedding model, stored in ObjectBox with HNSW indexing, and queried via natural-language input.
1010+The entire pipeline runs locally -- no data leaves the device.
1111+1212+### Why ObjectBox + TFLite (not MediaPipe)
1313+1414+**ObjectBox** (`objectbox` ^5.3.1) is the only Flutter-native vector DB with production-grade HNSW support.
1515+It provides `@HnswIndex` annotations, `nearestNeighborsF32` queries, and composable filters -- exactly what's needed.
1616+1717+**TFLite via `tflite_flutter`** (^0.12.1) is the embedding runtime.
1818+MediaPipe's Flutter package (`mediapipe_text` 0.0.1) requires the Flutter master channel and the experimental `--enable-experiment=native-assets` flag, making it unsuitable for production.
1919+`tflite_flutter` is stable, runs on both iOS and Android, and can load the same TFLite models MediaPipe would use internally.
2020+2121+**Embedding model:** MiniLM-L6-v2 (all-MiniLM-L6-v2), quantized to INT8.
2222+384-dimensional output, ~25 MB model file, ~15ms inference on mid-range devices.
2323+Widely deployed, well-understood, Apache 2.0 licensed. Bundled as a Flutter asset.
2424+2525+> Alternative considered: EmbeddingGemma (768D, ~200 MB). Better quality but 8x the model size -- too large for a bundled mobile asset.
2626+> MiniLM's 384D is sufficient for post-length text and keeps the app install size reasonable.
2727+2828+### Data Flow
2929+3030+```text
3131+Post saved/liked
3232+ → Extract searchable text (post text + alt text from images + link card title/description)
3333+ → Run TFLite inference in background Isolate → Float32List[384]
3434+ → Store in ObjectBox (EmbeddedPost entity with HNSW-indexed vector)
3535+3636+User searches
3737+ → Embed query string via same model → Float32List[384]
3838+ → ObjectBox nearestNeighborsF32(queryVector, maxResults)
3939+ → Map results back to cached/saved posts → display
4040+```
4141+4242+### ObjectBox Entity Model
4343+4444+ObjectBox runs as a **secondary data store** alongside Drift. It stores only embedding vectors and the metadata needed to join back to Drift's `SavedPosts`/cached posts.
4545+Drift remains the source of truth for post content.
4646+4747+```dart
4848+@Entity()
4949+class EmbeddedPost {
5050+ @Id()
5151+ int id = 0;
5252+5353+ /// AT URI of the post (e.g. at://did:plc:xxx/app.bsky.feed.post/yyy)
5454+ @Unique()
5555+ String postUri;
5656+5757+ /// Account DID that saved/liked this post
5858+ String accountDid;
5959+6060+ /// 'saved' or 'liked'
6161+ String source;
6262+6363+ /// Concatenated searchable text at embedding time
6464+ String indexedText;
6565+6666+ /// 384-dimensional embedding vector
6767+ @HnswIndex(dimensions: 384, distanceType: VectorDistanceType.cosine)
6868+ @Property(type: PropertyType.floatVector)
6969+ List<double>? embedding;
7070+7171+ /// When the embedding was generated (for staleness checks)
7272+ @Property(type: PropertyType.dateNano)
7373+ DateTime embeddedAt;
7474+}
7575+```
7676+7777+### Embedding Service
7878+7979+`EmbeddingService` wraps the TFLite interpreter, running in a long-lived background `Isolate` to avoid UI jank.
8080+8181+**Initialization:**
8282+8383+1. App startup → spawn isolate
8484+2. Isolate loads TFLite model from assets (`assets/models/minilm_l6_v2_int8.tflite`)
8585+3. Load tokenizer vocabulary (`assets/models/vocab.txt`) -- WordPiece tokenizer, max 256 tokens
8686+4. Isolate listens on `ReceivePort` for embed requests
8787+8888+**Embedding a post:**
8989+9090+1. Concatenate: `post.text + ' ' + altTexts.join(' ') + ' ' + linkCard?.title + ' ' + linkCard?.description`
9191+2. Tokenize (WordPiece, pad/truncate to 256 tokens)
9292+3. Run interpreter: input `[1, 256]` int32 tensor → output `[1, 384]` float32 tensor
9393+4. L2-normalize the output vector
9494+5. Return `Float32List` to caller via `SendPort`
9595+9696+**Error handling:** If model fails to load (corrupt asset, unsupported device), semantic search degrades gracefully to unavailable. A flag `EmbeddingService.isAvailable` gates all UI entry points.
9797+9898+### Indexing Strategy
9999+100100+**On save/like (incremental):** When a post is saved or liked, immediately queue it for embedding. The `EmbeddingService` isolate processes the queue serially. This keeps indexing latency invisible to the user -- most posts embed in <20ms.
101101+102102+**Backfill (first launch or re-index):** On first enable or after clearing the index, batch-embed all existing saved/liked posts. Process in chunks of 50 with `Future.delayed(Duration.zero)` yielding between chunks to avoid hogging the isolate. Show progress in settings UI ("Indexing: 142/300 posts...").
103103+104104+**Staleness:** Posts are immutable on ATProto, so embeddings never go stale. If a post is un-saved or un-liked, remove its `EmbeddedPost` entry.
105105+106106+**Account isolation:** `EmbeddedPost.accountDid` scopes all queries. On account switch, ObjectBox queries filter by the active account's DID.
107107+108108+### Search UX
109109+110110+**Entry point:** New "Semantic Search" tab in the existing saved posts screen. Two tabs: "All Saved" (existing list) and "Search" (vector search).
111111+112112+**Search tab layout:**
113113+114114+- Text field with hint "Search your saved posts..."
115115+- Debounce: 500ms after typing stops
116116+- Results: list of post cards (reuse existing `PostCard` widget), ordered by cosine similarity
117117+- Each result shows a relevance badge (percentage, derived from `1 - cosineDistance`)
118118+- Empty state when no query entered: "Search your saved and liked posts by meaning, not just keywords"
119119+- No results state: "No similar posts found"
120120+- Max results: 20 (configurable in settings)
121121+122122+**Scope toggle:** Chip row above results: "Saved" / "Liked" / "Both" (default: Both). Implemented as an ObjectBox query condition combined with the vector nearest-neighbor query.
123123+124124+### Liked Posts Integration
125125+126126+Liked posts are not currently persisted locally. To include them in semantic search:
127127+128128+**New Drift table:**
129129+130130+```dart
131131+@DataClassName('LikedPostEntry')
132132+class LikedPosts extends Table {
133133+ IntColumn get id => integer().autoIncrement();
134134+ TextColumn get accountDid => text();
135135+ TextColumn get postUri => text();
136136+ TextColumn get postJson => text();
137137+ DateTimeColumn get likedAt => dateTime().withDefault(currentDateAndTime);
138138+139139+ @override
140140+ List<String> get customConstraints => ['UNIQUE (account_did, post_uri)'];
141141+}
142142+```
143143+144144+**Sync strategy:** Periodic background sync of `bluesky.feed.getActorLikes(actor:, limit:, cursor:)`. Runs on app foreground (if >5 minutes since last sync) and on manual pull-to-refresh. Fetches newest likes until it hits an already-known URI, then stops. Caps at 1000 stored likes per account (evicts oldest on overflow).
145145+146146+This is a **Drift migration** (schema version 15).
147147+148148+### Settings
149149+150150+Under "Search" section in settings:
151151+152152+- **Semantic Search** toggle (default: off) -- enables/disables the feature, triggers backfill on first enable
153153+- **Search scope** -- "Saved only" / "Liked only" / "Both" (default: Both)
154154+- **Index status** -- shows count of indexed posts, "Re-index" button
155155+- **Max results** -- slider, 10-50, default 20
156156+157157+### Package Dependencies
158158+159159+| Package | Version | Purpose |
160160+| ------------------------ | ------- | ---------------------------------------------- |
161161+| `objectbox` | ^5.3.1 | Vector storage + HNSW nearest-neighbor queries |
162162+| `objectbox_flutter_libs` | ^5.3.1 | Platform-specific ObjectBox native libraries |
163163+| `tflite_flutter` | ^0.12.1 | On-device TFLite model inference |
164164+165165+Build tooling: `objectbox_generator` (build_runner) for code generation.
166166+167167+### ObjectBox Integration Notes
168168+169169+ObjectBox requires its own initialization separate from Drift:
170170+171171+```dart
172172+final store = await openStore(directory: join(appDocDir, 'objectbox'));
173173+```
174174+175175+This runs once at app startup (after Drift init). The `Store` instance is provided via the service locator / `RepositoryProvider` tree alongside the existing Drift database.
176176+177177+ObjectBox's generated `objectbox-model.json` and `objectbox.g.dart` must be committed. Run `dart run build_runner build` after entity changes.
178178+179179+### Performance Budget
180180+181181+| Operation | Target | Notes |
182182+| ------------------------------ | ------ | -------------------------------------- |
183183+| Model load (cold) | <500ms | One-time on app start |
184184+| Single post embedding | <20ms | MiniLM INT8 on mid-range device |
185185+| Batch embed 100 posts | <3s | In background isolate |
186186+| Vector query (1000 vectors) | <5ms | ObjectBox HNSW |
187187+| Vector query (10000 vectors) | <15ms | ObjectBox HNSW |
188188+| Model asset size | ~25 MB | INT8 quantized MiniLM-L6-v2 |
189189+| ObjectBox storage (1000 posts) | ~2 MB | 384 floats x 4 bytes x 1000 + metadata |
190190+191191+### Limitations & Future Work
192192+193193+- **Text-only embeddings.** Image content is captured only via alt text and link card metadata.
194194+ A future phase could add image embeddings (MobileNet V3 + separate HNSW index), but that doubles model size and complexity.
195195+- **No cross-account search.** Each account's embeddings are isolated. A "search all accounts" mode could be added later.
196196+- **No re-ranking.** Results are pure cosine similarity. A future improvement could apply BM25 re-ranking on the top-K results for hybrid search.
197197+ (Highest priority future update)
198198+- **Liked posts sync is incremental, not complete.** The 1000-like cap means very old likes won't be searchable.
199199+ This is a pragmatic trade-off for storage.
+101
docs/tasks/phase-7.md
···11+---
22+title: Phase 7 Task Breakdown
33+updated: 2026-04-09
44+---
55+66+# Phase 7 Milestones
77+88+## M26 - Semantic Search for Saved & Liked Posts
99+1010+### Core
1111+1212+#### ObjectBox Setup
1313+1414+- [ ] Add `objectbox`, `objectbox_flutter_libs` to `pubspec.yaml`; add `objectbox_generator` to dev deps
1515+- [ ] `EmbeddedPost` entity - `postUri` (unique), `accountDid`, `source` (saved/liked), `indexedText`, `embedding` (384D float vector, HNSW cosine index), `embeddedAt`
1616+- [ ] Run `build_runner` to generate `objectbox.g.dart` and `objectbox-model.json`
1717+- [ ] `ObjectBoxStore` singleton - `openStore()` at app startup (after Drift init), expose via `RepositoryProvider`
1818+- [ ] `EmbeddingRepository` - CRUD operations on `EmbeddedPost`: `upsert`, `deleteByUri`, `queryByAccount`, `countByAccount`
1919+2020+#### TFLite Embedding Service
2121+2222+- [ ] Add `tflite_flutter` to `pubspec.yaml`
2323+- [ ] Bundle `minilm_l6_v2_int8.tflite` and `vocab.txt` as Flutter assets
2424+- [ ] `WordPieceTokenizer` - load vocab, tokenize text, pad/truncate to 256 tokens, return `List<int>`
2525+- [ ] `EmbeddingService` - long-lived background `Isolate` with `ReceivePort`/`SendPort` message passing
2626+- [ ] `EmbeddingService.initialize()` - spawn isolate, load TFLite model + tokenizer in isolate
2727+- [ ] `EmbeddingService.embed(String text)` - send text to isolate, receive `Float32List[384]`, L2-normalize
2828+- [ ] `EmbeddingService.isAvailable` - flag gating UI entry points, false if model fails to load
2929+- [ ] `EmbeddingService.dispose()` - close isolate and interpreter
3030+- [ ] `PostTextExtractor` - concatenate post text + image alt texts + link card title/description into a single searchable string
3131+3232+#### Liked Posts Sync
3333+3434+- [ ] `LikedPosts` Drift table - `id`, `accountDid`, `postUri`, `postJson`, `likedAt`; unique constraint on `(account_did, post_uri)`
3535+- [ ] Drift migration v15 - add `liked_posts` table
3636+- [ ] `LikedPostsRepository` - `syncLikes(accountDid)`: call `bluesky.feed.getActorLikes(actor:, limit:100, cursor:)`, paginate until hitting known URI or 1000 cap, upsert new entries
3737+- [ ] `LikedPostsRepository.getLikedPosts(accountDid, {limit, offset})` - paginated query
3838+- [ ] `LikedPostsRepository.removeLike(accountDid, postUri)` - delete entry
3939+- [ ] Eviction: drop oldest entries when count exceeds 1000 per account
4040+4141+#### Indexing Pipeline
4242+4343+- [ ] `SemanticIndexer` - orchestrates embedding + storage for new posts
4444+- [ ] `indexPost(postUri, postJson, accountDid, source)` - extract text, embed, upsert `EmbeddedPost`
4545+- [ ] `removePost(postUri)` - delete `EmbeddedPost` entry
4646+- [ ] `backfill(accountDid)` - batch-embed all un-indexed saved + liked posts, chunks of 50, yield between chunks
4747+- [ ] `backfillProgress` stream - emits `(int completed, int total)` for UI progress display
4848+- [ ] Hook into `SavedPostsRepository.savePost()` - queue new save for indexing
4949+- [ ] Hook into `LikedPostsRepository.syncLikes()` - queue newly synced likes for indexing
5050+- [ ] Hook into unsave/unlike - remove from `EmbeddedPost`
5151+5252+#### Vector Search
5353+5454+- [ ] `SemanticSearchRepository` - depends on `EmbeddingService`, `EmbeddingRepository`
5555+- [ ] `search(query, accountDid, {source, maxResults})` - embed query, run `nearestNeighborsF32`, filter by `accountDid` and optional `source`, return `List<SemanticSearchResult>`
5656+- [ ] `SemanticSearchResult` model - `postUri`, `score` (cosine similarity as percentage), `source` (saved/liked)
5757+- [ ] Join results back to Drift `SavedPosts`/`LikedPosts` to hydrate full post JSON for display
5858+5959+### Cubit
6060+6161+- [ ] `SemanticSearchCubit` - `search(query)` with 500ms debounce, `setScope(source)`, `clearResults()`
6262+- [ ] `SemanticSearchState` - `status` (initial/searching/loaded/error/unavailable), `results`, `query`, `scope` (saved/liked/both)
6363+- [ ] `LikedPostsSyncCubit` - `sync()` triggers like sync, exposes sync progress
6464+- [ ] `SemanticIndexCubit` - exposes `backfillProgress`, `indexedCount`, `reindex()` action
6565+6666+### UI
6767+6868+#### Semantic Search Tab
6969+7070+- [ ] Saved posts screen - add "Search" tab alongside existing "All Saved" tab
7171+- [ ] Search text field with hint "Search your saved posts..."
7272+- [ ] Scope toggle chips: "Saved" / "Liked" / "Both" (default: Both)
7373+- [ ] Results list - reuse `PostCard`, ordered by similarity score
7474+- [ ] Relevance badge on each result (percentage)
7575+- [ ] Empty state (no query): "Search your saved and liked posts by meaning, not just keywords"
7676+- [ ] No results state: "No similar posts found"
7777+- [ ] Unavailable state: shown when `EmbeddingService.isAvailable` is false, with explanation
7878+7979+#### Settings
8080+8181+- [ ] Settings screen - new "Search" section
8282+- [ ] "Semantic Search" toggle (default: off) - enables feature, triggers backfill on first enable
8383+- [ ] "Search scope" dropdown - Saved only / Liked only / Both
8484+- [ ] "Index status" tile - shows indexed post count, "Re-index" button
8585+- [ ] "Max results" slider - 10 to 50, default 20
8686+- [ ] Backfill progress indicator - "Indexing: 142/300 posts..." shown during backfill
8787+8888+### Tests
8989+9090+- [ ] Unit tests: `WordPieceTokenizer` - tokenization, padding, truncation, edge cases (empty string, very long text)
9191+- [ ] Unit tests: `EmbeddingService` - initialization, embed returns correct dimensions, L2 normalization, dispose cleanup
9292+- [ ] Unit tests: `PostTextExtractor` - text concatenation from various post shapes (text-only, images with alt, link cards, combinations)
9393+- [ ] Unit tests: `EmbeddingRepository` - upsert, delete, query by account, count
9494+- [ ] Unit tests: `LikedPostsRepository` - sync pagination, dedup on known URI, 1000-cap eviction
9595+- [ ] Unit tests: `SemanticIndexer` - index/remove/backfill, progress stream, integration with save/like hooks
9696+- [ ] Unit tests: `SemanticSearchRepository` - search returns scored results, scope filtering, account isolation
9797+- [ ] Unit tests: `SemanticSearchCubit` - debounce, state transitions, scope changes
9898+- [ ] Unit tests: `SemanticIndexCubit` - backfill progress, reindex trigger
9999+- [ ] Widget tests: search tab renders, query produces results, scope chips filter, relevance badges display, empty/no-results/unavailable states
100100+- [ ] Widget tests: settings section renders, toggle enables/disables, progress indicator during backfill, re-index button triggers reindex
101101+- [ ] Integration test: save a post → verify it appears in semantic search results for a relevant query