···11-# Search & Embeddings
11+# Search
2233-Local full-text + semantic search over the authenticated user's saved and liked posts.
33+Search has two scopes:
4455-## Data Pipeline
55+1. **Network search**: server-side search via Bluesky APIs — no local indexing. Always available.
66+2. **Local search**: full-text + semantic search over the **authenticated user's own** liked and bookmarked/saved posts, stored locally in SQLite.
6777-1. **Sync**: on login and periodically, fetch user's likes (`app.bsky.feed.getActorLikes`) and bookmarks. Paginate fully, store in SQLite.
88-2. **Index FTS**: insert post text into SQLite FTS5 virtual table for keyword search.
99-3. **Embed**: run post text through `fastembed` with `nomic-embed-text-v1.5` (768-dim). Store vectors in `sqlite-vec` virtual table.
1010-4. **Incremental**: track cursor/last-seen; only process new posts on subsequent syncs.
88+Local semantic search (embeddings) is **opt-out**: enabled by default, but can be disabled in settings. When disabled, only local keyword (FTS) search is available and the embedding model is not downloaded.
99+1010+## Network Search (not indexed)
1111+1212+Server-side Bluesky search APIs. These are thin wrappers — no local storage or indexing.
1313+1414+### `app.bsky.feed.searchPosts`
1515+1616+Search all public posts.
1717+1818+| Parameter | Type | Required | Notes |
1919+| --------- | -------- | -------- | -------------------------------------------- |
2020+| `q` | string | yes | Query string. Supports `from:handle` syntax. |
2121+| `sort` | string | no | `top` (default) or `latest` |
2222+| `since` | string | no | ISO 8601 datetime, inclusive |
2323+| `until` | string | no | ISO 8601 datetime, exclusive |
2424+| `author` | string | no | Filter by DID or handle |
2525+| `lang` | string | no | Language code (e.g., `en`) |
2626+| `tag` | string[] | no | Hashtag filter (without `#`), repeatable |
2727+| `limit` | integer | no | 1–100, default 25 |
2828+| `cursor` | string | no | Pagination cursor from previous response |
2929+3030+Returns `{ cursor?, hitsTotal?, posts: PostView[] }`. With auth the response includes `viewer` state.
3131+3232+### `app.bsky.actor.searchActors`
3333+3434+Search user profiles.
3535+3636+| Parameter | Type | Required | Notes |
3737+| --------- | ------- | -------- | ----------------- |
3838+| `q` | string | yes | Query string |
3939+| `limit` | integer | no | 1–100, default 25 |
4040+| `cursor` | string | no | Pagination cursor |
4141+4242+Returns `{ cursor?, actors: ProfileView[] }`.
4343+4444+### `app.bsky.actor.searchActorsTypeahead`
4545+4646+Lightweight actor search for autocomplete (already used in login flow).
4747+4848+| Parameter | Type | Required | Notes |
4949+| --------- | ------- | -------- | ----------------- |
5050+| `q` | string | yes | Query string |
5151+| `limit` | integer | no | 1–100, default 10 |
5252+5353+Returns `{ actors: ProfileViewBasic[] }`. No pagination.
5454+5555+### `app.bsky.graph.searchStarterPacks`
5656+5757+Search starter packs.
5858+5959+| Parameter | Type | Required | Notes |
6060+| --------- | ------- | -------- | ----------------- |
6161+| `q` | string | yes | Query string |
6262+| `limit` | integer | no | 1–100, default 25 |
6363+| `cursor` | string | no | Pagination cursor |
6464+6565+Returns `{ cursor?, starterPacks: StarterPackViewBasic[] }`.
6666+6767+## Local Data Pipeline
6868+6969+1. **Sync**: on login and periodically, fetch the authenticated user's own likes (`app.bsky.feed.getActorLikes`) and bookmarks. Paginate using the API cursor, store posts in SQLite.
7070+2. **Cursor persistence**: store the last-seen API cursor per `(did, source)` in the `sync_state` table. On subsequent syncs, resume from the stored cursor so we only fetch new posts — never re-fetch the full history.
7171+3. **Index FTS**: insert post text into SQLite FTS5 virtual table for keyword search (always active).
7272+4. **Embed** _(opt-out)_: run post text through `fastembed` with `nomic-embed-text-v1.5` (768-dim). Store vectors in `sqlite-vec` virtual table. Skipped when embeddings are disabled.
7373+5. **Reindex**: a manual "Reindex" action clears all embeddings from `posts_vec` and re-embeds every post. Useful after model updates or if the index becomes corrupted.
11741275## SQLite Schema
13761477```sql
1515--- Post storage
7878+-- Post storage (authenticated user's liked/bookmarked posts)
1679CREATE TABLE posts (
1780 uri TEXT PRIMARY KEY,
1881 cid TEXT NOT NULL,
···2285 created_at TEXT,
2386 indexed_at TEXT DEFAULT CURRENT_TIMESTAMP,
2487 json_record TEXT, -- full record JSON
2525- source TEXT NOT NULL -- 'like', 'bookmark', 'own'
8888+ source TEXT NOT NULL -- 'like', 'bookmark'
2689);
27902828--- Full-text search
9191+-- Sync cursor tracking (avoids re-fetching on every sync)
9292+CREATE TABLE sync_state (
9393+ did TEXT NOT NULL,
9494+ source TEXT NOT NULL, -- 'like', 'bookmark'
9595+ cursor TEXT, -- last API cursor returned
9696+ last_synced_at TEXT,
9797+ PRIMARY KEY (did, source)
9898+);
9999+100100+-- Full-text search (always active)
29101CREATE VIRTUAL TABLE posts_fts USING fts5(text, uri UNINDEXED, content=posts, content_rowid=rowid);
301023131--- Vector embeddings
103103+-- Vector embeddings (opt-out — only populated when embeddings enabled)
32104CREATE VIRTUAL TABLE posts_vec USING vec0(
33105 uri TEXT PRIMARY KEY,
34106 embedding float[768]
···3710938110## Search Modes
391114040-| Mode | How |
4141-| -------- | ----------------------------------------------------------------------------------------- |
4242-| Keyword | `SELECT * FROM posts_fts WHERE posts_fts MATCH ?` |
4343-| Semantic | Embed query → `SELECT * FROM posts_vec WHERE embedding MATCH ? ORDER BY distance LIMIT k` |
4444-| Hybrid | Run both, merge results by reciprocal rank fusion |
112112+| Mode | Scope | How |
113113+| -------- | ------ | ----------------------------------------------------------------------------------------- |
114114+| Network | Remote | Server-side via Bluesky APIs (posts, actors, starter packs) — not indexed locally |
115115+| Keyword | Local | `SELECT * FROM posts_fts WHERE posts_fts MATCH ?` |
116116+| Semantic | Local | Embed query → `SELECT * FROM posts_vec WHERE embedding MATCH ? ORDER BY distance LIMIT k` |
117117+| Hybrid | Local | Run keyword + semantic, merge results by reciprocal rank fusion |
4511846119## Embedding Details
4712048121- Model: `nomic-embed-text-v1.5` via `fastembed` (ONNX runtime, no GPU required)
49122- Dimensions: 768 (or 256 with Matryoshka truncation for speed)
50123- Batch embedding on sync; single embedding on search query
5151-- Model downloaded on first use, cached in Tauri app data dir
124124+- Model downloaded on first use, cached in Tauri app data dir (skipped entirely when embeddings disabled)
5212553126## Tauri Commands
5412755128```rs
129129+// Network search (not indexed — direct API calls)
130130+search_posts_network(query: String, sort: Option<String>, limit: Option<u32>, cursor: Option<String>) -> NetworkSearchResult
131131+search_actors(query: String, limit: Option<u32>, cursor: Option<String>) -> ActorSearchResult
132132+search_starter_packs(query: String, limit: Option<u32>, cursor: Option<String>) -> StarterPackSearchResult
133133+// Note: searchActorsTypeahead already exists in auth module
134134+135135+// Local search (user's own likes/bookmarks)
56136search_posts(query: String, mode: "keyword"|"semantic"|"hybrid", limit: u32) -> Vec<PostResult>
5757-sync_liked_posts(did: String) -> SyncStatus
137137+sync_posts(did: String, source: "like"|"bookmark") -> SyncStatus // resumes from stored cursor
58138get_sync_status(did: String) -> SyncStatus
139139+reindex_embeddings() -> () // clears & re-embeds all posts
140140+set_embeddings_enabled(enabled: bool) -> () // opt-out toggle
59141```
6014261143## Keyboard Shortcuts
621446363-| Key | Action |
6464-| -------- | ----------------------------------------------- |
6565-| `/` | Focus search bar from anywhere |
6666-| `Tab` | Cycle search mode (keyword → semantic → hybrid) |
6767-| `Escape` | Clear search / close results |
145145+| Key | Action |
146146+| -------- | --------------------------------------------------------- |
147147+| `/` | Focus search bar from anywhere |
148148+| `Tab` | Cycle search mode (network → keyword → semantic → hybrid) |
149149+| `Escape` | Clear search / close results |
6815069151## UX Polish
70152
+166
docs/specs/v0.2.md
···11+---
22+title: Beyond MVP 1 (v0.2.0)
33+updated: 2026-03-29
44+---
55+66+The most useful "social power toolz" additions are the ones that answer:
77+88+- **How is this account perceived or acted on by others?**
99+- **What network-side artifacts affect me?**
1010+- **What context am I missing before I follow, reply, or trust this account?**
1111+1212+## Social Diagnostics/Tools
1313+1414+Tabbed panel with four buckets:
1515+1616+1. **Reputation/exposure** - lists, labels, starter packs
1717+2. **Safety/boundaries** - blocking, blocked-by, moderation-related visibility
1818+3. **Context/provenance** - profile history, handle/DID/PDS changes, post references
1919+4. **Power-user protocol inspection** - raw records, backlinks, record graph, PDS explorer
2020+2121+## Feature Mapping
2222+2323+| Feature | Problem it Solves | Infra |
2424+| ----------------------------------- | ------------------------------------------------------- | ------------------------------------------------------------ |
2525+| Lists I’m on/this account is on | Why do people react strongly to this account? | Constellation over `app.bsky.graph.listitem` & |
2626+| | | hydrate list via Bluesky APIs |
2727+| Blocked by | Why can’t I interact with some accounts? | ClearSky-like indexing/graph analysis; |
2828+| | What is my exposure? | derived from public block relations where visible |
2929+| Blocking | Who has this account blocked? | Public block records/graph index |
3030+| | What kinds of accounts do they block? | |
3131+| Labels on account | Is there moderation metadata on this actor? | Bluesky moderation/label views where exposed by AppView APIs |
3232+| Starter packs containing account | How are people discovering this account? | Graph relationship indexing |
3333+| | | list/starter-pack hydration |
3434+| Profile/identity history | Did this account recently change handle/name/PDS? | Historical index snapshots |
3535+| OnPosts/posts involving account | Where does this account show up in discourse? | Link/backlink index over posts referencing DID/URI |
3636+| PDS/DID/repo provenance | What PDS is this on, and what repo identity is this? | Existing PDSls-style explorer |
3737+| | | Slingshot-style identity resolution |
3838+| List risk summary | Is this account heavily listed? in what categories? | Derived from list memberships |
3939+| | | list metadata |
4040+| Moderation visibility summary | Before I follow/reply, is there any moderation context? | Labels, lists, blocks, starter packs |
4141+| Network relationship diff over time | What changed about this account recently? | Historical snapshots and graph diffs |
4242+4343+### 1. Lists
4444+4545+Why it matters:
4646+4747+- It explains hidden context around an account.
4848+- It helps users understand reputation, curation, and network placement.
4949+- It is legible to normal users.
5050+5151+Implementation note:
5252+5353+- Query list memberships via Constellation-style link indexing on `app.bsky.graph.listitem`.
5454+- Hydrate the parent list URI into title/owner/details using Bluesky list endpoints.
5555+ Bluesky documents public list hydration via `app.bsky.graph.getList`, and ClearSky visibly exposes Lists as a core account tab.
5656+5757+### 2. Labels
5858+5959+Lightweight but prominent.
6060+6161+Why it matters:
6262+6363+- It affects moderation and user expectations.
6464+- It can explain hidden visibility changes or warning states.
6565+6666+### 3. Blocked-by visibility
6767+6868+Needs careful UX.
6969+7070+Why it matters:
7171+7272+- It helps users understand social or moderation boundaries.
7373+- It can reduce confusion when interactions fail.
7474+7575+Risk:
7676+7777+- It can become voyeuristic or inflammatory if over-emphasized.
7878+7979+### 4. Starter packs
8080+8181+This is underrated and very useful.
8282+8383+Why it matters:
8484+8585+- It tells users how an account is being promoted or grouped.
8686+- It is often more benign and informative than list/blocking data.
8787+8888+Best UI:
8989+9090+- compact cards
9191+- title, creator, description
9292+- why you’re in this pack if derivable
9393+9494+### 5. History
9595+9696+Why it matters:
9797+9898+- Handle changes, PDS changes, and identity churn can be meaningful.
9999+- It helps debug impersonation, migration, or account evolution.
100100+101101+Best UI:
102102+103103+- timeline with discrete events
104104+- profile snapshot diff
105105+- copy raw record action for power users
106106+107107+## UFOs (Lexicons & NSIDs)
108108+109109+| User-facing feature | What user problem it solves |
110110+| ------------------------------------- | ----------------------------------------------------------- |
111111+| Collection/lexicon explorer | What apps and schemas exist beyond core Bluesky? |
112112+| Collection activity charts | Is this feature/app active or growing? |
113113+| Sample records for NSID | What does this collection actually store? |
114114+| Related lexicons/adjacent collections | What other record types go with this app? |
115115+| App footprint for an account | What non-Bluesky collections does this actor appear to use? |
116116+| Trending/unusual collection activity | What new or unusual things are happening on the network? |
117117+118118+### Apps & Collections tab on a profile
119119+120120+Given a profile, infer or inspect which collections that actor has records in, then show:
121121+122122+- core Bluesky collections used
123123+- third-party app collections used
124124+- rare / unusual app footprints
125125+- open samples for those NSIDs
126126+127127+### NSID explorer
128128+129129+A user can search an NSID prefix like sh.tangled, app.bsky, or your own lexicon namespace and see:
130130+131131+- active collections
132132+- sample records
133133+- recent activity shape
134134+- related collections
135135+136136+This is directly aligned with UFOs’ public purpose.
137137+138138+### What is this thing? panel for unknown records
139139+140140+When your client encounters a collection it does not recognize, UFOs can provide:
141141+142142+- sample schema shape from real records
143143+- nearby lexicons
144144+- collection activity hints
145145+- rough understanding of whether it is niche, abandoned, or actively used
146146+147147+That makes your client much better as an ATProto-native explorer.
148148+149149+### Ecosystem radar / discovery dashboard
150150+151151+Examples:
152152+153153+- fastest-growing collections this week
154154+- newly seen lexicon prefixes
155155+- unusual spikes in specific collections
156156+- recently updated for app collections
157157+158158+UFOs can serve recent records of any NSID it has seen.
159159+160160+## Phased Breakdown
161161+162162+1. Phase 1 - lists, labels, blocked-by, starter packs (via Constellation + Bluesky APIs)
163163+2. Phase 2 - profile history, apps & collections tab (via UFOs)
164164+3. Phase 3 - NSID explorer and collection insights (via UFOs)
165165+4. Phase 4 - network relationship diffs and provenance graphs (Constellation)
166166+5. Phase 5 - Discovery dashboard and ecosystem insights (UFOs)
+54-18
docs/tasks/06-search.md
···2233Spec: [search.md](../specs/search.md)
4455-## Steps
55+## Tasks
66+77+### Backend
88+99+#### Network Search
61077-- [ ] Create `src-tauri/src/search.rs`
1111+- [ ] Create
1212+ - `src-tauri/src/search.rs` for business logic
1313+ - `src-tauri/src/commands/search.rs`
1414+- [ ] Implement network search commands (not indexed - direct API calls):
1515+ - `search_posts_network(query, sort?, limit?, cursor?)` → `app.bsky.feed.searchPosts`
1616+ - `search_actors(query, limit?, cursor?)` → `app.bsky.actor.searchActors`
1717+ - `search_starter_packs(query, limit?, cursor?)` → `app.bsky.graph.searchStarterPacks`
1818+ - Note: `searchActorsTypeahead` already exists in auth module
1919+ - Always available - no local setup required
2020+2121+#### Local Data Pipeline (Base)
2222+2323+- [ ] Add `sync_state` table to migrations (stores cursor per `(did, source)`)
824- [ ] Implement `sync_posts(did: String, source: "like"|"bookmark")`:
99- - Paginate `app.bsky.feed.getActorLikes` (or bookmarks)
2525+ - Resume from stored cursor in `sync_state` (never re-fetch full history)
2626+ - Paginate `app.bsky.feed.getActorLikes` (or bookmarks) for the **authenticated user's own** likes/saves
1027 - Upsert into `posts` table
1111- - Insert text into `posts_fts`
1212- - Track sync cursor in `sync_state` table
1313-- [ ] Implement `embed_pending_posts()`:
2828+ - FTS index is maintained automatically via triggers
2929+ - Persist the new cursor back to `sync_state`
3030+3131+#### Embeddings
3232+3333+- [ ] Implement `embed_pending_posts()` *(opt-out - skip when embeddings disabled)*:
1434 - Query posts without embeddings
1535 - Batch through `fastembed` TextEmbedding model (`nomic-embed-text-v1.5`)
1636 - Insert into `posts_vec` via `zerocopy::AsBytes`
3737+- [ ] Implement `reindex_embeddings()`:
3838+ - Clear all rows from `posts_vec`
3939+ - Re-embed every post in `posts` table
4040+ - Triggered manually by user (reindex button in UI)
4141+- [ ] Implement `set_embeddings_enabled(enabled: bool)`:
4242+ - Persist preference; when disabled, skip model download + embedding on sync
4343+ - Keyword search remains fully functional regardless
4444+4545+#### Search Result Context
4646+1747- [ ] Implement `search_posts(query, mode, limit)`:
1818- - `keyword`: FTS5 MATCH query
1919- - `semantic`: embed query string → vec similarity search
2020- - `hybrid`: run both, merge via reciprocal rank fusion
2121-- [ ] `get_sync_status(did)` → last sync time, post counts
2222-- [ ] Model management: download `nomic-embed-text-v1.5` ONNX on first use to `app_data_dir/models/`
4848+ - `keyword`: FTS5 MATCH query (always available)
4949+ - `semantic`: embed query string → vec similarity search (requires embeddings enabled)
5050+ - `hybrid`: run both, merge via reciprocal rank fusion (falls back to keyword-only if embeddings disabled)
5151+- [ ] `get_sync_status(did)` → last sync time, post counts, cursor state
5252+- [ ] Model management: download `nomic-embed-text-v1.5` ONNX on first use to `<app_data_dir>/models/` (skipped when embeddings disabled)
2353- [ ] Background sync: trigger after login, then every 15 min
2424-- [ ] **Frontend**: search bar (`/` to focus) with mode selector, `Motion` sliding indicator underline
2525-- [ ] **Frontend**: search results with staggered `Motion` fade-in, highlighted keyword matches
2626-- [ ] **Frontend**: sync status indicator with animated progress bar, `Presence` fade-out on complete
2727-- [ ] **Frontend**: model download progress bar (percentage + ETA) on first launch
2828- - Splash/Preflight route should explain what the point of this is
2929-- [ ] **Frontend**: empty state illustration when no posts synced yet
3030-- [ ] **Frontend**: `Tab` cycles search mode, `Escape` clears
5454+5555+### Frontend
5656+5757+- [ ] search bar (`/` or `CTRL/CMD + F` to focus) with mode selector (network / keyword / semantic / hybrid), `Motion` sliding indicator underline
5858+- [ ] search results with staggered `Motion` fade-in, highlighted keyword matches
5959+- [ ] sync status indicator with animated progress bar, `Presence` fade-out on complete
6060+- [ ] reindex button: triggers `reindex_embeddings()`, shown in search settings or sync status area
6161+- [ ] embeddings opt-out toggle in settings (disables semantic search, skips model download)
6262+- [ ] model download progress bar (percentage + ETA) on first launch
6363+ - Enabled by default (opt-out)
6464+ - Splash/Preflight route should explain what semantic search provides
6565+- [ ] empty state illustration when no posts synced yet
6666+- [ ] `Tab` cycles search mode (network → keyword → semantic → hybrid), `Escape` clears
···11+CREATE TABLE IF NOT EXISTS sync_state (
22+ did TEXT NOT NULL,
33+ source TEXT NOT NULL,
44+ cursor TEXT,
55+ last_synced_at TEXT,
66+ PRIMARY KEY (did, source)
77+);