fuzzy find my records ken.waow.tech
embeddings pds search
6
fork

Configure Feed

Select the types of activity you want to include in your feed.

slim README to match sibling project style, dedupe about/disclosure

README: cut from 54 lines to 40. dropped the verbose how-it-works
walkthrough, data-propagation section, and sharing explanation. what
remains: one-liner hook, lexicon link, stack diagram, dev/deploy
commands, pointer to notes/. matches the pollz/typeahead pattern.

disclosure popover: the description paragraph was restating what the
about modal already says. trimmed to one line of state context ("on
your PDS. ken reloads it next sign-in." / "in memory only. save to
keep it across sessions.") — the about modal is the single source
for the full explanation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

+25 -38
+23 -34
README.md
··· 1 1 # ken 2 2 3 - fuzzy find any record in your [atproto](https://atproto.com) repo. semantic search over a [PDS](https://atproto.com/guides/data-repos), with the vector pack written back to your own PDS as a record you can inspect or delete. 3 + semantic search over your [atproto](https://atproto.com) repo. sign in, ken builds a vector index of your records, search by meaning. 4 4 5 - running at **[ken.waow.tech](https://ken.waow.tech)**. 5 + **[ken.waow.tech](https://ken.waow.tech)** 6 6 7 - ## how it works 7 + ## lexicons 8 8 9 - 1. sign in with your handle. oauth goes to your PDS. 10 - 2. backend fetches your whole repo in one call via [`com.atproto.sync.getRepo`](https://atproto.com/specs/sync#getrepo), parses the CAR locally via [zat](https://tangled.org/zat.dev/zat) 11 - 3. records whose collections have no semantic text (likes, follows, reposts, blocks, listitems, gates, etc.) are dropped before embedding. large repos also get a 2-year time cutoff so pipeline memory stays bounded; the pack-meta line in the UI shows exactly what was cut 12 - 4. each surviving record is embedded with [bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) running through [llama.cpp](https://github.com/ggerganov/llama.cpp), 16 records per batch 13 - 5. optional: click save, and the resulting vector pack is written back to your PDS as a `tech.waow.ken.pack` record plus a few vector blobs. ken's server keeps nothing past the current session — the pack lives on your PDS, and ken just reloads it on your next sign-in. click delete and ken tombstones the record on your repo 14 - 6. subsequent sign-ins reuse vectors by `(uri, cid)` — only new or changed records get re-embedded 9 + - [`tech.waow.ken.pack`](lexicons/tech/waow/ken/pack.json) — saved vector index (record + blobs on your PDS) 15 10 16 - search is in-memory cosine similarity across whatever the backend currently has cached for you. partial search works from the moment the first batch finishes, so the UI never blocks waiting on a full index. 11 + ## stack 17 12 18 - ### data propagation 13 + ``` 14 + sync.getRepo → CAR walk + filter → llama.cpp embed → in-memory search 15 + 16 + (opt-in) save pack to user's PDS 17 + ``` 19 18 20 - writing a record to a public PDS is a broadcast: the PDS emits a firehose event, and any relay or downstream consumer subscribed to your PDS can ingest the record and the blobs it references. this is how atproto is designed to work, and ken participates in it like every other app that writes records. your pack is not uniquely exposed — it propagates the same way your posts do — but "saved on my PDS" is not the same as "only on my PDS." if you want to minimize your network surface, don't click save; an unsaved pack lives only in ken's in-memory cache and disappears when you sign out or the server restarts. 19 + - **backend**: [zig](https://ziglang.org) 0.16, [zat](https://tangled.org/zat.dev/zat) (AT Protocol), [llama.cpp](https://github.com/ggerganov/llama.cpp) ([bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5)) 20 + - **infra**: [fly.io](https://fly.io) (performance-2x, 4 GB) 21 21 22 - ## sharing 22 + ## develop 23 23 24 - a signed-in user can share a specific search via `https://ken.waow.tech/?handle=X&q=Y`. the backend's `GET /` injects per-query OpenGraph tags so link unfurlers render a real preview. a visitor loads the target's saved pack via the same public-read path anyone else could take, and runs the query — no auth needed. 24 + ```sh 25 + cd backend 26 + zig build 27 + OAUTH_CLIENT_SECRET_KEY=... MODEL_PATH=models/bge-small.gguf ./zig-out/bin/embed-on-pds 28 + ``` 25 29 26 - the records being searched were already publicly readable from the PDS, so sharing a query doesn't expose any individual record that wasn't exposed before. what it does add is a new way to *find* things: semantic search across every record you've indexed is a different discoverability surface than e.g. scrolling a profile. if you have records that are technically public but you'd rather not see surfaced by meaning, don't save the pack. 27 - 28 - ## layout 30 + ## deploy 29 31 30 - ``` 31 - backend/ zig http server + llama.cpp wrapper + indexer 32 - src/ main source 33 - llama-include/ llama.h headers 34 - llama-bin/ linux x86_64 .so files (docker build) 35 - llama-bin-macos/ arm64 dylibs (local dev) 36 - models/ bge-small.gguf 37 - fly.toml production config 38 - Dockerfile multi-stage build for fly 39 - lexicons/ atproto lexicon specs 40 - tech/waow/ken/pack.json 32 + ```sh 33 + just deploy 41 34 ``` 42 35 43 - ## running locally 36 + ## notes 44 37 45 - ```bash 46 - cd backend 47 - zig build 48 - OAUTH_CLIENT_SECRET_KEY=... MODEL_PATH=models/bge-small.gguf ./zig-out/bin/embed-on-pds 49 - ``` 38 + - [large-repo embedding pipeline](notes/large-repo-embedding.md) — filtering, time cutoff, memory budget 50 39 51 40 ## license 52 41
+2 -4
backend/src/assets/main.js
··· 341 341 342 342 if (j.persisted) { 343 343 packStateEl.textContent = "saved"; 344 - packMenuDescEl.textContent = 345 - "a vector index of your records lives as a record on your PDS. ken reloads it on sign-in so you don't have to re-embed."; 344 + packMenuDescEl.textContent = "on your PDS. ken reloads it next sign-in."; 346 345 packViewLink.classList.remove("hidden"); 347 346 packViewLink.href = `https://pdsls.dev/${j.persisted_uri}`; 348 347 packDeleteBtn.classList.remove("hidden"); 349 348 packSaveBtn.classList.add("hidden"); 350 349 } else { 351 350 packStateEl.textContent = "not saved"; 352 - packMenuDescEl.textContent = 353 - "the index lives in ken's memory only. save it to your PDS to keep it across sessions."; 351 + packMenuDescEl.textContent = "in memory only. save to keep it across sessions."; 354 352 packViewLink.classList.add("hidden"); 355 353 packDeleteBtn.classList.add("hidden"); 356 354 packSaveBtn.classList.remove("hidden");