fuzzy find my records ken.waow.tech
embeddings pds search
5
fork

Configure Feed

Select the types of activity you want to include in your feed.

tighten README: fix propagation claims, document filtering

- "nothing lives anywhere else" was wrong — writing to a public PDS
is a broadcast. replaced with a data-propagation section that says
so plainly and gives the actual opt-out (don't click save).
- "nothing new is exposed" was technically defensible but misleading.
semantic search is a new discoverability surface even when the
underlying records were already public. the sharing section now
says both halves.
- added the collection filter + auto time cutoff to the how-it-works
list since they're now a material part of the pipeline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

+11 -4
+11 -4
README.md
··· 8 8 9 9 1. sign in with your handle. oauth goes to your PDS. 10 10 2. backend fetches your whole repo in one call via [`com.atproto.sync.getRepo`](https://atproto.com/specs/sync#getrepo), parses the CAR locally via [zat](https://tangled.org/zat.dev/zat) 11 - 3. each record is embedded with [bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) running through [llama.cpp](https://github.com/ggerganov/llama.cpp), 16 records per batch 12 - 4. optional: click save, and the resulting vector pack is written back to your PDS as a `tech.waow.ken.pack` record + blobs. nothing lives anywhere else. click delete and it's gone. 13 - 5. subsequent sign-ins reuse vectors by `(uri, cid)` — only new or changed records get re-embedded 11 + 3. records whose collections have no semantic text (likes, follows, reposts, blocks, listitems, gates, etc.) are dropped before embedding. large repos also get a 2-year time cutoff so pipeline memory stays bounded; the pack-meta line in the UI shows exactly what was cut 12 + 4. each surviving record is embedded with [bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) running through [llama.cpp](https://github.com/ggerganov/llama.cpp), 16 records per batch 13 + 5. optional: click save, and the resulting vector pack is written back to your PDS as a `tech.waow.ken.pack` record plus a few vector blobs. ken's server keeps nothing past the current session — the pack lives on your PDS, and ken just reloads it on your next sign-in. click delete and ken tombstones the record on your repo 14 + 6. subsequent sign-ins reuse vectors by `(uri, cid)` — only new or changed records get re-embedded 14 15 15 16 search is in-memory cosine similarity across whatever the backend currently has cached for you. partial search works from the moment the first batch finishes, so the UI never blocks waiting on a full index. 16 17 18 + ### data propagation 19 + 20 + writing a record to a public PDS is a broadcast: the PDS emits a firehose event, and any relay or downstream consumer subscribed to your PDS can ingest the record and the blobs it references. this is how atproto is designed to work, and ken participates in it like every other app that writes records. your pack is not uniquely exposed — it propagates the same way your posts do — but "saved on my PDS" is not the same as "only on my PDS." if you want to minimize your network surface, don't click save; an unsaved pack lives only in ken's in-memory cache and disappears when you sign out or the server restarts. 21 + 17 22 ## sharing 18 23 19 - a signed-in user can share a specific search via `https://ken.waow.tech/?handle=X&q=Y`. the backend's `GET /` injects per-query OpenGraph tags so link unfurlers render a real preview. anyone visiting a share URL loads the target's saved pack publicly (via the same PDS read path anyone else could take) and runs the query — no auth needed for readers, and nothing new is exposed because the records were already public on the PDS. 24 + a signed-in user can share a specific search via `https://ken.waow.tech/?handle=X&q=Y`. the backend's `GET /` injects per-query OpenGraph tags so link unfurlers render a real preview. a visitor loads the target's saved pack via the same public-read path anyone else could take, and runs the query — no auth needed. 25 + 26 + the records being searched were already publicly readable from the PDS, so sharing a query doesn't expose any individual record that wasn't exposed before. what it does add is a new way to *find* things: semantic search across every record you've indexed is a different discoverability surface than e.g. scrolling a profile. if you have records that are technically public but you'd rather not see surfaced by meaning, don't save the pack. 20 27 21 28 ## layout 22 29