GET /xrpc/app.bsky.actor.searchActorsTypeahead typeahead.waow.tech
16
fork

Configure Feed

Select the types of activity you want to include in your feed.

add created_at + associated fields, extractProfileFields, read-mode batches, edge caching

- add created_at (ISO 8601) and associated (JSON) columns to actors table
- extract shared extractProfileFields() for 4 callsites (enrichment, cron, backfill, admin)
- cleanAssociated() strips zero/false fields to match bsky's compact typeahead shape
- search response now returns full profileViewBasic surface minus viewer
- db.batch() supports read/write mode param (was hardcoded to "write")
- stats handler uses read-mode batch (3-6s → 50-250ms warm)
- stats handler uses CF edge cache (60s TTL) with Server-Timing headers
- search handler cache.put moved to ctx.waitUntil (non-blocking)
- add PLC export streaming backfill script for bulk created_at population
- update docs, architecture, README for new fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

+396 -79
+1 -1
README.md
··· 1 1 # typeahead 2 2 3 - community actor search for [atproto](https://atproto.com). aims to be a drop-in replacement for `app.bsky.actor.searchActorsTypeahead` — same endpoint path and query params, returns core fields (`did`, `handle`, `displayName`, `avatar`, `labels`) but not the full `profileViewBasic` surface (no `associated`, `viewer`, `createdAt`). 3 + community actor search for [atproto](https://atproto.com). aims to be a drop-in replacement for `app.bsky.actor.searchActorsTypeahead` — same endpoint path and query params, returns the full `profileViewBasic` surface minus `viewer` (`did`, `handle`, `displayName`, `avatar`, `associated`, `labels`, `createdAt`). 4 4 5 5 **live:** https://typeahead.waow.tech 6 6
+19 -11
docs/architecture.md
··· 33 33 - **phase 1 — identity**: slingshot resolves DID → handle + PDS endpoint. 34 34 100 DIDs/run, 20 concurrent. attempt tracking (`identity_checked_at`) 35 35 backs off failures for 1 hour. 36 - - **phase 2 — profile + labels**: `getProfiles` batch call (25 DIDs/call, 37 - 75/run) fetches avatar, display_name, and labels in one shot. also 38 - recomputes `hidden` from full label data. attempt tracking 39 - (`profile_checked_at`) backs off failures for 1 hour. actors not 40 - returned by getProfiles are marked to avoid retrying. 36 + - **phase 2 — profile + labels + metadata**: `getProfiles` batch call 37 + (25 DIDs/call, 75/run) fetches avatar, display_name, labels, createdAt, 38 + and associated in one shot. also recomputes `hidden` from full label 39 + data. attempt tracking (`profile_checked_at`) backs off failures for 40 + 1 hour. actors not returned by getProfiles are marked to avoid retrying. 41 41 - lease-coordinated via KV (`enrich_lock`, 30s TTL) — no stampede from 42 42 overlapping ingest batches. 43 43 - converges to zero work as gaps fill (gap-driven queries return fewer ··· 79 79 - identity phase queries actors with `handle = ''` and 80 80 `identity_checked_at < now - 1hr` 81 81 - profile phase queries actors with `handle != ''` and 82 - (`avatar_url = ''` OR `labels = '[]'`), `profile_checked_at < now - 1hr` 82 + (`avatar_url = ''` OR `labels = '[]'` OR `created_at = ''`), 83 + `profile_checked_at < now - 1hr` 83 84 - as actors get enriched, these queries return fewer rows 84 85 - system quiesces when all actors are resolved or backed off 85 86 86 87 AppView (public.api.bsky.app) is used for both enrichment (phase 2 87 88 getProfiles) and moderation label checks (hourly cron). phase 2 gets 88 - avatar, displayName, labels, and hidden in one batch call. 89 + avatar, displayName, labels, createdAt, associated, and hidden in one 90 + batch call. 89 91 90 92 ### read path 91 93 92 94 search query -> cache API (hit?) -> FTS5 prefix match -> reconstruct avatar 93 - URLs from DID + CID -> return `{did, handle, displayName?, avatar?, labels}` 95 + URLs from DID + CID -> return `{did, handle, displayName?, avatar?, associated?, labels, createdAt?}` 94 96 95 97 ## storage 96 98 ··· 104 106 labels TEXT -- 2 bytes ('[]') for ~92% of actors; 105 107 -- ~256 bytes per label for the ~8% that have them 106 108 -- stored as raw JSON array matching bsky's API shape 109 + created_at TEXT -- ~24 bytes (ISO 8601 timestamp from getProfiles) 110 + associated TEXT -- 2 bytes ('{}') until enriched; ~100 bytes when populated 111 + -- stored as JSON matching bsky's profileAssociated shape 107 112 updated_at INTEGER -- 8 bytes 108 113 pds TEXT -- ~40 bytes (PDS endpoint URL) 109 114 identity_checked_at INTEGER -- 8 bytes (last slingshot attempt) 110 115 profile_checked_at INTEGER -- 8 bytes (last PDS profile attempt) 111 116 112 - plus FTS5 index overhead. roughly ~320 bytes/row for unlabeled actors, 113 - ~580 bytes/row for labeled ones (~8% of index). at 1.33M actors this 114 - adds ~30MB over the baseline. 117 + plus FTS5 index overhead. roughly ~350 bytes/row for unlabeled actors, 118 + ~610 bytes/row for labeled ones (~8% of index). `created_at` and 119 + `associated` populate progressively as the enrichment pipeline and 120 + moderation cron cycle through actors — no separate backfill needed. 115 121 116 122 storage is [Turso](https://turso.tech) (hosted libSQL). previously used 117 123 Cloudflare D1 but migrated to Turso to avoid D1's 10GB hard limit and ··· 154 160 populates labels, handle, display_name, avatar_url via getProfiles API. 155 161 also recomputes hidden from full label data, fixing actors incorrectly 156 162 hidden by stale !no-unauthenticated logic. 163 + - `scripts/add-created-associated.sql` — one-shot migration adding created_at 164 + and associated columns to actors table 157 165 - `scripts/add-actor-deltas.sql` — one-shot migration adding actor_deltas 158 166 table for 5-min granularity delta tracking 159 167 - `scripts/migrate-avatar-cid.sql` — one-shot migration from full avatar
+2
schema.sql
··· 6 6 updated_at INTEGER NOT NULL DEFAULT (unixepoch()), 7 7 hidden INTEGER NOT NULL DEFAULT 0, 8 8 labels TEXT NOT NULL DEFAULT '[]', 9 + created_at TEXT DEFAULT '', 10 + associated TEXT DEFAULT '{}', 9 11 pds TEXT DEFAULT '', 10 12 identity_checked_at INTEGER DEFAULT 0, 11 13 profile_checked_at INTEGER DEFAULT 0
+5
scripts/add-created-associated.sql
··· 1 + -- one-shot migration: add created_at and associated columns to actors table 2 + -- run against Turso before deploying the updated worker 3 + 4 + ALTER TABLE actors ADD COLUMN created_at TEXT DEFAULT ''; 5 + ALTER TABLE actors ADD COLUMN associated TEXT DEFAULT '{}';
+237
scripts/backfill-created-at.py
··· 1 + #!/usr/bin/env -S PYTHONUNBUFFERED=1 uv run --script --quiet 2 + # /// script 3 + # requires-python = ">=3.12" 4 + # dependencies = [] 5 + # /// 6 + """ 7 + bulk backfill created_at by streaming the PLC directory export. 8 + 9 + loads all DIDs missing created_at from Turso into a set, then streams 10 + the PLC export (JSONL, chronological) matching creation operations 11 + against our set. batches updates to Turso as matches accumulate. 12 + 13 + usage: 14 + TURSO_URL=... TURSO_AUTH_TOKEN=... ./scripts/backfill-created-at.py 15 + TURSO_URL=... TURSO_AUTH_TOKEN=... ./scripts/backfill-created-at.py --dry-run 16 + """ 17 + 18 + import argparse 19 + import json 20 + import os 21 + import sys 22 + import time 23 + import urllib.request 24 + 25 + PLC_EXPORT_URL = "https://plc.directory/export" 26 + PLC_PAGE_SIZE = 1000 27 + TURSO_BATCH_SIZE = 200 28 + FLUSH_THRESHOLD = 500 # flush to Turso every N matches 29 + 30 + DIM = "\033[2m" 31 + RESET = "\033[0m" 32 + 33 + 34 + # --- turso helpers --- 35 + 36 + def get_turso_url() -> str: 37 + url = os.environ.get("TURSO_URL", "") 38 + if not url: 39 + print("error: TURSO_URL not set", file=sys.stderr) 40 + sys.exit(1) 41 + return url.replace("libsql://", "https://") 42 + 43 + 44 + def get_turso_token() -> str: 45 + token = os.environ.get("TURSO_AUTH_TOKEN", "") 46 + if not token: 47 + print("error: TURSO_AUTH_TOKEN not set", file=sys.stderr) 48 + sys.exit(1) 49 + return token 50 + 51 + 52 + def turso_query_all(sql: str, turso_url: str, turso_token: str) -> list[dict]: 53 + """paginated query to fetch all rows.""" 54 + all_rows = [] 55 + offset = 0 56 + page = 10000 57 + while True: 58 + body = json.dumps({ 59 + "requests": [ 60 + {"type": "execute", "stmt": { 61 + "sql": f"{sql} LIMIT {page} OFFSET {offset}", 62 + "args": [], 63 + }}, 64 + {"type": "close"}, 65 + ] 66 + }).encode() 67 + req = urllib.request.Request( 68 + f"{turso_url}/v3/pipeline", 69 + data=body, 70 + headers={ 71 + "Authorization": f"Bearer {turso_token}", 72 + "Content-Type": "application/json", 73 + }, 74 + ) 75 + with urllib.request.urlopen(req, timeout=60) as resp: 76 + result = json.loads(resp.read()) 77 + res = result["results"][0] 78 + if res.get("type") == "error": 79 + print(f" turso error: {res['error']['message']}", file=sys.stderr) 80 + break 81 + cols = [c["name"] for c in res["response"]["result"]["cols"]] 82 + rows = [] 83 + for row in res["response"]["result"]["rows"]: 84 + rows.append({c: (v["value"] if v["type"] != "null" else None) for c, v in zip(cols, row)}) 85 + all_rows.extend(rows) 86 + if len(rows) < page: 87 + break 88 + offset += page 89 + sys.stdout.write(f"\r loading DIDs... {len(all_rows):,}") 90 + sys.stdout.flush() 91 + return all_rows 92 + 93 + 94 + def turso_batch(stmts: list[dict], turso_url: str, turso_token: str) -> bool: 95 + requests = [{"type": "execute", "stmt": s} for s in stmts] 96 + requests.append({"type": "close"}) 97 + body = json.dumps({"requests": requests}).encode() 98 + req = urllib.request.Request( 99 + f"{turso_url}/v3/pipeline", 100 + data=body, 101 + headers={ 102 + "Authorization": f"Bearer {turso_token}", 103 + "Content-Type": "application/json", 104 + }, 105 + ) 106 + try: 107 + with urllib.request.urlopen(req, timeout=60) as resp: 108 + result = json.loads(resp.read()) 109 + for r in result.get("results", []): 110 + if r.get("type") == "error": 111 + print(f" turso error: {r.get('error', {}).get('message', 'unknown')}") 112 + return False 113 + return True 114 + except urllib.error.HTTPError as e: 115 + err_body = e.read().decode()[:300] 116 + print(f" turso HTTP {e.code}: {err_body}", file=sys.stderr) 117 + return False 118 + except Exception as e: 119 + print(f" turso request failed: {e}", file=sys.stderr) 120 + return False 121 + 122 + 123 + # --- PLC export streaming --- 124 + 125 + def stream_plc_export(after: str = "") -> list[dict]: 126 + """fetch one page from PLC export. returns list of ops.""" 127 + url = f"{PLC_EXPORT_URL}?count={PLC_PAGE_SIZE}" 128 + if after: 129 + url += f"&after={after}" 130 + req = urllib.request.Request(url, headers={"User-Agent": "typeahead-backfill/1.0"}) 131 + try: 132 + with urllib.request.urlopen(req, timeout=30) as resp: 133 + lines = resp.read().decode().strip().split("\n") 134 + return [json.loads(line) for line in lines if line] 135 + except urllib.error.HTTPError as e: 136 + if e.code == 429: 137 + print("\n PLC rate limited — pausing 10s") 138 + time.sleep(10) 139 + return stream_plc_export(after) 140 + raise 141 + except Exception as e: 142 + print(f"\n PLC fetch error: {e} — retrying in 5s") 143 + time.sleep(5) 144 + return stream_plc_export(after) 145 + 146 + 147 + # --- main --- 148 + 149 + def main(): 150 + parser = argparse.ArgumentParser(description="bulk backfill created_at from PLC directory export") 151 + parser.add_argument("--dry-run", action="store_true", help="stream + match but don't write") 152 + args = parser.parse_args() 153 + 154 + turso_url = get_turso_url() 155 + turso_token = get_turso_token() 156 + 157 + # step 1: load all DIDs missing created_at 158 + print("loading DIDs missing created_at from Turso...") 159 + rows = turso_query_all( 160 + "SELECT did FROM actors WHERE length(created_at) = 0 ORDER BY rowid ASC", 161 + turso_url, turso_token, 162 + ) 163 + wanted = {r["did"] for r in rows} 164 + print(f"\n {len(wanted):,} DIDs to backfill") 165 + 166 + if not wanted: 167 + print("nothing to do.") 168 + return 169 + 170 + # step 2: stream PLC export, match creations 171 + matched = 0 172 + scanned = 0 173 + pending: list[dict] = [] # buffered Turso statements 174 + after = "" 175 + t0 = time.time() 176 + 177 + print(f"streaming PLC export... {'(DRY RUN)' if args.dry_run else ''}") 178 + 179 + while wanted: 180 + ops = stream_plc_export(after) 181 + if not ops: 182 + break 183 + 184 + for op in ops: 185 + scanned += 1 186 + did = op.get("did", "") 187 + created_at = op.get("createdAt", "") 188 + 189 + # only care about creation ops for DIDs we're looking for 190 + if did not in wanted: 191 + continue 192 + 193 + # first op for a DID is its creation (prev=null) 194 + prev = op.get("operation", {}).get("prev") 195 + if prev is not None: 196 + continue 197 + 198 + wanted.discard(did) 199 + matched += 1 200 + 201 + pending.append({ 202 + "sql": "UPDATE actors SET created_at = ?1 WHERE did = ?2", 203 + "args": [ 204 + {"type": "text", "value": created_at}, 205 + {"type": "text", "value": did}, 206 + ], 207 + }) 208 + 209 + # flush when buffer is full 210 + if len(pending) >= FLUSH_THRESHOLD and not args.dry_run: 211 + for i in range(0, len(pending), TURSO_BATCH_SIZE): 212 + turso_batch(pending[i : i + TURSO_BATCH_SIZE], turso_url, turso_token) 213 + pending.clear() 214 + 215 + after = ops[-1]["createdAt"] 216 + elapsed = time.time() - t0 217 + rate = scanned / elapsed if elapsed > 0 else 0 218 + remaining = len(wanted) 219 + tag = "dry" if args.dry_run else "live" 220 + sys.stdout.write( 221 + f"\r [{tag}] scanned={scanned:,} matched={matched:,} " 222 + f"remaining={remaining:,} " 223 + f"{DIM}{rate:.0f} ops/s cursor={after}{RESET} " 224 + ) 225 + sys.stdout.flush() 226 + 227 + # flush remaining 228 + if pending and not args.dry_run: 229 + for i in range(0, len(pending), TURSO_BATCH_SIZE): 230 + turso_batch(pending[i : i + TURSO_BATCH_SIZE], turso_url, turso_token) 231 + 232 + elapsed = time.time() - t0 233 + print(f"\n\ndone in {elapsed:.0f}s. matched={matched:,}, missed={len(wanted):,}, scanned={scanned:,} ops") 234 + 235 + 236 + if __name__ == "__main__": 237 + main()
+12 -14
src/backfill.ts
··· 1 1 import type { TursoDB } from "./db"; 2 2 import type { Env } from "./types"; 3 3 import { BSKY_TYPEAHEAD_URL } from "./types"; 4 - import { shouldHide } from "./moderation"; 5 - import { extractAvatarCid } from "./utils"; 4 + import { extractProfileFields } from "./utils"; 6 5 7 6 // --- backfill: remove this block once at parity with Bluesky --- 8 7 ··· 30 29 31 30 // upsert all — fills in missing actors AND enriches existing ones 32 31 // (e.g. actors ingested via Jetstream that lack avatar/displayName) 33 - const stmts = actors.map((a) => 34 - db.prepare( 35 - `INSERT INTO actors (did, handle, display_name, avatar_url, hidden, labels, updated_at) 36 - VALUES (?1, ?2, ?3, ?4, ?5, ?6, unixepoch()) 32 + const stmts = actors.map((a) => { 33 + const f = extractProfileFields(a); 34 + return db.prepare( 35 + `INSERT INTO actors (did, handle, display_name, avatar_url, hidden, labels, created_at, associated, updated_at) 36 + VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, unixepoch()) 37 37 ON CONFLICT(did) DO UPDATE SET 38 38 handle = COALESCE(NULLIF(?2, ''), actors.handle), 39 39 display_name = COALESCE(NULLIF(?3, ''), actors.display_name), 40 40 avatar_url = COALESCE(NULLIF(?4, ''), actors.avatar_url), 41 41 hidden = ?5, 42 42 labels = ?6, 43 + created_at = COALESCE(NULLIF(?7, ''), actors.created_at), 44 + associated = COALESCE(NULLIF(?8, '{}'), actors.associated), 43 45 updated_at = unixepoch()` 44 46 ).bind( 45 - a.did, 46 - a.handle || '', 47 - a.displayName || '', 48 - extractAvatarCid(a.avatar || ''), 49 - shouldHide(a.labels) ? 1 : 0, 50 - JSON.stringify(a.labels || []) 51 - ) 52 - ); 47 + a.did, f.handle, f.displayName, f.avatarCid, 48 + f.hidden, f.labels, f.createdAt, f.associated 49 + ); 50 + }); 53 51 54 52 await db.batch(stmts); 55 53
+7 -6
src/cron.ts
··· 1 1 import type { Stmt, TursoDB } from "./db"; 2 2 import type { Env } from "./types"; 3 3 import { BSKY_GET_PROFILES_URL } from "./types"; 4 - import { shouldHide } from "./moderation"; 5 - import { extractAvatarCid } from "./utils"; 4 + import { extractProfileFields } from "./utils"; 6 5 7 6 /** refresh moderation labels, walking the full index over multiple cron runs */ 8 7 export async function refreshModeration(db: TursoDB, env: Env): Promise<void> { ··· 47 46 48 47 const stmts: Stmt[] = []; 49 48 for (const p of profiles) { 50 - const hide = shouldHide(p.labels) ? 1 : 0; 51 - const avatarCid = extractAvatarCid(p.avatar || ''); 49 + const f = extractProfileFields(p); 52 50 stmts.push( 53 51 db.prepare( 54 52 `UPDATE actors SET hidden = ?1, 55 53 handle = COALESCE(NULLIF(?3, ''), handle), 56 54 display_name = COALESCE(NULLIF(?4, ''), display_name), 57 55 avatar_url = COALESCE(NULLIF(?5, ''), avatar_url), 58 - labels = ?6 56 + labels = ?6, 57 + created_at = COALESCE(NULLIF(?7, ''), created_at), 58 + associated = COALESCE(NULLIF(?8, '{}'), associated) 59 59 WHERE did = ?2` 60 - ).bind(hide, p.did, p.handle || '', p.displayName || '', avatarCid, JSON.stringify(p.labels || [])) 60 + ).bind(f.hidden, p.did, f.handle, f.displayName, f.avatarCid, f.labels, 61 + f.createdAt, f.associated) 61 62 ); 62 63 } 63 64 if (stmts.length > 0) {
+3 -3
src/db.ts
··· 10 10 11 11 export interface TursoDB { 12 12 prepare(sql: string): Stmt; 13 - batch(stmts: Stmt[]): Promise<{ results: unknown[]; meta: { changes: number } }[]>; 13 + batch(stmts: Stmt[], mode?: "write" | "read"): Promise<{ results: unknown[]; meta: { changes: number } }[]>; 14 14 } 15 15 16 16 export function tursoDb(client: Client): TursoDB { ··· 36 36 }; 37 37 return s; 38 38 }, 39 - async batch(stmts) { 39 + async batch(stmts, mode = "write") { 40 40 const results = await client.batch( 41 41 stmts.map((s) => ({ sql: (s as any)._sql as string, args: (s as any)._args() as any[] })), 42 - "write", 42 + mode as any, 43 43 ); 44 44 return results.map((r) => ({ 45 45 results: r.rows as unknown[],
+9 -8
src/enrichment.ts
··· 1 1 import type { Stmt, TursoDB } from "./db"; 2 2 import type { Env, SlingshotResponse } from "./types"; 3 3 import { SLINGSHOT_URL, BSKY_GET_PROFILES_URL } from "./types"; 4 - import { shouldHide } from "./moderation"; 5 - import { extractAvatarCid } from "./utils"; 4 + import { extractProfileFields } from "./utils"; 6 5 7 6 /** record an actor-count snapshot for the current hour (idempotent) */ 8 7 export async function recordSnapshot(db: TursoDB): Promise<void> { ··· 86 85 const { results: profileRows } = await db.prepare( 87 86 `SELECT did FROM actors 88 87 WHERE handle != '' 89 - AND (avatar_url = '' OR labels = '[]') 88 + AND (avatar_url = '' OR labels = '[]' OR created_at = '') 90 89 AND profile_checked_at < unixepoch() - 3600 91 90 ORDER BY profile_checked_at ASC LIMIT 75` 92 91 ).all<{ did: string }>(); ··· 112 111 const returned = new Set<string>(); 113 112 for (const p of profiles) { 114 113 returned.add(p.did); 115 - const avatarCid = extractAvatarCid(p.avatar || ''); 116 - const hide = shouldHide(p.labels) ? 1 : 0; 114 + const f = extractProfileFields(p); 117 115 stmts.push( 118 116 db.prepare( 119 117 `UPDATE actors SET ··· 121 119 display_name = COALESCE(NULLIF(?3, ''), display_name), 122 120 avatar_url = COALESCE(NULLIF(?4, ''), avatar_url), 123 121 labels = ?5, hidden = ?6, 122 + created_at = COALESCE(NULLIF(?7, ''), created_at), 123 + associated = COALESCE(NULLIF(?8, '{}'), associated), 124 124 profile_checked_at = unixepoch() 125 125 WHERE did = ?1` 126 126 ).bind( 127 - p.did, p.handle || '', p.displayName || '', 128 - avatarCid, JSON.stringify(p.labels || []), hide 127 + p.did, f.handle, f.displayName, 128 + f.avatarCid, f.labels, f.hidden, 129 + f.createdAt, f.associated 129 130 ) 130 131 ); 131 - if (avatarCid) enriched++; 132 + if (f.avatarCid) enriched++; 132 133 } 133 134 // mark actors not returned by getProfiles so we back off 134 135 for (const r of batch) {
+10 -16
src/handlers/admin.ts
··· 1 1 import type { TursoDB } from "../db"; 2 2 import type { Env, SlingshotResponse } from "../types"; 3 3 import { SLINGSHOT_URL } from "../types"; 4 - import { clientIP, json, extractAvatarCid } from "../utils"; 5 - import { shouldHide } from "../moderation"; 4 + import { clientIP, json, extractProfileFields } from "../utils"; 6 5 import { recordActorDelta } from "../metrics"; 7 6 8 7 export async function handleDelete( ··· 89 88 90 89 const identity: SlingshotResponse = await res.json(); 91 90 92 - // fetch profile from public API for display name + avatar + labels 93 - let displayName = ""; 94 - let avatarCid = ""; 95 - let hidden = false; 96 - let labelsJson = "[]"; 91 + // fetch profile from public API for display name + avatar + labels + metadata 92 + let f = { displayName: '', avatarCid: '', hidden: 0, labels: '[]', createdAt: '', associated: '{}' }; 97 93 try { 98 94 const profileRes = await fetch( 99 95 `https://public.api.bsky.app/xrpc/app.bsky.actor.getProfile?actor=${encodeURIComponent(identity.did)}` 100 96 ); 101 97 if (profileRes.ok) { 102 - const profile: any = await profileRes.json(); 103 - displayName = profile.displayName || ""; 104 - avatarCid = extractAvatarCid(profile.avatar || ""); 105 - hidden = shouldHide(profile.labels); 106 - labelsJson = JSON.stringify(profile.labels || []); 98 + f = extractProfileFields(await profileRes.json()); 107 99 } 108 100 } catch { 109 101 // profile enrichment is best-effort 110 102 } 111 103 112 104 await db.prepare( 113 - `INSERT INTO actors (did, handle, display_name, avatar_url, hidden, labels, updated_at) 114 - VALUES (?1, ?2, ?3, ?4, ?5, ?6, unixepoch()) 105 + `INSERT INTO actors (did, handle, display_name, avatar_url, hidden, labels, created_at, associated, updated_at) 106 + VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, unixepoch()) 115 107 ON CONFLICT(did) DO UPDATE SET 116 108 handle = ?2, 117 109 display_name = COALESCE(NULLIF(?3, ''), actors.display_name), 118 110 avatar_url = COALESCE(NULLIF(?4, ''), actors.avatar_url), 119 111 hidden = ?5, 120 112 labels = ?6, 113 + created_at = COALESCE(NULLIF(?7, ''), actors.created_at), 114 + associated = COALESCE(NULLIF(?8, '{}'), actors.associated), 121 115 updated_at = unixepoch()` 122 116 ) 123 - .bind(identity.did, identity.handle, displayName, avatarCid, hidden ? 1 : 0, labelsJson) 117 + .bind(identity.did, identity.handle, f.displayName, f.avatarCid, f.hidden, f.labels, f.createdAt, f.associated) 124 118 .run(); 125 119 126 120 return json({ 127 121 handle: identity.handle, 128 122 did: identity.did, 129 - ...(hidden ? { hidden: true, reason: "hidden by moderation" } : { hidden: false }), 123 + ...(f.hidden ? { hidden: true, reason: "hidden by moderation" } : { hidden: false }), 130 124 }); 131 125 }
+7 -5
src/handlers/search.ts
··· 43 43 44 44 const ftsQuery = `"${term}"*`; 45 45 const { results } = await db.prepare( 46 - `SELECT a.did, a.handle, a.display_name, a.avatar_url, a.labels 46 + `SELECT a.did, a.handle, a.display_name, a.avatar_url, a.labels, a.created_at, a.associated 47 47 FROM actors_fts 48 48 JOIN actors a ON a.rowid = actors_fts.rowid 49 49 WHERE actors_fts MATCH ?1 AND a.handle != '' AND a.hidden = 0 ··· 59 59 handle: r.handle, 60 60 ...(r.display_name ? { displayName: r.display_name } : {}), 61 61 ...(r.avatar_url ? { avatar: avatarUrl(r.did, r.avatar_url) } : {}), 62 + ...(r.associated && r.associated !== '{}' ? { associated: JSON.parse(r.associated) } : {}), 62 63 labels: JSON.parse(r.labels || '[]'), 64 + ...(r.created_at ? { createdAt: r.created_at } : {}), 63 65 })); 64 66 65 67 // --- backfill: remove this block once at parity with Bluesky --- ··· 76 78 77 79 const response = json({ actors }); 78 80 79 - // cache for 60 seconds 80 - const cacheable = new Response(response.body, response); 81 + // cache for 60 seconds — non-blocking so it doesn't delay the response 82 + const cacheable = new Response(response.clone().body, response); 81 83 cacheable.headers.set("Cache-Control", "public, max-age=60"); 82 - await cache.put(cacheKey, cacheable.clone()); 84 + ctx.waitUntil(cache.put(cacheKey, cacheable)); 83 85 84 - return cacheable; 86 + return response; 85 87 }
+37 -5
src/handlers/stats.ts
··· 2 2 import { html } from "../utils"; 3 3 import { statsPage, type SnapshotPoint } from "../pages/stats"; 4 4 5 - export async function handleStats(db: TursoDB): Promise<Response> { 5 + const EDGE_CACHE_TTL = 60; // seconds — stats are backward-looking, 1 min staleness is invisible 6 + 7 + export async function handleStats(request: Request, db: TursoDB, ctx: ExecutionContext): Promise<Response> { 8 + // serve from CF edge cache when available (avoids Turso cold-start penalty) 9 + const cache = caches.default; 10 + const cacheKey = new Request(new URL("/stats", request.url).href); 11 + const cached = await cache.match(cacheKey); 12 + if (cached) return cached; 13 + 14 + const t0 = performance.now(); 15 + 6 16 const [metricsRes, snapshotRes, trafficRes, deltasRes] = 7 17 await db.batch([ 8 18 db.prepare( ··· 15 25 "SELECT domain, hits FROM traffic_sources ORDER BY hits DESC LIMIT 10" 16 26 ), 17 27 db.prepare( 18 - "SELECT bucket, actors_delta, handles_delta, avatars_delta FROM actor_deltas ORDER BY bucket ASC LIMIT 2016" 28 + "SELECT bucket, actors_delta, handles_delta, avatars_delta FROM actor_deltas ORDER BY bucket DESC LIMIT 288" 19 29 ), 20 - ]); 30 + ], "read"); 31 + 32 + const tQuery = performance.now(); 33 + 21 34 const rows = (metricsRes.results ?? []) as { 22 35 hour: number; 23 36 searches: number; ··· 25 38 }[]; 26 39 const dbSnapshots = (snapshotRes.results ?? []) as { hour: number; total: number; with_handles: number; with_avatars: number; hidden: number }[]; 27 40 const trafficSources = (trafficRes.results ?? []) as { domain: string; hits: number }[]; 28 - const deltas = (deltasRes.results ?? []) as { bucket: number; actors_delta: number; handles_delta: number; avatars_delta: number }[]; 41 + const deltas = ((deltasRes.results ?? []) as { bucket: number; actors_delta: number; handles_delta: number; avatars_delta: number }[]).reverse(); 29 42 30 43 // build snapshot points with timestamps 31 44 const snapshots: SnapshotPoint[] = dbSnapshots.map((s) => ({ ··· 67 80 const handlePct = total > 0 ? ((withHandles / total) * 100).toFixed(1) : "0"; 68 81 const avatarPct = total > 0 ? ((withAvatars / total) * 100).toFixed(1) : "0"; 69 82 70 - return html(statsPage({ total, hiddenCount, rows, totalSearches, avgLatency, handlePct, avatarPct, snapshots, trafficSources })); 83 + const tProcess = performance.now(); 84 + 85 + const body = statsPage({ total, hiddenCount, rows, totalSearches, avgLatency, handlePct, avatarPct, snapshots, trafficSources }); 86 + 87 + const tRender = performance.now(); 88 + 89 + const queryMs = (tQuery - t0).toFixed(0); 90 + const processMs = (tProcess - tQuery).toFixed(0); 91 + const renderMs = (tRender - tProcess).toFixed(0); 92 + const totalMs2 = (tRender - t0).toFixed(0); 93 + 94 + const response = html(body, { 95 + "Server-Timing": `query;dur=${queryMs}, process;dur=${processMs}, render;dur=${renderMs}, total;dur=${totalMs2}`, 96 + "Cache-Control": `public, max-age=${EDGE_CACHE_TTL}`, 97 + }); 98 + 99 + // populate edge cache — use waitUntil so it completes after response is sent 100 + ctx.waitUntil(cache.put(cacheKey, response.clone())); 101 + 102 + return response; 71 103 }
+1 -1
src/index.ts
··· 43 43 const db = createDb(env); 44 44 45 45 if (pathname === "/stats" && request.method === "GET") { 46 - return handleStats(db); 46 + return handleStats(request, db, ctx); 47 47 } 48 48 49 49 if (pathname === "/request-indexing" && request.method === "POST") {
+6 -7
src/pages/docs.ts
··· 133 133 <tr><td><code>handle</code></td><td class="yes">✓</td><td class="yes">✓</td></tr> 134 134 <tr><td><code>displayName</code></td><td class="yes">✓</td><td class="yes">✓</td></tr> 135 135 <tr><td><code>avatar</code></td><td class="yes">✓</td><td class="yes">✓</td></tr> 136 - <tr><td><code>associated</code></td><td class="yes">✓</td><td class="no">—</td></tr> 136 + <tr><td><code>associated</code></td><td class="yes">✓</td><td class="yes">✓</td></tr> 137 137 <tr><td><code>labels</code></td><td class="yes">✓</td><td class="yes">✓</td></tr> 138 - <tr><td><code>createdAt</code></td><td class="yes">✓</td><td class="no">—</td></tr> 138 + <tr><td><code>createdAt</code></td><td class="yes">✓</td><td class="yes">✓</td></tr> 139 139 <tr><td><code>viewer</code></td><td class="yes">✓</td><td class="no">—</td></tr> 140 140 </table> 141 141 142 142 <div class="callout warn"> 143 - <strong>if you depend on <code>associated</code>, <code>viewer</code>, 144 - or <code>createdAt</code></strong> — this API doesn't return them. we return 145 - did + handle + displayName + avatar + labels, which covers most typeahead UIs. 146 - if you need the full <code>profileViewBasic</code> surface, you'll need to stick with 147 - the bluesky API or fetch those fields separately. 143 + <strong>if you depend on <code>viewer</code></strong> — this API doesn't return it 144 + (it requires authentication). we return did + handle + displayName + avatar + 145 + associated + labels + createdAt, which covers the full <code>profileViewBasic</code> 146 + surface minus <code>viewer</code>. 148 147 </div> 149 148 150 149 <h2>operational notes</h2>
+2
src/types.ts
··· 13 13 display_name: string; 14 14 avatar_url: string; 15 15 labels: string; 16 + created_at: string; 17 + associated: string; 16 18 } 17 19 18 20 export interface IngestEvent {
+38 -2
src/utils.ts
··· 1 1 import { CORS_HEADERS } from "./types"; 2 + import { shouldHide } from "./moderation"; 2 3 3 4 export function clientIP(request: Request): string { 4 5 return request.headers.get("CF-Connecting-IP") || "unknown"; ··· 22 23 return match?.[1] ?? ''; 23 24 } 24 25 26 + /** strip zero/false fields from associated object to match bsky's typeahead shape */ 27 + export function cleanAssociated(assoc: any): Record<string, unknown> { 28 + if (!assoc || typeof assoc !== 'object') return {}; 29 + const clean: Record<string, unknown> = {}; 30 + for (const [k, v] of Object.entries(assoc)) { 31 + if (v === 0 || v === false || v === null || v === undefined) continue; 32 + clean[k] = v; 33 + } 34 + return clean; 35 + } 36 + 37 + /** extract the fields we store from a bsky profile response (getProfiles/getProfile/typeahead) */ 38 + export interface ProfileFields { 39 + handle: string; 40 + displayName: string; 41 + avatarCid: string; 42 + labels: string; 43 + hidden: number; 44 + createdAt: string; 45 + associated: string; 46 + } 47 + 48 + export function extractProfileFields(profile: any): ProfileFields { 49 + return { 50 + handle: profile.handle || '', 51 + displayName: profile.displayName || '', 52 + avatarCid: extractAvatarCid(profile.avatar || ''), 53 + labels: JSON.stringify(profile.labels || []), 54 + hidden: shouldHide(profile.labels) ? 1 : 0, 55 + createdAt: profile.createdAt || '', 56 + associated: JSON.stringify(cleanAssociated(profile.associated)), 57 + }; 58 + } 59 + 25 60 /** strip anything that could break FTS5 syntax, preserving unicode letters/digits */ 26 61 export function sanitize(q: string): string { 27 62 return q.replace(/[^\p{L}\p{N}\s.-]/gu, "").trim(); 28 63 } 29 64 30 - export function html(body: string, status = 200): Response { 65 + export function html(body: string, extra?: Record<string, string> | number, status = 200): Response { 66 + if (typeof extra === "number") { status = extra; extra = undefined; } 31 67 return new Response(body, { 32 68 status, 33 - headers: { "Content-Type": "text/html; charset=utf-8", ...CORS_HEADERS }, 69 + headers: { "Content-Type": "text/html; charset=utf-8", ...CORS_HEADERS, ...extra }, 34 70 }); 35 71 }