docs: atproto relay/indexer design for Phase 7 Tier 3 discovery

+143

1 changed file

expand all

docs

+143

docs/atproto-relay-indexer-design.md

··· 1 + # AT Protocol Relay/Indexer for Peek Feature Discovery 2 + 3 + Design sketch for the network-wide feature search service (Phase 7, Tier 3 discovery). 4 + This is a **separate repo** — not built in this worktree. Captured here for review. 5 + 6 + ## Problem 7 + 8 + Tiers 1 and 2 (direct AT URI resolution, publisher browsing) require knowing a publisher's 9 + handle or DID in advance. For discovery-at-large — "find features tagged 'productivity'" — 10 + you need a server that subscribes to the AT Protocol firehose and indexes feature release 11 + records as they are created. 12 + 13 + ## Architecture 14 + 15 + Same pattern as Bluesky feed generators and labelers: a standalone service that: 16 + 17 + 1. Subscribes to the firehose via `com.atproto.sync.subscribeRepos` 18 + 2. Filters for `space.peek.feature.release` records 19 + 3. Indexes them into a queryable store (Postgres or SQLite) 20 + 4. Exposes a search/list HTTP API consumed by Peek's browse UI 21 + 22 + ### Components 23 + 24 + ``` 25 + atproto-relay-indexer/ 26 + src/ 27 + ingester.ts — firehose subscriber, filters + persists records 28 + indexer.ts — indexing logic: dedup by featureId+version, resolve handles 29 + db.ts — database schema + queries (Postgres for production, SQLite for dev) 30 + api.ts — HTTP API server (Express or Hono) 31 + resolver.ts — DID/handle resolution cache (plc.directory + well-known) 32 + schema.sql — feature_releases table, full-text search index 33 + Dockerfile 34 + fly.toml — fly.io deployment config (cheap, matches bsky ecosystem) 35 + ``` 36 + 37 + ### Data Model 38 + 39 + ```sql 40 + CREATE TABLE feature_releases ( 41 + id SERIAL PRIMARY KEY, 42 + at_uri TEXT UNIQUE NOT NULL, -- at://did:plc:.../space.peek.feature.release/tid 43 + cid TEXT NOT NULL, 44 + feature_id TEXT NOT NULL, -- stable identifier from record 45 + publisher_did TEXT NOT NULL, 46 + publisher_handle TEXT, -- cached, refreshed periodically 47 + name TEXT NOT NULL, 48 + version TEXT NOT NULL, 49 + description TEXT, 50 + categories TEXT[], -- JSON array in SQLite 51 + capabilities TEXT[], 52 + is_latest BOOLEAN DEFAULT false, -- true for highest semver per (publisher_did, feature_id) 53 + created_at TIMESTAMPTZ NOT NULL, 54 + indexed_at TIMESTAMPTZ DEFAULT NOW() 55 + ); 56 + 57 + CREATE INDEX ON feature_releases (feature_id); 58 + CREATE INDEX ON feature_releases (publisher_did); 59 + CREATE INDEX ON feature_releases (is_latest) WHERE is_latest = true; 60 + 61 + -- Full-text search 62 + CREATE INDEX ON feature_releases USING gin(to_tsvector('english', name || ' ' || COALESCE(description, ''))); 63 + ``` 64 + 65 + ### API Surface 66 + 67 + All endpoints unauthenticated (public records only). 68 + 69 + ``` 70 + GET /features 71 + ?q= — full-text search (name + description) 72 + ?category= — filter by category 73 + ?capability= — filter by capability name 74 + ?cursor= — pagination cursor (indexed_at-based) 75 + ?limit= — default 20, max 100 76 + → { features: FeatureRecord[], cursor?: string } 77 + 78 + GET /features/:did/:featureId 79 + → latest release for this (publisher, featureId) pair 80 + → { feature: FeatureRecord } 81 + 82 + GET /publishers/:did 83 + → all latest releases for a publisher DID 84 + → { publisher: { did, handle }, features: FeatureRecord[] } 85 + 86 + GET /health 87 + → { ok: true, lag: number } — firehose lag in seconds 88 + ``` 89 + 90 + ### Peek Client Integration 91 + 92 + Browse UI already has the publisher-search flow. Adding indexer search requires: 93 + 94 + 1. New "Search All" mode alongside "By Publisher" in browse.html 95 + 2. A new `INDEXER_URL` constant (e.g. `https://peek-features.fly.dev`) — hardcoded 96 + initially, configurable via features-manager settings later 97 + 3. A new IPC handler `tile:features:search` that fetches from the indexer URL 98 + (gated on the existing `features.browse` capability, `network` domain added to 99 + features-manager manifest for the indexer host) 100 + 101 + The client should gracefully degrade if the indexer is unreachable — fall back to 102 + the publisher-search-only UI with a status message. 103 + 104 + ### Curated Directory Integration 105 + 106 + The indexer can serve the curated list more robustly than the current hardcoded DID 107 + approach: add a `/featured` endpoint that returns a curated set maintained editorially. 108 + This replaces the `CURATED_PUBLISHER_DID` constant in browse.js with a proper API call. 109 + 110 + ## Hosting 111 + 112 + - fly.io — same region as bsky.network relay nodes, low-cost, easy Docker deploy 113 + - Single instance initially (read replicas if traffic warrants) 114 + - Firehose reconnect with cursor persistence (survive restarts without replay gap) 115 + 116 + ## Security Considerations 117 + 118 + - Only index `space.peek.feature.release` records — ignore all others 119 + - Validate that `featureId` matches the manifest blob's `id` field before indexing 120 + (requires downloading and parsing the manifest blob — do this async in a worker) 121 + - Rate-limit API by IP 122 + - No auth required for reads (all data is public on the firehose anyway) 123 + - CID validation on indexed records (verify blob references are well-formed) 124 + 125 + ## Open Questions 126 + 127 + 1. **NSID namespace**: `space.peek.feature.*` requires controlling the `peek.space` domain 128 + for AT Protocol lexicon validation. Confirm domain ownership before publishing lexicons. 129 + 2. **Blob storage**: The indexer doesn't need to store blobs (Peek fetches directly from PDS). 130 + But thumbnails/icons for the browse UI would be nice — needs a CDN or blob proxy. 131 + 3. **Abuse**: Spammy or malicious feature records will appear in search. Need a flagging 132 + mechanism (simplest: a blocklist of DIDs; proper: labeler integration). 133 + 4. **Backfill**: On first deploy, need to backfill existing records via `com.atproto.repo.listRecords` 134 + across all known PDSes — or wait for organic growth. 135 + 136 + ## Timeline 137 + 138 + This is a separate repo. Suggested order: 139 + 1. Stand up the ingester + SQLite DB (local dev, no fly.io yet) 140 + 2. Add the `/features?q=` search endpoint 141 + 3. Wire Peek browse UI to use it (behind a feature flag / settings toggle) 142 + 4. Deploy to fly.io 143 + 5. Add curated `/featured` endpoint, replace hardcoded DID in browse.js

Configure Feed

Configure Feed