···11+# FTS5
22+33+full-text search in sqlite. powerful but has sharp edges.
44+55+## ambiguous columns in JOINs silently fail
66+77+when joining an FTS5 virtual table with a regular table, **always qualify column names**. FTS5 tables expose the same column names as the content they index, so unqualified names are ambiguous. sqlite may resolve them unpredictably or error — and if you catch/ignore the error, the query silently returns nothing.
88+99+```sql
1010+-- WRONG — did, handle, display_name exist in both tables
1111+-- silently fails or returns wrong data
1212+SELECT did, handle, display_name, avatar_url
1313+FROM actors_fts
1414+JOIN actors ON actors.did = actors_fts.did
1515+WHERE actors_fts MATCH ?
1616+1717+-- RIGHT — qualify every column
1818+SELECT actors.did, actors.handle, actors.display_name, actors.avatar_url
1919+FROM actors_fts
2020+JOIN actors ON actors.did = actors_fts.did
2121+WHERE actors_fts MATCH ?
2222+```
2323+2424+this bug is especially insidious because: (1) the FTS table has data (you can verify with `WHERE did = ?`), (2) the MATCH query parses fine, (3) queries that don't touch the FTS table (like `LIKE` prefix) still work, so it looks like only certain searches are broken. the actual cause is the JOIN failing on column resolution.
2525+2626+discovered in typeahead ingester — display_name search appeared broken for days. handle-prefix search worked fine (doesn't use FTS), but FTS-based display_name search returned empty. root cause: unqualified column names in the FTS JOIN.
2727+2828+## sanitize user input for MATCH queries
2929+3030+FTS5 MATCH has its own query syntax — `"`, `*`, `+`, `(`, `)`, `^` are operators. user input containing these will cause query errors. strip everything except letters, digits, whitespace, `.` and `-` before building MATCH queries.
3131+3232+```js
3333+// js/ts
3434+function sanitize(q) {
3535+ return q.replace(/[^\p{L}\p{N}\s.-]/gu, "").trim();
3636+}
3737+```
3838+3939+then build the MATCH query:
4040+```
4141+"sanitized term"* -- phrase prefix search
4242+```
4343+4444+## UNINDEXED columns
4545+4646+columns marked `UNINDEXED` are stored in the FTS table but not indexed for MATCH. useful for carrying IDs through without bloating the index:
4747+4848+```sql
4949+CREATE VIRTUAL TABLE actors_fts USING fts5(
5050+ did UNINDEXED, -- stored but not searchable via MATCH
5151+ handle, -- indexed
5252+ display_name, -- indexed
5353+ tokenize='unicode61 remove_diacritics 2'
5454+);
5555+```
5656+5757+you can still `SELECT did FROM actors_fts WHERE did = ?` (exact lookup), but `MATCH` won't search the `did` column.
5858+5959+## ALTER TABLE RENAME on FTS5
6060+6161+FTS5 virtual tables use shadow tables (`_content`, `_data`, `_idx`, `_docsize`, `_config`). `ALTER TABLE RENAME` renames the virtual table but the shadow tables should follow. in practice this mostly works, but to avoid any risk: create the FTS table with its final name instead of renaming.
6262+6363+```sql
6464+-- safer: create with final name, populate from already-swapped source
6565+ALTER TABLE actors_stage RENAME TO actors; -- swap regular table first
6666+CREATE VIRTUAL TABLE actors_fts USING fts5(...); -- create FTS with final name
6767+INSERT INTO actors_fts (did, handle, display_name)
6868+ SELECT did, handle, display_name FROM actors WHERE handle != '';
6969+```
+43
languages/ziglang/0.15/database.md
···267267}
268268```
269269270270+### dynamic filters: bind values, not strings
271271+272272+dynamic query shape is sometimes unavoidable for API filters (`key.any_`, `level.ge_`, optional timestamp bounds, etc.). keep the dynamic part limited to SQL fragments chosen by the program, and bind every request-derived value.
273273+274274+the trap is building a list of "bindings" and then interpolating them into SQL later. even if you escape single quotes, the code now has two SQL paths: normal tuple binding for fixed queries and manual string construction for dynamic queries. it is harder to audit, and every new filter repeats the escape discipline.
275275+276276+prefer a tiny dynamic binding abstraction:
277277+278278+```zig
279279+pub const BoundValue = union(enum) {
280280+ text: []const u8,
281281+ int: i64,
282282+ null,
283283+};
284284+285285+pub fn queryBound(db: *Backend, sql: []const u8, args: []const BoundValue) !Rows {
286286+ const stmt = try sqlite_conn.prepare(sql);
287287+ errdefer stmt.deinit();
288288+289289+ for (args, 0..) |arg, i| switch (arg) {
290290+ .text => |v| try stmt.bindValue(v, i),
291291+ .int => |v| try stmt.bindValue(v, i),
292292+ .null => try stmt.bindValue(null, i),
293293+ };
294294+295295+ return Rows{ .backend = .{ .sqlite = .{ .stmt = stmt, .err = null } } };
296296+}
297297+```
298298+299299+then build filter SQL with placeholders:
300300+301301+```zig
302302+try where.appendSlice(alloc, " AND flow_run_id = ?");
303303+try bindings.append(alloc, .{ .text = flow_run_id });
304304+305305+try where.appendSlice(alloc, " AND level >= ?");
306306+try bindings.append(alloc, .{ .int = level });
307307+```
308308+309309+this keeps the "one obvious way" invariant: values are values, not SQL text. dynamic column names, sort clauses, and operators should still come from fixed enums or code paths, never directly from request strings.
310310+311311+see: [prefect-server dynamic query binding](https://tangled.sh/@zzstoatzz.io/prefect-server/tree/main/src/db/backend.zig)
312312+270313### connection pooling
271314272315pg.zig has built-in pooling:
+46-8
languages/ziglang/0.15/io.md
···40404141if (result.status != .ok) return error.FetchFailed;
42424343-const response = aw.toArrayList().items; // the response body
4343+const response = aw.written(); // borrow the response body
4444```
45454646the allocating writer grows as needed to hold whatever the server sends back.
4747+4848+**WARNING: `toArrayList()` transfers ownership** — after calling it, `deinit()`
4949+frees nothing (it resets the internal buffer to empty). this is a silent memory
5050+leak when used with `defer deinit()`. use `written()` instead to borrow the data
5151+while `deinit()` retains ownership and frees properly. this bug has bitten us
5252+twice: once in zlay (fixed in zat v0.2.14, commit `819dffe`) and again in the
5353+typeahead ingester. both times it caused OOM on long-running processes — ~80KB
5454+leaked per HTTP call, exhausting 256MB in ~25 minutes.
47554856see: [find-bufo/bot/src/main.zig#L196](https://tangled.sh/@zzstoatzz.io/find-bufo/tree/main/bot/src/main.zig#L196)
4957···78867987the high-level apis handle this for you. `http.Server`'s `request.respond()` flushes internally. `http.Client` flushes when the request completes. you only need manual flushes when working with raw streams or tls directly.
80888181-## gzip decompression bug (0.15.x only)
8989+## gzip decompression — force identity on the low-level API
9090+9191+two separate issues at different layers. both want the same workaround.
9292+9393+### 0.15.x panic
9494+9595+http.Client panics when decompressing certain gzip responses on x86_64-linux. the deflate decompressor sets up a Writer with `unreachableRebase` but can hit a code path that calls `rebase` when the buffer fills. fixed in 0.16.
9696+9797+### 0.16 low-level API does not auto-decompress
82988383-http.Client panics when decompressing certain gzip responses on x86_64-linux. the deflate decompressor sets up a Writer with `unreachableRebase` but can hit a code path that calls `rebase` when the buffer fills.
9999+`client.fetch(...)` handles `Content-Encoding` transparently. `client.request(...) + response.reader(&.{}) + streamRemaining` does **not** — you get the raw gzip bytes and any downstream parser chokes. symptom: `parseFromSlice` returns `SyntaxError`, response body starts with `1f 8b`.
841008585-**workaround:**
101101+### workaround (works for both)
102102+103103+use the **typed** `headers.accept_encoding` slot — not `extra_headers`:
104104+86105```zig
8787-_ = try client.fetch(.{
8888- .location = .{ .url = url },
8989- .response_writer = &aw.writer,
106106+// WRONG: extra_headers is additive, zig still sends its default
107107+// "accept-encoding: gzip, deflate, zstd" alongside yours,
108108+// server happily picks gzip
109109+var req = try client.request(.POST, uri, .{
110110+ .extra_headers = &.{
111111+ .{ .name = "Accept-Encoding", .value = "identity" },
112112+ },
113113+});
114114+115115+// RIGHT: typed slot replaces the client's default
116116+var req = try client.request(.POST, uri, .{
90117 .headers = .{ .accept_encoding = .{ .override = "identity" } },
91118});
92119```
931209494-fixed in 0.16. see: [zat/xrpc.zig](https://tangled.sh/zzstoatzz.io/zat/tree/main/src/internal/xrpc.zig#L88)
121121+verified empirically against a real PDS (2026-04-09):
122122+123123+| mode | wire accept-encoding | response content-encoding | result |
124124+|------|---------------------|---------------------------|--------|
125125+| `extra_headers` alone | `gzip, deflate, zstd\r\naccept-encoding: identity` | gzip | SyntaxError |
126126+| `headers.accept_encoding = .override("identity")` | `identity` | (none) | parses ok |
127127+128128+### alternative: use fetch()
129129+130130+if you don't need to hand-build the request (no DPoP, no streaming body), `client.fetch()` auto-decompresses and you don't have to think about any of this. use it when you can.
131131+132132+see: [zat/xrpc.zig](https://tangled.sh/zzstoatzz.io/zat/tree/main/src/internal/xrpc.zig#L32), [embed-on-pds/backend/src/oauth.zig pdsAuthedRequest](https://tangled.sh/zzstoatzz.io/embed-on-pds)
+20-1
languages/ziglang/0.16/io/README.md
···3535- mutexes, futexes, events, and conditions (`Io.Mutex`, `Io.Condition`, `Io.Event`)
3636- memory mapped files
37373838+## timers and retry loops
3939+4040+Retry loops should sleep through the same `std.Io` value that performed the
4141+network request:
4242+4343+```zig
4444+try io.sleep(std.Io.Duration.fromMilliseconds(delay_ms), .awake);
4545+```
4646+4747+This makes the delay cancellation-aware and keeps the code backend-agnostic.
4848+Use `.awake` for monotonic elapsed retry delays; use `.real` only when the
4949+deadline is tied to wall-clock time. If a protocol exposes an absolute Unix
5050+timestamp, convert it deliberately instead of mixing wall-clock and elapsed
5151+duration logic.
5252+5353+`std.Io` also provides randomness. Retry jitter can use `std.Random.IoSource`
5454+so the policy stays on the caller's I/O backend instead of reaching for a
5555+separate global source.
5656+3857## backends
39584059- `Io.Threaded` — thread-based, always available, production default
···4463 - BSD: `Io.Kqueue`
4564 - unsupported platforms: `void`
4665 - uses userspace stack switching (fibers/green threads)
4747- - currently experimental — known performance issues to diagnose
6666+ - currently experimental — known bugs (see [patterns.md](./patterns.md#evented-production-experience))
4867- WASM: fiber-based backends can't work (no stack switching). stackless coroutines planned as future compiler feature.
49685069backend selection:
+95-1
languages/ziglang/0.16/io/patterns.md
···2626}
2727```
28282929-production: use `Io.Threaded` until Evented is stable. the code is identical — just swap the init.
2929+production: `Io.Evented` has known bugs as of `0.16.0-dev.3059` — see [below](#evented-production-experience). the code is identical between backends — just swap the init.
30303131## Threaded InitOptions
3232···160160```
161161162162`net.Stream` no longer has direct `read`/`writeAll`. use `Stream.Reader`/`Stream.Writer`.
163163+164164+## Evented production experience
165165+166166+field notes from running an AT Protocol relay (~2,800 PDS connections) on
167167+`Io.Evented` with `0.16.0-dev.3059`, kernel 6.8.0-101-generic.
168168+169169+### fiber contextSwitch GPF under ReleaseSafe
170170+171171+`Io.Evented` fibers crash immediately under ReleaseSafe on x86_64. the GPF is
172172+in `std.Io.fiber.contextSwitch` — the inline asm that saves/restores
173173+rsp/rbp/rip. the optimizer under ReleaseSafe arranges the code differently than
174174+ReleaseFast, causing the restored instruction pointer to fault.
175175+176176+```
177177+General protection exception (no address available)
178178+lib/std/Io/fiber.zig:30 in contextSwitch
179179+lib/std/Io/Uring.zig:1142 in mainIdle
180180+```
181181+182182+**consequence**: Evented currently requires ReleaseFast, which strips all safety
183183+checks. any bounds error, null dereference, or use-after-free becomes silent
184184+memory corruption instead of a clean panic with stack trace.
185185+186186+**status**: zig stdlib bug. no workaround other than ReleaseFast. a minimal
187187+repro (fiber that returns without yielding) triggers it on the first context
188188+switch.
189189+190190+### cross-backend bridging (Evented fibers ↔ Threaded workers)
191191+192192+the Io interface is backend-agnostic, but **you cannot mix execution contexts**.
193193+Evented fibers cannot safely lock a Threaded mutex — the scheduler accesses
194194+thread-local state that doesn't exist in the fiber context.
195195+196196+**pattern**: bridge with a lock-free MPSC queue using atomics:
197197+198198+```
199199+[Evented fibers] --atomics→ [ring buffer] --wake→ [Threaded worker pool]
200200+```
201201+202202+Evented subscriber fibers enqueue work items via atomic CAS. a bounded set of
203203+Threaded workers dequeue and execute (e.g., postgres queries). no mutex
204204+crossing between backends.
205205+206206+this is the "DbRequestQueue" pattern — decouples the hot networking path
207207+(Evented) from blocking I/O (database) that can't run in fibers.
208208+209209+### safety checks matter more under Evented
210210+211211+under Threaded/ReleaseSafe, a bounds error panics with a stack trace pointing
212212+to the exact line. under Evented/ReleaseFast (forced by the GPF bug), the same
213213+error silently corrupts memory and manifests as a SIGSEGV minutes or hours
214214+later with no useful diagnostic.
215215+216216+example: a websocket library assumed `\r\n` always arrives in a single TCP
217217+read. when TCP splits mid-CRLF, `line_start` advances past `pos` and the
218218+next `buf[line_start..pos]` slice has start > end. under ReleaseSafe this
219219+is an immediate panic:
220220+221221+```
222222+thread 543 panic: start index 1370 is larger than end index 1369
223223+websocket.zig/src/client/client.zig:766
224224+```
225225+226226+under ReleaseFast: silent corruption → SIGSEGV every 30-90 min across ~2,800
227227+connections. took switching back to Threaded/ReleaseSafe to get the stack trace
228228+that identified the real bug.
229229+230230+**lesson**: when forced into ReleaseFast by the fiber GPF, you lose the single
231231+most valuable debugging tool zig provides. any bug that would be trivially
232232+caught by bounds checking becomes a production mystery.
233233+234234+### thread count: Evented vs Threaded
235235+236236+| backend | OS threads | subscriber tasks | RSS |
237237+|---------|-----------|-----------------|-----|
238238+| Threaded (ReleaseSafe) | ~2,830 | ~2,830 | ~1.9 GiB |
239239+| Evented (ReleaseFast) | ~47 | ~2,830 | ~1.2 GiB |
240240+241241+Evented runs the same ~2,800 subscriber tasks on ~47 OS threads (bounded
242242+worker pool + io_uring event loop). RSS is lower partly due to fewer thread
243243+stacks and partly due to ReleaseFast stripping safety metadata.
244244+245245+### uring networking patch
246246+247247+`Io.Uring` ships with networking functions stubbed out as `*Unavailable`
248248+(return `error.NetworkDown`). to use Evented for real networking, you need to
249249+patch `Uring.zig` to implement `netListenIp`, `netAccept`, `netConnectIp`,
250250+`netSend`, `netRead`, `netWrite` using io_uring opcodes (ACCEPT, CONNECT,
251251+SENDMSG, READV, etc.).
252252+253253+note: `bind` and `listen` use sync syscalls because `IORING_OP_BIND` /
254254+`IORING_OP_LISTEN` require kernel 6.11+. DNS resolution (`netLookup`) is
255255+also not patched — subscribers resolve hostnames through a Threaded `pool_io`
256256+fallback.
+12-2
languages/ziglang/0.16/io/synchronization.md
···31313232replaces `std.Thread.Mutex` from 0.15.
33333434-### cross-context usage
3434+### cross-context usage — CRITICAL CONSTRAINT
3535+3636+`Io.Mutex` is futex-based and works from any context **within the same Io runtime**. but it **CANNOT be shared across different Io types** (Threaded vs Evented):
35373636-`Io.Mutex` is futex-based and works from both `std.Thread` workers and Io tasks. if you have a data structure accessed from both explicit threads (e.g., CPU worker pool) and Io tasks (e.g., subscriber fibers), `Io.Mutex` is the correct choice — it integrates with the scheduler in both contexts.
3838+- **Threaded futex on Evented fiber** → blocks the entire Uring OS thread. that thread's io_uring instance can't process CQEs → **deadlock**. other fibers on that thread (including the main fiber doing CA bundle loading, accept loops, etc.) are permanently stuck.
3939+- **Evented futex on plain `std.Thread`** → `Thread.current()` is a threadlocal only set on Uring-managed threads. on a plain thread it's null. in ReleaseFast, `self.?` on null silently gives NULL pointer → **SIGSEGV** at struct field offsets (0x28, 0x30, 0x38 in our case — `ready_queue`, `free_queue`, `io_uring` fields of the Thread struct).
4040+4141+**rule**: all callers of a given `Io.Mutex` must pass the **same Io instance** (or at least the same Io type). if you have a data structure accessed from both a Threaded worker pool and Evented fibers, you must either:
4242+1. run the shared structure entirely on one Io type (e.g., all on pool_io/Threaded)
4343+2. use raw atomics (no Io.Mutex) for the cross-boundary synchronization
4444+3. use an MPSC queue with atomic CAS (no futex involvement)
4545+4646+this was the root cause of the zlay relay SIGSEGV (frame worker threads on pool_io/Threaded calling `Io.Mutex.lockUncancelable` with Evented io on the Resyncer) and the subsequent deadlock (first fix attempt mixed Threaded futex with Evented fiber). see zlay commits 6674812, 439c678.
37473848## Io.Condition
3949
···11+# bluesky notifications
22+33+notes from building `noti`, a bluesky notification manager that compresses unread notifications into a small set of feed cards instead of rendering the raw notification list.
44+55+this is not a general bluesky client writeup. it is a record of the parts of the notifications API and product behavior that turned out to matter in practice.
66+77+## what the API seems to want you to do
88+99+bluesky notifications are not designed around arbitrary per-notification state mutation.
1010+1111+- unread state is a cursor
1212+- activity subscriptions are readable and writable
1313+- actor mutes are readable and writable
1414+- thread mute is writable, but not meaningfully enumerable
1515+1616+that split matters a lot for product design.
1717+1818+## the core read path
1919+2020+the useful starting point is:
2121+2222+- `app.bsky.notification.listNotifications`
2323+- `app.bsky.notification.getUnreadCount`
2424+2525+for a useful notification manager, the raw notification object is not enough by itself. the fields that ended up mattering most were:
2626+2727+- `reason`
2828+- `reasonSubject`
2929+- actor identity
3030+- post text
3131+- whether the record is a reply
3232+- the reply root / parent relationship
3333+3434+in `noti`, the most useful normalized relationship signals were:
3535+3636+- `reason_subject`
3737+- `reply_root_uri`
3838+- `reply_parent_uri`
3939+- `thread_key` (an alias for reply root)
4040+4141+this was enough to do useful grouping such as:
4242+4343+- likes on the same post
4444+- replies inside the same thread
4545+- mentions plus follow-on subscribed posts inside one discussion
4646+4747+## what worked for grouping
4848+4949+the best relationship signals were not fancy.
5050+5151+1. same `reasonSubject`
5252+2. same `reply_root_uri`
5353+3. same actor in a short time window
5454+4. same obvious canonical URL
5555+5656+that was enough to get from “annotated bluesky notification list” to “a few situation reports.”
5757+5858+the main lesson: if the unread volume is low, be concrete. if the unread volume is high, aggressively compress.
5959+6060+for small unread counts, over-abstraction is worse than the native bluesky UI.
6161+6262+## actions: what is actually supportable
6363+6464+the mistake is to let the model invent actions. the action set should be the finite set of mutations the API actually supports and that the app can verify afterward.
6565+6666+the safe actions discovered in `noti` were:
6767+6868+- mark all read
6969+- mute account
7070+- unsubscribe from posts and replies for a subscribed actor
7171+7272+these map cleanly to real bluesky state:
7373+7474+- `app.bsky.notification.updateSeen`
7575+- `app.bsky.graph.muteActor`
7676+- `app.bsky.notification.listActivitySubscriptions`
7777+- `app.bsky.notification.putActivitySubscription`
7878+7979+the important omission was:
8080+8181+- `app.bsky.graph.muteThread`
8282+8383+the mutation exists, but there does not appear to be a corresponding `getMutedThreads` / `listMutedThreads` style endpoint. without a real read path, the UI cannot reliably know whether the action is still available or already applied. that made it a bad fit for a demo that is supposed to manage notifications honestly.
8484+8585+## updateSeen is a cursor, not a subset mutation
8686+8787+this was the biggest product constraint.
8888+8989+`app.bsky.notification.updateSeen` advances the seen cursor. it does not let you mark an arbitrary subset of notifications read.
9090+9191+consequences:
9292+9393+- “mark all read” is honest
9494+- card-level dismiss is not, unless the entire product becomes a sequential “work through the pile” flow
9595+- local-only dismiss state is possible, but it creates a second unread model that diverges from bluesky
9696+9797+for `noti`, the correct move was to keep bluesky as the source of truth and avoid inventing a shadow unread ledger.
9898+9999+## grouped links need to be honest
100100+101101+one subtle edge case:
102102+103103+- if a grouped card spans multiple posts
104104+- and there is no genuinely shared canonical destination
105105+- the card should not show a singular `open post` or `open thread` link
106106+107107+the safe rule is:
108108+109109+- only render a top-right link if all backing notifications converge on one shared target
110110+- otherwise leave the card unlinked
111111+112112+this matters a lot for “2 likes on your posts” style cards. a plural card with a singular CTA is misleading.
113113+114114+## activity subscriptions are more useful than they first appear
115115+116116+subscribed-post notifications made it clear that “mute thread” is often the wrong action anyway.
117117+118118+if a notification exists because the user subscribed to an account’s posts and replies, the better action is usually:
119119+120120+- unsubscribe from posts and replies
121121+122122+that is both more legible and more stateful, because the subscription state is actually readable from bluesky.
123123+124124+## oauth lessons
125125+126126+the most important oauth lesson was not protocol mechanics, but scope discipline.
127127+128128+1. `atproto` by itself is effectively sign-in / account identification, not the full permission story.
129129+2. `transition:generic` is a blunt tool. it can be useful for fast prototyping, but it is not the right long-term shape for a narrowly scoped app.
130130+3. granular scopes and permission sets are the right direction when the app only needs a small set of capabilities.
131131+4. progressive scope upgrade is real: sometimes the right product flow is “start narrow, ask for more later.”
132132+133133+the useful references here were:
134134+135135+- the atproto scopes guide
136136+- the permission sets guide
137137+- the “requesting scopes progressively” docs PR merged into `atproto-website`
138138+- real applications like `plyr.fm`, `pds.ls`, and `blento`
139139+140140+the practical lesson for `noti`: oauth is feasible, but it changes the app from a single-user polling tool into a multi-tenant session-backed product. that is a valid extension, but it is not “just auth.”
141141+142142+## what we learned from the failed sqlite/oauth detour
143143+144144+sqlite itself was not the problem.
145145+146146+sqlite would be fine for:
147147+148148+- preferences
149149+- sessions
150150+- lightweight recurrence memory
151151+152152+but for the demo, sqlite only paid for itself if it was carrying genuinely useful product state. once oauth and recurrence windows were cut, json-on-disk preferences were simpler and more honest.
153153+154154+the broader lesson: do not add substrate just because it seems like the grown-up architecture. add it when it clearly improves the user experience you are demoing.
155155+156156+## current design heuristics for notification products
157157+158158+- keep bluesky as the source of truth for unread state
159159+- compress notifications into a few cards, but do not lose the underlying relationship signals
160160+- prefer actions that have a readable post-mutation state
161161+- be more concrete at low volume and more abstract at high volume
162162+- never let card copy imply a link or action the app cannot defend
163163+164164+## references
165165+166166+- atproto auth and scopes docs: https://atproto.com/guides/scopes
167167+- atproto permission sets docs: https://atproto.com/guides/permission-sets
168168+- merged docs PR on progressive scope upgrades: https://github.com/bluesky-social/atproto-website/pull/618
169169+- `noti` exploration history in `noti-bak`
+2
protocols/atproto/auth.md
···107107- applications can't lock in users by controlling auth
108108- the same identity works across all atmospheric applications
109109- granular scopes enable minimal-permission applications
110110+111111+for more operational notes on scope choice, permission sets, progressive scope upgrades, and when `transition:generic` is or is not appropriate, see [`./oauth/README.md`](./oauth/README.md).
+17
protocols/atproto/firehose.md
···138138139139from [follower-weight/tap_consumer.py](https://github.com/zzstoatzz/follower-weight)
140140141141+## required fields in #commit frames
142142+143143+per the `subscribeRepos` lexicon, `#commit` frames have several required
144144+fields. notable gotcha:
145145+146146+**`tooBig`** (boolean, required, deprecated): always `false` in Sync 1.1.
147147+both indigo (Go) and rsky (Rust) always serialize it. passthrough relays
148148+that do generic CBOR resequencing (decode → modify seq → re-encode) can
149149+drop it if the upstream PDS omits it. strict consumers like
150150+[hydrant](https://github.com/ptrott/hydrant) will reject frames with
151151+missing `tooBig` since it's in the `required` array.
152152+153153+**`since`** (string, nullable): must be a valid TID or null. some PDSes
154154+send `since: ""` (empty string) which passes through both indigo and zlay.
155155+strict parsers break on the empty string — they expect a 13-char TID or
156156+null. see [indigo#1357](https://github.com/bluesky-social/indigo/issues/1357).
157157+141158## cursor management
142159143160firehose supports resumption via cursor (sequence number):
+15-2
protocols/atproto/lexicons.md
···45454646## record keys
47474848-- **tid**: timestamp-based ID. for records where users have many (tracks, likes, posts).
4949-- **literal:self**: singleton. for records where users have one (profile).
4848+the `"key"` field in a record lexicon definition declares what kind of rkey is valid. four types:
4949+5050+| type | meaning | example use |
5151+|------|---------|-------------|
5252+| `tid` | timestamp-based ID, auto-generated | posts, likes, follows — anything users create many of |
5353+| `literal:<value>` | exactly one record per user with this fixed key | `app.bsky.actor.profile` uses `literal:self` |
5454+| `any` | any valid string — enables semantic keys | domain names, integers, application-specific IDs |
5555+| `nsid` | must be a valid NSID | less common, used for meta-records keyed by schema |
50565157```json
5258"key": "tid" // generates 3jui7akfj2k2a
5359"key": "literal:self" // always "self"
6060+"key": "any" // caller picks the rkey
5461```
6262+6363+rkey constraints: 1-512 chars, `A-Za-z0-9` plus `.` `-` `_` `:` `~`. the values `.` and `..` are reserved. case-sensitive but lowercase recommended.
6464+6565+**sidecar pattern**: use the same TID across different collections to link related records. bluesky does this with posts and threadgates — same rkey in `app.bsky.feed.post` and `app.bsky.feed.threadgate` signals they belong together.
6666+6767+**important**: rkeys are user-controlled. never trust them for authorization or assume they follow conventions — hostile accounts can set arbitrary values.
55685669## knownValues
5770
+7
protocols/atproto/oauth/README.md
···11+# oauth
22+33+more detailed notes on AT Protocol OAuth beyond the broad overview in [`../auth.md`](../auth.md).
44+55+## contents
66+77+- [scopes](./scopes.md) - choosing scopes, permission sets, progressive scope upgrades, and when `transition:generic` is the wrong tool
+135
protocols/atproto/oauth/scopes.md
···11+# scopes
22+33+notes on choosing AT Protocol OAuth scopes for real applications.
44+55+this is the more operational companion to [`../auth.md`](../auth.md).
66+77+## the basic distinction
88+99+there are three different levels people tend to mix together:
1010+1111+1. broad protocol scopes like `atproto`
1212+2. granular scopes like `repo:...` or `rpc:...`
1313+3. permission sets exposed as `include:...`
1414+1515+if you do not keep these separate, it becomes very easy to request the wrong thing or to over-scope a small app.
1616+1717+## what `atproto` means in practice
1818+1919+`atproto` is the basic protocol-level sign-in / authorization scope.
2020+2121+it is often the floor, not the whole answer.
2222+2323+for many applications, `atproto` by itself is not the useful permission story. the useful permission story comes from the additional granular scopes or permission sets layered on top.
2424+2525+## granular scopes
2626+2727+granular scopes are the right tool when you know the exact capabilities the app needs.
2828+2929+examples:
3030+3131+- `repo:com.example.collection`
3232+- `blob?accept=image/*`
3333+- `rpc:com.example.doThing?aud=did:web:service.example`
3434+3535+the benefit is obvious: the permission request matches the real behavior of the app.
3636+3737+the downside is also obvious: raw scope strings get noisy fast.
3838+3939+## permission sets
4040+4141+permission sets are the human-facing layer.
4242+4343+they let an app expose a coherent product permission like:
4444+4545+- “Create Bluesky Posts”
4646+- “Full music library access”
4747+4848+instead of making the user reason directly about many low-level scopes.
4949+5050+the technical shape is an `include:...` scope that expands to a published set of underlying permissions.
5151+5252+this is the right direction when:
5353+5454+- the app has a stable product surface
5555+- the app needs multiple underlying scopes
5656+- you want consent screens to be legible
5757+5858+## progressive scope upgrades
5959+6060+this is the most important design lesson from practice.
6161+6262+the right answer to “which scopes should I request?” may change over time for a single user.
6363+6464+patterns:
6565+6666+1. start narrow
6767+2. let the user ask for a feature that needs more power
6868+3. send them back through oauth with an expanded scope string
6969+4. replace the old session on callback
7070+7171+this is much better than:
7272+7373+- requesting everything up front
7474+- or freezing the scope set forever
7575+7676+practical examples:
7777+7878+- `pds.ls`
7979+- `plyr.fm`
8080+8181+this pattern is now documented in the atproto docs PR that added a “requesting scopes progressively” section to the scopes guide:
8282+8383+- https://github.com/bluesky-social/atproto-website/pull/618
8484+8585+## when `transition:generic` is appropriate
8686+8787+`transition:generic` is not the ideal long-term answer for a narrowly scoped app.
8888+8989+it is useful when:
9090+9191+- you are prototyping quickly
9292+- you are porting an older app that effectively assumed app-password-style power
9393+- you need a temporary bridge while figuring out the exact granular scope set
9494+9595+it is a bad fit when:
9696+9797+- the app has a small, knowable capability surface
9898+- you want the consent screen to reflect the actual product
9999+- you intend to keep the scope model stable and explainable
100100+101101+in short:
102102+103103+- good bridge
104104+- bad destination
105105+106106+## how this showed up in `noti`
107107+108108+`noti` was a useful failure case.
109109+110110+the app only needed a narrow slice of functionality, but oauth experimentation went sideways because the scope story was not anchored tightly enough to a known-good pattern.
111111+112112+the lesson was not “oauth is impossible.”
113113+114114+the lesson was:
115115+116116+- do not guess about scope syntax
117117+- copy a working granular-scope pattern from a real app
118118+- keep the permission request legible
119119+- if you need to move fast, use `transition:generic` only as a temporary exploration tool
120120+121121+## practical heuristics
122122+123123+- if the app is tiny and exploratory, start narrow or use app passwords
124124+- if the app is productized and multi-capability, define permission sets
125125+- if the app is growing, design for progressive upgrades from day one
126126+- do not request broad power just because it is easier to implement
127127+128128+## references
129129+130130+- AT Protocol scopes guide: https://atproto.com/guides/scopes
131131+- AT Protocol permission sets guide: https://atproto.com/guides/permission-sets
132132+- progressive upgrades docs PR: https://github.com/bluesky-social/atproto-website/pull/618
133133+- `plyr.fm`
134134+- `pds.ls`
135135+- `blento`
+29-6
protocols/atproto/sync-verification.md
···43434444| field | notes |
4545|---|---|
4646-| `since` | rev of the *preceding* commit (chain link) |
4747-| `prevData` | MST root CID of the preceding commit (chain link) |
4646+| `since` | rev of the *preceding* commit (chain link). TID syntax or null — empty string is invalid (see [indigo#1357](https://github.com/bluesky-social/indigo/issues/1357)) |
4747+| `prevData` | MST root CID of the preceding commit. spec says "semi-optional" but "effectively required for MST inversion" |
4848| `blocks` | CAR slice — only changed blocks, max 2 MB |
4949| `ops` | up to 200 record operations |
5050+| `tooBig` | **deprecated** — producers must set `false`, consumers should ignore |
5151+| `blobs` | **deprecated** — producers must set `[]`, consumers should ignore |
50525153each `repoOp` has:
5254···8587| op touches unchanged subtree not in CAR | stub error (partial tree) |
8688| high-S signature malleability | low-S check rejects it |
87899090+## size limits
9191+9292+- `blocks` field: 2 MB hard limit
9393+- individual record blocks: 1 MB hard limit
9494+- ops per commit: 200 max
9595+- overall WebSocket frame: **5 MB hard limit** (inclusive of all encoding overhead)
9696+8897## chain break → resync
89989090-when the chain breaks (mismatched `since`/`prevData`, or a `#sync` event):
9999+the spec defines a state machine for per-repo sync status: `desynchronized → in-progress → synchronized`. when the chain breaks (mismatched `since`/`prevData`, or a `#sync` event):
91100921011. mark the repo as `desynchronized`
9393-2. queue incoming events for this DID (don't drop them)
102102+2. queue incoming events for this DID (don't drop them — process after resync)
941033. fetch the full CAR — **from the upstream relay first** (not the PDS) to avoid thundering herd
9595-4. verify and reconcile state
9696-5. replay queued events
104104+4. walk the CAR tree, diff against local record state (create/update/delete as needed)
105105+5. replay queued events, update status to `synchronized`
9710698107from the spec:
99108···166175zat has the cryptographic and structural verification. zlay (march 2026) runs chain continuity detection in observation mode — logging breaks and counting them via prometheus, not yet enforcing. `verifyCommitDiff` is wired but behind a config flag; production uses `verifyCommitCar` (signature-only). see [inductive-proof/relay-integration.md](./inductive-proof/relay-integration.md) for details.
167176168177lightrail has the operational scheduling and recovery. collectiondir trusts the upstream relay entirely.
178178+179179+## bsky.network relay rollout (may 2025)
180180+181181+new relay implementation shipped as `relay` (replacing `bigsky`). key changes:
182182+- non-archival — no longer mirrors full repo data
183183+- `#sync` message type supported; `#handle`/`#migration`/`#tombstone` fully removed
184184+- MST inversion validation (behind "lenient mode" flag)
185185+- `listHosts` endpoint; `getRepo` redirects to PDS
186186+187187+two new instances: `relay1.us-west.bsky.network` and `relay1.us-east.bsky.network` (seq starts at 20B to distinguish from bsky.network's ~8.4B). both support `listReposByCollection` via collectiondir sidecar.
188188+189189+rollout plan: lenient mode → point bsky.network at new relay → ratchet to strict per-host (new hosts strict immediately, legacy hosts get time to upgrade).
190190+191191+source: [relay updates blog post](https://atproto.com/blog/relay-updates-sync-v1-1)
169192170193## see also
171194
+76
protocols/atproto/xrpc.md
···11+# XRPC
22+33+XRPC is the HTTP RPC layer used by AT Protocol services. Endpoints live under
44+`/xrpc/{nsid}` and are described by lexicons:
55+66+- query endpoints are `GET` with URL query parameters
77+- procedure endpoints are `POST` with a JSON or binary body
88+- success and failure are both HTTP responses; the body format depends on the
99+ lexicon and endpoint
1010+- endpoint paths are top-level `/xrpc/...` paths, not nested below another
1111+ prefix
1212+1313+## errors
1414+1515+Unsuccessful XRPC responses should be JSON objects with the standard error
1616+envelope:
1717+1818+```json
1919+{"error":"RateLimitExceeded","message":"slow down"}
2020+```
2121+2222+`error` is required by the spec and is the lexicon-defined error name;
2323+`message` is optional and intended for humans. Clients still need to tolerate
2424+non-JSON bodies because load balancers and reverse proxies often produce generic
2525+HTML or text failures.
2626+2727+SDKs should preserve both layers:
2828+2929+- HTTP status, because retry and auth handling are status-driven
3030+- ATProto error name and message, because callers use these for product logic
3131+- response body, because some services include endpoint-specific diagnostics
3232+- rate-limit headers, because retry scheduling should not require reparsing raw
3333+ HTTP headers in every caller
3434+3535+In Zig, this fits better as a response/result value than as a bare error set.
3636+Error sets are good for transport failures and programmer-facing failure modes,
3737+but they cannot carry owned payloads like status, message, body, and headers.
3838+For zat, the low-level XRPC calls can keep returning raw `Response`, while
3939+checked helpers return a tagged union:
4040+4141+```zig
4242+union(enum) {
4343+ ok: Response,
4444+ err: XrpcError,
4545+}
4646+```
4747+4848+That keeps low-level transport behavior available for callers that want to make
4949+their own policy decisions.
5050+5151+## retries
5252+5353+A conservative retry policy should treat only the usual transient HTTP statuses
5454+as retryable:
5555+5656+- `429 Too Many Requests`
5757+- `500 Internal Server Error`
5858+- `502 Bad Gateway`
5959+- `503 Service Unavailable`
6060+- `504 Gateway Timeout`
6161+6262+Procedure bodies need to be replayable before a client retries them. A `[]const
6363+u8` body is replayable; a streaming reader is not unless the caller explicitly
6464+buffers or rewinds it.
6565+6666+Use `Retry-After` when present. `RateLimit-Reset` is useful too, but it is
6767+usually an absolute Unix timestamp, so using it for sleep requires comparing it
6868+with the caller's real clock and then sleeping for the elapsed duration on a
6969+monotonic clock. The XRPC spec recommends limited retries with randomized
7070+exponential backoff; cap all server-directed sleeps so a bad header cannot park
7171+the client forever.
7272+7373+## sources
7474+7575+- [HTTP API (XRPC)](https://atproto.com/specs/xrpc)
7676+- [Lexicon](https://atproto.com/specs/lexicon)