about things
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

wip: atproto (xrpc, oauth, bsky notifications, sync), zig io, sqlite fts5

+737 -20
+69
databases/sqlite/fts5.md
··· 1 + # FTS5 2 + 3 + full-text search in sqlite. powerful but has sharp edges. 4 + 5 + ## ambiguous columns in JOINs silently fail 6 + 7 + when joining an FTS5 virtual table with a regular table, **always qualify column names**. FTS5 tables expose the same column names as the content they index, so unqualified names are ambiguous. sqlite may resolve them unpredictably or error — and if you catch/ignore the error, the query silently returns nothing. 8 + 9 + ```sql 10 + -- WRONG — did, handle, display_name exist in both tables 11 + -- silently fails or returns wrong data 12 + SELECT did, handle, display_name, avatar_url 13 + FROM actors_fts 14 + JOIN actors ON actors.did = actors_fts.did 15 + WHERE actors_fts MATCH ? 16 + 17 + -- RIGHT — qualify every column 18 + SELECT actors.did, actors.handle, actors.display_name, actors.avatar_url 19 + FROM actors_fts 20 + JOIN actors ON actors.did = actors_fts.did 21 + WHERE actors_fts MATCH ? 22 + ``` 23 + 24 + this bug is especially insidious because: (1) the FTS table has data (you can verify with `WHERE did = ?`), (2) the MATCH query parses fine, (3) queries that don't touch the FTS table (like `LIKE` prefix) still work, so it looks like only certain searches are broken. the actual cause is the JOIN failing on column resolution. 25 + 26 + discovered in typeahead ingester — display_name search appeared broken for days. handle-prefix search worked fine (doesn't use FTS), but FTS-based display_name search returned empty. root cause: unqualified column names in the FTS JOIN. 27 + 28 + ## sanitize user input for MATCH queries 29 + 30 + FTS5 MATCH has its own query syntax — `"`, `*`, `+`, `(`, `)`, `^` are operators. user input containing these will cause query errors. strip everything except letters, digits, whitespace, `.` and `-` before building MATCH queries. 31 + 32 + ```js 33 + // js/ts 34 + function sanitize(q) { 35 + return q.replace(/[^\p{L}\p{N}\s.-]/gu, "").trim(); 36 + } 37 + ``` 38 + 39 + then build the MATCH query: 40 + ``` 41 + "sanitized term"* -- phrase prefix search 42 + ``` 43 + 44 + ## UNINDEXED columns 45 + 46 + columns marked `UNINDEXED` are stored in the FTS table but not indexed for MATCH. useful for carrying IDs through without bloating the index: 47 + 48 + ```sql 49 + CREATE VIRTUAL TABLE actors_fts USING fts5( 50 + did UNINDEXED, -- stored but not searchable via MATCH 51 + handle, -- indexed 52 + display_name, -- indexed 53 + tokenize='unicode61 remove_diacritics 2' 54 + ); 55 + ``` 56 + 57 + you can still `SELECT did FROM actors_fts WHERE did = ?` (exact lookup), but `MATCH` won't search the `did` column. 58 + 59 + ## ALTER TABLE RENAME on FTS5 60 + 61 + FTS5 virtual tables use shadow tables (`_content`, `_data`, `_idx`, `_docsize`, `_config`). `ALTER TABLE RENAME` renames the virtual table but the shadow tables should follow. in practice this mostly works, but to avoid any risk: create the FTS table with its final name instead of renaming. 62 + 63 + ```sql 64 + -- safer: create with final name, populate from already-swapped source 65 + ALTER TABLE actors_stage RENAME TO actors; -- swap regular table first 66 + CREATE VIRTUAL TABLE actors_fts USING fts5(...); -- create FTS with final name 67 + INSERT INTO actors_fts (did, handle, display_name) 68 + SELECT did, handle, display_name FROM actors WHERE handle != ''; 69 + ```
+43
languages/ziglang/0.15/database.md
··· 267 267 } 268 268 ``` 269 269 270 + ### dynamic filters: bind values, not strings 271 + 272 + dynamic query shape is sometimes unavoidable for API filters (`key.any_`, `level.ge_`, optional timestamp bounds, etc.). keep the dynamic part limited to SQL fragments chosen by the program, and bind every request-derived value. 273 + 274 + the trap is building a list of "bindings" and then interpolating them into SQL later. even if you escape single quotes, the code now has two SQL paths: normal tuple binding for fixed queries and manual string construction for dynamic queries. it is harder to audit, and every new filter repeats the escape discipline. 275 + 276 + prefer a tiny dynamic binding abstraction: 277 + 278 + ```zig 279 + pub const BoundValue = union(enum) { 280 + text: []const u8, 281 + int: i64, 282 + null, 283 + }; 284 + 285 + pub fn queryBound(db: *Backend, sql: []const u8, args: []const BoundValue) !Rows { 286 + const stmt = try sqlite_conn.prepare(sql); 287 + errdefer stmt.deinit(); 288 + 289 + for (args, 0..) |arg, i| switch (arg) { 290 + .text => |v| try stmt.bindValue(v, i), 291 + .int => |v| try stmt.bindValue(v, i), 292 + .null => try stmt.bindValue(null, i), 293 + }; 294 + 295 + return Rows{ .backend = .{ .sqlite = .{ .stmt = stmt, .err = null } } }; 296 + } 297 + ``` 298 + 299 + then build filter SQL with placeholders: 300 + 301 + ```zig 302 + try where.appendSlice(alloc, " AND flow_run_id = ?"); 303 + try bindings.append(alloc, .{ .text = flow_run_id }); 304 + 305 + try where.appendSlice(alloc, " AND level >= ?"); 306 + try bindings.append(alloc, .{ .int = level }); 307 + ``` 308 + 309 + this keeps the "one obvious way" invariant: values are values, not SQL text. dynamic column names, sort clauses, and operators should still come from fixed enums or code paths, never directly from request strings. 310 + 311 + see: [prefect-server dynamic query binding](https://tangled.sh/@zzstoatzz.io/prefect-server/tree/main/src/db/backend.zig) 312 + 270 313 ### connection pooling 271 314 272 315 pg.zig has built-in pooling:
+46 -8
languages/ziglang/0.15/io.md
··· 40 40 41 41 if (result.status != .ok) return error.FetchFailed; 42 42 43 - const response = aw.toArrayList().items; // the response body 43 + const response = aw.written(); // borrow the response body 44 44 ``` 45 45 46 46 the allocating writer grows as needed to hold whatever the server sends back. 47 + 48 + **WARNING: `toArrayList()` transfers ownership** — after calling it, `deinit()` 49 + frees nothing (it resets the internal buffer to empty). this is a silent memory 50 + leak when used with `defer deinit()`. use `written()` instead to borrow the data 51 + while `deinit()` retains ownership and frees properly. this bug has bitten us 52 + twice: once in zlay (fixed in zat v0.2.14, commit `819dffe`) and again in the 53 + typeahead ingester. both times it caused OOM on long-running processes — ~80KB 54 + leaked per HTTP call, exhausting 256MB in ~25 minutes. 47 55 48 56 see: [find-bufo/bot/src/main.zig#L196](https://tangled.sh/@zzstoatzz.io/find-bufo/tree/main/bot/src/main.zig#L196) 49 57 ··· 78 86 79 87 the high-level apis handle this for you. `http.Server`'s `request.respond()` flushes internally. `http.Client` flushes when the request completes. you only need manual flushes when working with raw streams or tls directly. 80 88 81 - ## gzip decompression bug (0.15.x only) 89 + ## gzip decompression — force identity on the low-level API 90 + 91 + two separate issues at different layers. both want the same workaround. 92 + 93 + ### 0.15.x panic 94 + 95 + http.Client panics when decompressing certain gzip responses on x86_64-linux. the deflate decompressor sets up a Writer with `unreachableRebase` but can hit a code path that calls `rebase` when the buffer fills. fixed in 0.16. 96 + 97 + ### 0.16 low-level API does not auto-decompress 82 98 83 - http.Client panics when decompressing certain gzip responses on x86_64-linux. the deflate decompressor sets up a Writer with `unreachableRebase` but can hit a code path that calls `rebase` when the buffer fills. 99 + `client.fetch(...)` handles `Content-Encoding` transparently. `client.request(...) + response.reader(&.{}) + streamRemaining` does **not** — you get the raw gzip bytes and any downstream parser chokes. symptom: `parseFromSlice` returns `SyntaxError`, response body starts with `1f 8b`. 84 100 85 - **workaround:** 101 + ### workaround (works for both) 102 + 103 + use the **typed** `headers.accept_encoding` slot — not `extra_headers`: 104 + 86 105 ```zig 87 - _ = try client.fetch(.{ 88 - .location = .{ .url = url }, 89 - .response_writer = &aw.writer, 106 + // WRONG: extra_headers is additive, zig still sends its default 107 + // "accept-encoding: gzip, deflate, zstd" alongside yours, 108 + // server happily picks gzip 109 + var req = try client.request(.POST, uri, .{ 110 + .extra_headers = &.{ 111 + .{ .name = "Accept-Encoding", .value = "identity" }, 112 + }, 113 + }); 114 + 115 + // RIGHT: typed slot replaces the client's default 116 + var req = try client.request(.POST, uri, .{ 90 117 .headers = .{ .accept_encoding = .{ .override = "identity" } }, 91 118 }); 92 119 ``` 93 120 94 - fixed in 0.16. see: [zat/xrpc.zig](https://tangled.sh/zzstoatzz.io/zat/tree/main/src/internal/xrpc.zig#L88) 121 + verified empirically against a real PDS (2026-04-09): 122 + 123 + | mode | wire accept-encoding | response content-encoding | result | 124 + |------|---------------------|---------------------------|--------| 125 + | `extra_headers` alone | `gzip, deflate, zstd\r\naccept-encoding: identity` | gzip | SyntaxError | 126 + | `headers.accept_encoding = .override("identity")` | `identity` | (none) | parses ok | 127 + 128 + ### alternative: use fetch() 129 + 130 + if you don't need to hand-build the request (no DPoP, no streaming body), `client.fetch()` auto-decompresses and you don't have to think about any of this. use it when you can. 131 + 132 + see: [zat/xrpc.zig](https://tangled.sh/zzstoatzz.io/zat/tree/main/src/internal/xrpc.zig#L32), [embed-on-pds/backend/src/oauth.zig pdsAuthedRequest](https://tangled.sh/zzstoatzz.io/embed-on-pds)
+20 -1
languages/ziglang/0.16/io/README.md
··· 35 35 - mutexes, futexes, events, and conditions (`Io.Mutex`, `Io.Condition`, `Io.Event`) 36 36 - memory mapped files 37 37 38 + ## timers and retry loops 39 + 40 + Retry loops should sleep through the same `std.Io` value that performed the 41 + network request: 42 + 43 + ```zig 44 + try io.sleep(std.Io.Duration.fromMilliseconds(delay_ms), .awake); 45 + ``` 46 + 47 + This makes the delay cancellation-aware and keeps the code backend-agnostic. 48 + Use `.awake` for monotonic elapsed retry delays; use `.real` only when the 49 + deadline is tied to wall-clock time. If a protocol exposes an absolute Unix 50 + timestamp, convert it deliberately instead of mixing wall-clock and elapsed 51 + duration logic. 52 + 53 + `std.Io` also provides randomness. Retry jitter can use `std.Random.IoSource` 54 + so the policy stays on the caller's I/O backend instead of reaching for a 55 + separate global source. 56 + 38 57 ## backends 39 58 40 59 - `Io.Threaded` — thread-based, always available, production default ··· 44 63 - BSD: `Io.Kqueue` 45 64 - unsupported platforms: `void` 46 65 - uses userspace stack switching (fibers/green threads) 47 - - currently experimental — known performance issues to diagnose 66 + - currently experimental — known bugs (see [patterns.md](./patterns.md#evented-production-experience)) 48 67 - WASM: fiber-based backends can't work (no stack switching). stackless coroutines planned as future compiler feature. 49 68 50 69 backend selection:
+95 -1
languages/ziglang/0.16/io/patterns.md
··· 26 26 } 27 27 ``` 28 28 29 - production: use `Io.Threaded` until Evented is stable. the code is identical — just swap the init. 29 + production: `Io.Evented` has known bugs as of `0.16.0-dev.3059` — see [below](#evented-production-experience). the code is identical between backends — just swap the init. 30 30 31 31 ## Threaded InitOptions 32 32 ··· 160 160 ``` 161 161 162 162 `net.Stream` no longer has direct `read`/`writeAll`. use `Stream.Reader`/`Stream.Writer`. 163 + 164 + ## Evented production experience 165 + 166 + field notes from running an AT Protocol relay (~2,800 PDS connections) on 167 + `Io.Evented` with `0.16.0-dev.3059`, kernel 6.8.0-101-generic. 168 + 169 + ### fiber contextSwitch GPF under ReleaseSafe 170 + 171 + `Io.Evented` fibers crash immediately under ReleaseSafe on x86_64. the GPF is 172 + in `std.Io.fiber.contextSwitch` — the inline asm that saves/restores 173 + rsp/rbp/rip. the optimizer under ReleaseSafe arranges the code differently than 174 + ReleaseFast, causing the restored instruction pointer to fault. 175 + 176 + ``` 177 + General protection exception (no address available) 178 + lib/std/Io/fiber.zig:30 in contextSwitch 179 + lib/std/Io/Uring.zig:1142 in mainIdle 180 + ``` 181 + 182 + **consequence**: Evented currently requires ReleaseFast, which strips all safety 183 + checks. any bounds error, null dereference, or use-after-free becomes silent 184 + memory corruption instead of a clean panic with stack trace. 185 + 186 + **status**: zig stdlib bug. no workaround other than ReleaseFast. a minimal 187 + repro (fiber that returns without yielding) triggers it on the first context 188 + switch. 189 + 190 + ### cross-backend bridging (Evented fibers ↔ Threaded workers) 191 + 192 + the Io interface is backend-agnostic, but **you cannot mix execution contexts**. 193 + Evented fibers cannot safely lock a Threaded mutex — the scheduler accesses 194 + thread-local state that doesn't exist in the fiber context. 195 + 196 + **pattern**: bridge with a lock-free MPSC queue using atomics: 197 + 198 + ``` 199 + [Evented fibers] --atomics→ [ring buffer] --wake→ [Threaded worker pool] 200 + ``` 201 + 202 + Evented subscriber fibers enqueue work items via atomic CAS. a bounded set of 203 + Threaded workers dequeue and execute (e.g., postgres queries). no mutex 204 + crossing between backends. 205 + 206 + this is the "DbRequestQueue" pattern — decouples the hot networking path 207 + (Evented) from blocking I/O (database) that can't run in fibers. 208 + 209 + ### safety checks matter more under Evented 210 + 211 + under Threaded/ReleaseSafe, a bounds error panics with a stack trace pointing 212 + to the exact line. under Evented/ReleaseFast (forced by the GPF bug), the same 213 + error silently corrupts memory and manifests as a SIGSEGV minutes or hours 214 + later with no useful diagnostic. 215 + 216 + example: a websocket library assumed `\r\n` always arrives in a single TCP 217 + read. when TCP splits mid-CRLF, `line_start` advances past `pos` and the 218 + next `buf[line_start..pos]` slice has start > end. under ReleaseSafe this 219 + is an immediate panic: 220 + 221 + ``` 222 + thread 543 panic: start index 1370 is larger than end index 1369 223 + websocket.zig/src/client/client.zig:766 224 + ``` 225 + 226 + under ReleaseFast: silent corruption → SIGSEGV every 30-90 min across ~2,800 227 + connections. took switching back to Threaded/ReleaseSafe to get the stack trace 228 + that identified the real bug. 229 + 230 + **lesson**: when forced into ReleaseFast by the fiber GPF, you lose the single 231 + most valuable debugging tool zig provides. any bug that would be trivially 232 + caught by bounds checking becomes a production mystery. 233 + 234 + ### thread count: Evented vs Threaded 235 + 236 + | backend | OS threads | subscriber tasks | RSS | 237 + |---------|-----------|-----------------|-----| 238 + | Threaded (ReleaseSafe) | ~2,830 | ~2,830 | ~1.9 GiB | 239 + | Evented (ReleaseFast) | ~47 | ~2,830 | ~1.2 GiB | 240 + 241 + Evented runs the same ~2,800 subscriber tasks on ~47 OS threads (bounded 242 + worker pool + io_uring event loop). RSS is lower partly due to fewer thread 243 + stacks and partly due to ReleaseFast stripping safety metadata. 244 + 245 + ### uring networking patch 246 + 247 + `Io.Uring` ships with networking functions stubbed out as `*Unavailable` 248 + (return `error.NetworkDown`). to use Evented for real networking, you need to 249 + patch `Uring.zig` to implement `netListenIp`, `netAccept`, `netConnectIp`, 250 + `netSend`, `netRead`, `netWrite` using io_uring opcodes (ACCEPT, CONNECT, 251 + SENDMSG, READV, etc.). 252 + 253 + note: `bind` and `listen` use sync syscalls because `IORING_OP_BIND` / 254 + `IORING_OP_LISTEN` require kernel 6.11+. DNS resolution (`netLookup`) is 255 + also not patched — subscribers resolve hostnames through a Threaded `pool_io` 256 + fallback.
+12 -2
languages/ziglang/0.16/io/synchronization.md
··· 31 31 32 32 replaces `std.Thread.Mutex` from 0.15. 33 33 34 - ### cross-context usage 34 + ### cross-context usage — CRITICAL CONSTRAINT 35 + 36 + `Io.Mutex` is futex-based and works from any context **within the same Io runtime**. but it **CANNOT be shared across different Io types** (Threaded vs Evented): 35 37 36 - `Io.Mutex` is futex-based and works from both `std.Thread` workers and Io tasks. if you have a data structure accessed from both explicit threads (e.g., CPU worker pool) and Io tasks (e.g., subscriber fibers), `Io.Mutex` is the correct choice — it integrates with the scheduler in both contexts. 38 + - **Threaded futex on Evented fiber** → blocks the entire Uring OS thread. that thread's io_uring instance can't process CQEs → **deadlock**. other fibers on that thread (including the main fiber doing CA bundle loading, accept loops, etc.) are permanently stuck. 39 + - **Evented futex on plain `std.Thread`** → `Thread.current()` is a threadlocal only set on Uring-managed threads. on a plain thread it's null. in ReleaseFast, `self.?` on null silently gives NULL pointer → **SIGSEGV** at struct field offsets (0x28, 0x30, 0x38 in our case — `ready_queue`, `free_queue`, `io_uring` fields of the Thread struct). 40 + 41 + **rule**: all callers of a given `Io.Mutex` must pass the **same Io instance** (or at least the same Io type). if you have a data structure accessed from both a Threaded worker pool and Evented fibers, you must either: 42 + 1. run the shared structure entirely on one Io type (e.g., all on pool_io/Threaded) 43 + 2. use raw atomics (no Io.Mutex) for the cross-boundary synchronization 44 + 3. use an MPSC queue with atomic CAS (no futex involvement) 45 + 46 + this was the root cause of the zlay relay SIGSEGV (frame worker threads on pool_io/Threaded calling `Io.Mutex.lockUncancelable` with Evented io on the Resyncer) and the subsequent deadlock (first fix attempt mixed Threaded futex with Evented fiber). see zlay commits 6674812, 439c678. 37 47 38 48 ## Io.Condition 39 49
+2
protocols/atproto/README.md
··· 64 64 - [identity](./identity.md) - DIDs, handles, resolution 65 65 - [data](./data.md) - repos, records, collections, references 66 66 - [lexicons](./lexicons.md) - schema language, namespaces 67 + - [xrpc](./xrpc.md) - request/response shape, errors, retries, rate limits 67 68 - [firehose](./firehose.md) - event streaming, jetstream 68 69 - [auth](./auth.md) - OAuth, scopes, permission sets 70 + - [oauth](./oauth/README.md) - operational OAuth notes, especially scopes, permission sets, and progressive scope upgrades 69 71 - [labels](./labels.md) - moderation, signed assertions 70 72 - [appviews](./appviews.md) - building appviews, XRPC, frameworks (quickslice, hatk), backfill, link previews 71 73 - [sync-verification](./sync-verification.md) - inductive proof chains, MST inversion, sync 1.1
+169
protocols/atproto/applications/bluesky/notifications.md
··· 1 + # bluesky notifications 2 + 3 + notes from building `noti`, a bluesky notification manager that compresses unread notifications into a small set of feed cards instead of rendering the raw notification list. 4 + 5 + this is not a general bluesky client writeup. it is a record of the parts of the notifications API and product behavior that turned out to matter in practice. 6 + 7 + ## what the API seems to want you to do 8 + 9 + bluesky notifications are not designed around arbitrary per-notification state mutation. 10 + 11 + - unread state is a cursor 12 + - activity subscriptions are readable and writable 13 + - actor mutes are readable and writable 14 + - thread mute is writable, but not meaningfully enumerable 15 + 16 + that split matters a lot for product design. 17 + 18 + ## the core read path 19 + 20 + the useful starting point is: 21 + 22 + - `app.bsky.notification.listNotifications` 23 + - `app.bsky.notification.getUnreadCount` 24 + 25 + for a useful notification manager, the raw notification object is not enough by itself. the fields that ended up mattering most were: 26 + 27 + - `reason` 28 + - `reasonSubject` 29 + - actor identity 30 + - post text 31 + - whether the record is a reply 32 + - the reply root / parent relationship 33 + 34 + in `noti`, the most useful normalized relationship signals were: 35 + 36 + - `reason_subject` 37 + - `reply_root_uri` 38 + - `reply_parent_uri` 39 + - `thread_key` (an alias for reply root) 40 + 41 + this was enough to do useful grouping such as: 42 + 43 + - likes on the same post 44 + - replies inside the same thread 45 + - mentions plus follow-on subscribed posts inside one discussion 46 + 47 + ## what worked for grouping 48 + 49 + the best relationship signals were not fancy. 50 + 51 + 1. same `reasonSubject` 52 + 2. same `reply_root_uri` 53 + 3. same actor in a short time window 54 + 4. same obvious canonical URL 55 + 56 + that was enough to get from “annotated bluesky notification list” to “a few situation reports.” 57 + 58 + the main lesson: if the unread volume is low, be concrete. if the unread volume is high, aggressively compress. 59 + 60 + for small unread counts, over-abstraction is worse than the native bluesky UI. 61 + 62 + ## actions: what is actually supportable 63 + 64 + the mistake is to let the model invent actions. the action set should be the finite set of mutations the API actually supports and that the app can verify afterward. 65 + 66 + the safe actions discovered in `noti` were: 67 + 68 + - mark all read 69 + - mute account 70 + - unsubscribe from posts and replies for a subscribed actor 71 + 72 + these map cleanly to real bluesky state: 73 + 74 + - `app.bsky.notification.updateSeen` 75 + - `app.bsky.graph.muteActor` 76 + - `app.bsky.notification.listActivitySubscriptions` 77 + - `app.bsky.notification.putActivitySubscription` 78 + 79 + the important omission was: 80 + 81 + - `app.bsky.graph.muteThread` 82 + 83 + the mutation exists, but there does not appear to be a corresponding `getMutedThreads` / `listMutedThreads` style endpoint. without a real read path, the UI cannot reliably know whether the action is still available or already applied. that made it a bad fit for a demo that is supposed to manage notifications honestly. 84 + 85 + ## updateSeen is a cursor, not a subset mutation 86 + 87 + this was the biggest product constraint. 88 + 89 + `app.bsky.notification.updateSeen` advances the seen cursor. it does not let you mark an arbitrary subset of notifications read. 90 + 91 + consequences: 92 + 93 + - “mark all read” is honest 94 + - card-level dismiss is not, unless the entire product becomes a sequential “work through the pile” flow 95 + - local-only dismiss state is possible, but it creates a second unread model that diverges from bluesky 96 + 97 + for `noti`, the correct move was to keep bluesky as the source of truth and avoid inventing a shadow unread ledger. 98 + 99 + ## grouped links need to be honest 100 + 101 + one subtle edge case: 102 + 103 + - if a grouped card spans multiple posts 104 + - and there is no genuinely shared canonical destination 105 + - the card should not show a singular `open post` or `open thread` link 106 + 107 + the safe rule is: 108 + 109 + - only render a top-right link if all backing notifications converge on one shared target 110 + - otherwise leave the card unlinked 111 + 112 + this matters a lot for “2 likes on your posts” style cards. a plural card with a singular CTA is misleading. 113 + 114 + ## activity subscriptions are more useful than they first appear 115 + 116 + subscribed-post notifications made it clear that “mute thread” is often the wrong action anyway. 117 + 118 + if a notification exists because the user subscribed to an account’s posts and replies, the better action is usually: 119 + 120 + - unsubscribe from posts and replies 121 + 122 + that is both more legible and more stateful, because the subscription state is actually readable from bluesky. 123 + 124 + ## oauth lessons 125 + 126 + the most important oauth lesson was not protocol mechanics, but scope discipline. 127 + 128 + 1. `atproto` by itself is effectively sign-in / account identification, not the full permission story. 129 + 2. `transition:generic` is a blunt tool. it can be useful for fast prototyping, but it is not the right long-term shape for a narrowly scoped app. 130 + 3. granular scopes and permission sets are the right direction when the app only needs a small set of capabilities. 131 + 4. progressive scope upgrade is real: sometimes the right product flow is “start narrow, ask for more later.” 132 + 133 + the useful references here were: 134 + 135 + - the atproto scopes guide 136 + - the permission sets guide 137 + - the “requesting scopes progressively” docs PR merged into `atproto-website` 138 + - real applications like `plyr.fm`, `pds.ls`, and `blento` 139 + 140 + the practical lesson for `noti`: oauth is feasible, but it changes the app from a single-user polling tool into a multi-tenant session-backed product. that is a valid extension, but it is not “just auth.” 141 + 142 + ## what we learned from the failed sqlite/oauth detour 143 + 144 + sqlite itself was not the problem. 145 + 146 + sqlite would be fine for: 147 + 148 + - preferences 149 + - sessions 150 + - lightweight recurrence memory 151 + 152 + but for the demo, sqlite only paid for itself if it was carrying genuinely useful product state. once oauth and recurrence windows were cut, json-on-disk preferences were simpler and more honest. 153 + 154 + the broader lesson: do not add substrate just because it seems like the grown-up architecture. add it when it clearly improves the user experience you are demoing. 155 + 156 + ## current design heuristics for notification products 157 + 158 + - keep bluesky as the source of truth for unread state 159 + - compress notifications into a few cards, but do not lose the underlying relationship signals 160 + - prefer actions that have a readable post-mutation state 161 + - be more concrete at low volume and more abstract at high volume 162 + - never let card copy imply a link or action the app cannot defend 163 + 164 + ## references 165 + 166 + - atproto auth and scopes docs: https://atproto.com/guides/scopes 167 + - atproto permission sets docs: https://atproto.com/guides/permission-sets 168 + - merged docs PR on progressive scope upgrades: https://github.com/bluesky-social/atproto-website/pull/618 169 + - `noti` exploration history in `noti-bak`
+2
protocols/atproto/auth.md
··· 107 107 - applications can't lock in users by controlling auth 108 108 - the same identity works across all atmospheric applications 109 109 - granular scopes enable minimal-permission applications 110 + 111 + for more operational notes on scope choice, permission sets, progressive scope upgrades, and when `transition:generic` is or is not appropriate, see [`./oauth/README.md`](./oauth/README.md).
+17
protocols/atproto/firehose.md
··· 138 138 139 139 from [follower-weight/tap_consumer.py](https://github.com/zzstoatzz/follower-weight) 140 140 141 + ## required fields in #commit frames 142 + 143 + per the `subscribeRepos` lexicon, `#commit` frames have several required 144 + fields. notable gotcha: 145 + 146 + **`tooBig`** (boolean, required, deprecated): always `false` in Sync 1.1. 147 + both indigo (Go) and rsky (Rust) always serialize it. passthrough relays 148 + that do generic CBOR resequencing (decode → modify seq → re-encode) can 149 + drop it if the upstream PDS omits it. strict consumers like 150 + [hydrant](https://github.com/ptrott/hydrant) will reject frames with 151 + missing `tooBig` since it's in the `required` array. 152 + 153 + **`since`** (string, nullable): must be a valid TID or null. some PDSes 154 + send `since: ""` (empty string) which passes through both indigo and zlay. 155 + strict parsers break on the empty string — they expect a 13-char TID or 156 + null. see [indigo#1357](https://github.com/bluesky-social/indigo/issues/1357). 157 + 141 158 ## cursor management 142 159 143 160 firehose supports resumption via cursor (sequence number):
+15 -2
protocols/atproto/lexicons.md
··· 45 45 46 46 ## record keys 47 47 48 - - **tid**: timestamp-based ID. for records where users have many (tracks, likes, posts). 49 - - **literal:self**: singleton. for records where users have one (profile). 48 + the `"key"` field in a record lexicon definition declares what kind of rkey is valid. four types: 49 + 50 + | type | meaning | example use | 51 + |------|---------|-------------| 52 + | `tid` | timestamp-based ID, auto-generated | posts, likes, follows — anything users create many of | 53 + | `literal:<value>` | exactly one record per user with this fixed key | `app.bsky.actor.profile` uses `literal:self` | 54 + | `any` | any valid string — enables semantic keys | domain names, integers, application-specific IDs | 55 + | `nsid` | must be a valid NSID | less common, used for meta-records keyed by schema | 50 56 51 57 ```json 52 58 "key": "tid" // generates 3jui7akfj2k2a 53 59 "key": "literal:self" // always "self" 60 + "key": "any" // caller picks the rkey 54 61 ``` 62 + 63 + rkey constraints: 1-512 chars, `A-Za-z0-9` plus `.` `-` `_` `:` `~`. the values `.` and `..` are reserved. case-sensitive but lowercase recommended. 64 + 65 + **sidecar pattern**: use the same TID across different collections to link related records. bluesky does this with posts and threadgates — same rkey in `app.bsky.feed.post` and `app.bsky.feed.threadgate` signals they belong together. 66 + 67 + **important**: rkeys are user-controlled. never trust them for authorization or assume they follow conventions — hostile accounts can set arbitrary values. 55 68 56 69 ## knownValues 57 70
+7
protocols/atproto/oauth/README.md
··· 1 + # oauth 2 + 3 + more detailed notes on AT Protocol OAuth beyond the broad overview in [`../auth.md`](../auth.md). 4 + 5 + ## contents 6 + 7 + - [scopes](./scopes.md) - choosing scopes, permission sets, progressive scope upgrades, and when `transition:generic` is the wrong tool
+135
protocols/atproto/oauth/scopes.md
··· 1 + # scopes 2 + 3 + notes on choosing AT Protocol OAuth scopes for real applications. 4 + 5 + this is the more operational companion to [`../auth.md`](../auth.md). 6 + 7 + ## the basic distinction 8 + 9 + there are three different levels people tend to mix together: 10 + 11 + 1. broad protocol scopes like `atproto` 12 + 2. granular scopes like `repo:...` or `rpc:...` 13 + 3. permission sets exposed as `include:...` 14 + 15 + if you do not keep these separate, it becomes very easy to request the wrong thing or to over-scope a small app. 16 + 17 + ## what `atproto` means in practice 18 + 19 + `atproto` is the basic protocol-level sign-in / authorization scope. 20 + 21 + it is often the floor, not the whole answer. 22 + 23 + for many applications, `atproto` by itself is not the useful permission story. the useful permission story comes from the additional granular scopes or permission sets layered on top. 24 + 25 + ## granular scopes 26 + 27 + granular scopes are the right tool when you know the exact capabilities the app needs. 28 + 29 + examples: 30 + 31 + - `repo:com.example.collection` 32 + - `blob?accept=image/*` 33 + - `rpc:com.example.doThing?aud=did:web:service.example` 34 + 35 + the benefit is obvious: the permission request matches the real behavior of the app. 36 + 37 + the downside is also obvious: raw scope strings get noisy fast. 38 + 39 + ## permission sets 40 + 41 + permission sets are the human-facing layer. 42 + 43 + they let an app expose a coherent product permission like: 44 + 45 + - “Create Bluesky Posts” 46 + - “Full music library access” 47 + 48 + instead of making the user reason directly about many low-level scopes. 49 + 50 + the technical shape is an `include:...` scope that expands to a published set of underlying permissions. 51 + 52 + this is the right direction when: 53 + 54 + - the app has a stable product surface 55 + - the app needs multiple underlying scopes 56 + - you want consent screens to be legible 57 + 58 + ## progressive scope upgrades 59 + 60 + this is the most important design lesson from practice. 61 + 62 + the right answer to “which scopes should I request?” may change over time for a single user. 63 + 64 + patterns: 65 + 66 + 1. start narrow 67 + 2. let the user ask for a feature that needs more power 68 + 3. send them back through oauth with an expanded scope string 69 + 4. replace the old session on callback 70 + 71 + this is much better than: 72 + 73 + - requesting everything up front 74 + - or freezing the scope set forever 75 + 76 + practical examples: 77 + 78 + - `pds.ls` 79 + - `plyr.fm` 80 + 81 + this pattern is now documented in the atproto docs PR that added a “requesting scopes progressively” section to the scopes guide: 82 + 83 + - https://github.com/bluesky-social/atproto-website/pull/618 84 + 85 + ## when `transition:generic` is appropriate 86 + 87 + `transition:generic` is not the ideal long-term answer for a narrowly scoped app. 88 + 89 + it is useful when: 90 + 91 + - you are prototyping quickly 92 + - you are porting an older app that effectively assumed app-password-style power 93 + - you need a temporary bridge while figuring out the exact granular scope set 94 + 95 + it is a bad fit when: 96 + 97 + - the app has a small, knowable capability surface 98 + - you want the consent screen to reflect the actual product 99 + - you intend to keep the scope model stable and explainable 100 + 101 + in short: 102 + 103 + - good bridge 104 + - bad destination 105 + 106 + ## how this showed up in `noti` 107 + 108 + `noti` was a useful failure case. 109 + 110 + the app only needed a narrow slice of functionality, but oauth experimentation went sideways because the scope story was not anchored tightly enough to a known-good pattern. 111 + 112 + the lesson was not “oauth is impossible.” 113 + 114 + the lesson was: 115 + 116 + - do not guess about scope syntax 117 + - copy a working granular-scope pattern from a real app 118 + - keep the permission request legible 119 + - if you need to move fast, use `transition:generic` only as a temporary exploration tool 120 + 121 + ## practical heuristics 122 + 123 + - if the app is tiny and exploratory, start narrow or use app passwords 124 + - if the app is productized and multi-capability, define permission sets 125 + - if the app is growing, design for progressive upgrades from day one 126 + - do not request broad power just because it is easier to implement 127 + 128 + ## references 129 + 130 + - AT Protocol scopes guide: https://atproto.com/guides/scopes 131 + - AT Protocol permission sets guide: https://atproto.com/guides/permission-sets 132 + - progressive upgrades docs PR: https://github.com/bluesky-social/atproto-website/pull/618 133 + - `plyr.fm` 134 + - `pds.ls` 135 + - `blento`
+29 -6
protocols/atproto/sync-verification.md
··· 43 43 44 44 | field | notes | 45 45 |---|---| 46 - | `since` | rev of the *preceding* commit (chain link) | 47 - | `prevData` | MST root CID of the preceding commit (chain link) | 46 + | `since` | rev of the *preceding* commit (chain link). TID syntax or null — empty string is invalid (see [indigo#1357](https://github.com/bluesky-social/indigo/issues/1357)) | 47 + | `prevData` | MST root CID of the preceding commit. spec says "semi-optional" but "effectively required for MST inversion" | 48 48 | `blocks` | CAR slice — only changed blocks, max 2 MB | 49 49 | `ops` | up to 200 record operations | 50 + | `tooBig` | **deprecated** — producers must set `false`, consumers should ignore | 51 + | `blobs` | **deprecated** — producers must set `[]`, consumers should ignore | 50 52 51 53 each `repoOp` has: 52 54 ··· 85 87 | op touches unchanged subtree not in CAR | stub error (partial tree) | 86 88 | high-S signature malleability | low-S check rejects it | 87 89 90 + ## size limits 91 + 92 + - `blocks` field: 2 MB hard limit 93 + - individual record blocks: 1 MB hard limit 94 + - ops per commit: 200 max 95 + - overall WebSocket frame: **5 MB hard limit** (inclusive of all encoding overhead) 96 + 88 97 ## chain break → resync 89 98 90 - when the chain breaks (mismatched `since`/`prevData`, or a `#sync` event): 99 + the spec defines a state machine for per-repo sync status: `desynchronized → in-progress → synchronized`. when the chain breaks (mismatched `since`/`prevData`, or a `#sync` event): 91 100 92 101 1. mark the repo as `desynchronized` 93 - 2. queue incoming events for this DID (don't drop them) 102 + 2. queue incoming events for this DID (don't drop them — process after resync) 94 103 3. fetch the full CAR — **from the upstream relay first** (not the PDS) to avoid thundering herd 95 - 4. verify and reconcile state 96 - 5. replay queued events 104 + 4. walk the CAR tree, diff against local record state (create/update/delete as needed) 105 + 5. replay queued events, update status to `synchronized` 97 106 98 107 from the spec: 99 108 ··· 166 175 zat has the cryptographic and structural verification. zlay (march 2026) runs chain continuity detection in observation mode — logging breaks and counting them via prometheus, not yet enforcing. `verifyCommitDiff` is wired but behind a config flag; production uses `verifyCommitCar` (signature-only). see [inductive-proof/relay-integration.md](./inductive-proof/relay-integration.md) for details. 167 176 168 177 lightrail has the operational scheduling and recovery. collectiondir trusts the upstream relay entirely. 178 + 179 + ## bsky.network relay rollout (may 2025) 180 + 181 + new relay implementation shipped as `relay` (replacing `bigsky`). key changes: 182 + - non-archival — no longer mirrors full repo data 183 + - `#sync` message type supported; `#handle`/`#migration`/`#tombstone` fully removed 184 + - MST inversion validation (behind "lenient mode" flag) 185 + - `listHosts` endpoint; `getRepo` redirects to PDS 186 + 187 + two new instances: `relay1.us-west.bsky.network` and `relay1.us-east.bsky.network` (seq starts at 20B to distinguish from bsky.network's ~8.4B). both support `listReposByCollection` via collectiondir sidecar. 188 + 189 + rollout plan: lenient mode → point bsky.network at new relay → ratchet to strict per-host (new hosts strict immediately, legacy hosts get time to upgrade). 190 + 191 + source: [relay updates blog post](https://atproto.com/blog/relay-updates-sync-v1-1) 169 192 170 193 ## see also 171 194
+76
protocols/atproto/xrpc.md
··· 1 + # XRPC 2 + 3 + XRPC is the HTTP RPC layer used by AT Protocol services. Endpoints live under 4 + `/xrpc/{nsid}` and are described by lexicons: 5 + 6 + - query endpoints are `GET` with URL query parameters 7 + - procedure endpoints are `POST` with a JSON or binary body 8 + - success and failure are both HTTP responses; the body format depends on the 9 + lexicon and endpoint 10 + - endpoint paths are top-level `/xrpc/...` paths, not nested below another 11 + prefix 12 + 13 + ## errors 14 + 15 + Unsuccessful XRPC responses should be JSON objects with the standard error 16 + envelope: 17 + 18 + ```json 19 + {"error":"RateLimitExceeded","message":"slow down"} 20 + ``` 21 + 22 + `error` is required by the spec and is the lexicon-defined error name; 23 + `message` is optional and intended for humans. Clients still need to tolerate 24 + non-JSON bodies because load balancers and reverse proxies often produce generic 25 + HTML or text failures. 26 + 27 + SDKs should preserve both layers: 28 + 29 + - HTTP status, because retry and auth handling are status-driven 30 + - ATProto error name and message, because callers use these for product logic 31 + - response body, because some services include endpoint-specific diagnostics 32 + - rate-limit headers, because retry scheduling should not require reparsing raw 33 + HTTP headers in every caller 34 + 35 + In Zig, this fits better as a response/result value than as a bare error set. 36 + Error sets are good for transport failures and programmer-facing failure modes, 37 + but they cannot carry owned payloads like status, message, body, and headers. 38 + For zat, the low-level XRPC calls can keep returning raw `Response`, while 39 + checked helpers return a tagged union: 40 + 41 + ```zig 42 + union(enum) { 43 + ok: Response, 44 + err: XrpcError, 45 + } 46 + ``` 47 + 48 + That keeps low-level transport behavior available for callers that want to make 49 + their own policy decisions. 50 + 51 + ## retries 52 + 53 + A conservative retry policy should treat only the usual transient HTTP statuses 54 + as retryable: 55 + 56 + - `429 Too Many Requests` 57 + - `500 Internal Server Error` 58 + - `502 Bad Gateway` 59 + - `503 Service Unavailable` 60 + - `504 Gateway Timeout` 61 + 62 + Procedure bodies need to be replayable before a client retries them. A `[]const 63 + u8` body is replayable; a streaming reader is not unless the caller explicitly 64 + buffers or rewinds it. 65 + 66 + Use `Retry-After` when present. `RateLimit-Reset` is useful too, but it is 67 + usually an absolute Unix timestamp, so using it for sleep requires comparing it 68 + with the caller's real clock and then sleeping for the elapsed duration on a 69 + monotonic clock. The XRPC spec recommends limited retries with randomized 70 + exponential backoff; cap all server-directed sleeps so a bad header cannot park 71 + the client forever. 72 + 73 + ## sources 74 + 75 + - [HTTP API (XRPC)](https://atproto.com/specs/xrpc) 76 + - [Lexicon](https://atproto.com/specs/lexicon)