···11+# PubSub State Machine
22+33+Status: design, pre-implementation. Drafted 2026-04-23 in response to three
44+distinct manifestations of the same "lost publish" bug class during Phases
55+E/F of the tile-lifecycle FSM rollout. Layers on top of
66+[tile-lifecycle-fsm.md](tile-lifecycle-fsm.md) — that doc governs *when a
77+tile is alive*; this doc governs *what messages can flow between tiles (and
88+the core) in each state*.
99+1010+Goal: make the class of bug "a publish was dropped somewhere in the fabric"
1111+**impossible**. If a message is dropped, the state machine identifies
1212+exactly which transition was invalid. Runtime enforcement + bisectable
1313+tests derive from the machine.
1414+1515+## Non-goals
1616+1717+- Re-implementing the tile lifecycle FSM (that lives in tile-fsm.ts /
1818+ tile-lifecycle.ts). This doc references it.
1919+- Changing the renderer-visible API surface (`api.publish`, `api.subscribe`,
2020+ `api.commands.register`). Semantics tighten; shape stays.
2121+- Solving cross-machine / cross-process pubsub beyond the existing
2222+ main-process hub. The fabric remains one hub.
2323+2424+## Participants
2525+2626+Every pubsub participant is in exactly one of these roles. Each role has a
2727+fixed identity (source address) and a fixed set of legal capabilities.
2828+2929+| Role | Identity | Notes |
3030+|---|---|---|
3131+| **Core** | `peek://app/...` | bgWindow (`app/background.html` → `app/index.js` runs cmd registry / page / hud) + cmd panel UI window (`app/cmd/panel.html`). Both carry `trustedBuiltin`; pubsub doesn't distinguish them. First-class publisher AND subscriber; every publish must reach bgWindow. |
3232+| **Tile** | `peek://{tileId}/{entry}` | One per tile BrowserWindow. Capability-gated via tile token. |
3333+| **System** | `peek://system/` | Main-process code publishing on its own behalf (e.g., `tag:item-added` from datastore IPC handlers, `window:focus-changed` from app events). Not a window, not a renderer. In-process call to `publish()`; no token, no IPC hop. |
3434+| **Webview guest** | `peek://{feature}/...` webContents inside a `<webview>` | Hosted inside tile windows. Receives broadcasts via the same pubsub forwarding path. |
3535+3636+### The token is the capability handle
3737+3838+A tile's **token is the capability handle** for all pubsub and command
3939+operations. The token is minted on `loading → ready` and revoked on
4040+`→ unloading` / `→ crashed` by `tile-lifecycle.ts`. This makes the token
4141+the atomic, race-free signal of "tile is in `ready` or `visible`":
4242+4343+- If `validateToken(t)` returns a grant, the tile is alive.
4444+- If it returns null, the tile is not alive — publish/subscribe rejected.
4545+4646+**Consequence**: the pubsub enforcement path does **not** read lifecycle
4747+state. It validates the token and consults the grant. Token validity and
4848+lifecycle state are kept in sync by `tile-lifecycle.ts` as the sole
4949+owner of both. This removes a hot-path cross-module dependency between
5050+`pubsub.ts` and `tile-lifecycle.ts`, eliminates the transition-race
5151+question ("was the publish checked before or after the state change?"),
5252+and keeps the pubsub FSM testable as a pure function (see §Module
5353+layout).
5454+5555+**Deleted concepts**:
5656+- `peek://{id}/lazy-stub` pseudo-source — no longer a distinct participant.
5757+ Lazy declaration is a *state* (`registered` per the lifecycle FSM), not a
5858+ fake source address. The load-on-dispatch hook reads lifecycle state
5959+ directly (it lives on the lifecycle side of the boundary).
6060+- V1 extension-host iframes — gone as of 2026-04-21.
6161+- Legacy `IPC_CHANNELS.SUBSCRIBE` / `PUBLISH` paths — folded into tile
6262+ pubsub. Core uses the same capability-gated path as every tile, with a
6363+ `trustedBuiltin` grant.
6464+6565+## Transport layers
6666+6767+All paths converge at `publish()` in `pubsub.ts`. There is exactly one hub.
6868+6969+```
7070+ Core/Tile renderer
7171+ │
7272+ ▼ api.publish / api.subscribe (from tile-preload.cts)
7373+ │
7474+ ipcMain.on('tile:pubsub:publish') ──► capability-gated validate
7575+ ipcMain.on('tile:pubsub:subscribe') │
7676+ ▼
7777+ publish() in pubsub.ts
7878+ │
7979+ ┌──────────────────────────────┼──────────────────────────────┐
8080+ ▼ ▼ ▼
8181+ in-proc callbacks pubsubBroadcaster (hook)
8282+ (topics Map) ├─ core window (bgWindow) pre-publish
8383+ — subscribers registered ├─ tile windows can skip/defer
8484+ via tile:pubsub:subscribe └─ webview guests
8585+```
8686+8787+**Binding rules**:
8888+- No component calls `webContents.send('pubsub:...')` directly. The
8989+ broadcaster is the only sender of pubsub IPC frames.
9090+- The broadcaster iterates **bgWindow + all tile windows + qualifying
9191+ webview guests**. Omitting bgWindow is a bug (historical; see
9292+ `project_bgwindow_is_core.md`).
9393+- Main-process code that wants to publish as System uses
9494+ `publish(SYSTEM_ADDRESS, ..., topic, data)` directly — no IPC hop.
9595+9696+**Broadcaster fan-out — subscriber-indexed delivery**
9797+9898+The broadcaster must **not** blanket-iterate every renderer on every
9999+publish. It consults the per-topic subscriber set maintained by
100100+`tile:pubsub:subscribe`: a window that never subscribed to topic T
101101+never receives frames for T. This turns fan-out from O(N windows) to
102102+O(S subscribers-of-T), typically much smaller.
103103+104104+This is the single biggest perf lever in the fabric — at high widget
105105+counts (50+ tile windows, multiple webview guests each), blanket
106106+iteration at 60Hz (e.g., for `page:scroll`-style topics) will saturate
107107+IPC. Subscriber-indexed delivery keeps it bounded by actual interest.
108108+109109+For topics with genuinely many subscribers (e.g., `tag:item-added`
110110+during bulk sync), the fan-out is still proportional to how many
111111+renderers actually care — which is the minimum unavoidable cost.
112112+113113+## Topic taxonomy
114114+115115+Topics live in flat namespaces separated by `:`. Every topic belongs to one
116116+of these classes. The class determines who may publish, who may subscribe,
117117+and what capability gating applies.
118118+119119+| Class | Pattern | Publishers | Subscribers | Gate |
120120+|---|---|---|---|---|
121121+| **Command dispatch** | `cmd:execute:{name}` | Core (panel + chains), other tiles (chained execution) | Owning tile's handler (via `api.commands.register`) | Capability: tile must have `commands`. Load-on-dispatch hook may defer. |
122122+| **Command result** | `cmd:execute:{name}:result:{uuid}` | Owning tile | Dispatcher that set `expectResult:true` | Capability: same `commands` grant (infra carve-out). |
123123+| **Command registry** | `cmd:register`, `cmd:register-batch`, `cmd:unregister` | Tiles with valid token (= state ≥ `ready`) | Core (cmd registry) | Dynamic registrations only. `cmd:register-batch` is the bulk variant (used by noun expansion). *Declared* commands come from manifest cache (not pubsub) per tile-lifecycle-fsm.md §Invariants. |
124124+| **Noun registry** | `noun:register-batch`, `noun:unregister` | Tiles with valid token | Core (noun registry) | Same as cmd:register. |
125125+| **Noun dispatch** | `noun:browse:{name}`, `noun:query:{name}`, `noun:open:{name}` | Core (panel) | Noun handler tile | Capability: `commands` (nouns auto-generate commands). |
126126+| **Lifecycle observer** | `tile:state-changed` (mirror of lifecycle transitions) | `tile-lifecycle.ts` (System) | Observers (drift detector, HUD widgets) with `lifecycle-observer` capability | Read-only observer topic. Publishers must not publish this directly. `lifecycle-observer` is a new capability declared in the tile manifest (typical grantees: drift, HUD widgets that display tile state). |
127127+| **Domain events** | `tag:*`, `item:*`, `sync:*`, `editor:*`, `entities:*`, `page:*`, `window:*`, etc. | System (datastore IPC handlers) + tiles whose manifest declares ownership of the namespace (e.g., tags feature owns `tag:*`) | Any tile whose capability grant allows subscription to the namespace | Capability allowlist on BOTH publish and subscribe sides. Namespace ownership is a manifest field: `owns: ['tag']`. |
128128+| **Feature-scoped** | `{feature}:{verb}` (e.g., `websearch:engine-request`) | That feature's tile(s) | That feature's tile(s) | Capability allowlist. Cross-feature publish is a violation — see §Cross-feature rule below. |
129129+| **Settings** | `topic:core:prefs`, `settings:changed:{feature}`, `settings:navigate` | Core (settings UI) | Any tile whose capability grant allows subscription to the topic | Publishers are Core/System; subscribers pass the normal capability allowlist through the gate. |
130130+131131+### Cross-feature rule
132132+133133+A feature tile may **not** publish to another feature's topic namespace. If
134134+A needs data from B, it calls B's registered command (`api.commands.call`
135135+→ `cmd:execute:{b-command}` round-trip), which goes through the full
136136+dispatch state machine. This rule exists because ad-hoc cross-feature
137137+topics (see tasks.md: websearch bg↔home round-trip) are the most common
138138+source of fragile pubsub.
139139+140140+### Topics that are NOT pubsub: private lifecycle IPC
141141+142142+`tile:ready` and `tile:shutdown` are **not** pubsub topics. They are
143143+private IPC signals between a tile's preload and `tile-lifecycle.ts`:
144144+145145+- `tile:lifecycle:ready` — preload → main, sent once after capability
146146+ token validated. Causes lifecycle transition `loading → ready`.
147147+- `tile:lifecycle:shutdown` — main → preload, sent during unload grace
148148+ window.
149149+150150+**Why private**: `tile:ready` is the signal that *admits a tile as a
151151+publisher*. If it flowed through pubsub itself, the tile would be
152152+publishing before it has been admitted — a bootstrap circularity that
153153+either requires a carve-out or introduces a race. Private IPC removes
154154+the circularity entirely.
155155+156156+Observers that need to react to lifecycle transitions (drift detector,
157157+HUD widgets) subscribe to the mirrored `tile:state-changed` pubsub
158158+topic, which `tile-lifecycle.ts` publishes as System after a transition
159159+lands. One direction, after the fact.
160160+161161+### Deleted topics
162162+163163+- `cmd:request-registers` — race workaround where Core on boot asks all
164164+ tiles to replay their cached `cmd:register` payloads. Exists only because
165165+ subscribers could land after publishers. With the FSM's invariant that
166166+ Core's subscribers are live before any tile reaches `ready`, this topic
167167+ has no job. **Delete.**
168168+- `ext:ready`, `ext:all-loaded` — v1 lifecycle handshake. Replaced entirely
169169+ by the private `tile:lifecycle:ready` signal + cache-backed declared
170170+ commands. **Delete.**
171171+172172+## Authorization rules (not state lookups)
173173+174174+The pubsub enforcement layer does not inspect lifecycle state. It runs
175175+one check: `validateToken(t) → grant | null`, then consults the grant
176176+to answer "is this (publisher, topic, op) allowed?"
177177+178178+Because `tile-lifecycle.ts` mints the token on `loading → ready` and
179179+revokes it on `→ unloading` / `→ crashed`, this single check implicitly
180180+enforces the full state-based policy. Rules:
181181+182182+| Publisher | Operation | Check |
183183+|---|---|---|
184184+| Core (`trustedBuiltin` grant) | any publish, any subscribe | always allow |
185185+| System (in-process main) | any publish | always allow; never subscribes |
186186+| Tile with valid token | publish to topic T | grant includes `publish` for T's class (subject to per-class rules below) |
187187+| Tile with valid token | subscribe to topic T | grant includes `subscribe` for T's class |
188188+| Tile with revoked/missing token | any op | reject — logged as `tile:drift` telemetry, never silently dropped |
189189+190190+Per-class rules (the capability-grant shape implements these):
191191+192192+| Topic class | Who may publish | Who may subscribe |
193193+|---|---|---|
194194+| `cmd:execute:{name}` | Any with `commands` grant (dispatcher) | Only the declared owner of `{name}` |
195195+| `cmd:execute:{name}:result:{uuid}` | Only the owner of `{name}` | Only the dispatcher that set `resultTopic` (private-by-uuid) |
196196+| `cmd:register` / `cmd:unregister` | Any with `commands` grant | Core only |
197197+| `noun:register-batch` / `noun:unregister` | Any with `commands` grant | Core only |
198198+| `noun:browse:{n}` / `noun:query:{n}` / `noun:open:{n}` | Core (panel) | Owner of noun `{n}` |
199199+| `tile:state-changed` | System only (`tile-lifecycle.ts`) | Any with `lifecycle-observer` grant |
200200+| Domain event `{ns}:{verb}` | System OR tile with `{ns}` ownership | Any tile whose capability allowlist matches |
201201+| Feature-scoped `{feature}:{verb}` | Only the `{feature}` tile(s) | Only the `{feature}` tile(s) (cross-feature = violation) |
202202+203203+**Lifecycle state is not consulted at the enforcement point.** If the
204204+token is valid, the tile is in `ready` or `visible` by definition. The
205205+two FSMs couple through the token's lifetime, not through shared state
206206+reads — this is the pure-function boundary that makes the pubsub FSM
207207+testable in isolation.
208208+209209+`tile:lifecycle:ready` and `tile:lifecycle:shutdown` are private IPC
210210+(§Topics that are NOT pubsub), so they never hit this table.
211211+212212+### The IPC chokepoint
213213+214214+All `tile:*` IPC frames pass through a single main-process gate
215215+(`tile-ipc-gate.ts`, see §Module layout). The gate runs a fixed
216216+sequence before any handler executes:
217217+218218+1. **Channel allowlisted?** `registerTileIpc(channel, handler, descriptor)`
219219+ is the only way to attach a `tile:*` listener. An unregistered
220220+ channel receives a default handler that logs `tile:drift` and drops.
221221+2. **Sender identity verified?** `event.sender` (the `WebContents` that
222222+ sent the frame) must match the `WebContents` that owns the
223223+ `payload.token`. This closes the "forged token" hole — a tile with
224224+ XSS cannot smuggle out another tile's leaked token because the
225225+ sender frame wouldn't match.
226226+3. **Payload schema valid?** Each channel descriptor declares its
227227+ expected shape. Malformed frames log `tile:drift` and drop.
228228+4. **Token valid & grant consistent with channel?** The channel
229229+ descriptor names the capabilities required; absent ones → reject.
230230+5. **State-at-receive matches channel's expected transition window?**
231231+ E.g., `tile:lifecycle:ready` may only arrive while the tile is in
232232+ `loading`; arrival in any other state → reject + `tile:drift`.
233233+6. **Sender role allowlisted for this channel?** E.g.,
234234+ `tile:lifecycle:ready` must come from a tile's own preload, never
235235+ from Core or System.
236236+237237+Only after all six pass does the handler run. Rejections are never
238238+silent; every drop emits `tile:drift` with a structured reason so CI
239239+can fail on regressions.
240240+241241+**Performance budget**: schema validation (step 3) uses hand-rolled
242242+shape checks (`typeof x.foo === 'string' && Array.isArray(x.bar)`),
243243+not a general-purpose schema library (Zod / Ajv). Total gate cost
244244+stays under ~5μs per frame — negligible next to Electron's intrinsic
245245+IPC serialization + cross-process handoff (~10-50μs).
246246+247247+This chokepoint is the main-process analog of the authorization-rules
248248+table above — it guarantees that the rules can't be bypassed by a
249249+handler that forgets to validate, and it adds the sender-frame check
250250+that the pure FSM alone can't express.
251251+252252+## Command dispatch & result — one path
253253+254254+**Rule**: a command result returns to its dispatcher by publishing to the
255255+result topic through the same capability-gated pubsub path that every
256256+other publish uses. Specifically:
257257+258258+- Dispatcher publishes `cmd:execute:{name}` with `{expectResult: true,
259259+ resultTopic: 'cmd:execute:{name}:result:{uuid}'}`.
260260+- Handler in the owning tile publishes to that result topic when it
261261+ completes.
262262+- Both publishes go through `tile:pubsub:publish` → capability allowlist
263263+ → `publish()`.
264264+265265+The allowlist's `cmd:execute:*` infra carve-out for tiles holding the
266266+`commands` grant (tile-ipc.ts, landed in commit `f32063db` on
267267+2026-04-23) is what makes result-topic publishes legal without requiring
268268+every tile to explicitly declare every possible UUID-suffixed topic in
269269+its manifest.
270270+271271+**Legacy side-channel to remove**: tile-preload currently also sends a
272272+parallel `ipcRenderer.send('tile:command:result', ...)` frame
273273+(tile-preload.cts:411) that hits an unrestricted main-process publish
274274+handler (tile-ipc.ts `ipcMain.on('tile:command:result', ...)`). This
275275+duplicate path predates the allowlist carve-out and bypasses the gate
276276+entirely. It is redundant with the pubsub publish above and must be
277277+deleted — one auth path, one code path, one place to reason about who
278278+can emit a command result.
279279+280280+```
281281+ Dispatcher Owning tile
282282+ (Core/panel or another tile) (capability: commands)
283283+ │ │
284284+ │ publish('cmd:execute:X', │
285285+ │ { ..., expectResult:true, │
286286+ │ resultTopic:'cmd:execute:X: │
287287+ │ result:<uuid>' }) │
288288+ │ ───────────────────────────────────────►│
289289+ │ │
290290+ │ [load-on-dispatch hook
291291+ │ ensures state≥ready]
292292+ │ │
293293+ │ [handler runs]
294294+ │ │
295295+ │ publish('cmd:execute:X:
296296+ │ result:<uuid>',
297297+ │ { data } | { error })
298298+ │ ◄───────────────────────────────────────│
299299+ │ │
300300+ resolve proxy promise
301301+```
302302+303303+**Concrete removals**:
304304+- `ipcRenderer.send('tile:command:result', ...)` call in
305305+ `api.commands.register` handler (tile-preload.cts:411). Keep the
306306+ `tile:pubsub:publish` call at line 401.
307307+- `ipcMain.on('tile:command:result', ...)` handler in tile-ipc.ts.
308308+309309+## Scope semantics — delete
310310+311311+The `scope` argument (`SYSTEM=1 | SELF=2 | GLOBAL=3`) is vestigial.
312312+313313+Survey results (2026-04-23):
314314+- `api.scopes.GLOBAL` — ~100 sites across `app/**`.
315315+- `api.scopes.SYSTEM` — 4 sites (3 in `app/index.js`, 1 in
316316+ `app/settings/settings.js`).
317317+- `api.scopes.SELF` — **zero sites**.
318318+319319+Preload's `publishImpl` defaults missing scope to `SELF`, but since no
320320+caller passes SELF and SELF delivery is restricted to same-pseudo-host
321321+subscribers (which for a one-window core = same-window), the SELF code
322322+path is unreachable.
323323+324324+**Plan** (single commit, no backcompat grace period):
325325+- Delete the `scopes` constant and the `scopeCheck()` function in
326326+ `pubsub.ts`. All `publish()` calls route to both in-proc callbacks
327327+ (filtered by topic match only) and the broadcaster.
328328+- Migrate the 4 `api.scopes.SYSTEM` sites to regular publishes on
329329+ system-privileged topics (e.g., `topic:core:prefs`). Privilege moves
330330+ from the scope argument onto the topic — the topic's capability
331331+ allowlist controls who may publish and who may subscribe.
332332+- Remove the `scope` parameter from `api.publish` and `api.subscribe`.
333333+ Migrate all ~100 call sites in `app/**` in the same commit. No
334334+ backcompat shim, no no-op tolerance.
335335+336336+## Subscribe-before-publish invariant
337337+338338+This is the missing rule that explains every "lost publish at boot" bug.
339339+340340+**Statement**: For any topic T and any publisher P, at the moment P
341341+publishes T, every subscriber S that will ever receive T during this boot
342342+must already be registered.
343343+344344+**Enforcement**:
345345+- **Core subscribers land first.** `app/index.js` subscribes to
346346+ `cmd:register`, `cmd:register-batch`, `noun:register-batch`, and all
347347+ core-managed domain topics during its synchronous init, BEFORE any tile
348348+ is transitioned to `loading`. The main process waits for bgWindow's
349349+ `tile:lifecycle:ready` (private IPC) before any `registered → loading`
350350+ transition for other tiles.
351351+- **Tile publishes are gated by token existence.** A tile in `loading`
352352+ has no token yet (the token is minted on `loading → ready`). Any
353353+ publish attempt from a `loading` tile fails the chokepoint's token
354354+ check and is rejected + logged as `tile:drift`. No capability check
355355+ is needed — there's no token to consult.
356356+- **No replay needed.** Because subscribers exist before any publisher
357357+ fires, `cmd:request-registers` and the `registeredPayloads` cache in
358358+ tile-preload.cts become unnecessary. **Delete both.**
359359+360360+This is the single rule that kills the class of bug "only hello-world's
361361+commands visible in the cmd panel."
362362+363363+## Load-on-dispatch — timeout UX
364364+365365+When Core dispatches `cmd:execute:X` to a tile in `registered`, the
366366+pre-publish hook forces a `registered → loading → ready` transition before
367367+delivering. `LAZY_LOAD_TIMEOUT_MS` (currently 10s) bounds the wait.
368368+369369+**On timeout** (today: reject the pending promise with an error that the
370370+cmd panel surfaces as a spinner hang):
371371+- Dispatcher promise rejects with a structured error.
372372+- Core publishes `notification:show` with `{ type: 'error', title: 'Tile
373373+ didn't load', body: 'Command \"{name}\" couldn't run because its tile
374374+ failed to load within 10 seconds.' }`.
375375+- Cmd panel drops back to IDLE — no lingering spinner.
376376+377377+**On render-process-gone during load** (transition `loading → crashed`):
378378+- Same notification, different body (`crashed while loading`).
379379+- Dispatcher rejects immediately — no 10s wait.
380380+381381+## Bypass detection
382382+383383+Everything routed through the FSM is enforced at the gate; rejections
384384+emit `tile:drift` telemetry (see §Authorization rules). The only way
385385+for a message to escape the FSM is for code to call a lower-level
386386+Electron API directly, sidestepping the gate entirely. The FSM cannot
387387+prevent this — anyone writing main-process code can do anything —
388388+so we need complementary bypass detection.
389389+390390+Two bypass categories:
391391+392392+- **Direct-send bypass** — `webContents.send('pubsub:...')` called
393393+ outside the broadcaster. Mitigation: lint rule against
394394+ `webContents.send` with a `pubsub:` prefix, plus a dev-mode wrap of
395395+ `webContents.send` that throws on the same pattern.
396396+- **Off-path window creation** — `new BrowserWindow()` with a
397397+ `peek://{tileId}/...` URL outside the tile launcher. Mitigation: lint
398398+ rule + dev-mode assertion in the FSM that every BrowserWindow with a
399399+ tile URL has a corresponding `registered → loading` transition.
400400+ (Inherited from tile-lifecycle-fsm.md §Drift detectors.)
401401+402402+Both are dev/CI-only assertions. Production has nothing to check —
403403+correct code can't trip them, and incorrect code is caught at review
404404+or in dev.
405405+406406+What is explicitly **not** a drift detector:
407407+- Gate rejections (enforcement, already covered).
408408+- Startup races (if they occur, the FSM design is wrong — fix the design,
409409+ don't paper over it with a detector).
410410+- Unrouted publishes (legitimate for domain events with no listeners).
411411+412412+**Drift emission rate limit**: `tile:drift` publishes are themselves
413413+rate-limited to one event per `(tileId, reason)` tuple per second, with
414414+a dropped-count aggregator. A buggy or malicious tile spamming rejected
415415+frames cannot amplify into a drift-publish storm that saturates the
416416+broadcaster.
417417+418418+## Module layout
419419+420420+Dependencies are one-way: `tile-lifecycle` → `pubsub`. `pubsub` never
421421+imports from `tile-lifecycle`. This is what makes the pubsub FSM
422422+testable in isolation and eliminates the race between state lookup and
423423+state transition.
424424+425425+- **`backend/electron/pubsub-fsm.ts`** (pure, new) — the authorization
426426+ matrix as a pure function: `allow({role, grant, topic, op}) →
427427+ 'allow' | {violation: reason}`. No Electron / Node imports. Topic
428428+ classification lives here. Testable without spinning up main.
429429+- **`backend/electron/tile-fsm.ts`** (pure, exists) — lifecycle
430430+ transition table. Unchanged.
431431+- **`backend/electron/pubsub.ts`** — the hub: delivery, broadcaster
432432+ wiring, pre-publish hooks. Imports `pubsub-fsm.ts` only. Exposes
433433+ `unsubscribeAllByPrefix(tileId)` for lifecycle cleanup.
434434+- **`backend/electron/tile-ipc-gate.ts`** (new) — the single IPC
435435+ chokepoint (see §The IPC chokepoint). Exposes `registerTileIpc(channel,
436436+ handler, descriptor)`. Runs channel allowlist check, sender-frame
437437+ cross-check against token owner, payload schema validation, token
438438+ validation, state-at-receive assertion, sender-role allowlist.
439439+ Every drop emits `tile:drift`. No `tile:*` IPC handler is attached
440440+ except through this gate.
441441+- **`backend/electron/tile-ipc.ts`** — individual channel handlers,
442442+ registered via `registerTileIpc`. On `tile:pubsub:publish` /
443443+ `tile:pubsub:subscribe`: the gate has already validated the frame, so
444444+ the handler just calls `pubsubFsm.allow(...)` for per-topic
445445+ authorization → publish or reject+drift. This is where the
446446+ lifecycle↔pubsub boundary sits.
447447+- **`backend/electron/tile-lifecycle.ts`** — state store + token
448448+ lifecycle. On `loading → ready`: mint token. On `→ unloading` /
449449+ `→ crashed`: revoke token, call `pubsub.unsubscribeAllByPrefix()`.
450450+ Publishes `tile:state-changed` for observers. Imports from
451451+ `pubsub.ts` (one-way); never imported by it.
452452+- **`backend/electron/tile-drift.ts`** (new) — bypass detectors (the
453453+ two from §Bypass detection) + dev-mode wrap of `webContents.send`.
454454+ Owns the `tile:drift` GLOBAL topic used by the gate's rejection
455455+ telemetry.
456456+- **`backend/electron/tile-lazy.ts`** — load-on-dispatch pre-publish
457457+ hook. Already lives on the lifecycle side (it triggers
458458+ `registered → loading`). Continues to register as a pre-publish hook
459459+ on `pubsub.ts`; the hook body calls into lifecycle.
460460+461461+Test boundaries:
462462+- `pubsub-fsm.ts` — pure unit tests over the authorization matrix.
463463+- `tile-fsm.ts` — pure unit tests over the transition table (exists).
464464+- `tile-ipc.ts` — integration tests with real tokens + real grants.
465465+- Lifecycle cleanup — integration test that `→ crashed` clears the
466466+ tile's subscriptions.
467467+468468+## Phased implementation
469469+470470+1. **Phase 1 — Fix bgWindow broadcast + rename broadcaster.** Add
471471+ bgWindow to the broadcaster iteration. Rename
472472+ `extensionBroadcaster` → `pubsubBroadcaster` (the "extension" label
473473+ is a stale v1 term — Peek now reserves "extension" for bundled
474474+ Chromium extensions; features/tiles are the current vocabulary).
475475+ This alone unblocks the "only hello-world visible" and "v2 result
476476+ doesn't reach subscribers" bugs. No semantic changes.
477477+ **Test**: tag/untag/widget-update smoke tests go green.
478478+2. **Phase 2 — Sender-frame cross-check (security hardening).**
479479+ Independently shippable. In every `tile:*` handler, verify
480480+ `event.sender` matches the `WebContents` that owns
481481+ `payload.token`. Closes the "tile with XSS forges another tile's
482482+ token" hole. Does not require the full IPC chokepoint (Phase 8) but
483483+ prepares for it — this check is worth landing early because it's
484484+ security, not cleanup.
485485+ **Test**: new unit test that simulates a mismatched sender + valid
486486+ token → rejected with `tile:drift`.
487487+3. **Phase 3 — Collapse to one command-result path.** Delete the
488488+ `tile:command:result` IPC (preload send site + main-process
489489+ handler). All command results flow through capability-gated pubsub
490490+ publish of the result topic. **Test**: every test exercising command
491491+ results stays green.
492492+4. **Phase 4 — Private lifecycle IPC + subscribe-before-publish.**
493493+ Split `tile:ready` / `tile:shutdown` off the pubsub bus onto private
494494+ `tile:lifecycle:*` IPC channels handled directly in
495495+ `tile-lifecycle.ts`. `app/index.js` wires all core subscribers
496496+ during synchronous init; main process gates first tile
497497+ `registered → loading` on bgWindow's private lifecycle-ready IPC.
498498+ Add `tile:state-changed` as the public observer mirror. **Test**:
499499+ unit test that simulates tile emitting `cmd:register` at `t=0` boot
500500+ — subscriber in core gets it. No pubsub-level `tile:ready` topic
501501+ remains.
502502+5. **Phase 5 — Delete replay machinery.** Remove `cmd:request-registers`
503503+ topic + `registeredPayloads` cache + `ensureCmdRequestRegistersListener`
504504+ in tile-preload.cts. **Test**: cmd registry has full contents after
505505+ cold boot without replay.
506506+6. **Phase 6 — Delete scope.** Remove `scopes` constant from `pubsub.ts`
507507+ + `api.scopes` surface in tile-preload. `api.publish`/`api.subscribe`
508508+ lose their scope parameter. Migrate all ~100 call sites in `app/**`
509509+ + the 4 SYSTEM sites in the same commit; no backcompat shim.
510510+ **Test**: existing pubsub tests green; grep for `api.scopes` returns zero.
511511+7. **Phase 7 — Timeout UX.** Wire notification publish on
512512+ `LAZY_LOAD_TIMEOUT_MS` expiry and on `loading→crashed` during
513513+ load-on-dispatch. **Test**: dispatcher to a tile whose preload throws
514514+ → user-visible notification within 10s.
515515+8. **Phase 8 — IPC chokepoint + bypass detectors.** Introduce
516516+ `tile-ipc-gate.ts` with `registerTileIpc(channel, handler,
517517+ descriptor)`. Migrate every existing `tile:*` `ipcMain.on` through
518518+ the gate. Gate runs the six-step validation sequence (§The IPC
519519+ chokepoint). Add lint rules + dev-mode wraps for the two bypass
520520+ categories (direct `webContents.send('pubsub:...')`, off-path
521521+ `new BrowserWindow` for tile URLs). Wire gate rejections to the
522522+ `tile:drift` topic with structured reasons so CI can fail on any
523523+ drift event.
524524+ **Test**: regression suite stays green; a deliberately-seeded
525525+ bypass in a test fixture trips the detector; every `tile:*`
526526+ channel has at least one passing + one rejecting gate test.
527527+528528+Each phase is independently shippable and each has a narrow failure mode.
529529+530530+## Test plan
531531+532532+- **Pure FSM unit tests**: state × operation matrix — each legal op
533533+ returns `allow`, each illegal op returns `violation`.
534534+- **Integration**: cold boot → cmd panel has all declared commands before
535535+ any tile window exists (manifest-cache path, already scoped by
536536+ tile-lifecycle-fsm.md).
537537+- **Integration**: cold boot → dispatch cmd to `registered` tile, assert
538538+ load triggers, handler runs, result topic reaches dispatcher. No
539539+ spinner-hang.
540540+- **Integration**: dispatch to tile whose renderer throws during init →
541541+ notification appears, panel returns to IDLE within 10s.
542542+- **Regression**: the 2026-04-23 repro `tests/desktop/cmd-execute-twice.
543543+ spec.ts` still passes. Add companion that fires 5x in a row from the
544544+ panel UI (per tasks.md item).
545545+- **Regression**: full desktop suite stays green after each phase.
546546+547547+## References
548548+549549+- [tile-lifecycle-fsm.md](tile-lifecycle-fsm.md) — tile state machine;
550550+ this doc layers on top.
551551+- `backend/electron/pubsub.ts` — the hub.
552552+- `backend/electron/pubsub-fsm.ts` (new, Phase 8) — pure authorization
553553+ matrix.
554554+- `backend/electron/tile-ipc.ts` — individual `tile:*` channel handlers.
555555+- `backend/electron/tile-ipc-gate.ts` (new, Phase 8) — single IPC
556556+ chokepoint.
557557+- `backend/electron/tile-drift.ts` (new, Phase 8) — bypass detectors +
558558+ `tile:drift` topic owner.
559559+- `backend/electron/tile-preload.cts` — renderer-side publish/subscribe
560560+ API + command registration.
561561+- `backend/electron/main.ts:216` — broadcaster registration site
562562+ (`setExtensionBroadcaster` → renamed to `setPubsubBroadcaster` in
563563+ Phase 1; this is the bug site for bgWindow exclusion).
564564+- `backend/electron/tile-lazy.ts` — pre-publish hook for
565565+ load-on-dispatch.
566566+- `app/background.html`, `app/index.js`, `app/cmd/background.js` — core
567567+ renderer entry points + subscribers.
568568+- `docs/tasks.md` Current-Priority section — bugs this machine makes
569569+ impossible.
+12
docs/tasks.md
···8899---
10101111+## Current priority (drop everything)
1212+1313+- [ ] **Message-passing / pubsub state machine.** Cmd dispatch + pubsub + tile lifecycle have too many intersecting paths; Phase E/F surfaced three different manifestations of the same class of bug (see below). Write a formal state machine describing every legal transition: who can publish what topic from what source in what tile-state, who can subscribe to what, how results return, how lazy tiles mount. Runtime enforcement + bisectable tests derive from the machine. Goal: the "lost publish somewhere in the fabric" class of bug becomes impossible — if a message is dropped, the machine tells us exactly which transition was invalid. Explicit user directive 2026-04-23: "we're going to drop everything and write a state machine for the message passing and pubsub, so never have to worry about this again." Pre-requisite for tasks below.
1414+1515+- [ ] **Root-cause: only hello-world commands visible in cmd panel.** After Phases E/F, only hello-world's commands (`hello`, "hello world trace") appear in the cmd panel. Every other lazy tile's commands — tag, untag, kagi, google, ddg, lists, peeks, slides, and dozens more — are missing. Agent hypothesis: Phase E's `registerLazyTile()` publishes `cmd:register-batch` at boot before the cmd panel's core/background subscription lands. Do NOT band-aid; fix this only after the state machine is in place so the fix comes from the right abstraction. Manual test case: `yarn start`, open cmd panel, type a few characters, should see many matches — currently only sees hello-world's.
1616+1717+- [ ] **Consolidate command-result paths into one.** Two paths exist today: (A) `tile:command:result` IPC from tile-preload's `api.commands.register` wrapper → main-process unrestricted `publish()`; (B) handlers that manually `tile:pubsub:publish` to `cmd:execute:X:result:...`. This duality is why the Phase E bug hid for so long — pre-Phase-E tests only exercised path (A), which bypassed the capability allowlist; path (B) was silently rejected until the 2026-04-23 fix (`tolwnovr f32063db`). Collapse to one (probably A). Pre-req: state machine task above.
1818+1919+- [ ] **UI-level tests for cmd-panel repeat invocation.** The 2026-04-23 repro (`tests/desktop/cmd-execute-twice.spec.ts`) fires on the pubsub bus directly and passes 3/3. Manual testing caught a 3rd-invocation stall that the bus test misses — the difference must be panel UI state (`state.executionTimeout`, `urlSearchTimer`, subscription lifetime in `ensureCmdRequestRegistersListener`) leaking across repeats. Add Playwright tests that drive Cmd+K → type → Enter 3-5x in a row and assert panel closes + no spinner.
2020+2121+---
2222+1123## Tile architecture cleanup (post-conversion)
12241325- [ ] **Merge websearch's separate background tile into the home window.** Manifest has two tile entries: `background.html` (lazy:true) for settings/engine state, `home.html` for the UI. They communicate via pubsub round-trips (`websearch:engine-request` → `websearch:engines-list`) across `peek://ext/websearch/background.html` vs `peek://websearch/home.html`. Cross-window pubsub within one feature is fragile (see 2026-04-20 session — broadcaster echo-prevention bug + cluster 3 regression risk). The round-trip also blocks 3 websearch tests from passing. With the single-file tiles model shipping (`resident: true`), websearch can collapse to one tile whose home window owns both UI and engine state directly. No IPC needed. Applies also to any feature tile with a bg + window pair that pubsubs between itself — audit other candidates.