Merge pull request 'docs: ADRs, security model, and sprint triage' (#49) from docs/adrs-security-triage into main · scottlanoue.com/atmosphere-office@68f9868

+41

PRODUCT.md

··· 853 853 - Browser extension or bookmarklet that verifies the encryption is real 854 854 855 855 **Why this matters:** Trust-but-verify. Users should not have to take our word for it. The architecture should be auditable by anyone with dev tools open. 856 + 857 + --- 858 + 859 + ## 10. Recommended Next Sprint 860 + 861 + Based on impact analysis across 102 open issues, here are the **top 10 issues to tackle next**, ranked by a combination of user impact, crash severity, and feature-unlock potential. 862 + 863 + ### Tier 1: Fix What's Broken (Ship-Blocking) 864 + 865 + | # | Issue | Priority | Rationale | 866 + |---|-------|----------|-----------| 867 + | **#144** | Bug: pasteAtSelection uses undefined `parsedRows` | Critical | **Tab-crashing bug.** Pasting clipboard content into sheets throws a ReferenceError. This is a basic operation that every user will hit. One-line fix. | 868 + | **#145** | Bug: circular formula references cause infinite recursion | Critical | **Tab-crashing bug.** Two cells referencing each other freeze the browser. Unlike #144, users may not hit this immediately, but when they do there is no recovery. Fix: add a visited-cells set to `evaluateFormula()`. | 869 + 870 + ### Tier 2: High-Impact Quick Wins 871 + 872 + | # | Issue | Priority | Rationale | 873 + |---|-------|----------|-----------| 874 + | **#146** | Wire up virtual scrolling | High | **The module exists with full test coverage** but is not imported. Wiring it up is mostly plumbing -- import `calculateVisibleRange`, modify `renderGrid()` to only render the visible range, add a spacer element. Unlocks larger sheets without lag. | 875 + | **#149** | Wire up context menu actions | High | **Embarrassingly broken.** Right-click menu items (Insert Row, Delete Column, etc.) display but do nothing. Users discover this immediately and lose trust. Wiring the actions is straightforward -- the implementations for row/column append already exist via toolbar buttons. | 876 + | **#21** | Toolbar cleanup | High | **Long-standing debt.** Collapsible overflow, grouped dropdowns, and responsive collapse are partially implemented but need finishing. Affects every user's first impression. | 877 + 878 + ### Tier 3: Strategic Features 879 + 880 + | # | Issue | Priority | Rationale | 881 + |---|-------|----------|-----------| 882 + | **#113** | Row and column insert/delete | High | **Most-requested missing spreadsheet operation.** Users cannot insert a row in the middle of their data -- they can only append rows at the end. This is the most basic spreadsheet operation after cell editing. | 883 + | **#52** | Command palette (Cmd+K) | High | **Keystone UX feature** that touches everything: navigate to documents, run actions, search content, switch themes. Every modern productivity tool has this. The slash command infrastructure shows this pattern already works. | 884 + | **#91** | Array formula support and spill | High | **Unlocks Wave 2 entirely.** FILTER(), SORT(), UNIQUE(), SEQUENCE() all depend on array spill behavior. Without this, the dynamic array functions cannot be implemented. This is the critical dependency for the modern function library. | 885 + | **#102** | CSV and TSV export | High | **Closes the most obvious import/export gap.** Users can import .xlsx but cannot export anything from sheets except by copy-pasting. CSV export is a quick win (iterate cells, join with commas, trigger download). | 886 + | **#112** | Cell reference color coding | High | **Transforms formula comprehension.** When editing `=SUM(A1:A10) + B5`, each reference gets a unique color in the formula bar and the corresponding cells glow with matching colors on the grid. This is the single most impactful formula UX feature and the range highlighting infrastructure (#95) is already built. | 887 + 888 + ### Suggested Sprint Plan 889 + 890 + **Week 1 (Bugs + Quick Wins):** #144, #145, #146, #149 -- fix the two crashers, wire up virtual scrolling, wire up context menu actions. All are low-risk, high-confidence changes. 891 + 892 + **Week 2 (Core Spreadsheet):** #113 (row/column insert), #102 (CSV export) -- these are the two most impactful missing spreadsheet basics. 893 + 894 + **Week 3 (Formula Power):** #112 (color-coded refs), #91 (array spill) -- transform the formula editing experience and unlock the next wave of functions. 895 + 896 + **Week 4 (Platform):** #52 (command palette), #21 (toolbar cleanup) -- polish the interaction layer that every user touches.

+191

docs/SECURITY.md

··· 1 + # Security Model 2 + 3 + This document describes the threat model, encryption implementation, and security properties of Tools. 4 + 5 + ## Overview 6 + 7 + Tools is an end-to-end encrypted (E2EE) collaborative office suite. All document content is encrypted in the browser before it reaches the server. The server stores and relays encrypted data that it cannot decrypt. The encryption key is part of the URL fragment and is never transmitted over the network. 8 + 9 + ## Threat Model 10 + 11 + ### What We Protect Against 12 + 13 + | Threat | Protection | 14 + |--------|-----------| 15 + | **Server compromise** (database theft, admin access) | All stored content is AES-256-GCM encrypted. The server has no keys. | 16 + | **Network interception** (MITM on WebSocket/HTTP) | All transmitted content is E2E encrypted. Even without TLS, an attacker sees only ciphertext. TLS (Tailscale WireGuard or HTTPS) provides defense-in-depth. | 17 + | **Server operator curiosity** | Zero-knowledge architecture. The operator can read document IDs and timestamps but not content. | 18 + | **Legal compulsion** (subpoena, government request) | The server operator cannot produce plaintext because they do not have the keys. They can only produce encrypted blobs. | 19 + | **Ciphertext tampering** | AES-GCM is an authenticated encryption scheme. Any modification to ciphertext or IV causes decryption to fail entirely rather than producing garbled output. | 20 + 21 + ### What We Do NOT Protect Against 22 + 23 + | Threat | Status | Notes | 24 + |--------|--------|-------| 25 + | **Compromised client device** (malware, physical access) | Not protected | If an attacker has access to the browser, they can read decrypted content, keys from localStorage, and URLs from history. | 26 + | **Share link leakage** | Partially mitigated | Anyone with the share URL has the key. Link expiry (1h-30d) limits the window. Key rotation (#136) is planned but not yet implemented. | 27 + | **Browser extension attacks** | Not protected | Malicious extensions can read page content and localStorage. | 28 + | **Supply chain attacks** (compromised npm dependency) | Not protected | A compromised dependency could exfiltrate keys or plaintext. Mitigation: minimal dependency count (20 production deps), dependency auditing. | 29 + | **Side-channel attacks** (timing, memory analysis) | Not protected | Web Crypto API implementations may be vulnerable to side-channel attacks. This is a browser-level concern, not application-level. | 30 + | **Traffic analysis** (metadata) | Partially mitigated | An observer can see that a user is accessing a document (by document ID), when they access it, and approximately how much they are editing (by message size/frequency). They cannot see what the content is. | 31 + 32 + ## Encryption Implementation 33 + 34 + ### Algorithm 35 + 36 + - **Cipher**: AES-256-GCM (Galois/Counter Mode) 37 + - **Key length**: 256 bits 38 + - **IV length**: 96 bits (12 bytes), randomly generated per encryption 39 + - **Authentication tag**: 128 bits (built into GCM) 40 + - **Implementation**: Web Crypto API (`crypto.subtle`) 41 + 42 + ### Key Generation 43 + 44 + ``` 45 + crypto.subtle.generateKey( 46 + { name: 'AES-GCM', length: 256 }, 47 + true, // extractable (needed for URL export) 48 + ['encrypt', 'decrypt'] 49 + ) 50 + ``` 51 + 52 + Keys are generated using the browser's cryptographically secure random number generator (`crypto.getRandomValues`), which sources entropy from the operating system's CSPRNG. 53 + 54 + ### Key Encoding 55 + 56 + Keys are exported as raw bytes, then encoded as URL-safe base64: 57 + - Standard base64 with `+` replaced by `-`, `/` replaced by `_`, padding stripped 58 + - A 256-bit key produces a 43-character base64url string 59 + 60 + ### Ciphertext Format 61 + 62 + Every encrypted payload has the format: 63 + 64 + ``` 65 + [IV: 12 bytes] [ciphertext + GCM auth tag: variable length] 66 + ``` 67 + 68 + The IV is prepended to the ciphertext in a single contiguous byte array. The GCM authentication tag is appended by the Web Crypto API as part of the ciphertext. 69 + 70 + ### What Gets Encrypted 71 + 72 + | Data | Encryption | Storage | 73 + |------|-----------|---------| 74 + | **Document content** (Yjs state) | AES-256-GCM, key from URL fragment | Server: `documents.snapshot` BLOB column | 75 + | **Document names** | AES-256-GCM, key from URL fragment, base64-encoded | Server: `documents.name_encrypted` TEXT column | 76 + | **WebSocket sync messages** | AES-256-GCM, key from URL fragment | Transit only (not stored) | 77 + | **Awareness messages** (cursors, presence) | AES-256-GCM, key from URL fragment | Transit only (not stored) | 78 + | **Version history snapshots** | AES-256-GCM, key from URL fragment | Server: `versions.snapshot` BLOB column | 79 + 80 + ### What Is NOT Encrypted 81 + 82 + | Data | Reason | 83 + |------|--------| 84 + | Document ID (UUID) | Needed for routing and database lookup | 85 + | Document type (doc/sheet) | Needed for serving correct editor page | 86 + | Share mode (edit/view) | Needed for server-side access control | 87 + | Link expiry timestamp | Needed for server-side expiry enforcement | 88 + | Created/updated timestamps | Needed for sort/display on landing page | 89 + | WebSocket control messages (peer-count, peer-joined, peer-left) | Contain no sensitive data; only peer counts | 90 + 91 + ## Key Lifecycle 92 + 93 + ### Creation 94 + 95 + 1. User clicks "New Document" or "New Spreadsheet" on the landing page. 96 + 2. Browser calls `crypto.subtle.generateKey()` to create a fresh AES-256-GCM key. 97 + 3. Browser exports the key to URL-safe base64 and stores it in `localStorage` keyed by document ID. 98 + 4. Browser navigates to the editor URL with the key in the fragment: `/docs/{id}#{key}`. 99 + 100 + ### Sharing 101 + 102 + 1. User opens the Share dialog. 103 + 2. The share URL is constructed as `https://host/docs/{id}#{key}`. 104 + 3. The URL is copied to clipboard or sent to a collaborator. 105 + 4. When the collaborator opens the URL, their browser extracts the key from the fragment and imports it. 106 + 5. The collaborator's browser stores the key in their `localStorage` for future visits. 107 + 108 + ### Storage 109 + 110 + - **Browser**: `localStorage` under the key `tools-keys` as a JSON object `{ documentId: keyBase64, ... }`. 111 + - **URL**: Fragment portion of the URL (not sent to server). 112 + - **Server**: Never. The server has no mechanism to receive, store, or retrieve encryption keys. 113 + 114 + ### Rotation (Planned, Not Yet Implemented) 115 + 116 + Key rotation (#136) will work as follows: 117 + 1. Client generates a new AES-256-GCM key. 118 + 2. Client decrypts the current snapshot with the old key. 119 + 3. Client re-encrypts the snapshot with the new key. 120 + 4. Client PUTs the new encrypted snapshot to the server. 121 + 5. Client updates `localStorage` with the new key. 122 + 6. Old share URLs become invalid (decryption with the old key fails on the new ciphertext). 123 + 7. New share URLs are generated with the new key. 124 + 125 + ## Server Trust Model 126 + 127 + The server is a **zero-knowledge relay**. It performs three functions: 128 + 129 + 1. **REST API**: CRUD operations on document metadata and encrypted blobs. The server reads/writes encrypted data without understanding it. 130 + 2. **WebSocket relay**: Forwards encrypted binary messages between peers in the same room (identified by document ID). Messages are opaque binary blobs to the server. 131 + 3. **Snapshot storage**: Stores encrypted Yjs document state for persistence and new-client bootstrap. 132 + 133 + ### Server Code Audit Points 134 + 135 + The server code is in `server.js` (322 lines). Key audit points: 136 + 137 + - **Line 286-289**: WebSocket message relay. `peer.send(data, { binary: isBinary })` -- the server forwards received data verbatim. No parsing, no decryption, no modification. 138 + - **Line 130-133**: Snapshot PUT. `stmts.putSnapshot.run(req.body, req.params.id)` -- the server stores the request body as a raw BLOB. It does not inspect the content. 139 + - **Line 136-148**: Snapshot GET. The server reads the BLOB and sends it as `application/octet-stream`. It checks expiry (timestamp comparison) but does not read or modify the encrypted content. 140 + - **Line 100-107**: Document creation. The server receives an encrypted name and document type. It does not decrypt the name. 141 + 142 + ### What a Compromised Server Can Do 143 + 144 + - **Read metadata**: document IDs, types, timestamps, share modes, expiry dates. 145 + - **Count peers**: the `rooms` Map tracks how many WebSocket connections exist per document. 146 + - **Deny service**: refuse to relay messages, delete encrypted snapshots, return errors. 147 + - **Serve malicious JavaScript**: if the attacker controls the server, they could modify the frontend code to exfiltrate keys. This is the primary risk of a server compromise -- it does not break encryption retroactively, but it could capture keys going forward. 148 + 149 + ### What a Compromised Server Cannot Do 150 + 151 + - **Decrypt any stored content**: no keys exist on the server. 152 + - **Forge document content**: GCM authentication tag verification would fail on the client. 153 + - **Read real-time edits**: WebSocket messages are encrypted. 154 + - **Recover historical content**: version history snapshots are encrypted with the same key. 155 + 156 + ## IV (Initialization Vector) Handling 157 + 158 + - Each `encrypt()` call generates a fresh random 96-bit IV via `crypto.getRandomValues()`. 159 + - The IV is prepended to the ciphertext (not stored separately). 160 + - GCM requires IV uniqueness per (key, IV) pair. With 96-bit random IVs and a single key per document, the birthday bound is approximately 2^48 encryptions before a collision becomes probable. A document would need billions of save operations to approach this limit. 161 + - If an IV were reused with the same key, GCM's security degrades: an attacker could recover the authentication key and forge messages. The random IV generation makes this effectively impossible. 162 + 163 + ## Known Limitations 164 + 165 + 1. **No forward secrecy.** If a key is compromised in the future, all historical ciphertext encrypted with that key is decryptable. Forward secrecy would require per-session key exchange (like TLS), which is complex to implement with E2EE document storage. 166 + 167 + 2. **Single key per document.** All collaborators share the same symmetric key. There is no per-user key or key hierarchy. Revoking one collaborator's access requires rotating the key for everyone. 168 + 169 + 3. **No key derivation from password.** Keys are random, not derived from user-chosen passwords. This provides strong entropy but means key loss = data loss. If a user loses their URL and clears localStorage, the document is unrecoverable. 170 + 171 + 4. **localStorage key storage.** Keys in localStorage are accessible to any JavaScript running on the same origin. An XSS vulnerability could exfiltrate all stored keys. Mitigation: strict Content-Security-Policy, no inline scripts in editor pages, no user-generated HTML execution. 172 + 173 + 5. **Server-served JavaScript.** The encryption code is served by the same server that stores encrypted data. A compromised server could serve modified JavaScript that exfiltrates keys. Mitigation: Subresource Integrity (SRI) hashes, reproducible builds, and client-side verification (#137) are planned countermeasures. Self-hosting eliminates this trust dependency. 174 + 175 + ## Future Improvements 176 + 177 + - **Key rotation** (#136): Generate new key, re-encrypt document, invalidate old share URLs. 178 + - **Verifiable encryption** (#137): Client-side panel to audit encryption in real-time. 179 + - **Subresource Integrity**: Hash all JavaScript files and verify integrity on load. 180 + - **Password-protected shares**: Layer a password-derived key on top of the document key for additional protection on share links. 181 + - **Access audit log** (#124): Encrypted log of who accessed the document and when. 182 + - **Self-hosted deployments** (#138): Docker image for users who want to eliminate server trust entirely. 183 + 184 + ## Responsible Disclosure 185 + 186 + If you discover a security vulnerability in Tools, please report it privately: 187 + 188 + - **Email**: security@lobster-hake.ts.net 189 + - **Response time**: We aim to acknowledge reports within 48 hours and provide a fix within 7 days for critical issues. 190 + - **Scope**: Vulnerabilities in the encryption implementation, key handling, server-side data exposure, XSS, or authentication bypass. 191 + - **Out of scope**: Denial of service, social engineering, vulnerabilities in upstream dependencies (report those to the upstream project).

+41

docs/adr/001-vanilla-js-no-framework.md

··· 1 + # ADR 001: Vanilla JS — No Frontend Framework 2 + 3 + ## Status 4 + 5 + Accepted 6 + 7 + ## Context 8 + 9 + When starting Tools, we needed to choose a frontend architecture for two distinct editors (rich text documents and spreadsheets) plus a landing page. The common choice in 2024+ is React, Vue, or Svelte. We needed to evaluate whether a framework would provide net benefit given our specific constraints: 10 + 11 + 1. The document editor uses TipTap (a ProseMirror wrapper) which manages its own DOM and state. ProseMirror already has a reactive document model, transaction system, and plugin architecture. Wrapping it in React would mean managing two competing reactive systems. 12 + 13 + 2. The spreadsheet editor is a `<table>` grid where rendering performance is critical. A virtual DOM diffing layer adds overhead to the hot path (cell updates during selection drag, formula recalculation). Direct DOM manipulation gives predictable, measurable performance. 14 + 15 + 3. The landing page is a straightforward document list with search, sort, folders, and trash. No complex component trees or deeply nested state. 16 + 17 + 4. We value minimal dependencies, small bundle size, and long-term stability (no framework version churn). 18 + 19 + ## Decision 20 + 21 + Build the entire frontend in vanilla JavaScript with no framework. Use TipTap's extension API for the docs editor. Use direct DOM manipulation (innerHTML for bulk renders, targeted classList/textContent updates for incremental refreshes) for the sheets grid. Extract all business logic into pure, DOM-free modules for testability. 22 + 23 + ## Consequences 24 + 25 + **Positive:** 26 + - Zero framework overhead: no virtual DOM diffing, no reactivity system, no component lifecycle management. 27 + - TipTap integration is native -- no React adapter (`@tiptap/react`) or Vue adapter needed. One less abstraction layer. 28 + - Bundle size stays small. The frontend JS is dominated by TipTap/ProseMirror and Yjs, not framework runtime. 29 + - Pure logic modules (`formulas.js`, `filter.js`, `suggesting.js`, etc.) are framework-agnostic. They could be reused in any framework later if needed. 30 + - No framework migration risk. React 17->18->19, Vue 2->3, Svelte 4->5 transitions are non-issues. 31 + 32 + **Negative:** 33 + - No component model. UI elements like toolbar dropdowns, modals, and context menus are built with raw DOM creation. This leads to long `main.js` files with imperative event wiring. 34 + - No built-in state management. Toolbar button active states, modal visibility, and sidebar toggles are tracked with ad-hoc variables. A framework would provide cleaner patterns here. 35 + - No ecosystem. No off-the-shelf component library, no form validation library, no routing library. Everything is hand-rolled. 36 + - Steeper onboarding for developers who expect a framework. The codebase requires understanding of raw DOM APIs rather than framework conventions. 37 + 38 + **Mitigations:** 39 + - The pure logic module pattern compensates for the lack of a component model. Business logic is testable and reusable; only the thin DOM wiring layer is imperative. 40 + - TipTap's extension system provides a component-like abstraction for the docs editor (extensions with lifecycle hooks, commands, and keyboard shortcuts). 41 + - The single-page-per-surface architecture (landing, docs, sheets are separate HTML entry points via Vite's multi-page mode) avoids the need for client-side routing.

+51

docs/adr/002-e2ee-key-in-fragment.md

··· 1 + # ADR 002: E2EE Key in URL Fragment 2 + 3 + ## Status 4 + 5 + Accepted 6 + 7 + ## Context 8 + 9 + Tools is end-to-end encrypted. Every document has a unique AES-256-GCM encryption key. We needed a mechanism to: 10 + 11 + 1. Allow the browser to access the key for encrypt/decrypt operations. 12 + 2. Share the key with collaborators via a simple URL. 13 + 3. Ensure the server never receives or stores the key. 14 + 4. Work without user accounts, passwords, or key exchange protocols. 15 + 16 + Options considered: 17 + 18 + - **Key in URL query parameter** (`?key=abc`): Query parameters ARE sent to the server in HTTP requests. Disqualified. 19 + - **Key in URL fragment** (`#abc`): Fragments are NOT sent to the server per RFC 3986. The browser retains the fragment locally. 20 + - **Key derived from password** (PBKDF2/scrypt): Requires a password-entry step, complicates sharing (must share password out-of-band), and adds key derivation latency. 21 + - **Key exchange via WebSocket**: Would work but requires online peers. New collaborators arriving after the original sharer goes offline would be stuck. 22 + - **Server-side key escrow**: Fundamentally incompatible with zero-knowledge architecture. 23 + 24 + ## Decision 25 + 26 + Store the AES-256-GCM encryption key in the URL fragment (`#base64urlKey`). The key is generated client-side via `crypto.subtle.generateKey()`, exported to URL-safe base64 via `crypto.subtle.exportKey('raw')`, and appended to the document URL as `https://tools.example.com/docs/{docId}#{keyBase64}`. 27 + 28 + When a client loads the page, it reads `location.hash`, imports the key via `crypto.subtle.importKey()`, and uses it for all subsequent encrypt/decrypt operations. 29 + 30 + The key is also stored in `localStorage` keyed by document ID, so returning users do not need the full URL after first visit. 31 + 32 + ## Consequences 33 + 34 + **Positive:** 35 + - The server never receives the key. The URL fragment is excluded from HTTP requests by specification (RFC 3986 Section 3.5). This is not a convention -- it is a protocol-level guarantee. 36 + - Sharing is a single link. No password exchange, no invitation flow, no account creation. Copy the URL, send it to a collaborator, done. 37 + - Zero-knowledge is verifiable. Anyone can open browser dev tools, inspect network traffic, and confirm the fragment is absent from requests. 38 + - Works offline. The key is in localStorage; no server round-trip is needed to decrypt cached content. 39 + - No key management server. No HSM, no KMS, no key database. The key lives in the URL and the browser. 40 + 41 + **Negative:** 42 + - **Anyone with the URL has full access.** There is no authentication layer. If a share link leaks, anyone who obtains it can decrypt the document. Mitigation: link expiry (1h/1d/7d/30d) and planned key rotation (#136). 43 + - **Key is visible in browser history, bookmarks, and URL bar.** A person with physical access to the device can see the key. Mitigation: browsers do not send fragments in Referer headers, and private/incognito mode avoids history. 44 + - **No access revocation without key rotation.** Once a key is shared, the recipient can save it. Revoking access requires generating a new key, re-encrypting the document, and distributing new URLs. This is planned (#136) but not yet implemented. 45 + - **URL-safe base64 encoding.** Standard base64 uses `+` and `/` which are problematic in URLs. We use URL-safe base64 (`-` and `_`, no padding) to avoid encoding issues. 46 + - **Fragment length limits.** A 256-bit key exports to 43 characters of base64, well within any browser's URL length limit (typically 2000+ characters). No practical concern. 47 + 48 + **Security Properties:** 49 + - The encryption key is a 256-bit random value generated by `crypto.getRandomValues()` via the Web Crypto API. It has 2^256 possible values -- brute force is computationally infeasible. 50 + - AES-256-GCM provides authenticated encryption. If a single bit of ciphertext or the IV is corrupted (accidental or malicious), decryption fails entirely rather than producing garbled plaintext. 51 + - Each encryption operation uses a fresh random 96-bit IV, prepended to the ciphertext. Reuse of (key, IV) pairs would compromise GCM's security, but with 96-bit random IVs the collision probability is negligible for any practical document lifecycle.

+55

docs/adr/003-yjs-crdt-collaboration.md

··· 1 + # ADR 003: Yjs CRDT for Collaboration 2 + 3 + ## Status 4 + 5 + Accepted 6 + 7 + ## Context 8 + 9 + Tools needs real-time collaborative editing where multiple users can edit the same document simultaneously with their changes merging correctly. We needed a collaboration engine that: 10 + 11 + 1. Works with an encrypted relay server (server cannot read or merge content). 12 + 2. Handles concurrent edits without a central authority resolving conflicts. 13 + 3. Integrates with TipTap/ProseMirror (docs) and a custom data structure (sheets). 14 + 4. Provides awareness (cursor positions, user presence). 15 + 5. Supports offline editing with eventual consistency. 16 + 6. Has efficient binary encoding for persistence and transmission. 17 + 18 + Options considered: 19 + 20 + - **Operational Transformation (OT)** via ShareDB, Firepad, or custom implementation. OT requires a central server to apply transforms in a canonical order. This is incompatible with our E2EE architecture -- the server cannot apply transforms to data it cannot read. 21 + - **Yjs** (CRDT). Conflict-free Replicated Data Types merge automatically without a central authority. Each client applies remote updates locally. The server only relays opaque binary messages. 22 + - **Automerge** (CRDT). Similar to Yjs but with a different internal representation. At the time of evaluation, Automerge had less mature ProseMirror integration and larger encoded state sizes. 23 + - **Custom CRDT**. Building a CRDT from scratch for both rich text and spreadsheet data structures would be a multi-month project with subtle correctness risks. 24 + 25 + ## Decision 26 + 27 + Use Yjs as the CRDT library for all collaborative data structures: 28 + 29 + - **Documents**: Yjs XML Fragment via `@tiptap/extension-collaboration`, which maps ProseMirror's document model to Yjs types. 30 + - **Spreadsheets**: Yjs Maps and Arrays. `ydoc.getMap('sheets')` contains per-sheet Y.Maps with `cells` (Y.Map of cell data), `colWidths`, `freezeRows`, `cfRules`, `validations`, `merges`, `notes`, etc. 31 + - **Awareness**: Yjs Awareness protocol for cursor positions, user names, and colors. 32 + - **Persistence**: `Y.encodeStateAsUpdate(doc)` produces a compact binary encoding that is encrypted and stored on the server. 33 + - **Sync**: Custom `EncryptedProvider` (src/lib/provider.js) wraps Yjs sync protocol messages in AES-256-GCM encryption before WebSocket transmission. 34 + 35 + ## Consequences 36 + 37 + **Positive:** 38 + - No server-side conflict resolution. The server is a dumb relay. It forwards encrypted binary messages between peers without reading, modifying, or ordering them. This is exactly what E2EE requires. 39 + - Automatic merge. Two users editing different parts of a document (or even the same paragraph) will see their changes merge correctly without manual conflict resolution. 40 + - Offline support is inherent. Yjs documents can diverge arbitrarily while offline. When clients reconnect, they exchange state vectors and missing updates to converge. No special offline mode needed. 41 + - Rich type system. Y.Map, Y.Array, Y.Text, and Y.XmlFragment cover both the ProseMirror document model and the spreadsheet grid data model. 42 + - Undo/redo via `Y.UndoManager` -- integrates with the shared document so undo only affects local changes, not remote collaborators' changes. 43 + - Binary encoding. Yjs updates are compact binary (varint-encoded). A typical edit is 10-50 bytes, much smaller than JSON-based sync protocols. 44 + - Mature ecosystem. `y-prosemirror` (TipTap's collaboration layer) is battle-tested. The awareness protocol is well-specified. 45 + 46 + **Negative:** 47 + - CRDT overhead. Yjs maintains tombstones (deleted content markers) that grow over time. For long-lived documents with heavy editing, the document size can grow beyond the visible content size. Mitigation: periodic garbage collection via `Y.encodeStateAsUpdate()` which compacts the state. 48 + - Last-writer-wins for Y.Map values. When two users simultaneously edit the same spreadsheet cell, the last write wins (no merge). This is acceptable for cell-level granularity -- a cell is an atomic unit. 49 + - Learning curve. Yjs's shared types (Y.Map, Y.Array, Y.Doc) have different APIs from plain JavaScript objects. Developers must learn to use `.get()`, `.set()`, `.observe()`, and transactions (`ydoc.transact()`). 50 + - No built-in encrypted provider. We had to build `EncryptedProvider` from scratch (src/lib/provider.js) to wrap the sync protocol in E2EE. The standard `y-websocket` provider sends plaintext. Our provider handles: encrypted sync step 1/2, encrypted updates, encrypted awareness, snapshot persistence, and reconnection. 51 + 52 + **Trade-offs vs OT:** 53 + - OT guarantees intent preservation for text operations (insert at position N means exactly position N after transforms). CRDTs guarantee eventual consistency but may reorder concurrent inserts differently. In practice, this is indistinguishable for users because both produce a valid merged document. 54 + - OT can be more space-efficient (no tombstones). CRDTs are more network-efficient (no server round-trip for transform application). 55 + - OT requires a central server. CRDTs do not. For E2EE, this is the deciding factor.

+77

docs/adr/004-sqlite-encrypted-storage.md

··· 1 + # ADR 004: SQLite with Encrypted Blob Storage 2 + 3 + ## Status 4 + 5 + Accepted 6 + 7 + ## Context 8 + 9 + The Tools server needs to persist: 10 + 11 + 1. **Document metadata**: ID, type (doc/sheet), encrypted name, share mode, expiry, timestamps. 12 + 2. **Document snapshots**: The full Yjs document state, encrypted as a binary blob. 13 + 3. **Version history**: Up to 50 encrypted snapshots per document with metadata. 14 + 15 + We needed a storage solution that: 16 + 17 + - Requires zero configuration (no database server process, no connection strings). 18 + - Handles binary blobs efficiently (encrypted snapshots can be 100KB-10MB). 19 + - Supports concurrent reads during writes (multiple clients may load snapshots while one saves). 20 + - Is trivially backable-up (copy a file). 21 + - Runs in a single Docker container alongside the Node.js server. 22 + 23 + Options considered: 24 + 25 + - **PostgreSQL**: Full RDBMS. Excellent for complex queries and large datasets, but requires a separate server process, connection management, and configuration. Overkill for our simple data model (two tables, no joins, no complex queries). 26 + - **SQLite**: Embedded, single-file, zero-configuration. WAL mode provides concurrent reads during writes. Native support for BLOB columns. The `better-sqlite3` npm package provides synchronous, high-performance bindings. 27 + - **Filesystem (flat files)**: Store each document as a file. Simple but lacks atomic metadata updates, efficient listing/querying, and concurrent access guarantees. 28 + - **LevelDB/RocksDB**: Key-value stores with good write performance. No schema, no SQL, harder to inspect and debug than SQLite. 29 + 30 + ## Decision 31 + 32 + Use SQLite with WAL (Write-Ahead Logging) mode via the `better-sqlite3` npm package. Store encrypted blobs directly in BLOB columns. The schema: 33 + 34 + ```sql 35 + CREATE TABLE documents ( 36 + id TEXT PRIMARY KEY, 37 + type TEXT NOT NULL CHECK(type IN ('doc','sheet')), 38 + name_encrypted TEXT, -- AES-256-GCM ciphertext, base64-encoded 39 + snapshot BLOB, -- Encrypted Yjs state, raw binary 40 + share_mode TEXT DEFAULT 'edit', 41 + expires_at TEXT, 42 + created_at TEXT DEFAULT (datetime('now')), 43 + updated_at TEXT DEFAULT (datetime('now')) 44 + ); 45 + 46 + CREATE TABLE versions ( 47 + id TEXT PRIMARY KEY, 48 + document_id TEXT NOT NULL, 49 + snapshot BLOB NOT NULL, -- Encrypted Yjs state, raw binary 50 + created_at TEXT DEFAULT (datetime('now')), 51 + metadata TEXT -- JSON: { author, wordCount } (encrypted by client) 52 + ); 53 + ``` 54 + 55 + The server uses prepared statements for all operations, with a FIFO pruning policy (max 50 versions per document, oldest deleted first). 56 + 57 + ## Consequences 58 + 59 + **Positive:** 60 + - Zero configuration. No database server to install, configure, or manage. The SQLite file is created automatically on first run. 61 + - Single-file backup. `cp tools.db tools.db.bak` is a complete backup. No pg_dump, no mysqldump, no multi-file coordination. 62 + - Docker-friendly. The database file lives in a volume-mounted `/data` directory. No sidecar container, no network dependency. 63 + - WAL mode concurrency. Multiple clients can read snapshots simultaneously while another client writes. WAL mode is enabled with a single pragma: `db.pragma('journal_mode = WAL')`. 64 + - Prepared statements. All queries are prepared once at startup via `db.prepare()`. This avoids SQL injection and provides consistent performance. 65 + - Synchronous API. `better-sqlite3` uses synchronous calls (not callbacks or promises). This simplifies the Express request handlers -- a request reads from the database and responds in a single synchronous flow. For our workload (small number of concurrent requests, fast queries), this is more performant than async database drivers. 66 + - Schema migrations are handled inline. On startup, the server checks for missing columns (`share_mode`, `expires_at`) and adds them with `ALTER TABLE`. This handles upgrades gracefully. 67 + 68 + **Negative:** 69 + - Single-writer concurrency. SQLite allows only one writer at a time (even with WAL). If two clients simultaneously save snapshots, one blocks briefly. For our scale (small teams, saves debounced to every 500ms+), this is not a bottleneck. 70 + - No horizontal scaling. SQLite cannot be shared across multiple server processes or machines. If Tools ever needs to scale beyond a single server, the storage layer would need to be replaced with PostgreSQL or a similar networked database. This is a deliberate trade-off: self-hosting simplicity today over theoretical future scale. 71 + - BLOB size limits. SQLite supports BLOBs up to 2GB by default. Encrypted Yjs snapshots are typically 1KB-10MB. No practical concern, but extremely large documents (100MB+ of content) could approach limits. 72 + - No full-text search. SQLite has FTS5 for full-text search, but our content is encrypted -- the server cannot index it. Full-text search must be done client-side (#55). 73 + 74 + **Security Properties:** 75 + - The server stores only encrypted data. The `snapshot` BLOB and `name_encrypted` TEXT columns contain AES-256-GCM ciphertext. Even with full database access (SQL injection, backup theft, disk forensics), the content is unrecoverable without the document-specific encryption key. 76 + - The `metadata` column in `versions` is an encrypted JSON string (encrypted by the client before sending). The server cannot read version metadata. 77 + - The only plaintext metadata the server stores: document ID, type (doc/sheet), share_mode, expires_at, and timestamps. These are necessary for the REST API to function but reveal no content.

Configure Feed

Configure Feed