this repo has no description
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

Phase 1: Quick cleanup of impl.ml

- Fix duplicate Buffer.clear call (line 251-252 was clearing code_buff twice)
- Remove unused `complete` function (just "Not implemented")
- Remove unused `split_primitives` function (leftover from removed compile_js)

These are non-breaking changes that reduce dead code in impl.ml.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

+1020 -14
+52
.devcontainer/devcontainer.json
··· 1 + { 2 + "name": "Claude Code OCaml Sandbox", 3 + "image": "ghcr.io/avsm/claude-ocaml-devcontainer:main", 4 + "runArgs": [ 5 + "--cap-add=NET_ADMIN", 6 + "--cap-add=NET_RAW" 7 + ], 8 + "customizations": { 9 + "vscode": { 10 + "extensions": [ 11 + "anthropic.claude-code", 12 + "dbaeumer.vscode-eslint", 13 + "esbenp.prettier-vscode", 14 + "eamodio.gitlens", 15 + "ocamllabs.ocaml-platform" 16 + ], 17 + "settings": { 18 + "editor.formatOnSave": true, 19 + "editor.defaultFormatter": "esbenp.prettier-vscode", 20 + "editor.codeActionsOnSave": { 21 + "source.fixAll.eslint": "explicit" 22 + }, 23 + "terminal.integrated.defaultProfile.linux": "zsh", 24 + "terminal.integrated.profiles.linux": { 25 + "bash": { 26 + "path": "bash", 27 + "icon": "terminal-bash" 28 + }, 29 + "zsh": { 30 + "path": "zsh" 31 + } 32 + } 33 + } 34 + } 35 + }, 36 + "remoteUser": "node", 37 + "mounts": [ 38 + "source=claude-code-bashhistory-${devcontainerId},target=/commandhistory,type=volume", 39 + "source=${localEnv:HOME}/.claude,target=/home/node/.claude,type=bind", 40 + "source=${localEnv:HOME}/.ssh,target=/home/node/.ssh,type=bind,readonly", 41 + "source=${localEnv:HOME}/.gitconfig,target=/home/node/.gitconfig,type=bind,readonly" 42 + ], 43 + "containerEnv": { 44 + "NODE_OPTIONS": "--max-old-space-size=4096", 45 + "CLAUDE_CONFIG_DIR": "/home/node/.claude", 46 + "POWERLEVEL9K_DISABLE_GITSTATUS": "true" 47 + }, 48 + "workspaceMount": "source=${localWorkspaceFolder},target=/workspace,type=bind,consistency=delegated", 49 + "workspaceFolder": "/workspace", 50 + "postCreateCommand": "sudo /usr/local/bin/init-firewall.sh", 51 + "waitFor": "postStartCommand" 52 + }
+14 -1
CLAUDE.md
··· 51 51 - `example/`: Example applications demonstrating worker usage 52 52 - `bin/`: Command-line tools, notably `jtw` for OPAM package handling 53 53 54 - The system uses RPC (via `rpclib`) for communication between the client and worker, with support for both browser WebWorkers and Unix sockets for testing. 54 + The system uses RPC (via `rpclib`) for communication between the client and worker, with support for both browser WebWorkers and Unix sockets for testing. 55 + 56 + ## Technical Q&A Log 57 + 58 + When the user asks technical questions about the codebase, tools, or dependencies (especially js_of_ocaml, dune, findlib, etc.), Claude should: 59 + 60 + 1. **Answer the question** with technical accuracy 61 + 2. **Record the Q&A** in `docs/technical-qa.md` with: 62 + - The question asked 63 + - The answer provided 64 + - Verification steps taken (code inspection, testing, documentation lookup) 65 + - Date of the entry 66 + 67 + This creates institutional knowledge that persists across sessions.
+339
docs/architecture.md
··· 1 + # js_top_worker Architecture 2 + 3 + This document describes the current architecture of js_top_worker and the planned changes. 4 + 5 + ## Overview 6 + 7 + js_top_worker is an OCaml toplevel (REPL) designed to run in a Web Worker or remote process. It enables interactive OCaml execution in browsers for: 8 + 9 + - Jupyter-style notebooks 10 + - Interactive documentation 11 + - Educational tools (lecture slides, tutorials) 12 + - Library documentation with live examples 13 + 14 + ## System Architecture 15 + 16 + ``` 17 + ┌─────────────────────────────────────────────────────────────────┐ 18 + │ Browser │ 19 + │ ┌──────────────────┐ ┌──────────────────────────────┐ │ 20 + │ │ Frontend │ │ Web Worker │ │ 21 + │ │ │ │ │ │ 22 + │ │ ┌────────────┐ │ RPC │ ┌────────────────────────┐ │ │ 23 + │ │ │ Client │◄─┼────────►│ │ Server │ │ │ 24 + │ │ │ (Lwt/Fut) │ │ JSON │ │ (worker.ml) │ │ │ 25 + │ │ └────────────┘ │ │ └──────────┬─────────────┘ │ │ 26 + │ │ │ │ │ │ │ 27 + │ │ │ │ ┌──────────▼─────────────┐ │ │ 28 + │ │ │ │ │ Implementation │ │ │ 29 + │ │ │ │ │ (impl.ml) │ │ │ 30 + │ │ │ │ │ - Execute phrases │ │ │ 31 + │ │ │ │ │ - Type checking │ │ │ 32 + │ │ │ │ │ - Code completion │ │ │ 33 + │ │ │ │ └──────────┬─────────────┘ │ │ 34 + │ │ │ │ │ │ │ 35 + │ │ │ │ ┌──────────▼─────────────┐ │ │ 36 + │ │ │ │ │ js_of_ocaml-toplevel │ │ │ 37 + │ │ │ │ │ + Merlin │ │ │ 38 + │ │ │ │ └────────────────────────┘ │ │ 39 + │ └──────────────────┘ └──────────────────────────────┘ │ 40 + └─────────────────────────────────────────────────────────────────┘ 41 + ``` 42 + 43 + ## Package Structure 44 + 45 + | Package | Purpose | Key Files | 46 + |---------|---------|-----------| 47 + | `js_top_worker` | Core toplevel implementation | `lib/impl.ml`, `lib/ocamltop.ml` | 48 + | `js_top_worker-web` | Web Worker implementation | `lib/worker.ml`, `lib/findlibish.ml` | 49 + | `js_top_worker-rpc` | RPC type definitions | `idl/toplevel_api.ml` | 50 + | `js_top_worker-client` | Lwt-based client | `idl/js_top_worker_client.ml` | 51 + | `js_top_worker-client_fut` | Fut-based client | `idl/js_top_worker_client_fut.ml` | 52 + | `js_top_worker-unix` | Unix socket backend (testing) | - | 53 + | `js_top_worker-bin` | CLI tools (`jtw`) | `bin/jtw.ml` | 54 + 55 + ## Current Communication Layer 56 + 57 + ### RPC Protocol 58 + 59 + Uses [ocaml-rpc](https://github.com/mirage/ocaml-rpc) with JSON-RPC 2.0: 60 + 61 + ``` 62 + Client Server (Worker) 63 + │ │ 64 + │ ──── JSON-RPC request ────────► │ 65 + │ {method: "exec", │ 66 + │ params: ["let x = 1"], │ 67 + │ id: 1} │ 68 + │ │ 69 + │ ◄─── JSON-RPC response ──────── │ 70 + │ {result: {...}, │ 71 + │ id: 1} │ 72 + │ │ 73 + ``` 74 + 75 + ### RPC Operations 76 + 77 + | Method | Parameters | Returns | Description | 78 + |--------|------------|---------|-------------| 79 + | `init` | `init_config` | `unit` | Initialize toplevel | 80 + | `setup` | `unit` | `exec_result` | Start toplevel | 81 + | `exec` | `string` | `exec_result` | Execute OCaml phrase | 82 + | `typecheck` | `string` | `exec_result` | Type check without execution | 83 + | `complete_prefix` | `id, deps, source, position` | `completions` | Autocomplete | 84 + | `query_errors` | `id, deps, source` | `error list` | Get compilation errors | 85 + | `type_enclosing` | `id, deps, source, position` | `typed_enclosings` | Type at position | 86 + 87 + ### Type Definitions 88 + 89 + Key types from `idl/toplevel_api.ml`: 90 + 91 + ```ocaml 92 + type exec_result = { 93 + stdout : string option; 94 + stderr : string option; 95 + sharp_ppf : string option; (* # directive output *) 96 + caml_ppf : string option; (* Regular output *) 97 + highlight : highlight option; (* Error location *) 98 + mime_vals : mime_val list; (* Rich output *) 99 + } 100 + 101 + type mime_val = { 102 + mime_type : string; (* e.g., "text/html" *) 103 + encoding : encoding; (* Noencoding | Base64 *) 104 + data : string; 105 + } 106 + 107 + type init_config = { 108 + findlib_requires : string list; (* Packages to preload *) 109 + stdlib_dcs : string option; (* Dynamic CMIs URL *) 110 + execute : bool; (* Allow execution? *) 111 + } 112 + ``` 113 + 114 + ## Core Implementation 115 + 116 + ### Module Structure (`lib/impl.ml`) 117 + 118 + ```ocaml 119 + module type S = sig 120 + type findlib_t 121 + val capture : (unit -> 'a) -> unit -> captured * 'a 122 + val sync_get : string -> string option 123 + val async_get : string -> (string, [`Msg of string]) result Lwt.t 124 + val import_scripts : string list -> unit 125 + val get_stdlib_dcs : string -> dynamic_cmis list 126 + val findlib_init : string -> findlib_t Lwt.t 127 + val require : bool -> findlib_t -> string list -> dynamic_cmis list 128 + val path : string 129 + end 130 + 131 + module Make (S : S) : sig 132 + val init : init_config -> unit Lwt.t 133 + val setup : unit -> exec_result Lwt.t 134 + val exec : string -> exec_result Lwt.t 135 + val typecheck : string -> exec_result Lwt.t 136 + (* ... *) 137 + end 138 + ``` 139 + 140 + ### Execution Flow 141 + 142 + ``` 143 + exec(phrase) 144 + 145 + 146 + capture stdout/stderr 147 + 148 + 149 + parse phrase (Ocamltop.parse_toplevel) 150 + 151 + 152 + execute (Toploop.execute_phrase) 153 + 154 + 155 + collect MIME outputs 156 + 157 + 158 + return exec_result 159 + ``` 160 + 161 + ### Cell Dependency System 162 + 163 + Cells can depend on previous cells via module wrapping: 164 + 165 + ```ocaml 166 + (* Cell "c1" defines: *) 167 + let x = 1 168 + 169 + (* Internally becomes module Cell__c1 *) 170 + 171 + (* Cell "c2" with deps=["c1"]: *) 172 + let y = x + 1 173 + 174 + (* Prepended with: open Cell__c1 *) 175 + ``` 176 + 177 + The `mangle_toplevel` function handles this transformation. 178 + 179 + ## Library Loading 180 + 181 + ### findlibish.ml 182 + 183 + Custom findlib-like implementation for WebWorker context: 184 + 185 + ``` 186 + ┌─────────────────┐ 187 + │ findlib_index │ (list of META URLs) 188 + └────────┬────────┘ 189 + 190 + ┌──────────────┼──────────────┐ 191 + ▼ ▼ ▼ 192 + ┌─────────┐ ┌─────────┐ ┌─────────┐ 193 + │ META │ │ META │ │ META │ 194 + │ (pkg A) │ │ (pkg B) │ │ (pkg C) │ 195 + └────┬────┘ └────┬────┘ └────┬────┘ 196 + │ │ │ 197 + ▼ ▼ ▼ 198 + ┌─────────────────────────────────────┐ 199 + │ Dependency Resolution │ 200 + └─────────────────┬───────────────────┘ 201 + 202 + ┌────────────┼────────────┐ 203 + ▼ ▼ ▼ 204 + ┌──────────┐ ┌──────────┐ ┌──────────┐ 205 + │ .cma.js │ │ .cma.js │ │ .cma.js │ 206 + │ (import) │ │ (import) │ │ (import) │ 207 + └──────────┘ └──────────┘ └──────────┘ 208 + ``` 209 + 210 + ### Package Loading Process 211 + 212 + 1. Fetch `findlib_index` (list of META file URLs) 213 + 2. Parse each META file with `Fl_metascanner` 214 + 3. Build dependency graph 215 + 4. On `#require`: 216 + - Resolve dependencies 217 + - Fetch `dynamic_cmis.json` for each package 218 + - Load `.cma.js` via `import_scripts` 219 + 220 + ### Preloaded Packages 221 + 222 + These are compiled into the worker and not loaded dynamically: 223 + 224 + - `compiler-libs.common`, `compiler-libs.toplevel` 225 + - `merlin-lib.*` 226 + - `js_of_ocaml-compiler`, `js_of_ocaml-toplevel` 227 + - `findlib`, `findlib.top` 228 + 229 + ## Merlin Integration 230 + 231 + Code intelligence features use Merlin: 232 + 233 + | Feature | Merlin Query | Implementation | 234 + |---------|--------------|----------------| 235 + | Completion | `Query_protocol.Complete_prefix` | `complete_prefix` | 236 + | Type info | `Query_protocol.Type_enclosing` | `type_enclosing` | 237 + | Errors | `Query_protocol.Errors` | `query_errors` | 238 + 239 + Queries run through `Mpipeline` with source "mangled" to include cell dependencies. 240 + 241 + ## Planned Architecture Changes 242 + 243 + ### Phase 1: Communication Redesign 244 + 245 + Replace JSON-RPC with CBOR-based bidirectional channel: 246 + 247 + ``` 248 + Current: Planned: 249 + ┌─────────┐ JSON-RPC ┌─────────┐ CBOR 250 + │ Client │◄──────────────► │ Client │◄──────────────► 251 + │ │ request/response │ │ bidirectional 252 + └─────────┘ └─────────┘ 253 + 254 + Message types: 255 + - Request/Response (like RPC) 256 + - Push (server → client) 257 + - Widget events (bidirectional) 258 + ``` 259 + 260 + ### Phase 2: Environment Isolation 261 + 262 + Multiple isolated execution contexts: 263 + 264 + ``` 265 + ┌──────────────────────────────────────────┐ 266 + │ Web Worker │ 267 + │ │ 268 + │ ┌─────────────┐ ┌─────────────┐ │ 269 + │ │ Env "a" │ │ Env "b" │ │ 270 + │ │ │ │ │ │ 271 + │ │ Cell 1 │ │ Cell 1 │ │ 272 + │ │ Cell 2 │ │ Cell 2 │ │ 273 + │ │ (isolated) │ │ (isolated) │ │ 274 + │ └─────────────┘ └─────────────┘ │ 275 + │ │ 276 + │ Shared: stdlib, preloaded packages │ 277 + └──────────────────────────────────────────┘ 278 + ``` 279 + 280 + ### Phase 3: Rich Output & Widgets 281 + 282 + MIME-typed output with bidirectional widget communication: 283 + 284 + ```ocaml 285 + (* User code *) 286 + let chart = Chart.bar [1; 2; 3; 4] in 287 + Display.show chart 288 + 289 + (* Generates *) 290 + { 291 + mime_type = "application/vnd.widget+json"; 292 + data = {widget_id = "w1"; state = ...} 293 + } 294 + 295 + (* Frontend renders widget, sends events back *) 296 + Widget_event {widget_id = "w1"; event = Click {x; y}} 297 + ``` 298 + 299 + ## File Reference 300 + 301 + ### Core Files 302 + 303 + | File | Lines | Purpose | 304 + |------|-------|---------| 305 + | `lib/impl.ml` | 985 | Main implementation (execute, typecheck, etc.) | 306 + | `lib/worker.ml` | 100 | WebWorker server setup | 307 + | `lib/findlibish.ml` | 221 | Package loading | 308 + | `idl/toplevel_api.ml` | 315 | RPC type definitions | 309 + | `idl/js_top_worker_client.ml` | 126 | Lwt client | 310 + 311 + ### Build Outputs 312 + 313 + | File | Description | 314 + |------|-------------| 315 + | `worker.bc.js` | Compiled Web Worker | 316 + | `*.cma.js` | JavaScript-compiled OCaml libraries | 317 + | `dynamic_cmis.json` | CMI metadata for each package | 318 + 319 + ## Dependencies 320 + 321 + ### Runtime 322 + 323 + - `js_of_ocaml` >= 3.11.0 324 + - `js_of_ocaml-toplevel` 325 + - `js_of_ocaml-compiler` 326 + - `rpclib`, `rpclib-lwt` 327 + - `merlin-lib` 328 + - `compiler-libs` 329 + - `brr` >= 0.0.4 330 + 331 + ### Planned Additions 332 + 333 + - `cbort` - CBOR codec (tangled.org) 334 + - `zarith_stubs_js` - JS stubs for zarith 335 + - `bytesrw` - Streaming I/O 336 + 337 + --- 338 + 339 + *Last updated: 2026-01-20*
+568
docs/investigation-report.md
··· 1 + # js_top_worker Investigation Report 2 + 3 + This document captures research findings for the communication layer redesign. 4 + 5 + ## Phase 0.1: Wire Format Research 6 + 7 + ### Goal 8 + 9 + Find a suitable serialization format for bidirectional typed messaging between frontend (browser) and backend (WebWorker/remote). 10 + 11 + ### Requirements 12 + 13 + - Binary format preferred (compact, fast) 14 + - Type-safe OCaml codec (define once, use for both encode/decode) 15 + - js_of_ocaml compatible 16 + - Support for structured data (records, variants, arrays, maps) 17 + 18 + ### Options Evaluated 19 + 20 + | Library | Format | js_of_ocaml | Notes | 21 + |---------|--------|-------------|-------| 22 + | ocaml-rpc (current) | JSON-RPC | Yes | Request-response only, no push | 23 + | jsont | JSON | Yes (via brr) | Type-safe combinators, JSON only | 24 + | msgpck | MessagePack | Likely (pure OCaml) | Less active | 25 + | cbor | CBOR | Likely (pure OCaml) | Basic API | 26 + | **cbort** | CBOR | Yes (via zarith_stubs_js) | Type-safe combinators, RFC 8949 | 27 + 28 + ### Recommendation: cbort 29 + 30 + The [cbort](https://tangled.org/@anil.recoil.org/ocaml-cbort.git) library by Anil Madhavapeddy is the best choice: 31 + 32 + 1. **Type-safe combinators** following the jsont pattern - define codecs once, use bidirectionally 33 + 2. **CBOR format** (RFC 8949) - compact binary, smaller than JSON, widely supported 34 + 3. **js_of_ocaml compatible** via zarith_stubs_js for arbitrary-precision integers 35 + 4. **Built on bytesrw** for efficient streaming I/O 36 + 5. **Path-aware error messages** for debugging decode failures 37 + 38 + #### Example Codec Definition 39 + 40 + ```ocaml 41 + open Cbort 42 + 43 + type person = { name : string; age : int } 44 + 45 + let person_codec = 46 + let open Obj in 47 + let* name = mem "name" (fun p -> p.name) string in 48 + let* age = mem "age" (fun p -> p.age) int in 49 + return { name; age } 50 + |> finish 51 + 52 + (* Encode to CBOR bytes *) 53 + let encoded = encode_string person_codec { name = "Alice"; age = 30 } 54 + 55 + (* Decode from CBOR bytes *) 56 + let decoded = decode_string person_codec encoded 57 + ``` 58 + 59 + #### Dependencies 60 + 61 + - `bytesrw >= 0.2` - Pure OCaml streaming I/O 62 + - `zarith >= 1.12` - Arbitrary precision integers (uses zarith_stubs_js for JS) 63 + - `crowbar` - Fuzz testing (dev only) 64 + 65 + #### Installation 66 + 67 + Currently available from tangled.org: 68 + ``` 69 + git clone https://tangled.org/@anil.recoil.org/ocaml-cbort.git 70 + ``` 71 + 72 + Will need pin-depends in dune-project until published to opam. 73 + 74 + ### Jupyter Protocol Reference 75 + 76 + For comparison, Jupyter uses: 77 + - **JSON** for message content 78 + - **ZeroMQ** for transport (multipart messages) 79 + - **MIME types** for rich output (text/plain, text/html, image/png, etc.) 80 + 81 + Key Jupyter message types: 82 + - `execute_request` / `execute_reply` - Code execution 83 + - `stream` - stdout/stderr output 84 + - `display_data` - MIME-typed rich output 85 + - `comm_open` / `comm_msg` - Bidirectional widget communication 86 + 87 + Our design will follow similar patterns but use CBOR instead of JSON. 88 + 89 + --- 90 + 91 + ## Phase 0.2: Findlib Investigation 92 + 93 + ### Goal 94 + 95 + Understand what real `findlib.top` does and whether to integrate it or improve `findlibish`. 96 + 97 + ### Current Implementation: findlibish 98 + 99 + The project has a custom `findlibish.ml` (221 lines) that: 100 + 101 + 1. Parses META files using `Fl_metascanner` 102 + 2. Builds a library dependency graph 103 + 3. Resolves `#require` requests 104 + 4. Loads `.cma.js` archives via `import_scripts` 105 + 5. Fetches `dynamic_cmis.json` for type information 106 + 107 + Key differences from real findlib: 108 + - No `topfind` file mechanism 109 + - No `#list`, `#camlp4o`, etc. directives 110 + - Hardcoded list of "preloaded" packages (compiler-libs, merlin, etc.) 111 + - URL-based fetching instead of filesystem access 112 + 113 + ### Real Findlib Behavior (from source analysis) 114 + 115 + Studied [ocamlfind source](https://github.com/ocaml/ocamlfind) - specifically `src/findlib/topfind.ml.in`. 116 + 117 + #### Directive Registration 118 + 119 + Findlib registers directives by adding to `Toploop.directive_table`: 120 + 121 + ```ocaml 122 + Hashtbl.add 123 + Toploop.directive_table 124 + "require" 125 + (Toploop.Directive_string 126 + (fun s -> protect load_deeply (Fl_split.in_words s))) 127 + ``` 128 + 129 + #### Package Loading (`load` function) 130 + 131 + The `load` function performs these steps: 132 + 1. Get package directory via `Findlib.package_directory pkg` 133 + 2. Add directory to search path via `Topdirs.dir_directory d` 134 + 3. Get `archive` property from META file 135 + 4. Load archives via `Topdirs.dir_load Format.std_formatter archive` 136 + 5. Handle PPX properties (if defined) 137 + 6. Record package as loaded via `Findlib.record_package` 138 + 139 + #### Deep Loading (`load_deeply` function) 140 + 141 + ```ocaml 142 + let load_deeply pkglist = 143 + (* Get the sorted list of ancestors *) 144 + let eff_pkglist = 145 + Findlib.package_deep_ancestors !predicates pkglist in 146 + (* Check for error properties *) 147 + List.iter (fun pkg -> 148 + try let error = Findlib.package_property !predicates pkg "error" in 149 + failwith ("Error from package `" ^ pkg ^ "': " ^ error) 150 + with Not_found -> ()) eff_pkglist ; 151 + (* Load the packages in turn: *) 152 + load eff_pkglist 153 + ``` 154 + 155 + #### Key Mechanisms 156 + 157 + | Findlib | findlibish | Notes | 158 + |---------|------------|-------| 159 + | `Topdirs.dir_load` | `import_scripts` | Native .cma vs .cma.js | 160 + | `Topdirs.dir_directory` | N/A | Search path management | 161 + | `Findlib.package_directory` | URL-based | Filesystem vs HTTP | 162 + | Predicate system | Hardcoded | `["byte"; "toploop"]` etc. | 163 + | `Findlib.record_package` | `loaded` mutable field | Track loaded packages | 164 + 165 + ### Recommendation 166 + 167 + **Keep findlibish but improve it**. The architectures are fundamentally different: 168 + 169 + 1. **Findlib**: Native bytecode loading, filesystem access, Toploop integration 170 + 2. **findlibish**: JavaScript module loading, URL fetching, WebWorker context 171 + 172 + Key improvements to make: 173 + 1. Add `.mli` file documenting the API 174 + 2. Support `#list` directive for discoverability 175 + 3. Better error messages when packages not found 176 + 4. Add test to verify `preloaded` list matches build (see below) 177 + 5. Add predicate support for conditional archives 178 + 179 + #### Preloaded List Synchronization 180 + 181 + The `preloaded` list in `findlibish.ml` must match packages linked into the 182 + worker via dune. Currently this is manually maintained and can drift. 183 + 184 + **Solution**: Add a test that verifies consistency: 185 + - Query actually-linked packages (via `Findlib.recorded_packages()` or similar) 186 + - Compare against `preloaded` list 187 + - Fail with clear message if they differ 188 + 189 + This catches drift without adding build-time complexity. The current list also 190 + has duplicates (`js_of_ocaml-ppx`, `findlib`) that should be cleaned up. 191 + 192 + --- 193 + 194 + ## Phase 0.3: Environment Model Research 195 + 196 + ### Goal 197 + 198 + Understand how to support multiple isolated execution environments (like mdx `x-ocaml` blocks). 199 + 200 + ### Current State 201 + 202 + The project already has cell ID support: 203 + - `opt_id` parameter on API calls 204 + - `Cell__<id>` modules for cell outputs 205 + - `failed_cells` tracking for dependency management 206 + - `mangle_toplevel` adds `open Cell__<dep>` for dependencies 207 + 208 + ### MDX Implementation (from source analysis) 209 + 210 + Studied [mdx source](https://github.com/realworldocaml/mdx) - specifically `lib/top/mdx_top.ml`. 211 + 212 + MDX implements environment isolation by capturing and restoring Toploop state: 213 + 214 + ```ocaml 215 + (* Environment storage: name -> (type_env, binding_names, runtime_values) *) 216 + let envs = Hashtbl.create 8 217 + 218 + (* Extract user-defined bindings from environment summary *) 219 + let env_deps env = 220 + let names = save_summary [] (Env.summary env) in 221 + let objs = List.map Toploop.getvalue names in 222 + (env, names, objs) 223 + 224 + (* Restore environment state *) 225 + let load_env env names objs = 226 + Toploop.toplevel_env := env; 227 + List.iter2 Toploop.setvalue names objs 228 + 229 + (* Execute code in a named environment *) 230 + let in_env e f = 231 + let env_name = Mdx.Ocaml_env.name e in 232 + let env, names, objs = 233 + try Hashtbl.find envs env_name 234 + with Not_found -> env_deps !default_env 235 + in 236 + load_env env names objs; 237 + let res = f () in 238 + (* Save updated state *) 239 + Hashtbl.replace envs env_name (env_deps !Toploop.toplevel_env); 240 + res 241 + ``` 242 + 243 + #### Key Toploop State Components 244 + 245 + | Component | Access Method | Description | 246 + |-----------|---------------|-------------| 247 + | Type environment | `Toploop.toplevel_env` | Type bindings, modules | 248 + | Runtime values | `Toploop.getvalue`/`setvalue` | Actual OCaml values | 249 + | Environment summary | `Env.summary` | List of binding operations | 250 + 251 + #### MDX's Strategy 252 + 253 + 1. **Shared base**: All environments start from `default_env` (initial Toploop state) 254 + 2. **Capture on exit**: After execution, save `(env, names, objs)` tuple 255 + 3. **Restore on entry**: Before execution, restore the saved state 256 + 4. **Hashtable storage**: Environments keyed by string name 257 + 258 + ### Implications for js_top_worker 259 + 260 + The MDX approach works because it runs in a native OCaml process with mutable global state. For WebWorker: 261 + 262 + 1. **Same approach possible**: We have Toploop in js_of_ocaml-toplevel 263 + 2. **Memory concern**: Each environment stores captured values - could grow large 264 + 3. **No true fork**: Can't fork WebWorker, must use save/restore pattern 265 + 4. **Cell IDs vs Environments**: Current cell system is different - cells can depend on each other, environments are isolated 266 + 267 + ### x-ocaml Implementation (better than mdx) 268 + 269 + Studied [x-ocaml](https://github.com/art-w/x-ocaml) by @art-w - cleaner approach. 270 + 271 + #### Value Capture with Env.diff 272 + 273 + ```ocaml 274 + module Value_env = struct 275 + type t = Obj.t String_map.t 276 + 277 + let capture t idents = 278 + List.fold_left (fun t ident -> 279 + let name = Translmod.toplevel_name ident in 280 + let v = Topeval.getvalue name in 281 + String_map.add name v t 282 + ) t idents 283 + 284 + let restore t = 285 + String_map.iter (fun name v -> Topeval.setvalue name v) t 286 + end 287 + ``` 288 + 289 + Key insight: Uses `Env.diff previous_env current_env` to get only NEW bindings, 290 + rather than walking the full environment summary like mdx does. 291 + 292 + #### Stack-based Environment Management 293 + 294 + ```ocaml 295 + module Environment = struct 296 + let environments = ref [] (* stack of (id, typing_env, value_env) *) 297 + 298 + let reset id = 299 + (* Walk stack until we find id, restore that state *) 300 + environments := go id !environments 301 + 302 + let capture id = 303 + let idents = Env.diff previous_env !Toploop.toplevel_env in 304 + let values = Value_env.capture previous_values idents in 305 + environments := (id, !Toploop.toplevel_env, values) :: !environments 306 + end 307 + ``` 308 + 309 + Benefits: 310 + - Can backtrack to any previous checkpoint 311 + - Only captures incremental changes (memory efficient) 312 + - Simple integer IDs 313 + 314 + #### PPX Integration 315 + 316 + ```ocaml 317 + (* Capture all registered PPX rewriters *) 318 + let ppx_rewriters = ref [] 319 + 320 + let () = 321 + Ast_mapper.register_function := 322 + fun _ f -> ppx_rewriters := f :: !ppx_rewriters 323 + 324 + (* Apply during phrase preprocessing *) 325 + let preprocess_phrase phrase = 326 + match phrase with 327 + | Ptop_def str -> Ptop_def (preprocess_structure str) 328 + | Ptop_dir _ as x -> x 329 + ``` 330 + 331 + ppxlib bridge (`ppxlib_register.ml`): 332 + ```ocaml 333 + let () = Ast_mapper.register "ppxlib" mapper 334 + ``` 335 + 336 + ### Recommended Design 337 + 338 + Adopt x-ocaml's core patterns, adapted for js_top_worker's purpose as a 339 + reusable backend library: 340 + 341 + **From x-ocaml (adopt directly)**: 342 + 1. **Incremental capture** via `Env.diff` - replaces current cell wrapping 343 + 2. **PPX via `Ast_mapper.register_function`** override 344 + 3. **ppxlib bridge** for modern PPX ecosystem 345 + 346 + **Adapted for js_top_worker**: 347 + 1. **Named environments** instead of pure stack (multiple notebooks can coexist) 348 + 2. **MIME output API** generalizing x-ocaml's `output_html` 349 + 3. **cbort protocol** instead of Marshal (type-safe, browser-friendly) 350 + 351 + **API sketch**: 352 + ```ocaml 353 + type env_id = string 354 + 355 + (* Environment management *) 356 + val create_env : ?base:env_id -> env_id -> unit 357 + val checkpoint : env_id -> unit (* capture current state *) 358 + val reset : env_id -> unit (* restore to last checkpoint *) 359 + val destroy_env : env_id -> unit 360 + 361 + (* Execution *) 362 + val exec : env:env_id -> string -> exec_result 363 + 364 + (* MIME output (callable from user code) *) 365 + val display : ?mime_type:string -> string -> unit 366 + ``` 367 + 368 + This gives us x-ocaml's simplicity while supporting: 369 + - Multiple concurrent environments (different notebooks) 370 + - Checkpoint/reset within an environment (cell re-execution) 371 + - Rich output beyond just HTML 372 + 373 + --- 374 + 375 + ## Phase 0.4: Existing Art Review 376 + 377 + ### Projects Analyzed 378 + 379 + | Project | URL | Architecture | 380 + |---------|-----|--------------| 381 + | ocaml-jupyter | https://github.com/akabe/ocaml-jupyter | Native OCaml + ZeroMQ | 382 + | js_of_ocaml toplevel | https://ocsigen.org/js_of_ocaml | Browser + js_of_ocaml | 383 + | sketch.sh | https://github.com/Sketch-sh/sketch-sh | Browser + WebWorker | 384 + | utop | https://github.com/ocaml-community/utop | Native OCaml + terminal | 385 + 386 + ### ocaml-jupyter 387 + 388 + **Architecture**: Native OCaml kernel communicating via ZeroMQ (Jupyter protocol v5.2). 389 + 390 + **Key components**: 391 + - `jupyter` - Core protocol implementation 392 + - `jupyter.notebook` - Rich output API (HTML, markdown, images, LaTeX) 393 + - `jupyter.comm` - Bidirectional widget communication 394 + 395 + **Rich output**: Programmatic generation via `jupyter.notebook` library: 396 + ```ocaml 397 + (* Example from jupyter.notebook *) 398 + Jupyter_notebook.display "text/html" "<b>Hello</b>" 399 + ``` 400 + 401 + **Code completion**: Merlin integration, reads `.merlin` files. 402 + 403 + **Takeaway**: Good reference for MIME output API and comm protocol design. 404 + 405 + ### js_of_ocaml Toplevel 406 + 407 + **Architecture**: OCaml bytecode compiled to JavaScript, runs in browser. 408 + 409 + **Build flags**: 410 + ```bash 411 + js_of_ocaml --toplevel --linkall +weak.js +toplevel.js +dynlink.js 412 + ``` 413 + 414 + **Library loading**: Two approaches: 415 + 1. Compile libraries into toplevel directly 416 + 2. Load dynamically via `--extern-fs` pseudo-filesystem 417 + 418 + **Takeaway**: Foundation of our project. We already use js_of_ocaml-toplevel. 419 + 420 + ### Sketch.sh 421 + 422 + **Architecture**: Browser-based notebook using js_of_ocaml toplevel in WebWorker. 423 + 424 + **Key insight**: "rtop-evaluator loads refmt & js_of_ocaml compiler as a web worker" 425 + 426 + **Features**: 427 + - Multiple OCaml versions (4.06.1, 4.13.1, 5.3.0) 428 + - Reason syntax support via refmt 429 + - Notebook-style cells with inline evaluation 430 + - OCaml 5 effects support (continuation-based in JS) 431 + 432 + **Limitations**: 433 + - No BuckleScript modules (Js module) 434 + - Belt library support added later 435 + 436 + **Takeaway**: Similar architecture to js_top_worker. Good reference for multi-version support. 437 + 438 + ### utop 439 + 440 + **Architecture**: Enhanced native OCaml toplevel with: 441 + - Line editing (lambda-term) 442 + - History 443 + - Context-sensitive completion 444 + - Colors 445 + 446 + **Features relevant to us**: 447 + - `UTop.set_create_implicits` - Auto-generate module interfaces 448 + - Merlin integration for completion 449 + - PPX rewriter support 450 + 451 + **Takeaway**: Reference for toplevel UX features (completion, error formatting). 452 + 453 + ### Comparison Summary 454 + 455 + | Feature | ocaml-jupyter | sketch.sh | js_top_worker | 456 + |---------|---------------|-----------|---------------| 457 + | Runtime | Native | Browser/Worker | Browser/Worker | 458 + | Protocol | Jupyter/ZMQ | Custom | RPC (current) | 459 + | Rich output | MIME via API | Limited | MIME (planned) | 460 + | Widgets | jupyter.comm | No | Planned | 461 + | Multi-env | No | No | Planned | 462 + | Completion | Merlin | Basic | Merlin | 463 + 464 + ### Key Lessons 465 + 466 + 1. **MIME output**: jupyter.notebook provides good API pattern 467 + 2. **Widget comm**: jupyter.comm shows bidirectional messaging 468 + 3. **WebWorker**: sketch.sh validates our architecture choice 469 + 4. **Environment isolation**: None of these support it - opportunity for differentiation 470 + 471 + --- 472 + 473 + ## Open Questions 474 + 475 + 1. **Widget state persistence**: How long should widget state live? Per-session? Per-environment? 476 + 477 + 2. **Streaming output**: Should stdout/stderr be pushed incrementally or batched? 478 + 479 + 3. **PPX scope**: When a PPX is installed, should it apply to: 480 + - All environments? 481 + - Just the current environment? 482 + - Configurable? 483 + 484 + 4. **Error recovery**: If a cell fails, how do dependent cells behave? 485 + - Current: tracked in `failed_cells` set 486 + - Desired: TBD 487 + 488 + --- 489 + 490 + ## Summary of Findings 491 + 492 + ### Wire Format Decision: cbort 493 + 494 + Use [cbort](https://tangled.org/@anil.recoil.org/ocaml-cbort.git) for CBOR-based typed messaging: 495 + - Type-safe combinators (jsont-style) 496 + - Binary format (compact, fast) 497 + - js_of_ocaml compatible via zarith_stubs_js 498 + 499 + ### Findlib Decision: Keep findlibish 500 + 501 + The current `findlibish.ml` is appropriate for WebWorker context: 502 + - URL-based package loading (not filesystem) 503 + - JavaScript module loading via `import_scripts` 504 + - Add `.mli` file and improve error handling 505 + - Add test to verify preloaded list matches build 506 + 507 + ### Environment Model Decision: x-ocaml-style capture/restore 508 + 509 + Adopt [x-ocaml](https://github.com/art-w/x-ocaml)'s approach: 510 + - **`Env.diff`** for incremental capture (only new bindings) 511 + - **`Topeval.getvalue`/`setvalue`** for runtime values 512 + - **Named environments** (adapting x-ocaml's integer stack) 513 + - **PPX via `Ast_mapper.register_function`** override 514 + 515 + This replaces the current cell module wrapping approach with something simpler 516 + and more powerful (supports checkpoint/reset, not just forward execution). 517 + 518 + ### Key Differentiators 519 + 520 + Features that set js_top_worker apart: 521 + 1. **Multiple named environments** - Not supported by competitors 522 + 2. **CBOR wire format** - More efficient than JSON/Marshal 523 + 3. **Bidirectional widgets** - Like Jupyter but in browser 524 + 4. **PPX support** - Via x-ocaml's pattern + ppxlib bridge 525 + 5. **Reusable backend** - Library for others to build on 526 + 527 + --- 528 + 529 + ## Next Steps 530 + 531 + ### Immediate (Phase 1) 532 + 533 + 1. **Add cbort dependency**: Pin-depends in dune-project 534 + 2. **Define message types**: Simple ADT like x-ocaml, encoded with cbort 535 + ```ocaml 536 + type request = 537 + | Setup 538 + | Eval of { env : string; code : string } 539 + | Merlin of { env : string; action : Merlin_protocol.action } 540 + | Checkpoint of { env : string } 541 + | Reset of { env : string } 542 + 543 + type response = 544 + | Setup_complete 545 + | Output of { env : string; loc : int; data : output list } 546 + | Eval_complete of { env : string; result : exec_result } 547 + | Merlin_response of Merlin_protocol.answer 548 + ``` 549 + 3. **Replace RPC with simple message handling**: Like x-ocaml's pattern match 550 + 4. ~~**Remove compile_js**: Delete unused method~~ ✓ Done 551 + 552 + ### Short-term (Phase 2) 553 + 554 + 5. **Environment isolation**: x-ocaml's `Env.diff` + `Topeval.getvalue/setvalue` 555 + 6. **PPX support**: `Ast_mapper.register_function` override + ppxlib bridge 556 + 7. **Add .mli files**: `impl.mli`, `findlibish.mli` 557 + 8. **CI setup**: GitHub Actions for OCaml 5.2+ 558 + 9. **Preloaded list test**: Verify sync with build 559 + 560 + ### Medium-term (Phase 3) 561 + 562 + 10. **MIME output API**: Generalize x-ocaml's `output_html` pattern 563 + 11. **Widget protocol**: Bidirectional comm for interactive widgets 564 + 12. **OCamlformat integration**: Auto-format like x-ocaml 565 + 566 + --- 567 + 568 + *Last updated: 2026-01-20*
+47
docs/technical-qa.md
··· 1 + # Technical Q&A Log 2 + 3 + This file records technical questions and answers about the codebase, along with verification steps taken to ensure accuracy. 4 + 5 + --- 6 + 7 + ## 2026-01-20: What does `--include-runtime` do in js_of_ocaml? 8 + 9 + **Question**: What does the `--include-runtime` argument actually do when compiling with js_of_ocaml? 10 + 11 + **Answer**: The `--include-runtime` flag embeds library-specific JS stubs (from the library's `runtime.js` files) into the compiled output. It does NOT include the full js_of_ocaml runtime. 12 + 13 + When used with `--toplevel`, it: 14 + 1. Takes the library's `runtime.js` stubs (e.g., `+base/runtime.js`) 15 + 2. Embeds them in the compiled `.js` file 16 + 3. Registers them on `jsoo_runtime` via `Object.assign()` 17 + 18 + This allows separate compilation where each library's `.cma.js` file carries its own stubs, rather than requiring all stubs to be bundled into the main toplevel. 19 + 20 + **Verification Steps**: 21 + 22 + 1. **File size comparison**: Compiled `base.cma.js` with and without `--include-runtime` 23 + - With: 629KB 24 + - Without: 626KB 25 + - Difference: ~3KB (just the stubs, not the full runtime) 26 + 27 + 2. **Searched for runtime functions**: 28 + ```bash 29 + grep -c "function caml_call_gen" base.cma.js 30 + # Result: 0 definitions, 215 references 31 + 32 + grep -c "function caml_register_global" base.cma.js 33 + # Result: 0 definitions, 146 references 34 + ``` 35 + This confirms the core runtime is NOT included. 36 + 37 + 3. **Found stub registration pattern**: 38 + ```javascript 39 + Object.assign(a.jsoo_runtime, {Base_am_testing: m, Base_hash_stubs: n, ...}) 40 + ``` 41 + This shows how stubs are registered on the global `jsoo_runtime` object. 42 + 43 + 4. **Runtime test**: The Node.js test in `test/node/` successfully loads `base` and uses functions that depend on JS stubs (hash functions), confirming the stubs work correctly when embedded this way. 44 + 45 + **Related**: js_of_ocaml PR #1509 added support for this feature in toplevel mode. 46 + 47 + ---
-13
lib/impl.ml
··· 248 248 in 249 249 fun phrase -> 250 250 Buffer.clear code_buff; 251 - Buffer.clear code_buff; 252 251 Buffer.clear res_buff; 253 252 Buffer.clear stderr_buff; 254 253 Buffer.clear stdout_buff; ··· 449 448 Lwt.return 450 449 (Error (Toplevel_api_gen.InternalError (Printexc.to_string e)))) 451 450 452 - let complete _phrase = failwith "Not implemented" 453 - 454 451 let typecheck_phrase : 455 452 string -> 456 453 (Toplevel_api_gen.exec_result, Toplevel_api_gen.err) IdlM.T.resultb = ··· 508 505 highlight = !highlighted; 509 506 mime_vals = []; 510 507 } 511 - 512 - let split_primitives p = 513 - let len = String.length p in 514 - let rec split beg cur = 515 - if cur >= len then [] 516 - else if Char.equal p.[cur] '\000' then 517 - String.sub p beg (cur - beg) :: split (cur + 1) (cur + 1) 518 - else split beg (cur + 1) 519 - in 520 - Array.of_list (split 0 0) 521 508 522 509 let handle_toplevel stripped = 523 510 if String.length stripped < 2 || stripped.[0] <> '#' || stripped.[1] <> ' '