Update optimization plan to focus on type-safe improvements only · anil.recoil.org/unpac-myspace-ocaml@4e9610e

+102 -103

1 changed file

expand all

+102 -103

OPTIMIZATION_PLAN.md

··· 2 2 3 3 Based on analysis of C QuickJS optimization techniques and the current OCaml implementation. 4 4 5 + **Principle: All optimizations must be type-safe and idiomatic OCaml.** 6 + 5 7 ## Executive Summary 6 8 7 9 The OCaml implementation is ~640x faster for recursive function calls but ~500-1100x slower for array-intensive operations. The main optimization opportunities are: 8 10 9 - 1. **Value Representation** - Current ADT is clean but slow 10 - 2. **Object Property Access** - No shapes, uses Hashtbl + list 11 - 3. **Array Implementation** - No fast path, heavy allocations 12 - 4. **Bytecode Dispatch** - Pattern matching vs computed goto 13 - 5. **String Handling** - No interning, no rope concatenation 11 + 1. **Object Property Access** - No shapes, uses Hashtbl + list 12 + 2. **Array Implementation** - No fast path, heavy allocations 13 + 3. **String Handling** - No interning, no rope concatenation 14 + 4. **Data Structure Choice** - Suboptimal for hot paths 14 15 15 16 --- 16 17 17 18 ## Phase 1: Quick Wins (Low Effort, High Impact) 18 19 19 - ### 1.1 Unboxed Integer Fast Path 20 + ### 1.1 Inline Annotations for Hot Paths 20 21 21 - **Problem**: Every `Int` value is a heap-allocated variant. 22 + **Problem**: Function call overhead in tight loops. 22 23 23 - **Current**: 24 + **Solution**: Add `[@inline]` annotations to hot functions: 24 25 ```ocaml 25 - type value = 26 - | Int of int32 27 - | Float of float 28 - | ... 29 - ``` 30 - 31 - **Solution**: Use OCaml's unboxed types where possible: 32 - ```ocaml 33 - (* For hot paths in arithmetic *) 34 - let[@inline] fast_add a b = 26 + let[@inline] binary_add a b = 35 27 match a, b with 36 - | Int ai, Int bi -> 37 - let sum = Int32.add ai bi in 38 - (* Check overflow *) 39 - Int sum 28 + | Int ai, Int bi -> Int (Int32.add ai bi) 29 + | Float af, Float bf -> Float (af +. bf) 30 + | Int ai, Float bf -> Float (Int32.to_float ai +. bf) 31 + | Float af, Int bi -> Float (af +. Int32.to_float bi) 32 + | String _, _ | _, String _ -> String (to_string a ^ to_string b) 40 33 | _ -> slow_add a b 41 34 ``` 42 35 43 - **Impact**: 2-3x faster arithmetic loops 36 + **Impact**: 1.5-2x faster arithmetic loops 44 37 **Effort**: Low 45 38 46 - ### 1.2 Array Fast Path 39 + ### 1.2 Array Fast Path with Direct Storage 47 40 48 41 **Problem**: Array access goes through pattern matching and ref dereferencing. 49 42 ··· 54 47 if idx >= 0 && idx < Array.length !arr then !arr.(idx) 55 48 ``` 56 49 57 - **Solution**: Add fast_array flag and inline check: 50 + **Solution**: Add fast_array flag with bounds-checked direct access: 58 51 ```ocaml 59 52 type js_object = { 60 53 ... 61 - mutable fast_array : bool; (* True if array access is fast *) 54 + mutable fast_array : bool; (* True if pure array *) 62 55 mutable array_values : value array; (* Direct array storage *) 63 56 mutable array_count : int; (* Actual element count *) 64 57 } 65 58 66 59 let[@inline] get_array_el_fast obj idx = 67 60 if obj.fast_array && idx >= 0 && idx < obj.array_count then 68 - Array.unsafe_get obj.array_values idx 61 + obj.array_values.(idx) (* Safe bounds-checked access *) 69 62 else 70 63 get_array_el_slow obj idx 71 64 ``` ··· 73 66 **Impact**: 10-50x faster array access 74 67 **Effort**: Medium 75 68 76 - ### 1.3 Local Variable Access Optimization 69 + ### 1.3 Pre-sized Array Allocation 77 70 78 - **Problem**: Local access goes through array bounds check. 71 + **Problem**: Arrays grow one element at a time, causing many reallocations. 79 72 80 73 **Current**: 81 74 ```ocaml 82 - let get_local frame idx = 83 - if idx < Array.length frame.locals then frame.locals.(idx) 84 - else Undefined 75 + let arr = ref (Array.of_list elements) 76 + (* Each push reallocates *) 85 77 ``` 86 78 87 - **Solution**: Use unsafe access for verified indices: 79 + **Solution**: Exponential growth strategy: 88 80 ```ocaml 89 - let[@inline] get_local frame idx = 90 - Array.unsafe_get frame.locals idx (* Compiler verifies bounds *) 81 + type array_data = { 82 + mutable values : value array; 83 + mutable length : int; (* Logical length *) 84 + mutable capacity : int; (* Physical capacity *) 85 + } 86 + 87 + let ensure_capacity arr needed = 88 + if needed > arr.capacity then begin 89 + let new_cap = max needed (arr.capacity * 3 / 2 + 8) in 90 + let new_values = Array.make new_cap Undefined in 91 + Array.blit arr.values 0 new_values 0 arr.length; 92 + arr.values <- new_values; 93 + arr.capacity <- new_cap 94 + end 95 + 96 + let push arr value = 97 + ensure_capacity arr (arr.length + 1); 98 + arr.values.(arr.length) <- value; 99 + arr.length <- arr.length + 1 91 100 ``` 92 101 93 - **Impact**: 1.5x faster function execution 102 + **Impact**: 5-20x faster array building 94 103 **Effort**: Low 95 104 96 105 --- ··· 246 255 247 256 --- 248 257 249 - ## Phase 3: Major Architectural Changes 250 - 251 - ### 3.1 NaN Boxing (Optional) 252 - 253 - **Problem**: Every value is heap-allocated OCaml variant. 254 - 255 - **Current**: Each value is a tagged pointer to variant data. 256 - 257 - **Solution**: Pack values into 64-bit integers using NaN boxing: 258 - ```ocaml 259 - (* WARNING: This is unsafe and loses type safety *) 260 - type value = int64 (* Raw 64-bit value *) 258 + ## Phase 3: Architectural Improvements 261 259 262 - (* Tags in upper 16 bits of NaN *) 263 - let tag_int = 0x0000_0000_0001_0000L 264 - let tag_bool = 0x0000_0000_0002_0000L 265 - let tag_null = 0x0000_0000_0003_0000L 266 - let tag_undef = 0x0000_0000_0004_0000L 267 - let tag_object = 0x0000_0000_0005_0000L 268 - let tag_string = 0x0000_0000_0006_0000L 269 - 270 - let[@inline] get_tag v = Int64.logand v 0xFFFF_0000_0000_0000L 271 - let[@inline] is_int v = get_tag v = tag_int 272 - let[@inline] get_int v = Int64.to_int32 v 273 - let[@inline] make_int i = Int64.logor tag_int (Int64.of_int32 i) 274 - ``` 275 - 276 - **Impact**: 2-5x faster overall, reduced memory 277 - **Effort**: Very High (rewrites entire value system) 278 - **Risk**: Loses OCaml's type safety 279 - 280 - ### 3.2 Bytecode Dispatch Optimization 260 + ### 3.1 Bytecode Dispatch Optimization 281 261 282 262 **Problem**: Pattern matching compiles to jump table, not direct dispatch. 283 263 ··· 314 294 **Impact**: 1.3-2x faster bytecode execution 315 295 **Effort**: Medium 316 296 317 - ### 3.3 Inline Caching for Property Access 297 + ### 3.2 Inline Caching for Property Access 318 298 319 299 **Problem**: Every property access does full lookup. 320 300 ··· 327 307 328 308 let get_property_cached obj atom cache = 329 309 match cache.cached_shape with 330 - | Some shape when shape == obj.shape -> 331 - (* Cache hit: direct access *) 332 - Array.unsafe_get obj.prop_values cache.cached_offset 310 + | Some shape when shape == obj.shape && cache.cached_offset < Array.length obj.prop_values -> 311 + (* Cache hit: direct access with bounds check *) 312 + obj.prop_values.(cache.cached_offset) 333 313 | _ -> 334 314 (* Cache miss: full lookup and update cache *) 335 315 match Hashtbl.find_opt obj.shape.prop_hash atom with 336 - | Some offset -> 316 + | Some offset when offset < Array.length obj.prop_values -> 337 317 cache.cached_shape <- Some obj.shape; 338 318 cache.cached_offset <- offset; 339 - Array.unsafe_get obj.prop_values offset 340 - | None -> 319 + obj.prop_values.(offset) 320 + | _ -> 341 321 (* Prototype chain lookup *) 342 322 get_property_slow obj atom 343 323 ``` ··· 429 409 ## Prioritized Implementation Order 430 410 431 411 ### Sprint 1: Quick Wins (1-2 weeks) 432 - 1. [ ] Unsafe array access for verified bounds 433 - 2. [ ] Inline hot arithmetic operations 434 - 3. [ ] Fast path for small integers (0-7) 412 + 1. [ ] Add `[@inline]` annotations to hot arithmetic operations 413 + 2. [ ] Implement pre-sized array allocation with capacity tracking 414 + 3. [ ] Add fast path checks for common value types 435 415 4. [ ] Cache array length in typed arrays 436 416 437 417 ### Sprint 2: Array Optimization (2-3 weeks) 438 418 1. [ ] Add fast_array flag to objects 439 - 2. [ ] Implement direct array storage 419 + 2. [ ] Implement direct array storage with capacity/length separation 440 420 3. [ ] Exponential growth strategy (3/2 factor) 441 - 4. [ ] Fast typed array access 421 + 4. [ ] Optimize map/filter/reduce to avoid intermediate allocations 442 422 443 423 ### Sprint 3: String Optimization (2-3 weeks) 444 - 1. [ ] Implement atom table 424 + 1. [ ] Implement atom table module 445 425 2. [ ] Convert property keys to atoms 446 - 3. [ ] Implement string rope 426 + 3. [ ] Implement string rope data structure 447 427 4. [ ] Optimize string concatenation in loops 448 428 449 429 ### Sprint 4: Object Optimization (3-4 weeks) 450 - 1. [ ] Design shape data structure 451 - 2. [ ] Implement shape transitions 452 - 3. [ ] Convert object storage to shape-based 453 - 4. [ ] Add inline caching for property access 430 + 1. [ ] Design shape data structure with property metadata 431 + 2. [ ] Implement shape transitions for property additions 432 + 3. [ ] Convert object storage to shape-based dense arrays 433 + 4. [ ] Add inline caching for repeated property access 454 434 455 435 ### Sprint 5: Execution Optimization (2-3 weeks) 456 - 1. [ ] Implement handler array dispatch 457 - 2. [ ] Pre-allocate frame pool 458 - 3. [ ] Optimize local variable access 459 - 4. [ ] Add fast path for common opcodes 436 + 1. [ ] Implement handler array dispatch for bytecode 437 + 2. [ ] Pre-allocate frame pool to reduce GC pressure 438 + 3. [ ] Optimize local variable access patterns 439 + 4. [ ] Add specialized opcodes for common operations 460 440 461 441 --- 462 442 ··· 476 456 477 457 | Optimization | Impact | Effort | Risk | Priority | 478 458 |--------------|--------|--------|------|----------| 479 - | Unsafe array access | Low | Low | Low | P0 | 459 + | Inline annotations | Low | Low | None | P0 | 460 + | Pre-sized arrays | Medium | Low | None | P0 | 480 461 | Fast array flag | High | Medium | Low | P0 | 481 462 | Atom table | Medium | Medium | Low | P1 | 482 463 | String rope | High | Medium | Low | P1 | 483 464 | Shape system | Very High | High | Medium | P1 | 465 + | Bytecode dispatch | Medium | Medium | Low | P2 | 484 466 | Inline caching | High | High | Medium | P2 | 485 - | NaN boxing | Very High | Very High | High | P3 | 486 - | Bytecode dispatch | Medium | Medium | Low | P2 | 467 + | Frame pooling | Medium | Medium | Low | P2 | 468 + 469 + **Note**: All optimizations are type-safe. We avoid unsafe operations like `Obj.magic`, `Array.unsafe_get`, or manual memory manipulation. 487 470 488 471 --- 489 472 490 473 ## Appendix: C QuickJS Key Optimizations 491 474 492 - ### From quickjs.c analysis: 475 + ### Techniques We Adapt (Type-Safe) 476 + 477 + | C Technique | OCaml Adaptation | 478 + |-------------|------------------| 479 + | **Shapes** (lines 909-924) | Shape module with property metadata | 480 + | **Fast Arrays** (line 943) | fast_array flag + capacity tracking | 481 + | **String Interning** (lines 243-249) | Atom module with hashtable | 482 + | **String Ropes** (lines 535-544) | Rope variant type with auto-flattening | 483 + | **Compact Properties** (lines 882-907) | Dense value arrays indexed by shape | 484 + | **Specialized Math** (lines 1001-1032) | Inline functions with type-specific paths | 485 + 486 + ### Techniques We Skip (Unsafe in OCaml) 487 + 488 + | C Technique | Why We Skip | 489 + |-------------|-------------| 490 + | **NaN Boxing** (lines 144-213) | Loses OCaml type safety, requires Obj.magic | 491 + | **Direct Dispatch** (line 53) | Computed goto not available in OCaml | 492 + | **Branch Hints** (lines 36-45) | OCaml has no `likely()`/`unlikely()` | 493 + | **Manual Memory** | OCaml GC handles allocation/deallocation | 494 + 495 + ### Alternative OCaml Approaches 493 496 494 - 1. **NaN Boxing** (lines 144-213): Values packed into 64-bit using IEEE 754 NaN encoding 495 - 2. **Shapes** (lines 909-924): Shared property descriptors with hash table 496 - 3. **Fast Arrays** (line 943): `fast_array` flag for O(1) indexed access 497 - 4. **Inline Property Lookup** (lines 5699-5741): Hash chain walking inlined 498 - 5. **String Interning** (lines 243-249): Global atom table 499 - 6. **String Ropes** (lines 535-544): Lazy concatenation with depth limiting 500 - 7. **Direct Dispatch** (line 53): Computed goto for bytecode 501 - 8. **Branch Hints** (lines 36-45): `likely()`/`unlikely()` annotations 502 - 9. **Compact Properties** (lines 882-907): Bitfield packing for flags 503 - 10. **Specialized Math** (lines 1001-1032): Fast paths for `f_f`, `f_f_f` functions 497 + | C Technique | OCaml Alternative | 498 + |-------------|-------------------| 499 + | Computed goto | Handler array with function dispatch | 500 + | Reference counting | OCaml GC (automatic, no overhead) | 501 + | Inline caching | Mutable cache fields in shape lookups | 502 + | Small integer tags | OCaml already optimizes small ints |

Configure Feed

Configure Feed