Monorepo for Aesthetic.Computer aesthetic.computer
4
fork

Configure Feed

Select the types of activity you want to include in your feed.

at main 268 lines 8.7 kB view raw view rendered
1# KidLisp Performance Analysis & Optimization Report 2 3## Executive Summary 4 5Analysis of the `$cow` piece rendering pipeline reveals several performance bottlenecks in the alpha compositing and embedded layer system. The piece composites two animated layers (`$39i` and `$r2f`) at 120fps with complex timing expressions, creating significant computational overhead. 6 7## Piece Analysis: $cow 8 9### Structure 10``` 11📁 $cow (composite layer) 12 ├─ 📄 $39i (background effects layer) 13 └─ 📄 $r2f (foreground zoom layer) 14``` 15 16### Source Code Breakdown 17 18**Main Compositor ($cow)**: 19```kidlisp 20($39i 0 0 w h 128) ; Background layer at 50% opacity 21($r2f 0 0 w h 128) ; Foreground layer at 50% opacity 22(contrast 1.5) ; Expensive post-processing effect 23``` 24 25**Background Layer ($39i)**: 11 operations/frame 26- Complex timing: 5 different timer expressions (0.1s, 1.5s, 1s, 2s, 0.3s) 27- Heavy operations: `flood`, `scroll`, `zoom`, `blur`, `contrast`, `spin` 28- Random point generation: `(repeat 30 point)` 29 30**Foreground Layer ($r2f)**: 11 operations/frame 31- High-frequency zoom: `(0.1s (zoom (? 1.89 1 1.1 1.2)))` every 100ms 32- Continuous effects: `scroll`, `spin`, `blur` 33- Multiple flood fills: `(repeat 2 (flood ? ?))` 34 35## Performance Bottlenecks 36 37### 1. Alpha Compositing Pipeline 38 39**Current Implementation** (`graph.mjs` lines 1558-1610): 40```javascript 41function blend(dst, src, si, di, alphaIn = 1) { 42 // Branch A: Transparent pixel compositing (expensive) 43 if (dst[di + 3] < 255 && src[si + 3] > 0) { 44 const alphaSrc = (src[si + 3] * alphaIn) / 255; 45 const alphaDst = dst[di + 3] / 255; 46 const combinedAlpha = alphaSrc + (1.0 - alphaSrc) * alphaDst; 47 // Per-channel floating-point math (3 divisions, 6 multiplications) 48 for (let offset = 0; offset < 3; offset++) { 49 dst[di + offset] = (src[si + offset] * alphaSrc + 50 dst[di + offset] * (1.0 - alphaSrc) * alphaDst) / 51 (combinedAlpha + epsilon); 52 } 53 } else { 54 // Branch B: Opaque pixel compositing (faster integer math) 55 const alpha = src[si + 3] * alphaIn + 1; 56 const invAlpha = 256 - alpha; 57 dst[di] = (alpha * src[si + 0] + invAlpha * dst[di + 0]) >> 8; 58 // ... similar for G, B channels 59 } 60} 61``` 62 63**Performance Issues**: 64- **Floating-point overhead**: Branch A uses expensive division and floating-point arithmetic 65- **Branch prediction**: Transparent vs opaque pixel branching creates CPU pipeline stalls 66- **Memory access pattern**: Non-sequential pixel access in compositing loops 67 68### 2. Layer Rendering Frequency 69 70**Current Execution Pattern**: 71- `$cow` renders at 120fps (8.33ms budget per frame) 72- Each embedded layer renders independently 73- `$39i`: 5 timer expressions firing at different intervals 74- `$r2f`: High-frequency zoom (10 times per second) 75- Double alpha compositing: Each layer → main buffer → final output 76 77**Frame Budget Breakdown** (estimated): 78``` 79Per-frame operations (120fps = 8.33ms budget): 80├─ $39i rendering: ~2.5ms (timer evaluation + effects) 81├─ $r2f rendering: ~2.0ms (zoom calculations + blending) 82├─ Alpha compositing: ~2.5ms (pixel-by-pixel blending) 83├─ Contrast effect: ~1.0ms (post-processing) 84└─ Overhead: ~0.33ms (timing, evaluation) 85Total: ~8.33ms (100% budget utilization) 86``` 87 88### 3. Timing System Overhead 89 90**Recent Fix Applied**: Removed `setTimeout`-based timing in favor of frame-based counting: 91```javascript 92// OLD (expensive): 93setTimeout(() => { /* execute timing expression */ }, delay); 94 95// NEW (efficient): 96if (this.frameCount - lastExecution >= targetFrames) { 97 // execute immediately in frame context 98} 99``` 100 101**Remaining Issues**: 102- Timer expressions are evaluated every frame even when not firing 103- Context switching between embedded layers 104- Redundant timing key generation: `${head}-${cacheId}-${JSON.stringify(args)}` 105 106## Optimization Opportunities 107 108### 1. Alpha Compositing Optimizations 109 110**A. SIMD-style Bulk Operations** 111```javascript 112// Instead of pixel-by-pixel blending, process in chunks 113function fastBlendChunk(dst, src, dstIdx, srcIdx, count, alpha) { 114 // Process 4 pixels at once using typed array operations 115 const alpha256 = (alpha * 256) | 0; 116 const invAlpha = 256 - alpha256; 117 118 for (let i = 0; i < count * 4; i += 4) { 119 dst[dstIdx + i] = ((alpha256 * src[srcIdx + i] + invAlpha * dst[dstIdx + i]) >> 8); 120 dst[dstIdx + i + 1] = ((alpha256 * src[srcIdx + i + 1] + invAlpha * dst[dstIdx + i + 1]) >> 8); 121 dst[dstIdx + i + 2] = ((alpha256 * src[srcIdx + i + 2] + invAlpha * dst[dstIdx + i + 2]) >> 8); 122 // Skip alpha channel for opaque blending 123 } 124} 125``` 126 127**B. Pre-multiplied Alpha** 128```javascript 129// Store layers in pre-multiplied format to avoid runtime multiplication 130function convertToPremultiplied(pixels) { 131 for (let i = 0; i < pixels.length; i += 4) { 132 const alpha = pixels[i + 3] / 255; 133 pixels[i] *= alpha; // R 134 pixels[i + 1] *= alpha; // G 135 pixels[i + 2] *= alpha; // B 136 } 137} 138``` 139 140**Estimated Gain**: 40-60% faster compositing 141 142### 2. Layer Rendering Optimizations 143 144**A. Dirty Rectangle Tracking** 145```javascript 146class LayerBuffer { 147 constructor(width, height) { 148 this.pixels = new Uint8Array(width * height * 4); 149 this.dirtyBox = null; // Track changed regions 150 } 151 152 markDirty(x, y, w, h) { 153 if (!this.dirtyBox) { 154 this.dirtyBox = { x, y, w, h }; 155 } else { 156 // Expand dirty box to include new region 157 this.dirtyBox = expandBox(this.dirtyBox, { x, y, w, h }); 158 } 159 } 160} 161``` 162 163**B. Layer Caching Strategy** 164```javascript 165// Cache layer results when no animations are active 166const layerCache = new Map(); 167 168function renderLayerWithCaching(layerId, hasActiveTimers) { 169 if (!hasActiveTimers && layerCache.has(layerId)) { 170 return layerCache.get(layerId); 171 } 172 173 const result = renderLayer(layerId); 174 if (!hasActiveTimers) { 175 layerCache.set(layerId, result); 176 } 177 return result; 178} 179``` 180 181**Estimated Gain**: 30-50% reduction in redundant layer rendering 182 183### 3. Timing System Optimizations 184 185**A. Timer Batching** 186```javascript 187// Group timers by interval for batch processing 188const timerBatches = { 189 '0.1s': [], // 12-frame intervals 190 '1s': [], // 120-frame intervals 191 '1.5s': [], // 180-frame intervals 192}; 193 194function processTimerBatch(interval, frameCount) { 195 if (frameCount % getFramesForInterval(interval) === 0) { 196 timerBatches[interval].forEach(timer => timer.execute()); 197 } 198} 199``` 200 201**B. Lazy Timer Key Generation** 202```javascript 203// Cache timer keys to avoid JSON.stringify overhead 204const timerKeyCache = new WeakMap(); 205 206function getTimerKey(head, cacheId, args) { 207 if (!timerKeyCache.has(args)) { 208 timerKeyCache.set(args, `${head}-${cacheId}-${JSON.stringify(args)}`); 209 } 210 return timerKeyCache.get(args); 211} 212``` 213 214**Estimated Gain**: 15-25% reduction in timing overhead 215 216## Recommended Implementation Plan 217 218### Phase 1: Critical Path Optimization (High Impact) 2191. **Implement integer-only alpha compositing** for opaque blending cases 2202. **Add SIMD-style bulk pixel operations** for large layer composites 2213. **Implement dirty rectangle tracking** for incremental updates 222 223### Phase 2: Memory & Caching (Medium Impact) 2244. **Add layer result caching** for static content 2255. **Implement pre-multiplied alpha storage** format 2266. **Optimize timer key generation** with caching 227 228### Phase 3: Advanced Optimizations (Lower Impact) 2297. **WebGL compositing pipeline** for complex effects 2308. **Worker thread layer rendering** for parallel processing 2319. **Adaptive quality scaling** based on performance metrics 232 233## Measurement & Validation 234 235### Performance Metrics to Track 236```javascript 237// Add to graph.mjs for performance monitoring 238const perfMetrics = { 239 blendTime: 0, 240 layerRenderTime: 0, 241 timingOverhead: 0, 242 frameDrops: 0 243}; 244 245function measureBlendPerformance(fn) { 246 const start = performance.now(); 247 fn(); 248 perfMetrics.blendTime += performance.now() - start; 249} 250``` 251 252### Expected Performance Gains 253- **Alpha Compositing**: 40-60% faster → ~1.5ms savings per frame 254- **Layer Rendering**: 30-50% reduction → ~1.0ms savings per frame 255- **Timing Overhead**: 15-25% reduction → ~0.2ms savings per frame 256- **Total Improvement**: ~2.7ms per frame (32% performance gain) 257 258This would provide significant headroom for more complex effects and better frame stability at 120fps. 259 260## Current Status 261 262**Completed**: Removed setTimeout-based timing (major architectural fix) 263🔄 **In Progress**: Performance measurement infrastructure 264📋 **Next**: Implement integer-only alpha compositing optimization 265 266--- 267*Last Updated: September 6, 2025* 268*Analysis Target: $cow embedded layer composition*