Monorepo for Aesthetic.Computer aesthetic.computer
4
fork

Configure Feed

Select the types of activity you want to include in your feed.

at main 280 lines 8.7 kB view raw view rendered
1# GPU Acceleration Plan for KidLisp Effects 2 3## Overview 4 5Analysis of `$cow` and the current CPU/GPU hybrid architecture to accelerate `flood`, `contrast`, and embedded layer compositing for better performance on complex KidLisp pieces. 6 7## Current `$cow` Source 8 9```lisp 10($39i 0 0 w h 128) 11($r2f 0 0 w h 128) 12(contrast 1.5) 13``` 14 15This piece embeds two other KidLisp pieces (`$39i` and `$r2f`) as fullscreen layers with 50% alpha (128), then applies contrast adjustment. The performance bottlenecks are: 16 171. **Embedded layer rendering** - Each frame renders 2 full child interpreters 182. **Layer compositing** - Alpha blending 2 fullscreen layers onto the main buffer 193. **Contrast adjustment** - Per-pixel LUT-based processing on CPU 20 21## Current Architecture 22 23### CPU Effects (`graph.mjs`) 24 25| Effect | Implementation | Performance | 26|--------|---------------|-------------| 27| `flood` | Stack-based flood fill with visited array | O(n) pixels, high memory churn | 28| `contrast` | Pre-computed LUT (256 entries), per-pixel loop | Fast but sequential | 29| `brightness` | Pre-computed LUT, per-pixel loop | Fast but sequential | 30| `blur` | Separable Gaussian, 2-pass convolution | GPU fallback available | 31| `spin` | Polar coordinate transform | GPU fallback available | 32| `zoom` | Inverse transform sampling | GPU fallback available | 33| `scroll` | Wrapped coordinate offset | GPU fallback available | 34 35### GPU Effects (`gpu-effects.mjs`) 36 37Already implemented with WebGL2: 38-`spin` - Polar rotation shader (pixel-perfect match to CPU) 39-`zoom` - Inverse transform with wrapping 40-`scroll` - Coordinate offset with wrapping 41-`contrast` - Fragment shader adjustment (in composite shader) 42-`brightness` - Fragment shader adjustment 43-`blur` - Separable Gaussian (horizontal + vertical passes) 44-`sharpen` - Unsharp mask filter 45 46### Embedded Layers (`kidlisp.mjs`) 47 48Current flow: 491. `embed` creates a persistent `EmbeddedLayer` object 502. Each frame, child KidLisp interpreter runs in isolated buffer 513. Buffer is `paste`d to main screen with alpha blending 524. `bake` creates persistent background layers 53 54## Proposed GPU Acceleration 55 56### Phase 1: GPU Flood Fill (High Impact) 57 58The current CPU flood fill is a major bottleneck for pieces that use `flood` heavily. 59 60**Approach**: Jump Flooding Algorithm (JFA) on GPU 61 62```glsl 63// Jump Flooding Algorithm - O(log n) passes for flood fill 64// Pass 1: Initialize seed pixels 65// Pass 2-N: Propagate nearest seed with halving step sizes 66 67#version 300 es 68precision highp float; 69 70uniform sampler2D u_seeds; // Current seed map (RGB = position, A = distance) 71uniform sampler2D u_source; // Original image for color matching 72uniform vec2 u_resolution; 73uniform int u_stepSize; // Jump distance (starts at max, halves each pass) 74uniform vec4 u_targetColor; // Color to match for boundary 75 76out vec4 fragColor; 77 78void main() { 79 ivec2 coord = ivec2(gl_FragCoord.xy); 80 vec4 best = texelFetch(u_seeds, coord, 0); 81 82 // Check 8 neighbors at current step size 83 for (int dy = -1; dy <= 1; dy++) { 84 for (int dx = -1; dx <= 1; dx++) { 85 if (dx == 0 && dy == 0) continue; 86 87 ivec2 neighbor = coord + ivec2(dx, dy) * u_stepSize; 88 if (neighbor.x < 0 || neighbor.y < 0 || 89 neighbor.x >= int(u_resolution.x) || neighbor.y >= int(u_resolution.y)) continue; 90 91 vec4 neighborSeed = texelFetch(u_seeds, neighbor, 0); 92 if (neighborSeed.a < best.a) { 93 // Check if path crosses boundary (color mismatch) 94 vec4 sourceColor = texelFetch(u_source, coord, 0); 95 if (sourceColor == u_targetColor) { 96 best = neighborSeed; 97 } 98 } 99 } 100 } 101 102 fragColor = best; 103} 104``` 105 106**Performance**: O(log₂(max(width, height))) passes vs O(n) pixels 107 108### Phase 2: GPU Layer Compositing (High Impact for $cow) 109 110Current: CPU `paste` with alpha blending per pixel 111Proposed: Batch all embedded layers into single GPU composite pass 112 113```glsl 114#version 300 es 115precision highp float; 116 117uniform sampler2D u_background; 118uniform sampler2D u_layer0; 119uniform sampler2D u_layer1; 120// ... up to 8 layers 121 122uniform vec4 u_layerBounds[8]; // x, y, w, h for each layer 123uniform float u_layerAlpha[8]; 124uniform int u_layerCount; 125 126out vec4 fragColor; 127 128void main() { 129 ivec2 coord = ivec2(gl_FragCoord.xy); 130 vec4 color = texelFetch(u_background, coord, 0); 131 132 // Composite each layer in order 133 for (int i = 0; i < 8; i++) { 134 if (i >= u_layerCount) break; 135 136 vec4 bounds = u_layerBounds[i]; 137 if (float(coord.x) >= bounds.x && float(coord.x) < bounds.x + bounds.z && 138 float(coord.y) >= bounds.y && float(coord.y) < bounds.y + bounds.w) { 139 140 ivec2 layerCoord = coord - ivec2(bounds.xy); 141 vec4 layerColor; 142 143 // Sample from appropriate layer texture 144 if (i == 0) layerColor = texelFetch(u_layer0, layerCoord, 0); 145 else if (i == 1) layerColor = texelFetch(u_layer1, layerCoord, 0); 146 // ... etc 147 148 // Alpha blend 149 float alpha = layerColor.a * u_layerAlpha[i] / 255.0; 150 color = mix(color, layerColor, alpha); 151 } 152 } 153 154 fragColor = color; 155} 156``` 157 158**Benefits**: 159- Single GPU draw call for all layers 160- No CPU-GPU round trips per layer 161- Parallel alpha blending 162 163### Phase 3: GPU Contrast/Brightness Pipeline 164 165Already partially implemented in `COMPOSITE_FRAGMENT_SHADER`. Extend to be usable standalone: 166 167```javascript 168// In gpu-effects.mjs 169export function gpuContrast(pixels, width, height, level, mask = null) { 170 if (!initialized || !gl) return false; 171 172 ensureResources(width, height); 173 uploadPixels(pixels, width, height); 174 175 gl.useProgram(compositeProgram); 176 setUniform('u_zoomScale', 1.0); 177 setUniform('u_scrollOffset', [0, 0]); 178 setUniform('u_contrast', level); 179 setUniform('u_brightness', 0); 180 setBounds(mask || { x: 0, y: 0, width, height }); 181 182 renderAndReadback(pixels, width, height); 183 return true; 184} 185``` 186 187### Phase 4: Batched Effect Pipeline 188 189For pieces like `$cow` that chain multiple effects, batch them into a single GPU pipeline: 190 191```javascript 192// New API: Batched effect execution 193export function gpuEffectBatch(pixels, width, height, effects) { 194 // effects = [ 195 // { type: 'layer', texture: layer0, bounds: {...}, alpha: 128 }, 196 // { type: 'layer', texture: layer1, bounds: {...}, alpha: 128 }, 197 // { type: 'contrast', level: 1.5 }, 198 // ] 199 200 // Single upload, multiple shader passes, single readback 201 ensureResources(width, height); 202 uploadPixels(pixels, width, height); 203 204 for (const effect of effects) { 205 switch (effect.type) { 206 case 'layer': 207 applyLayerComposite(effect); 208 break; 209 case 'contrast': 210 applyContrast(effect.level); 211 break; 212 // ... etc 213 } 214 // Ping-pong between framebuffers 215 swapBuffers(); 216 } 217 218 readbackPixels(pixels, width, height); 219 return true; 220} 221``` 222 223## Implementation Priority 224 225| Phase | Effect | Impact | Complexity | Est. Time | 226|-------|--------|--------|------------|-----------| 227| 1 | GPU Flood Fill (JFA) | High | Medium | 2-3 days | 228| 2 | GPU Layer Compositing | High | Medium | 2 days | 229| 3 | Standalone GPU Contrast | Medium | Low | 0.5 day | 230| 4 | Batched Effect Pipeline | High | High | 3-4 days | 231 232## Current GPU Hooks in graph.mjs 233 234```javascript 235// Existing GPU fallback pattern (blur example) 236function blur(strength = 1, quality = "medium") { 237 // 🚀 TRY GPU BLUR FIRST 238 if (gpuSpinEnabled && gpuSpinAvailable && gpuSpinModule?.gpuBlur) { 239 const success = gpuSpinModule.gpuBlur(pixels, width, height, strength, mask); 240 if (success) { 241 blurAccumulator = 0.0; 242 return; 243 } 244 } 245 246 // CPU FALLBACK 247 // ... existing CPU implementation 248} 249``` 250 251This pattern should be extended for: 252- `flood()``gpuSpinModule.gpuFlood()` 253- `contrast()``gpuSpinModule.gpuContrast()` 254 255## Memory Considerations 256 257- Flood fill JFA requires 2 textures for ping-pong 258- Layer compositing needs texture per layer (up to 8) 259- All use existing `gl` context from `gpu-effects.mjs` 260- Readback buffer already allocated (`readbackBuffer`) 261 262## Testing Strategy 263 2641. **Visual parity**: Compare GPU vs CPU output pixel-by-pixel 2652. **Performance benchmarks**: 266 - `$cow` FPS before/after 267 - Isolated `flood` on 1920x1080 canvas 268 - 4-layer composite vs 4 sequential `paste` calls 2693. **Edge cases**: 270 - Flood fill at boundaries 271 - Layers with partial transparency 272 - Chained effects order 273 274## Next Steps 275 2761. Profile `$cow` to identify actual bottleneck percentages 2772. Implement `gpuFlood` with JFA algorithm 2783. Add GPU layer compositing to `embed` system 2794. Create batched effect API for complex pieces 2805. Add performance metrics to compare CPU vs GPU paths