Monorepo for Aesthetic.Computer
aesthetic.computer
1# GPU Acceleration Plan for KidLisp Effects
2
3## Overview
4
5Analysis of `$cow` and the current CPU/GPU hybrid architecture to accelerate `flood`, `contrast`, and embedded layer compositing for better performance on complex KidLisp pieces.
6
7## Current `$cow` Source
8
9```lisp
10($39i 0 0 w h 128)
11($r2f 0 0 w h 128)
12(contrast 1.5)
13```
14
15This piece embeds two other KidLisp pieces (`$39i` and `$r2f`) as fullscreen layers with 50% alpha (128), then applies contrast adjustment. The performance bottlenecks are:
16
171. **Embedded layer rendering** - Each frame renders 2 full child interpreters
182. **Layer compositing** - Alpha blending 2 fullscreen layers onto the main buffer
193. **Contrast adjustment** - Per-pixel LUT-based processing on CPU
20
21## Current Architecture
22
23### CPU Effects (`graph.mjs`)
24
25| Effect | Implementation | Performance |
26|--------|---------------|-------------|
27| `flood` | Stack-based flood fill with visited array | O(n) pixels, high memory churn |
28| `contrast` | Pre-computed LUT (256 entries), per-pixel loop | Fast but sequential |
29| `brightness` | Pre-computed LUT, per-pixel loop | Fast but sequential |
30| `blur` | Separable Gaussian, 2-pass convolution | GPU fallback available |
31| `spin` | Polar coordinate transform | GPU fallback available |
32| `zoom` | Inverse transform sampling | GPU fallback available |
33| `scroll` | Wrapped coordinate offset | GPU fallback available |
34
35### GPU Effects (`gpu-effects.mjs`)
36
37Already implemented with WebGL2:
38- ✅ `spin` - Polar rotation shader (pixel-perfect match to CPU)
39- ✅ `zoom` - Inverse transform with wrapping
40- ✅ `scroll` - Coordinate offset with wrapping
41- ✅ `contrast` - Fragment shader adjustment (in composite shader)
42- ✅ `brightness` - Fragment shader adjustment
43- ✅ `blur` - Separable Gaussian (horizontal + vertical passes)
44- ✅ `sharpen` - Unsharp mask filter
45
46### Embedded Layers (`kidlisp.mjs`)
47
48Current flow:
491. `embed` creates a persistent `EmbeddedLayer` object
502. Each frame, child KidLisp interpreter runs in isolated buffer
513. Buffer is `paste`d to main screen with alpha blending
524. `bake` creates persistent background layers
53
54## Proposed GPU Acceleration
55
56### Phase 1: GPU Flood Fill (High Impact)
57
58The current CPU flood fill is a major bottleneck for pieces that use `flood` heavily.
59
60**Approach**: Jump Flooding Algorithm (JFA) on GPU
61
62```glsl
63// Jump Flooding Algorithm - O(log n) passes for flood fill
64// Pass 1: Initialize seed pixels
65// Pass 2-N: Propagate nearest seed with halving step sizes
66
67#version 300 es
68precision highp float;
69
70uniform sampler2D u_seeds; // Current seed map (RGB = position, A = distance)
71uniform sampler2D u_source; // Original image for color matching
72uniform vec2 u_resolution;
73uniform int u_stepSize; // Jump distance (starts at max, halves each pass)
74uniform vec4 u_targetColor; // Color to match for boundary
75
76out vec4 fragColor;
77
78void main() {
79 ivec2 coord = ivec2(gl_FragCoord.xy);
80 vec4 best = texelFetch(u_seeds, coord, 0);
81
82 // Check 8 neighbors at current step size
83 for (int dy = -1; dy <= 1; dy++) {
84 for (int dx = -1; dx <= 1; dx++) {
85 if (dx == 0 && dy == 0) continue;
86
87 ivec2 neighbor = coord + ivec2(dx, dy) * u_stepSize;
88 if (neighbor.x < 0 || neighbor.y < 0 ||
89 neighbor.x >= int(u_resolution.x) || neighbor.y >= int(u_resolution.y)) continue;
90
91 vec4 neighborSeed = texelFetch(u_seeds, neighbor, 0);
92 if (neighborSeed.a < best.a) {
93 // Check if path crosses boundary (color mismatch)
94 vec4 sourceColor = texelFetch(u_source, coord, 0);
95 if (sourceColor == u_targetColor) {
96 best = neighborSeed;
97 }
98 }
99 }
100 }
101
102 fragColor = best;
103}
104```
105
106**Performance**: O(log₂(max(width, height))) passes vs O(n) pixels
107
108### Phase 2: GPU Layer Compositing (High Impact for $cow)
109
110Current: CPU `paste` with alpha blending per pixel
111Proposed: Batch all embedded layers into single GPU composite pass
112
113```glsl
114#version 300 es
115precision highp float;
116
117uniform sampler2D u_background;
118uniform sampler2D u_layer0;
119uniform sampler2D u_layer1;
120// ... up to 8 layers
121
122uniform vec4 u_layerBounds[8]; // x, y, w, h for each layer
123uniform float u_layerAlpha[8];
124uniform int u_layerCount;
125
126out vec4 fragColor;
127
128void main() {
129 ivec2 coord = ivec2(gl_FragCoord.xy);
130 vec4 color = texelFetch(u_background, coord, 0);
131
132 // Composite each layer in order
133 for (int i = 0; i < 8; i++) {
134 if (i >= u_layerCount) break;
135
136 vec4 bounds = u_layerBounds[i];
137 if (float(coord.x) >= bounds.x && float(coord.x) < bounds.x + bounds.z &&
138 float(coord.y) >= bounds.y && float(coord.y) < bounds.y + bounds.w) {
139
140 ivec2 layerCoord = coord - ivec2(bounds.xy);
141 vec4 layerColor;
142
143 // Sample from appropriate layer texture
144 if (i == 0) layerColor = texelFetch(u_layer0, layerCoord, 0);
145 else if (i == 1) layerColor = texelFetch(u_layer1, layerCoord, 0);
146 // ... etc
147
148 // Alpha blend
149 float alpha = layerColor.a * u_layerAlpha[i] / 255.0;
150 color = mix(color, layerColor, alpha);
151 }
152 }
153
154 fragColor = color;
155}
156```
157
158**Benefits**:
159- Single GPU draw call for all layers
160- No CPU-GPU round trips per layer
161- Parallel alpha blending
162
163### Phase 3: GPU Contrast/Brightness Pipeline
164
165Already partially implemented in `COMPOSITE_FRAGMENT_SHADER`. Extend to be usable standalone:
166
167```javascript
168// In gpu-effects.mjs
169export function gpuContrast(pixels, width, height, level, mask = null) {
170 if (!initialized || !gl) return false;
171
172 ensureResources(width, height);
173 uploadPixels(pixels, width, height);
174
175 gl.useProgram(compositeProgram);
176 setUniform('u_zoomScale', 1.0);
177 setUniform('u_scrollOffset', [0, 0]);
178 setUniform('u_contrast', level);
179 setUniform('u_brightness', 0);
180 setBounds(mask || { x: 0, y: 0, width, height });
181
182 renderAndReadback(pixels, width, height);
183 return true;
184}
185```
186
187### Phase 4: Batched Effect Pipeline
188
189For pieces like `$cow` that chain multiple effects, batch them into a single GPU pipeline:
190
191```javascript
192// New API: Batched effect execution
193export function gpuEffectBatch(pixels, width, height, effects) {
194 // effects = [
195 // { type: 'layer', texture: layer0, bounds: {...}, alpha: 128 },
196 // { type: 'layer', texture: layer1, bounds: {...}, alpha: 128 },
197 // { type: 'contrast', level: 1.5 },
198 // ]
199
200 // Single upload, multiple shader passes, single readback
201 ensureResources(width, height);
202 uploadPixels(pixels, width, height);
203
204 for (const effect of effects) {
205 switch (effect.type) {
206 case 'layer':
207 applyLayerComposite(effect);
208 break;
209 case 'contrast':
210 applyContrast(effect.level);
211 break;
212 // ... etc
213 }
214 // Ping-pong between framebuffers
215 swapBuffers();
216 }
217
218 readbackPixels(pixels, width, height);
219 return true;
220}
221```
222
223## Implementation Priority
224
225| Phase | Effect | Impact | Complexity | Est. Time |
226|-------|--------|--------|------------|-----------|
227| 1 | GPU Flood Fill (JFA) | High | Medium | 2-3 days |
228| 2 | GPU Layer Compositing | High | Medium | 2 days |
229| 3 | Standalone GPU Contrast | Medium | Low | 0.5 day |
230| 4 | Batched Effect Pipeline | High | High | 3-4 days |
231
232## Current GPU Hooks in graph.mjs
233
234```javascript
235// Existing GPU fallback pattern (blur example)
236function blur(strength = 1, quality = "medium") {
237 // 🚀 TRY GPU BLUR FIRST
238 if (gpuSpinEnabled && gpuSpinAvailable && gpuSpinModule?.gpuBlur) {
239 const success = gpuSpinModule.gpuBlur(pixels, width, height, strength, mask);
240 if (success) {
241 blurAccumulator = 0.0;
242 return;
243 }
244 }
245
246 // CPU FALLBACK
247 // ... existing CPU implementation
248}
249```
250
251This pattern should be extended for:
252- `flood()` → `gpuSpinModule.gpuFlood()`
253- `contrast()` → `gpuSpinModule.gpuContrast()`
254
255## Memory Considerations
256
257- Flood fill JFA requires 2 textures for ping-pong
258- Layer compositing needs texture per layer (up to 8)
259- All use existing `gl` context from `gpu-effects.mjs`
260- Readback buffer already allocated (`readbackBuffer`)
261
262## Testing Strategy
263
2641. **Visual parity**: Compare GPU vs CPU output pixel-by-pixel
2652. **Performance benchmarks**:
266 - `$cow` FPS before/after
267 - Isolated `flood` on 1920x1080 canvas
268 - 4-layer composite vs 4 sequential `paste` calls
2693. **Edge cases**:
270 - Flood fill at boundaries
271 - Layers with partial transparency
272 - Chained effects order
273
274## Next Steps
275
2761. Profile `$cow` to identify actual bottleneck percentages
2772. Implement `gpuFlood` with JFA algorithm
2783. Add GPU layer compositing to `embed` system
2794. Create batched effect API for complex pieces
2805. Add performance metrics to compare CPU vs GPU paths