Monorepo for Aesthetic.Computer
aesthetic.computer
1# KidLisp Performance Analysis & Optimization Report
2
3## Executive Summary
4
5Analysis of the `$cow` piece rendering pipeline reveals several performance bottlenecks in the alpha compositing and embedded layer system. The piece composites two animated layers (`$39i` and `$r2f`) at 120fps with complex timing expressions, creating significant computational overhead.
6
7## Piece Analysis: $cow
8
9### Structure
10```
11📁 $cow (composite layer)
12 ├─ 📄 $39i (background effects layer)
13 └─ 📄 $r2f (foreground zoom layer)
14```
15
16### Source Code Breakdown
17
18**Main Compositor ($cow)**:
19```kidlisp
20($39i 0 0 w h 128) ; Background layer at 50% opacity
21($r2f 0 0 w h 128) ; Foreground layer at 50% opacity
22(contrast 1.5) ; Expensive post-processing effect
23```
24
25**Background Layer ($39i)**: 11 operations/frame
26- Complex timing: 5 different timer expressions (0.1s, 1.5s, 1s, 2s, 0.3s)
27- Heavy operations: `flood`, `scroll`, `zoom`, `blur`, `contrast`, `spin`
28- Random point generation: `(repeat 30 point)`
29
30**Foreground Layer ($r2f)**: 11 operations/frame
31- High-frequency zoom: `(0.1s (zoom (? 1.89 1 1.1 1.2)))` every 100ms
32- Continuous effects: `scroll`, `spin`, `blur`
33- Multiple flood fills: `(repeat 2 (flood ? ?))`
34
35## Performance Bottlenecks
36
37### 1. Alpha Compositing Pipeline
38
39**Current Implementation** (`graph.mjs` lines 1558-1610):
40```javascript
41function blend(dst, src, si, di, alphaIn = 1) {
42 // Branch A: Transparent pixel compositing (expensive)
43 if (dst[di + 3] < 255 && src[si + 3] > 0) {
44 const alphaSrc = (src[si + 3] * alphaIn) / 255;
45 const alphaDst = dst[di + 3] / 255;
46 const combinedAlpha = alphaSrc + (1.0 - alphaSrc) * alphaDst;
47 // Per-channel floating-point math (3 divisions, 6 multiplications)
48 for (let offset = 0; offset < 3; offset++) {
49 dst[di + offset] = (src[si + offset] * alphaSrc +
50 dst[di + offset] * (1.0 - alphaSrc) * alphaDst) /
51 (combinedAlpha + epsilon);
52 }
53 } else {
54 // Branch B: Opaque pixel compositing (faster integer math)
55 const alpha = src[si + 3] * alphaIn + 1;
56 const invAlpha = 256 - alpha;
57 dst[di] = (alpha * src[si + 0] + invAlpha * dst[di + 0]) >> 8;
58 // ... similar for G, B channels
59 }
60}
61```
62
63**Performance Issues**:
64- **Floating-point overhead**: Branch A uses expensive division and floating-point arithmetic
65- **Branch prediction**: Transparent vs opaque pixel branching creates CPU pipeline stalls
66- **Memory access pattern**: Non-sequential pixel access in compositing loops
67
68### 2. Layer Rendering Frequency
69
70**Current Execution Pattern**:
71- `$cow` renders at 120fps (8.33ms budget per frame)
72- Each embedded layer renders independently
73- `$39i`: 5 timer expressions firing at different intervals
74- `$r2f`: High-frequency zoom (10 times per second)
75- Double alpha compositing: Each layer → main buffer → final output
76
77**Frame Budget Breakdown** (estimated):
78```
79Per-frame operations (120fps = 8.33ms budget):
80├─ $39i rendering: ~2.5ms (timer evaluation + effects)
81├─ $r2f rendering: ~2.0ms (zoom calculations + blending)
82├─ Alpha compositing: ~2.5ms (pixel-by-pixel blending)
83├─ Contrast effect: ~1.0ms (post-processing)
84└─ Overhead: ~0.33ms (timing, evaluation)
85Total: ~8.33ms (100% budget utilization)
86```
87
88### 3. Timing System Overhead
89
90**Recent Fix Applied**: Removed `setTimeout`-based timing in favor of frame-based counting:
91```javascript
92// OLD (expensive):
93setTimeout(() => { /* execute timing expression */ }, delay);
94
95// NEW (efficient):
96if (this.frameCount - lastExecution >= targetFrames) {
97 // execute immediately in frame context
98}
99```
100
101**Remaining Issues**:
102- Timer expressions are evaluated every frame even when not firing
103- Context switching between embedded layers
104- Redundant timing key generation: `${head}-${cacheId}-${JSON.stringify(args)}`
105
106## Optimization Opportunities
107
108### 1. Alpha Compositing Optimizations
109
110**A. SIMD-style Bulk Operations**
111```javascript
112// Instead of pixel-by-pixel blending, process in chunks
113function fastBlendChunk(dst, src, dstIdx, srcIdx, count, alpha) {
114 // Process 4 pixels at once using typed array operations
115 const alpha256 = (alpha * 256) | 0;
116 const invAlpha = 256 - alpha256;
117
118 for (let i = 0; i < count * 4; i += 4) {
119 dst[dstIdx + i] = ((alpha256 * src[srcIdx + i] + invAlpha * dst[dstIdx + i]) >> 8);
120 dst[dstIdx + i + 1] = ((alpha256 * src[srcIdx + i + 1] + invAlpha * dst[dstIdx + i + 1]) >> 8);
121 dst[dstIdx + i + 2] = ((alpha256 * src[srcIdx + i + 2] + invAlpha * dst[dstIdx + i + 2]) >> 8);
122 // Skip alpha channel for opaque blending
123 }
124}
125```
126
127**B. Pre-multiplied Alpha**
128```javascript
129// Store layers in pre-multiplied format to avoid runtime multiplication
130function convertToPremultiplied(pixels) {
131 for (let i = 0; i < pixels.length; i += 4) {
132 const alpha = pixels[i + 3] / 255;
133 pixels[i] *= alpha; // R
134 pixels[i + 1] *= alpha; // G
135 pixels[i + 2] *= alpha; // B
136 }
137}
138```
139
140**Estimated Gain**: 40-60% faster compositing
141
142### 2. Layer Rendering Optimizations
143
144**A. Dirty Rectangle Tracking**
145```javascript
146class LayerBuffer {
147 constructor(width, height) {
148 this.pixels = new Uint8Array(width * height * 4);
149 this.dirtyBox = null; // Track changed regions
150 }
151
152 markDirty(x, y, w, h) {
153 if (!this.dirtyBox) {
154 this.dirtyBox = { x, y, w, h };
155 } else {
156 // Expand dirty box to include new region
157 this.dirtyBox = expandBox(this.dirtyBox, { x, y, w, h });
158 }
159 }
160}
161```
162
163**B. Layer Caching Strategy**
164```javascript
165// Cache layer results when no animations are active
166const layerCache = new Map();
167
168function renderLayerWithCaching(layerId, hasActiveTimers) {
169 if (!hasActiveTimers && layerCache.has(layerId)) {
170 return layerCache.get(layerId);
171 }
172
173 const result = renderLayer(layerId);
174 if (!hasActiveTimers) {
175 layerCache.set(layerId, result);
176 }
177 return result;
178}
179```
180
181**Estimated Gain**: 30-50% reduction in redundant layer rendering
182
183### 3. Timing System Optimizations
184
185**A. Timer Batching**
186```javascript
187// Group timers by interval for batch processing
188const timerBatches = {
189 '0.1s': [], // 12-frame intervals
190 '1s': [], // 120-frame intervals
191 '1.5s': [], // 180-frame intervals
192};
193
194function processTimerBatch(interval, frameCount) {
195 if (frameCount % getFramesForInterval(interval) === 0) {
196 timerBatches[interval].forEach(timer => timer.execute());
197 }
198}
199```
200
201**B. Lazy Timer Key Generation**
202```javascript
203// Cache timer keys to avoid JSON.stringify overhead
204const timerKeyCache = new WeakMap();
205
206function getTimerKey(head, cacheId, args) {
207 if (!timerKeyCache.has(args)) {
208 timerKeyCache.set(args, `${head}-${cacheId}-${JSON.stringify(args)}`);
209 }
210 return timerKeyCache.get(args);
211}
212```
213
214**Estimated Gain**: 15-25% reduction in timing overhead
215
216## Recommended Implementation Plan
217
218### Phase 1: Critical Path Optimization (High Impact)
2191. **Implement integer-only alpha compositing** for opaque blending cases
2202. **Add SIMD-style bulk pixel operations** for large layer composites
2213. **Implement dirty rectangle tracking** for incremental updates
222
223### Phase 2: Memory & Caching (Medium Impact)
2244. **Add layer result caching** for static content
2255. **Implement pre-multiplied alpha storage** format
2266. **Optimize timer key generation** with caching
227
228### Phase 3: Advanced Optimizations (Lower Impact)
2297. **WebGL compositing pipeline** for complex effects
2308. **Worker thread layer rendering** for parallel processing
2319. **Adaptive quality scaling** based on performance metrics
232
233## Measurement & Validation
234
235### Performance Metrics to Track
236```javascript
237// Add to graph.mjs for performance monitoring
238const perfMetrics = {
239 blendTime: 0,
240 layerRenderTime: 0,
241 timingOverhead: 0,
242 frameDrops: 0
243};
244
245function measureBlendPerformance(fn) {
246 const start = performance.now();
247 fn();
248 perfMetrics.blendTime += performance.now() - start;
249}
250```
251
252### Expected Performance Gains
253- **Alpha Compositing**: 40-60% faster → ~1.5ms savings per frame
254- **Layer Rendering**: 30-50% reduction → ~1.0ms savings per frame
255- **Timing Overhead**: 15-25% reduction → ~0.2ms savings per frame
256- **Total Improvement**: ~2.7ms per frame (32% performance gain)
257
258This would provide significant headroom for more complex effects and better frame stability at 120fps.
259
260## Current Status
261
262✅ **Completed**: Removed setTimeout-based timing (major architectural fix)
263🔄 **In Progress**: Performance measurement infrastructure
264📋 **Next**: Implement integer-only alpha compositing optimization
265
266---
267*Last Updated: September 6, 2025*
268*Analysis Target: $cow embedded layer composition*