Oven Architecture Report#

Generated: 2026-02-13 Server: oven.aesthetic.computer (137.184.237.166) Uptime: 44 days (OS), ~12 min since last oven restart

1. Machine Specs#

Resource	Value
CPU	2 vCPUs (Intel, DO-Regular)
RAM	1.97 GB total, ~635 MB used, ~1.3 GB available
Swap	None configured
Disk	58 GB, 6.7 GB used (12%)
OS	Ubuntu 24.04.3 LTS (kernel 6.8.0-90)
Node	v20.20.0
Chrome	143.0.7499.40 (headless, Puppeteer-managed)
ffmpeg	6.1.1 (system package, WebP + H.264 support)

Current Memory Breakdown (at rest with 1 active grab)#

Node (server.mjs): ~175 MB (8.6% of RAM)
Chrome main process: ~202 MB (10%)
Chrome GPU process: ~157 MB (7.7%)
Chrome network service: ~125 MB (6.2%)
Chrome renderer(s): ~65-100 MB each (3-5%)
Caddy: ~32 MB
Total Chrome footprint: ~600-700 MB
Peak memory observed in logs: 1.4 GB (during heavy grab batches)

Verdict: With 2 GB total and no swap, the machine is memory-constrained. A single Chrome instance + Node already consumes ~850 MB at rest. During heavy workloads, peak memory hits 1.4 GB, leaving very little headroom. This is the primary bottleneck.

2. Architecture Overview#

Internet → Caddy (port 443/80, gzip, TLS) → Express (port 3002) → Puppeteer (Chrome)
                                                                 → ffmpeg (WebP/MP4)
                                                                 → terser (JS minification)
                                                                 → DO Spaces (S3 storage)
                                                                 → MongoDB (metadata)

Process Model#

Single Node process (server.mjs) — no clustering, no workers
Single Chrome browser — shared instance, reused across all grab/icon/preview requests
Serial grab queue — grabRunning boolean, one grab at a time, 100ms delay between jobs
systemd manages the oven service with Restart=always, RestartSec=10

Key Modules#

Module	Purpose	Size
`server.mjs`	Express routes + dashboard HTML	104 KB
`grabber.mjs`	Screenshot/WebP/icon capture via Puppeteer	127 KB
`baker.mjs`	Tape (MP4) baking pipeline	24 KB
`bundler.mjs`	KidLisp/JS piece HTML bundle generation	44 KB

3. API Endpoints (41 routes)#

Core Operations#

Endpoint	Method	Purpose
`/`	GET	Dashboard (real-time WebSocket updates)
`/health`	GET	Health check
`/status`	GET	Server status + recent bakes
`/grab-status`	GET	Active grabs + queue state

Tape Baking (MP4)#

Endpoint	Method	Purpose
`/bake`	POST	Start tape bake (WebP frames → MP4)
`/bake-complete`	POST	Callback when bake finishes
`/bake-status`	POST	Check bake progress

Screenshots & WebP Captures (Grabber)#

Endpoint	Method	Purpose
`/grab`	POST	Trigger grab (screenshot/animation)
`/grab/:format/:width/:height/:piece`	GET	Direct grab with params
`/grab-ipfs`	POST	Grab + IPFS upload
`/grab-cleanup`	POST	Clean stale grabs
`/grab-clear`	POST	Clear all active grabs
`/icon/:size/:piece.png`	GET	Piece icon (cached → DO Spaces)
`/icon/:size/:piece.webp`	GET	Piece icon as WebP
`/preview/:size/:piece.png`	GET	Piece preview screenshot

OG Images#

Endpoint	Method	Purpose
`/kidlisp-og.png`	GET	KidLisp OG image (for social sharing)
`/kidlisp-og`	GET	KidLisp OG page (HTML)
`/kidlisp-og/status`	GET	OG cache status
`/kidlisp-og/preview`	GET	OG preview page
`/notepat-og.png`	GET	Notepat OG image
`/kidlisp-backdrop.webp`	GET	KidLisp backdrop animation
`/kidlisp-backdrop`	GET	KidLisp backdrop page

App Screenshots#

Endpoint	Method	Purpose
`/app-screenshots`	GET	App screenshot dashboard
`/app-screenshots/:preset/:piece.png`	GET	Screenshot by preset
`/app-screenshots/download/:piece`	GET	Download all presets as ZIP
`/api/app-screenshots/:piece`	GET	JSON metadata for screenshots

Bundle (HTML offline bundles)#

Endpoint	Method	Purpose
`/bundle-html`	GET	Generate HTML bundle (SSE streaming)
`/bundle-prewarm`	POST	Prewarm bundle cache
`/bundle-status`	GET	Bundle cache status

Misc#

Endpoint	Method	Purpose
`/api/frozen`	GET	List frozen pieces
`/api/frozen/:piece`	DELETE	Unfreeze a piece
`/keeps/latest`	GET	Latest keep thumbnail
`/keeps/latest/:piece`	GET	Latest keep for specific piece
`/keeps/all`	GET	All latest IPFS uploads

4. Current Issues#

4.1 Terser Not Found (FIXED in latest deploy)#

The error log shows 92 minification failures with Cannot find package 'terser'. This was from a previous deploy where npm install wasn't run after terser was added to package.json. The latest deploy (today) resolved this — bundler is working and prewarm succeeds.

4.2 Repeated Service Crashes#

The systemd journal shows 25 instances of Main process exited, code=exited, status=1/FAILURE. These are likely from:

Deploys that didn't run npm install before restarting
OOM situations (no swap, peak memory hit 1.4 GB on a 2 GB machine)
Chrome connection drops during heavy workloads

4.3 Serial Grab Queue (Primary Performance Bottleneck)#

The grabber processes one grab at a time using a simple boolean lock:

let grabRunning = false;  // Only one grab runs at a time

Currently there are 19 items in the queue (1 capturing, 18 queued). Each grab takes roughly 30-40 seconds (load page + wait for ready signal + capture 16 frames + ffmpeg encode + upload to Spaces). That means the current queue will take ~10-13 minutes to clear.

4.4 No Swap Space#

With 2 GB RAM and Chrome eating 600-700 MB at rest, there's no safety net. If a grab hits a memory-heavy piece (or multiple Chrome renderer processes spawn), the OOM killer can terminate the process.

4.5 Low File Descriptor Limit#

ulimit -n is 1024 (default). Chrome alone can use hundreds of FDs. Under heavy load this could cause EMFILE errors.

4.6 Stale PM2 Process#

There's a PM2 daemon running (PM2 v6.0.14) from before the systemd migration. It's consuming 17 MB of RAM doing nothing.

5. Recommendations for Faster Parallel WebP Recording#

Priority 1: Upgrade the Droplet (Immediate Impact)#

Current	Recommended	Cost
2 vCPU / 2 GB	4 vCPU / 8 GB	~$48/mo (vs ~$18/mo now)

With 8 GB RAM you can comfortably run 3-4 concurrent Chrome tabs for parallel captures. 4 vCPUs means ffmpeg encoding can happen in parallel without blocking grabs.

Priority 2: Add Swap (Quick Win, Free)#

fallocate -l 2G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile none swap sw 0 0' >> /etc/fstab

This prevents OOM kills during peak usage. Even slow swap is better than crashing.

Priority 3: Parallel Grab Workers (Architecture Change)#

Replace the serial grabRunning boolean with a concurrency pool:

Current:  [Queue] → [Single Worker] → [Upload]
                        ↓
Proposed: [Queue] → [Worker 1] → [Upload]
                  → [Worker 2] → [Upload]
                  → [Worker 3] → [Upload]

Implementation approach:

Replace the single shared browser with a browser page pool — launch N pages (tabs) in the same Chrome instance
Replace grabRunning boolean with a semaphore/counter: let grabsRunning = 0; const MAX_CONCURRENT_GRABS = 3;
Each worker gets its own page from the pool, captures frames, encodes, uploads, then returns the page
Chrome tabs share memory more efficiently than separate browser instances (~65 MB per tab vs ~300+ MB per browser)

Key changes in grabber.mjs:

processGrabQueue() — loop while grabsRunning < MAX_CONCURRENT_GRABS && queue.length > 0
Page pool: pre-create N pages at startup, hand them out via acquirePage() / releasePage()
ffmpeg calls already happen in child processes, so they parallelize naturally

Expected improvement: With 3 concurrent workers on a 4-CPU/8-GB droplet:

Current: 19 queued items × ~35s each = ~11 minutes
Parallel: 19 items / 3 workers × ~35s = ~3.7 minutes (3x speedup)

Priority 4: Optimize Individual Grab Speed#

Reduce acPieceReady timeout from 30s to 10s — pieces that don't signal ready in 10s probably won't at 30s either
Skip Google Analytics in capture mode — add ?noanalytics=true param or block GA URLs in Chrome's request interception (eliminates ERR_ABORTED noise in logs)
Pre-render frame capture — instead of 16 sequential page.screenshot() calls with delays, consider a client-side approach where the piece renders frames to an offscreen canvas and bundles them

Priority 5: Separate Concerns (Long-term)#

The oven server handles too many responsibilities in a single process:

Screenshot/WebP capture (CPU + memory intensive)
OG image generation (CPU intensive)
Bundle HTML generation (CPU intensive during minification)
Tape baking (CPU intensive)
Dashboard serving
Icon/preview caching

Consider splitting into:

API gateway (Express, lightweight) — routes, dashboard, status
Capture workers (Chrome + ffmpeg) — the heavy lifting, can be scaled independently
Bundle worker — terser minification, isolated from capture workload

This could be done with Node worker threads, separate processes, or even separate droplets behind a load balancer.

Quick Wins (Do Now)#

Kill stale PM2: pm2 kill — frees 17 MB
Add swap: 2 GB swapfile — prevents OOM crashes
Increase file limits: Add LimitNOFILE=65536 to oven.service
Clean up logs: journalctl --vacuum-time=7d

6. Storage & CDN#

Storage	Bucket	Content
DO Spaces (art)	`art-aesthetic-computer`	Source ZIPs, grab WebPs, icons
DO Spaces (blobs)	`at-blobs-aesthetic-computer`	Processed tapes (MP4), thumbnails
CDN	`art-aesthetic-computer.sfo3.cdn.digitaloceanspaces.com`	Public CDN for grabs/icons
CDN	`at-blobs.aesthetic.computer`	Public CDN for tapes

ac-source on oven: 640 files in /opt/oven/ac-source/
Total oven directory: 168 MB (including node_modules)

7. Bundle Cache Status#

Cache state: Warm (189 core files minified)
Git version: 64512591a
ac-source synced: 640 files
Post-push hook: Installed (.git/hooks/pre-push → sync-source.sh)
Prewarm: Triggered on every deploy.sh restart

8. Summary#

The oven is a capable but resource-constrained single-process server trying to do everything at once on a 2 vCPU / 2 GB droplet. The serial grab queue is the biggest performance bottleneck — with 18+ items queued, individual WebP recordings wait 10+ minutes.

Fastest path to improvement:

Add 2 GB swap (5 min, prevents crashes)
Upgrade to 4 vCPU / 8 GB ($30/mo more)
Implement parallel grab workers (code change in grabber.mjs)
Expected result: 3-4x faster WebP recording throughput

Configure Feed