Oven Architecture Report#
Generated: 2026-02-13 Server: oven.aesthetic.computer (137.184.237.166) Uptime: 44 days (OS), ~12 min since last oven restart
1. Machine Specs#
| Resource | Value |
|---|---|
| CPU | 2 vCPUs (Intel, DO-Regular) |
| RAM | 1.97 GB total, ~635 MB used, ~1.3 GB available |
| Swap | None configured |
| Disk | 58 GB, 6.7 GB used (12%) |
| OS | Ubuntu 24.04.3 LTS (kernel 6.8.0-90) |
| Node | v20.20.0 |
| Chrome | 143.0.7499.40 (headless, Puppeteer-managed) |
| ffmpeg | 6.1.1 (system package, WebP + H.264 support) |
Current Memory Breakdown (at rest with 1 active grab)#
- Node (server.mjs): ~175 MB (8.6% of RAM)
- Chrome main process: ~202 MB (10%)
- Chrome GPU process: ~157 MB (7.7%)
- Chrome network service: ~125 MB (6.2%)
- Chrome renderer(s): ~65-100 MB each (3-5%)
- Caddy: ~32 MB
- Total Chrome footprint: ~600-700 MB
- Peak memory observed in logs: 1.4 GB (during heavy grab batches)
Verdict: With 2 GB total and no swap, the machine is memory-constrained. A single Chrome instance + Node already consumes ~850 MB at rest. During heavy workloads, peak memory hits 1.4 GB, leaving very little headroom. This is the primary bottleneck.
2. Architecture Overview#
Internet → Caddy (port 443/80, gzip, TLS) → Express (port 3002) → Puppeteer (Chrome)
→ ffmpeg (WebP/MP4)
→ terser (JS minification)
→ DO Spaces (S3 storage)
→ MongoDB (metadata)
Process Model#
- Single Node process (
server.mjs) — no clustering, no workers - Single Chrome browser — shared instance, reused across all grab/icon/preview requests
- Serial grab queue —
grabRunningboolean, one grab at a time, 100ms delay between jobs - systemd manages the oven service with
Restart=always,RestartSec=10
Key Modules#
| Module | Purpose | Size |
|---|---|---|
server.mjs |
Express routes + dashboard HTML | 104 KB |
grabber.mjs |
Screenshot/WebP/icon capture via Puppeteer | 127 KB |
baker.mjs |
Tape (MP4) baking pipeline | 24 KB |
bundler.mjs |
KidLisp/JS piece HTML bundle generation | 44 KB |
3. API Endpoints (41 routes)#
Core Operations#
| Endpoint | Method | Purpose |
|---|---|---|
/ |
GET | Dashboard (real-time WebSocket updates) |
/health |
GET | Health check |
/status |
GET | Server status + recent bakes |
/grab-status |
GET | Active grabs + queue state |
Tape Baking (MP4)#
| Endpoint | Method | Purpose |
|---|---|---|
/bake |
POST | Start tape bake (WebP frames → MP4) |
/bake-complete |
POST | Callback when bake finishes |
/bake-status |
POST | Check bake progress |
Screenshots & WebP Captures (Grabber)#
| Endpoint | Method | Purpose |
|---|---|---|
/grab |
POST | Trigger grab (screenshot/animation) |
/grab/:format/:width/:height/:piece |
GET | Direct grab with params |
/grab-ipfs |
POST | Grab + IPFS upload |
/grab-cleanup |
POST | Clean stale grabs |
/grab-clear |
POST | Clear all active grabs |
/icon/:size/:piece.png |
GET | Piece icon (cached → DO Spaces) |
/icon/:size/:piece.webp |
GET | Piece icon as WebP |
/preview/:size/:piece.png |
GET | Piece preview screenshot |
OG Images#
| Endpoint | Method | Purpose |
|---|---|---|
/kidlisp-og.png |
GET | KidLisp OG image (for social sharing) |
/kidlisp-og |
GET | KidLisp OG page (HTML) |
/kidlisp-og/status |
GET | OG cache status |
/kidlisp-og/preview |
GET | OG preview page |
/notepat-og.png |
GET | Notepat OG image |
/kidlisp-backdrop.webp |
GET | KidLisp backdrop animation |
/kidlisp-backdrop |
GET | KidLisp backdrop page |
App Screenshots#
| Endpoint | Method | Purpose |
|---|---|---|
/app-screenshots |
GET | App screenshot dashboard |
/app-screenshots/:preset/:piece.png |
GET | Screenshot by preset |
/app-screenshots/download/:piece |
GET | Download all presets as ZIP |
/api/app-screenshots/:piece |
GET | JSON metadata for screenshots |
Bundle (HTML offline bundles)#
| Endpoint | Method | Purpose |
|---|---|---|
/bundle-html |
GET | Generate HTML bundle (SSE streaming) |
/bundle-prewarm |
POST | Prewarm bundle cache |
/bundle-status |
GET | Bundle cache status |
Misc#
| Endpoint | Method | Purpose |
|---|---|---|
/api/frozen |
GET | List frozen pieces |
/api/frozen/:piece |
DELETE | Unfreeze a piece |
/keeps/latest |
GET | Latest keep thumbnail |
/keeps/latest/:piece |
GET | Latest keep for specific piece |
/keeps/all |
GET | All latest IPFS uploads |
4. Current Issues#
4.1 Terser Not Found (FIXED in latest deploy)#
The error log shows 92 minification failures with Cannot find package 'terser'. This was from a previous deploy where npm install wasn't run after terser was added to package.json. The latest deploy (today) resolved this — bundler is working and prewarm succeeds.
4.2 Repeated Service Crashes#
The systemd journal shows 25 instances of Main process exited, code=exited, status=1/FAILURE. These are likely from:
- Deploys that didn't run
npm installbefore restarting - OOM situations (no swap, peak memory hit 1.4 GB on a 2 GB machine)
- Chrome connection drops during heavy workloads
4.3 Serial Grab Queue (Primary Performance Bottleneck)#
The grabber processes one grab at a time using a simple boolean lock:
let grabRunning = false; // Only one grab runs at a time
Currently there are 19 items in the queue (1 capturing, 18 queued). Each grab takes roughly 30-40 seconds (load page + wait for ready signal + capture 16 frames + ffmpeg encode + upload to Spaces). That means the current queue will take ~10-13 minutes to clear.
4.4 No Swap Space#
With 2 GB RAM and Chrome eating 600-700 MB at rest, there's no safety net. If a grab hits a memory-heavy piece (or multiple Chrome renderer processes spawn), the OOM killer can terminate the process.
4.5 Low File Descriptor Limit#
ulimit -n is 1024 (default). Chrome alone can use hundreds of FDs. Under heavy load this could cause EMFILE errors.
4.6 Stale PM2 Process#
There's a PM2 daemon running (PM2 v6.0.14) from before the systemd migration. It's consuming 17 MB of RAM doing nothing.
5. Recommendations for Faster Parallel WebP Recording#
Priority 1: Upgrade the Droplet (Immediate Impact)#
| Current | Recommended | Cost |
|---|---|---|
| 2 vCPU / 2 GB | 4 vCPU / 8 GB | ~$48/mo (vs ~$18/mo now) |
With 8 GB RAM you can comfortably run 3-4 concurrent Chrome tabs for parallel captures. 4 vCPUs means ffmpeg encoding can happen in parallel without blocking grabs.
Priority 2: Add Swap (Quick Win, Free)#
fallocate -l 2G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile none swap sw 0 0' >> /etc/fstab
This prevents OOM kills during peak usage. Even slow swap is better than crashing.
Priority 3: Parallel Grab Workers (Architecture Change)#
Replace the serial grabRunning boolean with a concurrency pool:
Current: [Queue] → [Single Worker] → [Upload]
↓
Proposed: [Queue] → [Worker 1] → [Upload]
→ [Worker 2] → [Upload]
→ [Worker 3] → [Upload]
Implementation approach:
- Replace the single shared browser with a browser page pool — launch N pages (tabs) in the same Chrome instance
- Replace
grabRunningboolean with a semaphore/counter:let grabsRunning = 0; const MAX_CONCURRENT_GRABS = 3; - Each worker gets its own page from the pool, captures frames, encodes, uploads, then returns the page
- Chrome tabs share memory more efficiently than separate browser instances (~65 MB per tab vs ~300+ MB per browser)
Key changes in grabber.mjs:
processGrabQueue()— loop whilegrabsRunning < MAX_CONCURRENT_GRABS && queue.length > 0- Page pool: pre-create N pages at startup, hand them out via
acquirePage()/releasePage() - ffmpeg calls already happen in child processes, so they parallelize naturally
Expected improvement: With 3 concurrent workers on a 4-CPU/8-GB droplet:
- Current: 19 queued items × ~35s each = ~11 minutes
- Parallel: 19 items / 3 workers × ~35s = ~3.7 minutes (3x speedup)
Priority 4: Optimize Individual Grab Speed#
- Reduce
acPieceReadytimeout from 30s to 10s — pieces that don't signal ready in 10s probably won't at 30s either - Skip Google Analytics in capture mode — add
?noanalytics=trueparam or block GA URLs in Chrome's request interception (eliminatesERR_ABORTEDnoise in logs) - Pre-render frame capture — instead of 16 sequential
page.screenshot()calls with delays, consider a client-side approach where the piece renders frames to an offscreen canvas and bundles them
Priority 5: Separate Concerns (Long-term)#
The oven server handles too many responsibilities in a single process:
- Screenshot/WebP capture (CPU + memory intensive)
- OG image generation (CPU intensive)
- Bundle HTML generation (CPU intensive during minification)
- Tape baking (CPU intensive)
- Dashboard serving
- Icon/preview caching
Consider splitting into:
- API gateway (Express, lightweight) — routes, dashboard, status
- Capture workers (Chrome + ffmpeg) — the heavy lifting, can be scaled independently
- Bundle worker — terser minification, isolated from capture workload
This could be done with Node worker threads, separate processes, or even separate droplets behind a load balancer.
Quick Wins (Do Now)#
- Kill stale PM2:
pm2 kill— frees 17 MB - Add swap: 2 GB swapfile — prevents OOM crashes
- Increase file limits: Add
LimitNOFILE=65536to oven.service - Clean up logs:
journalctl --vacuum-time=7d
6. Storage & CDN#
| Storage | Bucket | Content |
|---|---|---|
| DO Spaces (art) | art-aesthetic-computer |
Source ZIPs, grab WebPs, icons |
| DO Spaces (blobs) | at-blobs-aesthetic-computer |
Processed tapes (MP4), thumbnails |
| CDN | art-aesthetic-computer.sfo3.cdn.digitaloceanspaces.com |
Public CDN for grabs/icons |
| CDN | at-blobs.aesthetic.computer |
Public CDN for tapes |
- ac-source on oven: 640 files in
/opt/oven/ac-source/ - Total oven directory: 168 MB (including node_modules)
7. Bundle Cache Status#
- Cache state: Warm (189 core files minified)
- Git version:
64512591a - ac-source synced: 640 files
- Post-push hook: Installed (
.git/hooks/pre-push→sync-source.sh) - Prewarm: Triggered on every
deploy.shrestart
8. Summary#
The oven is a capable but resource-constrained single-process server trying to do everything at once on a 2 vCPU / 2 GB droplet. The serial grab queue is the biggest performance bottleneck — with 18+ items queued, individual WebP recordings wait 10+ minutes.
Fastest path to improvement:
- Add 2 GB swap (5 min, prevents crashes)
- Upgrade to 4 vCPU / 8 GB ($30/mo more)
- Implement parallel grab workers (code change in
grabber.mjs) - Expected result: 3-4x faster WebP recording throughput