Monorepo for Aesthetic.Computer aesthetic.computer
4
fork

Configure Feed

Select the types of activity you want to include in your feed.

lacma-2026: VO + subtitle pipeline — tools/vo-pipeline.mjs

New reusable tool at tools/vo-pipeline.mjs that takes a timestamped
narration script + a source video and produces a narrated, captioned
version of the video with both baked in.

Pipeline:
1. Parse script (markdown with [M:SS] timestamps + # meta headers)
2. For each segment: macOS `say -v Daniel` generates AIFF, ffmpeg
converts to 48kHz/stereo WAV, probes duration
3. Build a full-length VO track by adelay-ing each chunk and amix-ing
4. Generate SRT cues from script timestamps
5. ffmpeg renders final: source video + (source audio ducked -18dB)
mixed with VO + libass subtitles filter burns SRT into the image

Requires ffmpeg-full (libass + libfreetype + drawtext filters).
macOS `say` handles TTS locally — no API keys, no cloud calls.

Usage:
node tools/vo-pipeline.mjs script.md --video in.mp4 --out out.mp4
node tools/vo-pipeline.mjs script.md --video in.mp4 --caption-only
node tools/vo-pipeline.mjs script.md --video in.mp4 --narrate-only

Flags:
--voice Daniel macOS say voice (default Daniel / en_GB)
--rate 175 speech rate in wpm
--duck -18 source-audio ducking under VO in dB
--keep-scratch keep /tmp intermediate files for debugging

First output:
grants/lacma-2026/demo-narration.md — 6-cue script for the 23s demo
→ ac-native-demo-narrated.{mp4,webm} on the CDN

Landing page: the demo section now has a two-tab toggle above the
video frame — "ambient loop" (the original, muted autoplay) and
"narrated + captioned" (the new pipeline output, unmutes on swap).
Tabs hot-swap the <video> sources so the URL stays the same.

Also updated the Video row in § Application at a Glance: both cuts
linked, with the pipeline tool cited as a reusable artifact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

+356 -18
+11
grants/lacma-2026/demo-narration.md
··· 1 + # title: AC Native — 23-second demo narration 2 + # voice: Daniel 3 + # rate: 170 4 + 5 + [0:00] Aesthetic Computer boots from a USB stick straight into art software. 6 + [0:04] No desktop, no app store, no browser — just a greeting, and an instrument. 7 + [0:08] This is notepat, the default piece. 8 + [0:11] Thirty-two voices of polyphony, synthesized at 192 kilohertz through ALSA. 9 + [0:15] The background color shifts with every chord you play — 10 + [0:18] a visual echo of the Viennese waltz you hear. 11 + [0:21] --silence--
+70 -18
system/public/lacma-2026/index.html
··· 161 161 height: 100%; 162 162 object-fit: cover; 163 163 } 164 + .demo-tabs { 165 + display: flex; 166 + gap: 0.3em; 167 + margin-bottom: 0.5em; 168 + font-family: 'Berkeley Mono Variable', monospace; 169 + font-size: 0.78em; 170 + letter-spacing: 0.06em; 171 + } 172 + .demo-tab { 173 + padding: 0.3em 0.8em; 174 + background: var(--box-bg); 175 + border: 1px solid var(--box-border); 176 + border-radius: 3px; 177 + color: var(--dim); 178 + cursor: pointer; 179 + transition: color 0.15s, border-color 0.15s; 180 + user-select: none; 181 + } 182 + .demo-tab:hover { color: var(--text); border-color: var(--pink); } 183 + .demo-tab.active { 184 + color: var(--pink); 185 + border-color: var(--pink); 186 + background: transparent; 187 + } 188 + .demo-tab .hint { 189 + color: var(--dim); 190 + margin-left: 0.4em; 191 + font-size: 0.85em; 192 + } 164 193 .demo-caption { 165 194 display: flex; 166 195 justify-content: space-between; ··· 173 202 letter-spacing: 0.06em; 174 203 flex-wrap: wrap; 175 204 } 176 - .demo-caption .sequence { 177 - color: var(--dim); 178 - } 179 - .demo-caption .sequence .step { 180 - color: var(--text); 181 - } 182 - .demo-caption .sequence .arrow { 183 - color: var(--pink); 184 - margin: 0 0.4em; 185 - } 205 + .demo-caption .sequence { color: var(--dim); } 206 + .demo-caption .sequence .step { color: var(--text); } 207 + .demo-caption .sequence .arrow { color: var(--pink); margin: 0 0.4em; } 186 208 .demo-caption .byline { 187 - color: var(--pink); 188 - font-family: 'YWFT Processing', monospace; 189 - font-size: 1.4em; 190 - letter-spacing: 0; 191 - text-transform: none; 209 + color: var(--dim); 210 + font-style: italic; 192 211 } 193 212 194 213 /* ── APPLICATION AT A GLANCE (form mirror) ─────── */ ··· 479 498 480 499 <!-- ── DEMO VIDEO ─────────────────────────── --> 481 500 <section class="demo" id="demo"> 501 + <div class="demo-tabs"> 502 + <div class="demo-tab active" data-src="ac-native-demo" data-poster="ac-native-demo-poster">ambient loop<span class="hint">23 s · muted</span></div> 503 + <div class="demo-tab" data-src="ac-native-demo-narrated" data-poster="ac-native-demo-narrated-poster">narrated + captioned<span class="hint">23 s · click unmute</span></div> 504 + </div> 482 505 <div class="demo-frame"> 483 - <video 506 + <video id="demo-video" 484 507 poster="https://assets.aesthetic.computer/lacma-2026/ac-native-demo-poster.jpg" 485 508 autoplay muted loop playsinline controls preload="metadata"> 486 509 <source src="https://assets.aesthetic.computer/lacma-2026/ac-native-demo.webm" type="video/webm"> ··· 492 515 <div class="sequence"> 493 516 <span class="step">boot</span><span class="arrow">→</span><span class="step">prompt</span><span class="arrow">→</span><span class="step">notepat</span><span class="arrow">→</span><span class="step">waltz</span><span class="arrow">→</span><span class="step">back</span><span class="arrow">→</span><span class="step">shutdown</span> 494 517 </div> 495 - <div class="byline">captured live · no editing tricks</div> 518 + <div class="byline">captured live · no editing tricks · shutdown animation in progress</div> 496 519 </div> 520 + <script> 521 + (() => { 522 + const tabs = document.querySelectorAll(".demo-tab"); 523 + const video = document.getElementById("demo-video"); 524 + const ASSET_BASE = "https://assets.aesthetic.computer/lacma-2026"; 525 + tabs.forEach(tab => tab.addEventListener("click", () => { 526 + if (tab.classList.contains("active")) return; 527 + tabs.forEach(t => t.classList.remove("active")); 528 + tab.classList.add("active"); 529 + const base = tab.dataset.src; 530 + const poster = tab.dataset.poster; 531 + const wasMuted = video.muted; 532 + video.pause(); 533 + video.innerHTML = ` 534 + <source src="${ASSET_BASE}/${base}.webm" type="video/webm"> 535 + <source src="${ASSET_BASE}/${base}.mp4" type="video/mp4">`; 536 + video.poster = `${ASSET_BASE}/${poster}.jpg`; 537 + // If swapping to narrated version, unmute by default so VO plays. 538 + video.muted = base.includes("narrated") ? false : true; 539 + video.load(); 540 + video.play().catch(() => { 541 + // Autoplay blocked (browsers block autoplay w/ sound). 542 + // Restore muted state and let user click play. 543 + video.muted = wasMuted; 544 + video.load(); 545 + }); 546 + })); 547 + })(); 548 + </script> 497 549 </section> 498 550 499 551 <!-- ── APPLICATION AT A GLANCE (form mirror) ─────── --> ··· 563 615 </dd> 564 616 565 617 <dt>Video<span class="cap">under 5 min · optional · ours: 23 s</span></dt> 566 - <dd>Looping 23-second demo captured natively from AC Native hardware: <em>boot animation → notepat → Viennese waltz</em>. <a class="jump" href="#demo">Watch above ↑</a> · <a class="jump" href="https://assets.aesthetic.computer/lacma-2026/ac-native-demo.mp4">MP4</a> · <a class="jump" href="https://assets.aesthetic.computer/lacma-2026/ac-native-demo.webm">WebM</a>. A longer narrated cut with prompt-transitions and a shutdown "bye @jeffrey" animation (still being written in C for the native binary) is the next capture. Shot-by-shot script: <a class="jump" href="https://github.com/whistlegraph/aesthetic-computer/blob/main/grants/lacma-2026/video-script.md">video-script.md</a>.</dd> 618 + <dd>Two cuts of the same 23-second demo, captured natively from AC Native hardware. An <em>ambient loop</em> (muted, plays silently on page load) and a <em>narrated + captioned</em> version (voiceover + baked-in subtitles) produced by a reusable pipeline at <a class="jump" href="https://github.com/whistlegraph/aesthetic-computer/blob/main/tools/vo-pipeline.mjs">tools/vo-pipeline.mjs</a>. <a class="jump" href="#demo">Watch above ↑</a>. A longer cut with prompt-transitions and a shutdown "bye @jeffrey" animation (still being written in C for the native binary) is the next capture. Narration script: <a class="jump" href="https://github.com/whistlegraph/aesthetic-computer/blob/main/grants/lacma-2026/demo-narration.md">demo-narration.md</a>.</dd> 567 619 568 620 <dt>Lineage check<span class="cap">past Lab cohort</span></dt> 569 621 <dd>Two recent Lab recipients co-teach UCLA's Social Software course where Jeffrey is Author in Residence: <strong>Casey Reas</strong> (2023, <em>METAVASARELY and An Empty Room</em>) and <strong>Lauren Lee McCarthy</strong> (2022, <em>Auto</em>). <a class="jump" href="#this-lab">Lineage below ↓</a> · <a class="jump" href="https://github.com/whistlegraph/aesthetic-computer/blob/main/grants/lacma-2026/art-tech-lab-recipients.md">All 45 recipients</a></dd>
+275
tools/vo-pipeline.mjs
··· 1 + #!/usr/bin/env node 2 + // vo-pipeline.mjs — voiceover + baked-in-subtitle pipeline 3 + // 4 + // Reads a script file (YAML-ish or a small JSON) of timestamped narration 5 + // segments, generates TTS via macOS `say`, assembles an SRT, mixes the VO 6 + // onto the source video's audio with automatic ducking, and burns the 7 + // subtitles into the image using libass via ffmpeg. 8 + // 9 + // Usage: 10 + // node tools/vo-pipeline.mjs <script.md> --video <input.mp4> --out <output.mp4> 11 + // 12 + // Optional flags: 13 + // --voice <name> macOS `say` voice (default: Daniel — en_GB) 14 + // --rate <wpm> Speech rate (default: 170) 15 + // --duck <db> How much to attenuate source audio under VO (default: -18) 16 + // --style <srt-style> libass style override (ASS section) 17 + // --caption-only Skip TTS — only burn subtitles 18 + // --narrate-only Skip subtitle burn — only mix VO audio 19 + // --keep-scratch Leave intermediate wavs/srts in /tmp for debugging 20 + // 21 + // Script format (markdown with timestamps): 22 + // 23 + // # title: AC Native Demo 24 + // # voice: Daniel 25 + // # rate: 170 26 + // 27 + // [0:00] What if your computer were a musical instrument? 28 + // [0:03] Not had a music app — but was an instrument. 29 + // [0:06] This is Aesthetic Computer, booting from USB. 30 + // [0:12] (notepat) 31 + // [0:12] notepat. Every key is a pitch. Every chord is a polyphonic voice 32 + // through ALSA at 192 kilohertz. 33 + // [0:20] --silence-- 34 + // 35 + // Lines beginning with ( ) are stage directions and are included as 36 + // subtitle text in parentheses but not spoken. Lines with --silence-- or 37 + // empty body suppress both. Timestamp format is [M:SS] or [M:SS.mmm]. 38 + 39 + import { execFileSync, execSync } from "node:child_process"; 40 + import { readFileSync, writeFileSync, mkdirSync, existsSync, rmSync } from "node:fs"; 41 + import { join, basename, dirname, resolve } from "node:path"; 42 + import { parseArgs } from "node:util"; 43 + import { tmpdir } from "node:os"; 44 + 45 + // ─── ARG PARSING ───────────────────────────────────────────────────────── 46 + const { values: args, positionals } = parseArgs({ 47 + options: { 48 + video: { type: "string" }, 49 + out: { type: "string" }, 50 + voice: { type: "string", default: "Daniel" }, 51 + rate: { type: "string", default: "170" }, 52 + duck: { type: "string", default: "-18" }, 53 + "caption-only": { type: "boolean", default: false }, 54 + "narrate-only": { type: "boolean", default: false }, 55 + "keep-scratch": { type: "boolean", default: false }, 56 + help: { type: "boolean", default: false, short: "h" }, 57 + }, 58 + allowPositionals: true, 59 + }); 60 + 61 + if (args.help || positionals.length === 0) { 62 + console.log(readFileSync(new URL(import.meta.url)).toString().split("\n") 63 + .filter(l => l.startsWith("//")).slice(0, 40).join("\n").replace(/^\/\/\s?/gm, "")); 64 + process.exit(0); 65 + } 66 + 67 + const SCRIPT_PATH = resolve(positionals[0]); 68 + const VIDEO_PATH = args.video ? resolve(args.video) : null; 69 + const OUT_PATH = args.out ? resolve(args.out) 70 + : VIDEO_PATH?.replace(/(\.[^.]+)$/, "-narrated$1") 71 + ?? "out-narrated.mp4"; 72 + 73 + if (!VIDEO_PATH) { 74 + console.error("× --video is required"); 75 + process.exit(2); 76 + } 77 + if (!existsSync(SCRIPT_PATH)) { 78 + console.error(`× script not found: ${SCRIPT_PATH}`); 79 + process.exit(2); 80 + } 81 + if (!existsSync(VIDEO_PATH)) { 82 + console.error(`× video not found: ${VIDEO_PATH}`); 83 + process.exit(2); 84 + } 85 + 86 + const SCRATCH = join(tmpdir(), `vo-pipeline-${Date.now()}`); 87 + mkdirSync(SCRATCH, { recursive: true }); 88 + 89 + // ─── SCRIPT PARSING ────────────────────────────────────────────────────── 90 + function parseTime(t) { 91 + // "M:SS" or "M:SS.mmm" → seconds (float) 92 + const [m, rest] = t.split(":"); 93 + return Number(m) * 60 + Number(rest); 94 + } 95 + function fmtSrtTime(sec) { 96 + // "HH:MM:SS,mmm" 97 + const ms = Math.floor((sec % 1) * 1000); 98 + const s = Math.floor(sec); 99 + const hh = String(Math.floor(s / 3600)).padStart(2, "0"); 100 + const mm = String(Math.floor((s % 3600) / 60)).padStart(2, "0"); 101 + const ss = String(s % 60).padStart(2, "0"); 102 + return `${hh}:${mm}:${ss},${String(ms).padStart(3, "0")}`; 103 + } 104 + 105 + const scriptRaw = readFileSync(SCRIPT_PATH, "utf8"); 106 + const meta = { title: basename(SCRIPT_PATH), voice: args.voice, rate: Number(args.rate) }; 107 + const segments = []; 108 + 109 + for (const rawLine of scriptRaw.split("\n")) { 110 + const line = rawLine.replace(/\r$/, ""); 111 + if (!line.trim()) continue; 112 + 113 + // Front-matter comments 114 + const fmMatch = line.match(/^#\s*(title|voice|rate):\s*(.+)$/i); 115 + if (fmMatch) { 116 + const key = fmMatch[1].toLowerCase(); 117 + meta[key] = key === "rate" ? Number(fmMatch[2]) : fmMatch[2].trim(); 118 + continue; 119 + } 120 + if (line.startsWith("#")) continue; // comment 121 + 122 + const tsMatch = line.match(/^\s*\[(\d+:\d+(?:\.\d+)?)\]\s*(.*)$/); 123 + if (!tsMatch) { 124 + // Continuation line → append to last segment 125 + if (segments.length > 0 && line.trim().length > 0 && !line.startsWith("//")) { 126 + segments[segments.length - 1].text += " " + line.trim(); 127 + } 128 + continue; 129 + } 130 + const t = parseTime(tsMatch[1]); 131 + const body = tsMatch[2].trim(); 132 + const silent = body === "" || body === "--silence--"; 133 + const stageOnly = body.startsWith("(") && body.endsWith(")"); 134 + segments.push({ t, text: body, silent, stageOnly }); 135 + } 136 + 137 + // Close segments by giving each a "next timestamp" hint for SRT end times. 138 + for (let i = 0; i < segments.length; i++) { 139 + segments[i].nextT = segments[i + 1]?.t ?? segments[i].t + 3; 140 + } 141 + 142 + console.log(`→ parsed ${segments.length} segments, voice=${meta.voice}, rate=${meta.rate} wpm`); 143 + 144 + // ─── STEP 1: GENERATE TTS (macOS `say`) ────────────────────────────────── 145 + const voPath = join(SCRATCH, "vo.wav"); 146 + const chunkWavs = []; 147 + 148 + if (!args["caption-only"]) { 149 + for (let i = 0; i < segments.length; i++) { 150 + const seg = segments[i]; 151 + if (seg.silent || seg.stageOnly) continue; 152 + 153 + const aiffPath = join(SCRATCH, `seg-${String(i).padStart(3, "0")}.aiff`); 154 + const wavPath = join(SCRATCH, `seg-${String(i).padStart(3, "0")}.wav`); 155 + try { 156 + execFileSync("say", 157 + ["-v", meta.voice, "-r", String(meta.rate), "-o", aiffPath, seg.text], 158 + { stdio: "inherit" } 159 + ); 160 + execFileSync("ffmpeg", ["-y", "-i", aiffPath, "-ar", "48000", "-ac", "2", wavPath], 161 + { stdio: ["ignore", "ignore", "ignore"] }); 162 + const duration = Number(execSync( 163 + `ffprobe -v error -show_entries format=duration -of default=nokey=1:noprint_wrappers=1 "${wavPath}"` 164 + ).toString().trim()); 165 + seg.ttsDuration = duration; 166 + seg.ttsPath = wavPath; 167 + chunkWavs.push({ at: seg.t, path: wavPath, duration }); 168 + } catch (e) { 169 + console.error(`× TTS failed for segment ${i}: ${e.message}`); 170 + process.exit(3); 171 + } 172 + } 173 + 174 + // Probe the video duration 175 + const vidDuration = Number(execSync( 176 + `ffprobe -v error -show_entries format=duration -of default=nokey=1:noprint_wrappers=1 "${VIDEO_PATH}"` 177 + ).toString().trim()); 178 + 179 + // Build a single VO track the length of the video: start with silence, 180 + // pad each chunk to its timestamp, overlay. 181 + // Easier: render a VO track with adelay on each, amix them. 182 + const inputs = chunkWavs.flatMap(c => ["-i", c.path]); 183 + const filters = chunkWavs.map((c, i) => 184 + `[${i}:a]adelay=${Math.round(c.at * 1000)}|${Math.round(c.at * 1000)}[a${i}]` 185 + ).join(";"); 186 + const amix = chunkWavs.length > 0 187 + ? `${chunkWavs.map((_, i) => `[a${i}]`).join("")}amix=inputs=${chunkWavs.length}:normalize=0[vo]` 188 + : `anullsrc=r=48000:cl=stereo[vo]`; 189 + const duration = Math.ceil(vidDuration + 1); 190 + 191 + if (chunkWavs.length > 0) { 192 + execFileSync("ffmpeg", [ 193 + "-y", 194 + ...inputs, 195 + "-filter_complex", `${filters};${amix}`, 196 + "-map", "[vo]", 197 + "-t", String(duration), 198 + "-ar", "48000", 199 + "-ac", "2", 200 + voPath 201 + ], { stdio: ["ignore", "ignore", "inherit"] }); 202 + console.log(`→ VO track built: ${voPath}`); 203 + } 204 + } 205 + 206 + // ─── STEP 2: GENERATE SRT ──────────────────────────────────────────────── 207 + const srtPath = join(SCRATCH, "captions.srt"); 208 + { 209 + const lines = []; 210 + let n = 1; 211 + for (const seg of segments) { 212 + if (seg.silent) continue; 213 + const text = seg.stageOnly ? seg.text : seg.text; // keep parenthetical stage dirs visible 214 + // End timestamp: either next segment start, or this + tts duration, or +3s 215 + const end = seg.nextT ?? (seg.t + (seg.ttsDuration ?? 3)); 216 + lines.push( 217 + `${n++}`, 218 + `${fmtSrtTime(seg.t)} --> ${fmtSrtTime(Math.max(seg.t + 0.5, end - 0.1))}`, 219 + text, 220 + "" 221 + ); 222 + } 223 + writeFileSync(srtPath, lines.join("\n")); 224 + console.log(`→ SRT written: ${srtPath} (${n - 1} cues)`); 225 + } 226 + 227 + // ─── STEP 3: MIX + BURN ────────────────────────────────────────────────── 228 + // - If we have a VO, amix it with the source audio (ducked). 229 + // - If subtitle burn is requested, pipe through the subtitles filter. 230 + 231 + const args3 = ["-y", "-i", VIDEO_PATH]; 232 + if (!args["caption-only"] && existsSync(voPath)) args3.push("-i", voPath); 233 + 234 + // Video filter: burn subtitles if not narrate-only 235 + const vfChain = []; 236 + if (!args["narrate-only"]) { 237 + // Inline ASS styling (sepia-ish to match AC palette) 238 + const style = args.style ?? "FontName=Helvetica Neue,FontSize=26," + 239 + "PrimaryColour=&H00e4d8bc,OutlineColour=&H001c1812,BackColour=&H801c1812," + 240 + "BorderStyle=3,Outline=1,Shadow=0,Alignment=2,MarginV=42"; 241 + vfChain.push(`subtitles='${srtPath.replace(/'/g, "\\'")}':force_style='${style}'`); 242 + } 243 + if (vfChain.length) args3.push("-vf", vfChain.join(",")); 244 + 245 + // Audio mix 246 + if (!args["caption-only"] && existsSync(voPath)) { 247 + const duckDb = Number(args.duck); 248 + const af = `[0:a]volume=${Math.pow(10, duckDb/20).toFixed(3)}[bg];` + 249 + `[bg][1:a]amix=inputs=2:duration=first:normalize=0[aout]`; 250 + args3.push( 251 + "-filter_complex", af, 252 + "-map", "0:v", "-map", "[aout]" 253 + ); 254 + } else { 255 + args3.push("-map", "0:v", "-map", "0:a?"); 256 + } 257 + 258 + args3.push( 259 + "-c:v", "libx264", "-preset", "medium", "-crf", "20", "-pix_fmt", "yuv420p", 260 + "-movflags", "+faststart", 261 + "-c:a", "aac", "-b:a", "160k", 262 + OUT_PATH 263 + ); 264 + 265 + console.log(`→ encoding ${OUT_PATH}`); 266 + execFileSync("ffmpeg", args3, { stdio: "inherit" }); 267 + 268 + // ─── CLEANUP ───────────────────────────────────────────────────────────── 269 + if (!args["keep-scratch"]) { 270 + rmSync(SCRATCH, { recursive: true, force: true }); 271 + } else { 272 + console.log(`→ scratch kept: ${SCRATCH}`); 273 + } 274 + 275 + console.log(`✓ done: ${OUT_PATH}`);