Recap#
Generates narrated, captioned video recaps of monorepo activity for a chosen
audience (currently fia, jas's girlfriend; trivially extendable to others).
The audio is the source of truth — whisper word-level timestamps drive slide
durations, so visuals stay in sync with what the voice is actually saying.
The default voice is jeffrey-pvc (the same Professional Voice Clone used in
the say piece and the LACMA grant pitch video), called via /api/say on
production.
Pipeline#
audience/<name>.mjs (narration + segment markers + slide HTML/queries + voice + transcriptFixes
+ optional per-slide `metaphor` for jeffrey-photos)
│
▼ bin/tts.mjs
out/recap.mp3 (jeffrey-pvc TTS via /api/say)
│
▼ bin/transcribe.mjs (whisper-cli, models/ggml-base.en.bin)
out/words.json ([{text, fromMs, toMs}, ...])
│
▼ bin/align.mjs (matches audience.segments[].marker)
out/segments.json ([{name, startSec, endSec, durationSec}, ...])
│
▼ bin/jeffrey-photos.mjs (optional; OpenAI gpt-image-2 + platter SHOOT+SELFIE refs)
out/jeffrey-photos/<seg>.png (cached per segment; --force regenerates; failures are soft)
│
▼ bin/scout.mjs (resolves per-slide content queries; pdftoppm for PDFs)
out/assets.json (slide-name → {queryName: dataUrl|commits|paths})
│
▼ bin/slides.mjs (puppeteer + ywft-processing + purple-pals + scouted assets)
out/slides/*.png (1080×1920 PNG per segment)
out/concat.txt (ffmpeg concat demuxer w/ real durations)
out/duration.txt (total seconds, including trailing silence)
│
▼ bin/subtitles.mjs (chunks words.json, applies transcriptFixes, renders pill PNGs)
out/subs/*.png (1080×220 transparent subtitle pill per chunk)
out/subs.json ([{file, startSec, endSec, text}, ...])
│
▼ bin/build-filter.mjs (emits filter graph: showwaves + drawbox + per-sub overlay)
▼ bin/compose.fish
out/recap.mp4 (1080×1920, h264 + aac, faststart, baked subs)
Run end-to-end:
./pipeline.fish fia # fresh tts + everything
./pipeline.fish fia --skip-tts # reuse existing out/recap.mp3 (re-align/re-render)
First run only — download the whisper model (~141 MB):
curl -L -o models/ggml-base.en.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
Architecture decisions#
- Audio is the source of truth. Slide durations come from whisper word timestamps, not from hand-tuned guesses. Re-recording the audio (e.g. a re-edit of the narration) automatically retimes the visuals.
- Markers are anchor phrases, not paraphrases. Each
audience.segments[]has amarkerfield that must appear in the narration verbatim (modulo whisper transcription quirks — match is case-insensitive and punctuation- stripped).align.mjsfails loud if any marker is missing. - End card sits in trailing silence. The last segment uses a synthetic
__END__marker; the audio is padded withapadso the silent end card has time to breathe without truncating the narration. - Slides are HTML rendered by Chrome. Reuses the
oven/puppeteer install to avoid taking on a new dep. ywft-processing-bold/regular fonts are inlined as base64;unicode-range: U+0020-007Econstrains the AC font to ASCII so Chrome falls back to system fonts forñ,中文,日本語,·,×, etc. - Progress bar is
drawboxwithw='iw*t/$TOTAL'. This ffmpeg build lacksdrawtextandsubtitles, so visible captions live in the slide PNGs; only the bar (no text) is composited at runtime.
Content queries (scout)#
Slide bodies can be functions of resolved query results. scout.mjs runs
every query declared on a slide and writes data URLs / commit lists / file
paths into out/assets.json. The slide function then receives those values
and produces HTML.
Three query shapes are supported:
| Shape | Result |
|---|---|
{ glob: "<path>" } |
base64 data URL of the first matching image (PNG/JPG/WebP/SVG) |
{ glob: "<path>.pdf", pdfPage: 1, pdfWidth: 600 } |
base64 data URL of one PDF page rendered via pdftoppm |
{ commits: "<git -E grep regex>", since: "48 hours ago", limit: 5 } |
[{hash, subject}, ...] from git log --grep -E |
{ files: "<glob>", sinceHours: 48, limit: 60 } |
matching paths newer than N hours, sorted newest first |
Globs are repo-relative or absolute. PDF rendering uses 150 DPI by default;
pdfWidth scales the longer side. Commit grep is POSIX extended (|
alternation works without escaping). Failed queries log a warning and skip
the value; the slide function should defensively handle missing results
(e.g. ${(commits || []).map(...)}).
Example slide entry in an audience config:
"02_notepat": {
queries: {
icon: { glob: "ac-electron/build/icon.png" },
paper: { glob: "system/public/papers.aesthetic.computer/notepat-26-arxiv-cards.pdf",
pdfPage: 1, pdfWidth: 600 },
commits: { commits: "^notepat|^build-notepat", since: "48 hours ago", limit: 5 },
},
body: ({ icon, paper, commits }) => `
<div class="frame">
<img class="brand-icon" src="${icon}" />
<img class="paper-thumb" src="${paper}" />
${(commits || []).map(c => `<div>${c.hash} ${c.subject}</div>`).join("")}
</div>`,
},
A slide entry can also still be a plain HTML string when no scouting is
needed (see 01_title, 03_arena, etc. in audience/fia.mjs).
Subtitle transcript fixes#
Whisper renders dictionary-style — notepat becomes Notepad, baktok
becomes Backtalk, menubar becomes menu bar. Fix per-audience without
re-running whisper:
transcriptFixes: {
"Notepad": "notepat",
"Backtalk": "baktok",
"menu bar": "menubar",
}
Match is case-insensitive and applied to each subtitle chunk's joined text
(so multi-word fixes like "laid on Linux": "late on Linux" work).
Adding a new audience#
Drop audience/<name>.mjs exporting audience (and a PALETTE if you want
to deviate from fia's). Required shape:
export const audience = {
name: "<name>",
handle: "<optional handle for the corner bug>",
voice: { provider: "jeffrey", voice: "neutral:0" },
narration: "<verbatim text POSTed to /api/say>",
segments: [
{ name: "01_title", marker: "<phrase from narration>" },
{ name: "02_topic1", marker: "<phrase from narration>" },
// ...
{ name: "10_end", marker: "__END__", trailingSilenceSec: 3 },
],
slides: { "01_title": "<html body>", /* ...one per segment */ },
};
Then ./pipeline.fish <name>.
Files#
| File | Role |
|---|---|
audience/fia.mjs |
narration, markers, slide HTML/queries, palette, fixes |
audience/general.mjs |
48-hour public-facing recap (HTML slides, no jeffrey-photos) |
audience/jeffrey-24h.mjs |
24-hour jeffrey-as-protagonist recap (full-bleed photos) |
bin/tts.mjs |
POST narration → /api/say → out/recap.mp3 |
bin/transcribe.mjs |
whisper-cli → out/words.json |
bin/align.mjs |
match markers in transcript → out/segments.json |
bin/jeffrey-photos.mjs |
gpt-image-2 + platter refs → out/jeffrey-photos/<seg>.png |
bin/scout.mjs |
resolve per-slide content queries → out/assets.json |
bin/slides.mjs |
puppeteer-render slide PNGs (consume assets) + concat.txt |
bin/subtitles.mjs |
chunk words into pills (apply transcriptFixes) → subs.json |
bin/build-filter.mjs |
emit ffmpeg filter graph for compose (one overlay per sub) |
bin/compose.fish |
ffmpeg compose final mp4 |
pipeline.fish |
runs all seven stages |
models/ggml-base.en.bin |
whisper model (gitignored, downloaded on first run) |
out/ |
all generated artifacts (gitignored) |
Dependencies#
whisper-cli(homebrewwhisper-cpp)ffmpegwithlibx264,aac,showwaves,drawbox,apad,movie,overlay(homebrew default)pdftoppm(homebrewpoppler) for PDF → PNG in scoutnode(usesoven/node_modules/puppeteerto avoid extra installs)- Google Chrome at
/Applications/Google Chrome.app(puppeteer driver) - Network access to
aesthetic.computer/api/say(jeffrey-pvc TTS) OPENAI_API_KEY(forjeffrey-photos.mjs; read from env oraesthetic-computer-vault/.devcontainer/envs/devcontainer.env); only required for audiences that declare per-slidemetaphorprompts.
Jeffrey photos (optional per-audience)#
An audience can opt into full-bleed gpt-image-2 photos by adding a metaphor
field to each content slide and a queries.photo: { glob: ... } that points to
recap/out/jeffrey-photos/<segment>.png. bin/jeffrey-photos.mjs reads the
metaphor strings, calls images.edit with gpt-image-2 and the platter
SHOOT_REFS + SELFIE_REFS for identity grounding, and writes one PNG per segment.
# regen all photos for an audience
node bin/jeffrey-photos.mjs jeffrey-24h --force
# regen one segment only
node bin/jeffrey-photos.mjs jeffrey-24h --only 04_platter --force
Cost ~$0.30 per high-quality 1024x1536 generation; ~$2-4 per full recap.
Failures are soft — slides fall back to a dark ${PALETTE.bg} placeholder when
the photo glob matches nothing, so the pipeline still produces a runnable mp4.