Monorepo for Aesthetic.Computer
aesthetic.computer
1# Recap
2
3Generates narrated, captioned video recaps of monorepo activity for a chosen
4audience (currently `fia`, jas's girlfriend; trivially extendable to others).
5The audio is the source of truth — whisper word-level timestamps drive slide
6durations, so visuals stay in sync with what the voice is actually saying.
7
8The default voice is `jeffrey-pvc` (the same Professional Voice Clone used in
9the `say` piece and the LACMA grant pitch video), called via `/api/say` on
10production.
11
12## Pipeline
13
14```
15audience/<name>.mjs (narration + segment markers + slide HTML/queries + voice + transcriptFixes
16 + optional per-slide `metaphor` for jeffrey-photos)
17 │
18 ▼ bin/tts.mjs
19out/recap.mp3 (jeffrey-pvc TTS via /api/say)
20 │
21 ▼ bin/transcribe.mjs (whisper-cli, models/ggml-base.en.bin)
22out/words.json ([{text, fromMs, toMs}, ...])
23 │
24 ▼ bin/align.mjs (matches audience.segments[].marker)
25out/segments.json ([{name, startSec, endSec, durationSec}, ...])
26 │
27 ▼ bin/jeffrey-photos.mjs (optional; OpenAI gpt-image-2 + platter SHOOT+SELFIE refs)
28out/jeffrey-photos/<seg>.png (cached per segment; --force regenerates; failures are soft)
29 │
30 ▼ bin/scout.mjs (resolves per-slide content queries; pdftoppm for PDFs)
31out/assets.json (slide-name → {queryName: dataUrl|commits|paths})
32 │
33 ▼ bin/slides.mjs (puppeteer + ywft-processing + purple-pals + scouted assets)
34out/slides/*.png (1080×1920 PNG per segment)
35out/concat.txt (ffmpeg concat demuxer w/ real durations)
36out/duration.txt (total seconds, including trailing silence)
37 │
38 ▼ bin/subtitles.mjs (chunks words.json, applies transcriptFixes, renders pill PNGs)
39out/subs/*.png (1080×220 transparent subtitle pill per chunk)
40out/subs.json ([{file, startSec, endSec, text}, ...])
41 │
42 ▼ bin/build-filter.mjs (emits filter graph: showwaves + drawbox + per-sub overlay)
43 ▼ bin/compose.fish
44out/recap.mp4 (1080×1920, h264 + aac, faststart, baked subs)
45```
46
47Run end-to-end:
48
49```fish
50./pipeline.fish fia # fresh tts + everything
51./pipeline.fish fia --skip-tts # reuse existing out/recap.mp3 (re-align/re-render)
52```
53
54First run only — download the whisper model (~141 MB):
55
56```fish
57curl -L -o models/ggml-base.en.bin \
58 https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
59```
60
61## Architecture decisions
62
63- **Audio is the source of truth.** Slide durations come from whisper word
64 timestamps, not from hand-tuned guesses. Re-recording the audio (e.g. a
65 re-edit of the narration) automatically retimes the visuals.
66- **Markers are anchor phrases**, not paraphrases. Each `audience.segments[]`
67 has a `marker` field that must appear in the narration verbatim (modulo
68 whisper transcription quirks — match is case-insensitive and punctuation-
69 stripped). `align.mjs` fails loud if any marker is missing.
70- **End card sits in trailing silence.** The last segment uses a synthetic
71 `__END__` marker; the audio is padded with `apad` so the silent end card
72 has time to breathe without truncating the narration.
73- **Slides are HTML rendered by Chrome.** Reuses the `oven/` puppeteer install
74 to avoid taking on a new dep. ywft-processing-bold/regular fonts are
75 inlined as base64; `unicode-range: U+0020-007E` constrains the AC font to
76 ASCII so Chrome falls back to system fonts for `ñ`, `中文`, `日本語`,
77 `·`, `×`, etc.
78- **Progress bar is `drawbox` with `w='iw*t/$TOTAL'`.** This ffmpeg build
79 lacks `drawtext` and `subtitles`, so visible captions live in the slide
80 PNGs; only the bar (no text) is composited at runtime.
81
82## Content queries (scout)
83
84Slide bodies can be **functions** of resolved query results. `scout.mjs` runs
85every query declared on a slide and writes data URLs / commit lists / file
86paths into `out/assets.json`. The slide function then receives those values
87and produces HTML.
88
89Three query shapes are supported:
90
91| Shape | Result |
92| -------------------------------------------------------------------- | --------------------------------------------------------- |
93| `{ glob: "<path>" }` | base64 data URL of the first matching image (PNG/JPG/WebP/SVG) |
94| `{ glob: "<path>.pdf", pdfPage: 1, pdfWidth: 600 }` | base64 data URL of one PDF page rendered via pdftoppm |
95| `{ commits: "<git -E grep regex>", since: "48 hours ago", limit: 5 }` | `[{hash, subject}, ...]` from `git log --grep -E` |
96| `{ files: "<glob>", sinceHours: 48, limit: 60 }` | matching paths newer than N hours, sorted newest first |
97
98Globs are repo-relative or absolute. PDF rendering uses 150 DPI by default;
99`pdfWidth` scales the longer side. Commit grep is POSIX extended (`|`
100alternation works without escaping). Failed queries log a warning and skip
101the value; the slide function should defensively handle missing results
102(e.g. `${(commits || []).map(...)}`).
103
104Example slide entry in an audience config:
105
106```js
107"02_notepat": {
108 queries: {
109 icon: { glob: "ac-electron/build/icon.png" },
110 paper: { glob: "system/public/papers.aesthetic.computer/notepat-26-arxiv-cards.pdf",
111 pdfPage: 1, pdfWidth: 600 },
112 commits: { commits: "^notepat|^build-notepat", since: "48 hours ago", limit: 5 },
113 },
114 body: ({ icon, paper, commits }) => `
115 <div class="frame">
116 <img class="brand-icon" src="${icon}" />
117 <img class="paper-thumb" src="${paper}" />
118 ${(commits || []).map(c => `<div>${c.hash} ${c.subject}</div>`).join("")}
119 </div>`,
120},
121```
122
123A slide entry can also still be a plain HTML string when no scouting is
124needed (see `01_title`, `03_arena`, etc. in `audience/fia.mjs`).
125
126## Subtitle transcript fixes
127
128Whisper renders dictionary-style — `notepat` becomes `Notepad`, `baktok`
129becomes `Backtalk`, `menubar` becomes `menu bar`. Fix per-audience without
130re-running whisper:
131
132```js
133transcriptFixes: {
134 "Notepad": "notepat",
135 "Backtalk": "baktok",
136 "menu bar": "menubar",
137}
138```
139
140Match is case-insensitive and applied to each subtitle chunk's joined text
141(so multi-word fixes like `"laid on Linux": "late on Linux"` work).
142
143## Adding a new audience
144
145Drop `audience/<name>.mjs` exporting `audience` (and a `PALETTE` if you want
146to deviate from fia's). Required shape:
147
148```js
149export const audience = {
150 name: "<name>",
151 handle: "<optional handle for the corner bug>",
152 voice: { provider: "jeffrey", voice: "neutral:0" },
153 narration: "<verbatim text POSTed to /api/say>",
154 segments: [
155 { name: "01_title", marker: "<phrase from narration>" },
156 { name: "02_topic1", marker: "<phrase from narration>" },
157 // ...
158 { name: "10_end", marker: "__END__", trailingSilenceSec: 3 },
159 ],
160 slides: { "01_title": "<html body>", /* ...one per segment */ },
161};
162```
163
164Then `./pipeline.fish <name>`.
165
166## Files
167
168| File | Role |
169| ------------------------ | ------------------------------------------------------------- |
170| `audience/fia.mjs` | narration, markers, slide HTML/queries, palette, fixes |
171| `audience/general.mjs` | 48-hour public-facing recap (HTML slides, no jeffrey-photos) |
172| `audience/jeffrey-24h.mjs` | 24-hour jeffrey-as-protagonist recap (full-bleed photos) |
173| `bin/tts.mjs` | POST narration → `/api/say` → `out/recap.mp3` |
174| `bin/transcribe.mjs` | `whisper-cli` → `out/words.json` |
175| `bin/align.mjs` | match markers in transcript → `out/segments.json` |
176| `bin/jeffrey-photos.mjs` | gpt-image-2 + platter refs → `out/jeffrey-photos/<seg>.png` |
177| `bin/scout.mjs` | resolve per-slide content queries → `out/assets.json` |
178| `bin/slides.mjs` | puppeteer-render slide PNGs (consume assets) + `concat.txt` |
179| `bin/subtitles.mjs` | chunk words into pills (apply transcriptFixes) → `subs.json` |
180| `bin/build-filter.mjs` | emit ffmpeg filter graph for compose (one overlay per sub) |
181| `bin/compose.fish` | ffmpeg compose final mp4 |
182| `pipeline.fish` | runs all seven stages |
183| `models/ggml-base.en.bin`| whisper model (gitignored, downloaded on first run) |
184| `out/` | all generated artifacts (gitignored) |
185
186## Dependencies
187
188- `whisper-cli` (homebrew `whisper-cpp`)
189- `ffmpeg` with `libx264`, `aac`, `showwaves`, `drawbox`, `apad`, `movie`, `overlay` (homebrew default)
190- `pdftoppm` (homebrew `poppler`) for PDF → PNG in scout
191- `node` (uses `oven/node_modules/puppeteer` to avoid extra installs)
192- Google Chrome at `/Applications/Google Chrome.app` (puppeteer driver)
193- Network access to `aesthetic.computer/api/say` (jeffrey-pvc TTS)
194- `OPENAI_API_KEY` (for `jeffrey-photos.mjs`; read from env or
195 `aesthetic-computer-vault/.devcontainer/envs/devcontainer.env`); only required
196 for audiences that declare per-slide `metaphor` prompts.
197
198## Jeffrey photos (optional per-audience)
199
200An audience can opt into full-bleed gpt-image-2 photos by adding a `metaphor`
201field to each content slide and a `queries.photo: { glob: ... }` that points to
202`recap/out/jeffrey-photos/<segment>.png`. `bin/jeffrey-photos.mjs` reads the
203metaphor strings, calls `images.edit` with `gpt-image-2` and the platter
204SHOOT_REFS + SELFIE_REFS for identity grounding, and writes one PNG per segment.
205
206```fish
207# regen all photos for an audience
208node bin/jeffrey-photos.mjs jeffrey-24h --force
209
210# regen one segment only
211node bin/jeffrey-photos.mjs jeffrey-24h --only 04_platter --force
212```
213
214Cost ~$0.30 per high-quality 1024x1536 generation; ~$2-4 per full recap.
215Failures are soft — slides fall back to a dark `${PALETTE.bg}` placeholder when
216the photo glob matches nothing, so the pipeline still produces a runnable mp4.