Monorepo for Aesthetic.Computer aesthetic.computer
4
fork

Configure Feed

Select the types of activity you want to include in your feed.

at main 216 lines 10 kB view raw view rendered
1# Recap 2 3Generates narrated, captioned video recaps of monorepo activity for a chosen 4audience (currently `fia`, jas's girlfriend; trivially extendable to others). 5The audio is the source of truth — whisper word-level timestamps drive slide 6durations, so visuals stay in sync with what the voice is actually saying. 7 8The default voice is `jeffrey-pvc` (the same Professional Voice Clone used in 9the `say` piece and the LACMA grant pitch video), called via `/api/say` on 10production. 11 12## Pipeline 13 14``` 15audience/<name>.mjs (narration + segment markers + slide HTML/queries + voice + transcriptFixes 16 + optional per-slide `metaphor` for jeffrey-photos) 17 18 ▼ bin/tts.mjs 19out/recap.mp3 (jeffrey-pvc TTS via /api/say) 20 21 ▼ bin/transcribe.mjs (whisper-cli, models/ggml-base.en.bin) 22out/words.json ([{text, fromMs, toMs}, ...]) 23 24 ▼ bin/align.mjs (matches audience.segments[].marker) 25out/segments.json ([{name, startSec, endSec, durationSec}, ...]) 26 27 ▼ bin/jeffrey-photos.mjs (optional; OpenAI gpt-image-2 + platter SHOOT+SELFIE refs) 28out/jeffrey-photos/<seg>.png (cached per segment; --force regenerates; failures are soft) 29 30 ▼ bin/scout.mjs (resolves per-slide content queries; pdftoppm for PDFs) 31out/assets.json (slide-name → {queryName: dataUrl|commits|paths}) 32 33 ▼ bin/slides.mjs (puppeteer + ywft-processing + purple-pals + scouted assets) 34out/slides/*.png (1080×1920 PNG per segment) 35out/concat.txt (ffmpeg concat demuxer w/ real durations) 36out/duration.txt (total seconds, including trailing silence) 37 38 ▼ bin/subtitles.mjs (chunks words.json, applies transcriptFixes, renders pill PNGs) 39out/subs/*.png (1080×220 transparent subtitle pill per chunk) 40out/subs.json ([{file, startSec, endSec, text}, ...]) 41 42 ▼ bin/build-filter.mjs (emits filter graph: showwaves + drawbox + per-sub overlay) 43 ▼ bin/compose.fish 44out/recap.mp4 (1080×1920, h264 + aac, faststart, baked subs) 45``` 46 47Run end-to-end: 48 49```fish 50./pipeline.fish fia # fresh tts + everything 51./pipeline.fish fia --skip-tts # reuse existing out/recap.mp3 (re-align/re-render) 52``` 53 54First run only — download the whisper model (~141 MB): 55 56```fish 57curl -L -o models/ggml-base.en.bin \ 58 https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin 59``` 60 61## Architecture decisions 62 63- **Audio is the source of truth.** Slide durations come from whisper word 64 timestamps, not from hand-tuned guesses. Re-recording the audio (e.g. a 65 re-edit of the narration) automatically retimes the visuals. 66- **Markers are anchor phrases**, not paraphrases. Each `audience.segments[]` 67 has a `marker` field that must appear in the narration verbatim (modulo 68 whisper transcription quirks — match is case-insensitive and punctuation- 69 stripped). `align.mjs` fails loud if any marker is missing. 70- **End card sits in trailing silence.** The last segment uses a synthetic 71 `__END__` marker; the audio is padded with `apad` so the silent end card 72 has time to breathe without truncating the narration. 73- **Slides are HTML rendered by Chrome.** Reuses the `oven/` puppeteer install 74 to avoid taking on a new dep. ywft-processing-bold/regular fonts are 75 inlined as base64; `unicode-range: U+0020-007E` constrains the AC font to 76 ASCII so Chrome falls back to system fonts for `ñ`, `中文`, `日本語`, 77 `·`, `×`, etc. 78- **Progress bar is `drawbox` with `w='iw*t/$TOTAL'`.** This ffmpeg build 79 lacks `drawtext` and `subtitles`, so visible captions live in the slide 80 PNGs; only the bar (no text) is composited at runtime. 81 82## Content queries (scout) 83 84Slide bodies can be **functions** of resolved query results. `scout.mjs` runs 85every query declared on a slide and writes data URLs / commit lists / file 86paths into `out/assets.json`. The slide function then receives those values 87and produces HTML. 88 89Three query shapes are supported: 90 91| Shape | Result | 92| -------------------------------------------------------------------- | --------------------------------------------------------- | 93| `{ glob: "<path>" }` | base64 data URL of the first matching image (PNG/JPG/WebP/SVG) | 94| `{ glob: "<path>.pdf", pdfPage: 1, pdfWidth: 600 }` | base64 data URL of one PDF page rendered via pdftoppm | 95| `{ commits: "<git -E grep regex>", since: "48 hours ago", limit: 5 }` | `[{hash, subject}, ...]` from `git log --grep -E` | 96| `{ files: "<glob>", sinceHours: 48, limit: 60 }` | matching paths newer than N hours, sorted newest first | 97 98Globs are repo-relative or absolute. PDF rendering uses 150 DPI by default; 99`pdfWidth` scales the longer side. Commit grep is POSIX extended (`|` 100alternation works without escaping). Failed queries log a warning and skip 101the value; the slide function should defensively handle missing results 102(e.g. `${(commits || []).map(...)}`). 103 104Example slide entry in an audience config: 105 106```js 107"02_notepat": { 108 queries: { 109 icon: { glob: "ac-electron/build/icon.png" }, 110 paper: { glob: "system/public/papers.aesthetic.computer/notepat-26-arxiv-cards.pdf", 111 pdfPage: 1, pdfWidth: 600 }, 112 commits: { commits: "^notepat|^build-notepat", since: "48 hours ago", limit: 5 }, 113 }, 114 body: ({ icon, paper, commits }) => ` 115 <div class="frame"> 116 <img class="brand-icon" src="${icon}" /> 117 <img class="paper-thumb" src="${paper}" /> 118 ${(commits || []).map(c => `<div>${c.hash} ${c.subject}</div>`).join("")} 119 </div>`, 120}, 121``` 122 123A slide entry can also still be a plain HTML string when no scouting is 124needed (see `01_title`, `03_arena`, etc. in `audience/fia.mjs`). 125 126## Subtitle transcript fixes 127 128Whisper renders dictionary-style — `notepat` becomes `Notepad`, `baktok` 129becomes `Backtalk`, `menubar` becomes `menu bar`. Fix per-audience without 130re-running whisper: 131 132```js 133transcriptFixes: { 134 "Notepad": "notepat", 135 "Backtalk": "baktok", 136 "menu bar": "menubar", 137} 138``` 139 140Match is case-insensitive and applied to each subtitle chunk's joined text 141(so multi-word fixes like `"laid on Linux": "late on Linux"` work). 142 143## Adding a new audience 144 145Drop `audience/<name>.mjs` exporting `audience` (and a `PALETTE` if you want 146to deviate from fia's). Required shape: 147 148```js 149export const audience = { 150 name: "<name>", 151 handle: "<optional handle for the corner bug>", 152 voice: { provider: "jeffrey", voice: "neutral:0" }, 153 narration: "<verbatim text POSTed to /api/say>", 154 segments: [ 155 { name: "01_title", marker: "<phrase from narration>" }, 156 { name: "02_topic1", marker: "<phrase from narration>" }, 157 // ... 158 { name: "10_end", marker: "__END__", trailingSilenceSec: 3 }, 159 ], 160 slides: { "01_title": "<html body>", /* ...one per segment */ }, 161}; 162``` 163 164Then `./pipeline.fish <name>`. 165 166## Files 167 168| File | Role | 169| ------------------------ | ------------------------------------------------------------- | 170| `audience/fia.mjs` | narration, markers, slide HTML/queries, palette, fixes | 171| `audience/general.mjs` | 48-hour public-facing recap (HTML slides, no jeffrey-photos) | 172| `audience/jeffrey-24h.mjs` | 24-hour jeffrey-as-protagonist recap (full-bleed photos) | 173| `bin/tts.mjs` | POST narration → `/api/say``out/recap.mp3` | 174| `bin/transcribe.mjs` | `whisper-cli``out/words.json` | 175| `bin/align.mjs` | match markers in transcript → `out/segments.json` | 176| `bin/jeffrey-photos.mjs` | gpt-image-2 + platter refs → `out/jeffrey-photos/<seg>.png` | 177| `bin/scout.mjs` | resolve per-slide content queries → `out/assets.json` | 178| `bin/slides.mjs` | puppeteer-render slide PNGs (consume assets) + `concat.txt` | 179| `bin/subtitles.mjs` | chunk words into pills (apply transcriptFixes) → `subs.json` | 180| `bin/build-filter.mjs` | emit ffmpeg filter graph for compose (one overlay per sub) | 181| `bin/compose.fish` | ffmpeg compose final mp4 | 182| `pipeline.fish` | runs all seven stages | 183| `models/ggml-base.en.bin`| whisper model (gitignored, downloaded on first run) | 184| `out/` | all generated artifacts (gitignored) | 185 186## Dependencies 187 188- `whisper-cli` (homebrew `whisper-cpp`) 189- `ffmpeg` with `libx264`, `aac`, `showwaves`, `drawbox`, `apad`, `movie`, `overlay` (homebrew default) 190- `pdftoppm` (homebrew `poppler`) for PDF → PNG in scout 191- `node` (uses `oven/node_modules/puppeteer` to avoid extra installs) 192- Google Chrome at `/Applications/Google Chrome.app` (puppeteer driver) 193- Network access to `aesthetic.computer/api/say` (jeffrey-pvc TTS) 194- `OPENAI_API_KEY` (for `jeffrey-photos.mjs`; read from env or 195 `aesthetic-computer-vault/.devcontainer/envs/devcontainer.env`); only required 196 for audiences that declare per-slide `metaphor` prompts. 197 198## Jeffrey photos (optional per-audience) 199 200An audience can opt into full-bleed gpt-image-2 photos by adding a `metaphor` 201field to each content slide and a `queries.photo: { glob: ... }` that points to 202`recap/out/jeffrey-photos/<segment>.png`. `bin/jeffrey-photos.mjs` reads the 203metaphor strings, calls `images.edit` with `gpt-image-2` and the platter 204SHOOT_REFS + SELFIE_REFS for identity grounding, and writes one PNG per segment. 205 206```fish 207# regen all photos for an audience 208node bin/jeffrey-photos.mjs jeffrey-24h --force 209 210# regen one segment only 211node bin/jeffrey-photos.mjs jeffrey-24h --only 04_platter --force 212``` 213 214Cost ~$0.30 per high-quality 1024x1536 generation; ~$2-4 per full recap. 215Failures are soft — slides fall back to a dark `${PALETTE.bg}` placeholder when 216the photo glob matches nothing, so the pipeline still produces a runnable mp4.