Monorepo for Aesthetic.Computer aesthetic.computer
4
fork

Configure Feed

Select the types of activity you want to include in your feed.

Recap#

Generates narrated, captioned video recaps of monorepo activity for a chosen audience (currently fia, jas's girlfriend; trivially extendable to others). The audio is the source of truth — whisper word-level timestamps drive slide durations, so visuals stay in sync with what the voice is actually saying.

The default voice is jeffrey-pvc (the same Professional Voice Clone used in the say piece and the LACMA grant pitch video), called via /api/say on production.

Pipeline#

audience/<name>.mjs            (narration + segment markers + slide HTML/queries + voice + transcriptFixes
                                + optional per-slide `metaphor` for jeffrey-photos)
       │
       ▼  bin/tts.mjs
out/recap.mp3                  (jeffrey-pvc TTS via /api/say)
       │
       ▼  bin/transcribe.mjs   (whisper-cli, models/ggml-base.en.bin)
out/words.json                 ([{text, fromMs, toMs}, ...])
       │
       ▼  bin/align.mjs        (matches audience.segments[].marker)
out/segments.json              ([{name, startSec, endSec, durationSec}, ...])
       │
       ▼  bin/jeffrey-photos.mjs   (optional; OpenAI gpt-image-2 + platter SHOOT+SELFIE refs)
out/jeffrey-photos/<seg>.png   (cached per segment; --force regenerates; failures are soft)
       │
       ▼  bin/scout.mjs        (resolves per-slide content queries; pdftoppm for PDFs)
out/assets.json                (slide-name → {queryName: dataUrl|commits|paths})
       │
       ▼  bin/slides.mjs       (puppeteer + ywft-processing + purple-pals + scouted assets)
out/slides/*.png               (1080×1920 PNG per segment)
out/concat.txt                 (ffmpeg concat demuxer w/ real durations)
out/duration.txt               (total seconds, including trailing silence)
       │
       ▼  bin/subtitles.mjs    (chunks words.json, applies transcriptFixes, renders pill PNGs)
out/subs/*.png                 (1080×220 transparent subtitle pill per chunk)
out/subs.json                  ([{file, startSec, endSec, text}, ...])
       │
       ▼  bin/build-filter.mjs (emits filter graph: showwaves + drawbox + per-sub overlay)
       ▼  bin/compose.fish
out/recap.mp4                  (1080×1920, h264 + aac, faststart, baked subs)

Run end-to-end:

./pipeline.fish fia            # fresh tts + everything
./pipeline.fish fia --skip-tts # reuse existing out/recap.mp3 (re-align/re-render)

First run only — download the whisper model (~141 MB):

curl -L -o models/ggml-base.en.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin

Architecture decisions#

  • Audio is the source of truth. Slide durations come from whisper word timestamps, not from hand-tuned guesses. Re-recording the audio (e.g. a re-edit of the narration) automatically retimes the visuals.
  • Markers are anchor phrases, not paraphrases. Each audience.segments[] has a marker field that must appear in the narration verbatim (modulo whisper transcription quirks — match is case-insensitive and punctuation- stripped). align.mjs fails loud if any marker is missing.
  • End card sits in trailing silence. The last segment uses a synthetic __END__ marker; the audio is padded with apad so the silent end card has time to breathe without truncating the narration.
  • Slides are HTML rendered by Chrome. Reuses the oven/ puppeteer install to avoid taking on a new dep. ywft-processing-bold/regular fonts are inlined as base64; unicode-range: U+0020-007E constrains the AC font to ASCII so Chrome falls back to system fonts for ñ, 中文, 日本語, ·, ×, etc.
  • Progress bar is drawbox with w='iw*t/$TOTAL'. This ffmpeg build lacks drawtext and subtitles, so visible captions live in the slide PNGs; only the bar (no text) is composited at runtime.

Content queries (scout)#

Slide bodies can be functions of resolved query results. scout.mjs runs every query declared on a slide and writes data URLs / commit lists / file paths into out/assets.json. The slide function then receives those values and produces HTML.

Three query shapes are supported:

Shape Result
{ glob: "<path>" } base64 data URL of the first matching image (PNG/JPG/WebP/SVG)
{ glob: "<path>.pdf", pdfPage: 1, pdfWidth: 600 } base64 data URL of one PDF page rendered via pdftoppm
{ commits: "<git -E grep regex>", since: "48 hours ago", limit: 5 } [{hash, subject}, ...] from git log --grep -E
{ files: "<glob>", sinceHours: 48, limit: 60 } matching paths newer than N hours, sorted newest first

Globs are repo-relative or absolute. PDF rendering uses 150 DPI by default; pdfWidth scales the longer side. Commit grep is POSIX extended (| alternation works without escaping). Failed queries log a warning and skip the value; the slide function should defensively handle missing results (e.g. ${(commits || []).map(...)}).

Example slide entry in an audience config:

"02_notepat": {
  queries: {
    icon: { glob: "ac-electron/build/icon.png" },
    paper: { glob: "system/public/papers.aesthetic.computer/notepat-26-arxiv-cards.pdf",
             pdfPage: 1, pdfWidth: 600 },
    commits: { commits: "^notepat|^build-notepat", since: "48 hours ago", limit: 5 },
  },
  body: ({ icon, paper, commits }) => `
    <div class="frame">
      <img class="brand-icon" src="${icon}" />
      <img class="paper-thumb" src="${paper}" />
      ${(commits || []).map(c => `<div>${c.hash} ${c.subject}</div>`).join("")}
    </div>`,
},

A slide entry can also still be a plain HTML string when no scouting is needed (see 01_title, 03_arena, etc. in audience/fia.mjs).

Subtitle transcript fixes#

Whisper renders dictionary-style — notepat becomes Notepad, baktok becomes Backtalk, menubar becomes menu bar. Fix per-audience without re-running whisper:

transcriptFixes: {
  "Notepad": "notepat",
  "Backtalk": "baktok",
  "menu bar": "menubar",
}

Match is case-insensitive and applied to each subtitle chunk's joined text (so multi-word fixes like "laid on Linux": "late on Linux" work).

Adding a new audience#

Drop audience/<name>.mjs exporting audience (and a PALETTE if you want to deviate from fia's). Required shape:

export const audience = {
  name: "<name>",
  handle: "<optional handle for the corner bug>",
  voice: { provider: "jeffrey", voice: "neutral:0" },
  narration: "<verbatim text POSTed to /api/say>",
  segments: [
    { name: "01_title",  marker: "<phrase from narration>" },
    { name: "02_topic1", marker: "<phrase from narration>" },
    // ...
    { name: "10_end",    marker: "__END__", trailingSilenceSec: 3 },
  ],
  slides: { "01_title": "<html body>", /* ...one per segment */ },
};

Then ./pipeline.fish <name>.

Files#

File Role
audience/fia.mjs narration, markers, slide HTML/queries, palette, fixes
audience/general.mjs 48-hour public-facing recap (HTML slides, no jeffrey-photos)
audience/jeffrey-24h.mjs 24-hour jeffrey-as-protagonist recap (full-bleed photos)
bin/tts.mjs POST narration → /api/sayout/recap.mp3
bin/transcribe.mjs whisper-cliout/words.json
bin/align.mjs match markers in transcript → out/segments.json
bin/jeffrey-photos.mjs gpt-image-2 + platter refs → out/jeffrey-photos/<seg>.png
bin/scout.mjs resolve per-slide content queries → out/assets.json
bin/slides.mjs puppeteer-render slide PNGs (consume assets) + concat.txt
bin/subtitles.mjs chunk words into pills (apply transcriptFixes) → subs.json
bin/build-filter.mjs emit ffmpeg filter graph for compose (one overlay per sub)
bin/compose.fish ffmpeg compose final mp4
pipeline.fish runs all seven stages
models/ggml-base.en.bin whisper model (gitignored, downloaded on first run)
out/ all generated artifacts (gitignored)

Dependencies#

  • whisper-cli (homebrew whisper-cpp)
  • ffmpeg with libx264, aac, showwaves, drawbox, apad, movie, overlay (homebrew default)
  • pdftoppm (homebrew poppler) for PDF → PNG in scout
  • node (uses oven/node_modules/puppeteer to avoid extra installs)
  • Google Chrome at /Applications/Google Chrome.app (puppeteer driver)
  • Network access to aesthetic.computer/api/say (jeffrey-pvc TTS)
  • OPENAI_API_KEY (for jeffrey-photos.mjs; read from env or aesthetic-computer-vault/.devcontainer/envs/devcontainer.env); only required for audiences that declare per-slide metaphor prompts.

Jeffrey photos (optional per-audience)#

An audience can opt into full-bleed gpt-image-2 photos by adding a metaphor field to each content slide and a queries.photo: { glob: ... } that points to recap/out/jeffrey-photos/<segment>.png. bin/jeffrey-photos.mjs reads the metaphor strings, calls images.edit with gpt-image-2 and the platter SHOOT_REFS + SELFIE_REFS for identity grounding, and writes one PNG per segment.

# regen all photos for an audience
node bin/jeffrey-photos.mjs jeffrey-24h --force

# regen one segment only
node bin/jeffrey-photos.mjs jeffrey-24h --only 04_platter --force

Cost ~$0.30 per high-quality 1024x1536 generation; ~$2-4 per full recap. Failures are soft — slides fall back to a dark ${PALETTE.bg} placeholder when the photo glob matches nothing, so the pipeline still produces a runnable mp4.