macos: scripted input + per-frame PNG dump + WAV audio tap
Three mechanisms the demo pipeline needs to turn a live session into a
synced video:
- AC_INJECT_SEQUENCE="key,ms|key,ms|...": timeline of keypresses.
Each entry's ms is relative to the prior event (cumulative delay),
so "n,2300|o,120|t,120|e,120|p,120|a,120|t,120|enter,400" types
'notepat<enter>' starting 2.3 s after launch with 120 ms between
chars. Main loop fires paired keyboard:down + keyboard:up events so
notepat's key-release handlers run naturally.
- AC_FRAME_DUMP_DIR=<dir>: after each render_frame(), upscale the
framebuffer density× nearest-neighbor and write frame_%05d.png.
Pairs with ffmpeg for a clean PNG-sequence-to-video pipeline.
- AC_WAV_OUT=<path>: tap the CoreAudio render callback and append
every frame's interleaved stereo float32 to a WAVE_FORMAT_IEEE_FLOAT
@ 48 kHz file. Crucially the tap state lives at module scope
(g_wav_file / g_wav_samples), not inside struct Audio, so piece
jumps — which destroy + recreate the per-piece audio engine —
don't break the recording. audio_wav_stop() patches RIFF + data
chunk sizes before close. SDL3 backend gets a stub (demo runs
AUDIO=core exclusively).
Verified end-to-end: boot-anim → prompt → type 'notepat<enter>' →
jump into notepat → play 'c d e f' → 5 s of synced 1280x800 h264 +
float32 stereo aac encoded to mkv cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>