Buffer HTML output to avoid per-chunk channel mutex overhead
Format.formatter_of_out_channel writes to the channel on every buffer
flush. Each write takes the OCaml multicore channel mutex (trylock +
unlock), which callgrind showed at ~3% self-time on html-generate.
Accumulate output in a Buffer and write the whole file in one call
at the end. The mutex is taken once per page instead of many times.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>