Merge branch 'perf-investigation-2'
Odoc performance investigation: 14 commits reducing total allocation
by 40% and wall time by 14% on odoc_driver core (OxCaml switch).
Key optimisations:
- Flatten hidden PPX-monomorphization includes at load time
- Memoize doc comment parsing + semantic analysis
- O(n) shadow detection (was O(n^2))
- Hash-first identifier compare (eliminates most Map lookup compare_val)
- Cache mode printing (avoid Format.asprintf per arg)
- segment_to_string: direct string concat instead of Format.asprintf
- Buffered HTML output (avoid channel mutex per chunk)
- Stream Renderer.page children via Seq.t (bounded peak memory)
Details in commits a1802364..a92f00fb.