Odoc perf: flatten hidden includes, memoize doc parsing, skip trivial link docs
Three layered optimizations for odoc compile/link performance on
ppx_template-heavy packages (base, core):
1. Loader: flatten includes whose expansion items all have __ names
(ppx_template monomorphization duplicates) into the enclosing
signature, eliminating nested Include nodes that caused 10K+
redundant traversals during compile/link.
2. Loader: memoize Odoc_parser.parse_comment by raw text string.
Container_intf has 155K doc comments but only 33 unique texts
(99.98% cache hit rate), saving ~3.4s of parser time per compile.
3. Link: short-circuit comment_docs when the doc AST contains no
references, headings, or modules to resolve — avoids rebuilding
155K doc ASTs word-by-word via List.map.
Also adds instrumentation (gated by ODOC_GC_STATS=1):
- Per-subprocess Gc.quick_stat reporting via stderr
- Per-phase include_ call counting with per-location breakdown
- Doc parse timing and cache hit stats
- Per-item timing in the cmt loader
- Driver: top-10-by-allocation report per phase with include counts
- Driver: track all subprocesses (including silent/dependency ones)
Results on odoc_driver core (vs better-website baseline):
- Compile: 128 GB → 94 GB (-27%)
- Link: 73 GB → 56 GB (-23%)
- Wall time: 549s → 499s (-9%)
- HTML-gen: +28% allocation (known; items bypass internal_value
fast-skip due to ValueName.Std tag — deferred to follow-up)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>