Add CANONICALIZATION-PLAN.md — architecture plan for canonicalization v2
Synthesizes three inputs:
- CANONICALIZATION.md (internal deep-dive, 10 shortcomings, 8 research directions)
- CANONICALIZATION-REVIEW.md (Codex automated code review, normalizer bug, acronym loss)
- Research advisor feedback (extraction/resolution split, CONTEXT type, hierarchy,
sacred vs negotiable invariants, priority reordering)
Key architectural decisions:
1. Split canonicalization into two phases: Extraction (deterministic, per-clause)
and Resolution (versioned, global, graph-level)
2. Add CONTEXT as 5th canonical type (solves coverage + prose extraction)
3. Sentence-level extraction replacing line-level
4. Scoring rubric replacing binary regex classification
5. Typed edges (constrains, defines, refines, invariant_of) replacing untyped links
6. Hierarchy from heading structure
7. canon_anchor for soft identity (survives rephrasing)
8. LLM-as-normalizer (not extractor) as default
9. Resolution-D-rate as separate health metric
4-sprint roadmap (8 weeks) with task breakdown, risk register,
measurement targets, and 6 decisions requiring team sign-off.