LLM-enhanced canonicalization & classification + E2E success criteria tests
Phase 2: Real LLM Integration
- Added canonicalizer-llm.ts: LLM-enhanced canonical node extraction
with structured JSON prompts, batch processing, and graceful fallback
to rule-based extraction when LLM is unavailable or fails
- Added classifier-llm.ts: LLM-enhanced D-class resolution that
escalates uncertain changes to Claude/GPT for semantic classification,
reducing D-rate in the trust loop
- Wired LLM-enhanced canonicalization into CLI bootstrap and canonicalize
commands (auto-detects provider from ANTHROPIC_API_KEY/OPENAI_API_KEY)
- Added llm_resolved field to ChangeClassification model
Phase 1: E2E Integration Tests (PRD §19 Success Criteria)
- §19.1: Delete generated code → full regen succeeds
- §19.2: Clause change invalidates only dependent IU subtree
- §19.3: Boundary linter catches undeclared coupling
- §19.4: Drift detection blocks unlabeled edits
- §19.5: D-rate within acceptable bounds
- §19.6: Shadow pipeline upgrade produces classified diff
- §19.7: Compaction preserves ancestry
- §19.8: Freeq bots perform ingest/canon/plan/regen/status safely
- Multi-spec project lifecycle tests
- Evidence & cascade pipeline E2E
- Full provenance traceability: spec line → clause → canon → IU → file
Added test fixtures: spec-gateway.md, spec-notifications.md
233 tests passing across 28 test files (was 201 across 25)