Reference implementation for the Phoenix Architecture. Work in progress. aicoding.leaflet.pub/
ai coding crazy
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

feat: implement Fowler gap analysis — evaluations, pace layers, conceptual mass, negative knowledge, replacement audit

Five gaps identified from Chad Fowler's 'The Phoenix Architecture':

1. Evaluation vs. Implementation Test Separation
- Evaluation model (durable behavioral assertions at IU boundaries)
- EvaluationStore with coverage analysis and gap detection
- Evaluations bind to boundary_contract, domain_rule, invariant, failure_mode
- Survive regeneration; implementation tests don't

2. Conservation Layers & Pace Layers
- PaceLayer type: surface → service → domain → foundation
- Layer crossing detection (slow-depends-on-fast = violation)
- Pace-appropriate regeneration cadence enforcement
- Conservation flag for surfaces where external trust accumulates

3. Conceptual Mass Budget
- Mass = contract concepts + dependencies + side channels + canon nodes
- Interaction potential: n*(n-1)/2 combinatorial burden
- Ratchet rule: mass cannot grow without justification
- Thresholds: healthy(7), warning(12), danger(20)

4. Replacement Audit (phoenix audit)
- 7-dimension assessment: boundary clarity, evaluation coverage,
blast radius, deletion safety, pace layer, conceptual mass,
negative knowledge
- Readiness gradient: opaque → observable → evaluable → regenerable
- Weighted composite scoring with concrete blockers/recommendations
- New CLI command with formatted output

5. Negative Knowledge (immune memory)
- Records failed generations, rejected approaches, incident constraints
- NegativeKnowledgeStore with active/stale lifecycle
- Consulted during audit; surfaced in recommendations
- Preserved across compaction

Also includes:
- Letter to Chad Fowler re: implementation insights and book gaps
- Gap-filling plan document
- 36 new tests (341 total, all passing)
- Full public API exports in index.ts

+1826
+109
LETTER-TO-FOWLER.md
··· 1 + # Letter to Chad Fowler 2 + 3 + **Re: The Phoenix Architecture — Notes from an Implementation Team** 4 + 5 + Chad, 6 + 7 + We've been building a system called Phoenix VCS — a regenerative version control system that compiles intent to architecture. We read your draft of *The Phoenix Architecture* after having independently arrived at many of the same conclusions, and the convergence is striking enough that we wanted to share what we've learned from actually building the machinery, and where your writing left gaps that our implementation experience might help fill. 8 + 9 + ## Where You Were Right and We Can Prove It 10 + 11 + **The compilation model is correct.** Our pipeline is: Spec → Clauses → Canonical Requirement Graph → Implementation Units → Generated Code → Evidence → Policy Decision. This maps almost exactly to your four-stage pipeline (Intent → Architectural Compilation → Generation → Evaluation). The intermediate representation metaphor isn't just illustrative — it's literally how the system works. We content-address every node in the graph and track provenance edges between every transformation. When a spec line changes, we can trace exactly which canonical requirements shift, which IUs are invalidated, and which evidence needs to be re-gathered. Selective invalidation is real and it works. 12 + 13 + **Evaluations as the durable artifact is the single most important insight in your book.** We built an evidence and policy engine with risk-tiered enforcement (low: typecheck+lint, medium: unit tests required, high: unit+property tests+threat notes, critical: human signoff). But reading your distinction between evaluations and implementation tests was a gut-check moment. Our own test suite — 305 tests, all passing — is almost entirely implementation-coupled. We test that `classifyChange` returns the right `ChangeClass` enum given specific internal data structures. We don't test that "changing a spec line about authentication invalidates only the auth subtree and nothing else." The former dies when we regenerate our own code. The latter would survive. We're eating our own cooking and the recipe has a hole in it. 14 + 15 + **The deletion test is the right diagnostic.** Our PRD's first success criterion is: "Delete generated code → full regen succeeds." We have a test for this. But your deeper point — that the *obstacles* to deletion reveal the real architectural debt — is something we only partially internalized. We test deletion of generated output. We don't test deletion of our own pipeline components, which would reveal the coupling we can't see. 16 + 17 + **Pace layers explain a design tension we couldn't name.** We built a bootstrap state machine (COLD → WARMING → STEADY_STATE) and suppress D-rate alarms during cold boot. We built risk tiers for IUs. We built shadow pipelines for safe canonicalization upgrades. These are all pace-layer mechanisms, but we designed them ad hoc. Your framework — Surface/Service/Domain/Foundation with explicit dependency-weight classification — would have saved us several wrong turns. 18 + 19 + ## Where Your Writing Has Gaps Our Implementation Reveals 20 + 21 + ### 1. The Cold Start Problem Is Harder Than You Acknowledge 22 + 23 + Your book assumes intent specifications and evaluation suites exist before regeneration begins. In practice, they don't. The hardest engineering problem in Phoenix isn't regeneration — it's *bootstrapping*. When a team writes their first spec, there is no canonical graph to hash against, no warm context, no baseline for classification. Our system explicitly models this with a two-pass semantic hashing strategy: 24 + 25 + - **Pass 1 (Cold):** Compute clause hashes using only local context. Classifier operates conservatively. System marked BOOTSTRAP_COLD. 26 + - **Pass 2 (Warm):** Re-hash using extracted canonical graph context. Re-classify. System transitions to BOOTSTRAP_WARMING. 27 + 28 + We also had to build a D-rate trust loop (target ≤5%, acceptable ≤10%, alarm >15%) to track how often the classifier says "I don't know." During cold start, this rate is high by design. Your book would benefit from a chapter on bootstrapping: how do you go from zero evaluations to a trustworthy evaluation surface? The migration chapter (Chapter 21) touches this but treats it as a legacy-system concern. It's equally a greenfield concern. 29 + 30 + ### 2. Canonicalization Is a Missing Layer in Your Model 31 + 32 + Your pipeline is Intent → Architecture → Code → Evaluation. Ours has a critical intermediate step you don't discuss: **canonicalization** — the process of extracting structured, typed, deduplicated requirements from natural-language spec text. 33 + 34 + A spec might say: "Users must authenticate via OAuth2. Authentication tokens expire after 1 hour. Expired tokens must be rejected with a 401 response." That's three sentences. But canonicalization reveals: one Requirement (OAuth2 auth), one Constraint (1-hour expiry), one Invariant (expired → 401), and dependency edges between them. 35 + 36 + This is where the real compilation happens — not from intent to code, but from intent to *canonical requirement graph*. Without this step, your "architectural compilation" is a hand-wave. We built two versions: 37 + 38 + - **v1:** Heuristic extraction using sentence segmentation, term-reference analysis, and pattern matching. 39 + - **v2:** LLM-enhanced extraction with self-consistency (medoid selection across multiple generations) and an eval harness with gold-standard fixtures. 40 + 41 + The canonicalization layer is where semantic change detection lives. It's where you answer "did this spec edit actually change a requirement, or just rephrase one?" (our A/B/C/D classification). Your book's discussion of behavioral equivalence across regeneration would be strengthened by acknowledging that *determining equivalence* is itself a hard, non-trivial computation that needs its own pipeline. 42 + 43 + ### 3. The Boundary Validator Needs Teeth 44 + 45 + You write extensively about boundaries, but your treatment is largely diagnostic ("ask whether the boundary holds"). We built an architectural linter that enforces boundaries mechanically: 46 + 47 + ```yaml 48 + dependencies: 49 + code: 50 + allowed_ius: [AuthIU] 51 + forbidden_ius: [InternalAdminIU] 52 + forbidden_packages: [fs, child_process] 53 + side_channels: 54 + databases: [users_db] 55 + external_apis: [oauth_provider] 56 + ``` 57 + 58 + Post-generation, we extract the actual dependency graph, validate it against the declared boundary policy, and emit diagnostics with configurable severity (error vs. warning). Side-channel dependencies (databases, queues, caches, config, external APIs, files) create graph edges for invalidation. 59 + 60 + Your book would benefit from being more prescriptive here. "Clean boundaries" is advice. A boundary policy schema with mechanical enforcement is architecture. The distinction matters because, as you correctly note, generated code couples things that shouldn't be coupled — not out of malice but because coupling is the shortest path to a correct result. You need machinery that catches this, not just principles that warn against it. 61 + 62 + ### 4. Shadow Pipelines Deserve More Than a Mention 63 + 64 + Your discussion of rollout controls (canary, traffic splitting, comparison) is good but brief. We found that shadow pipelines for the *canonicalization layer itself* are essential. When you upgrade your extraction model, prompt pack, or classification rules, you need to run old and new pipelines in parallel and diff the outputs: 65 + 66 + - `node_change_pct` ≤3%: SAFE 67 + - `node_change_pct` ≤25%, no orphan nodes: COMPACTION EVENT 68 + - Orphan nodes or excessive churn: REJECT 69 + 70 + This is meta-regeneration — regenerating the machinery that does the regeneration. Your book discusses upgrading implementations but doesn't discuss upgrading the extraction and compilation toolchain itself, which is where the most dangerous drift can occur. 71 + 72 + ### 5. `phoenix status` Is the Entire Product 73 + 74 + You write: "If `phoenix status` is trusted, Phoenix becomes the coordination substrate. If status is noisy or wrong, the system dies." We arrived at this conclusion independently, and it deserves more emphasis in your book. 75 + 76 + Every diagnostic in our system is structured: 77 + 78 + ``` 79 + severity: error|warning|info 80 + category: boundary|d-rate|drift|canon|evidence 81 + subject: <IU or spec reference> 82 + message: <human-readable explanation> 83 + recommended_actions: [<concrete steps>] 84 + ``` 85 + 86 + The Trust Dashboard is the UX. Not the generation. Not the canonicalization. The dashboard. Because the moment an engineer looks at `phoenix status` and sees noise they can't act on, they stop trusting the system, and a system nobody trusts is a system nobody uses. 87 + 88 + Your Chapter 7 (Gradient of Trust) is excellent theory. It would be stronger with a section on *how trust is surfaced* — the UX of trust. A trust gradient that exists in the architecture but isn't visible in the developer's daily experience doesn't function as a design tool. 89 + 90 + ## What We're Building Next (Informed by Your Book) 91 + 92 + 1. **Separating evaluations from implementation tests** — making the durable behavioral truth surface a first-class, independently versioned artifact. 93 + 2. **Conservation layers as explicit metadata** — tagging IUs and boundaries with pace-layer classification that drives different regeneration policies. 94 + 3. **Queryable provenance** — moving from "provenance edges exist" to "the system can answer: why does this IU exist in this form?" 95 + 4. **Conceptual mass budgets** — measuring and ratcheting cognitive burden per IU across regeneration cycles. 96 + 5. **A `phoenix audit` command** — the replacement audit from your Chapter 4 as a concrete CLI tool. 97 + 6. **Negative knowledge preservation** — recording what was tried and failed in the provenance graph. 98 + 99 + ## A Question for You 100 + 101 + Your book is careful to say that Phoenix Architecture applies partially to safety-critical systems and may not be viable in organizations with rigid change-management taxonomies. But you don't address the inverse question: **what happens when the canonicalization and evaluation toolchain itself needs to be trusted?** 102 + 103 + We're building a system that determines what changed, what's affected, and what needs to be re-verified. If that determination is wrong — if a spec change is classified as "trivial formatting" when it's actually a "contextual semantic shift" — the entire trust model collapses silently. Who watches the watchmen? 104 + 105 + Our answer so far is the D-rate trust loop and shadow pipelines. But we think this deserves treatment as a first-class architectural concern — the **meta-trust problem** — in any serious book on regenerative systems. 106 + 107 + We'd welcome the conversation. 108 + 109 + — The Phoenix VCS Team
+77
PLAN-FOWLER-GAPS.md
··· 1 + # Plan: Fill Gaps from The Phoenix Architecture 2 + 3 + Based on reading Chad Fowler's book and comparing against our implementation, these are the gaps worth filling — things we haven't built that are architecturally significant. 4 + 5 + ## Gap 1: Evaluation vs. Implementation Test Separation 6 + 7 + **Book insight:** Evaluations bind to behavior at boundaries. Implementation tests bind to code internals. Only evaluations survive regeneration. "Would this assertion still be meaningful if the entire implementation were replaced tomorrow?" 8 + 9 + **Our gap:** All 305 tests are implementation-coupled. No separation between durable behavioral evaluations and disposable implementation scaffolding. 10 + 11 + **Fix:** 12 + - Add `evaluations/` directory as first-class, independently versioned behavioral truth surface 13 + - Create evaluation model types (behavioral assertions at IU boundaries) 14 + - Add `EvaluationStore` for persistence across regeneration cycles 15 + - Evaluations reference IU contracts and boundary behaviors, never internal function signatures 16 + - `phoenix status` reports evaluation coverage gaps 17 + - Add CLI: `phoenix eval` to run evaluations, `phoenix eval:coverage` to report gaps 18 + 19 + ## Gap 2: Conservation Layers as First-Class Concept 20 + 21 + **Book insight:** Any surface where external trust accumulates (UI, public APIs, event schemas) should be tagged as a conservation layer with a slower regeneration cadence. 22 + 23 + **Our gap:** IUs have risk tiers but no pace-layer classification. No concept of conservation surfaces. 24 + 25 + **Fix:** 26 + - Add `pace_layer` field to IU model: surface | service | domain | foundation 27 + - Add `conservation` boolean flag — marks surfaces where external parties depend on stability 28 + - Boundary validator enforces that conservation-layer IUs cannot be regenerated without explicit approval 29 + - `phoenix status` surfaces pace-layer violations (fast-layer changes touching slow-layer boundaries) 30 + 31 + ## Gap 3: Conceptual Mass Budget 32 + 33 + **Book insight:** Conceptual mass compounds combinatorially. Each concept interacts with existing concepts. Treat it as a budget with a cap, not a backlog that grows freely. 34 + 35 + **Our gap:** No measurement of cognitive burden per IU. No ratchet preventing mass growth. 36 + 37 + **Fix:** 38 + - Define conceptual mass metric per IU: count of distinct concepts (types, contracts, dependencies, side channels) 39 + - Track mass across regeneration cycles in manifest 40 + - Ratchet rule: mass cannot grow across two consecutive regeneration cycles without explicit justification 41 + - `phoenix status` warns when mass exceeds threshold or grows without justification 42 + 43 + ## Gap 4: Replacement Audit (`phoenix audit`) 44 + 45 + **Book insight:** "Pick a component and ask: could I replace this implementation entirely and have its dependents not notice?" The obstacles reveal identity debt. 46 + 47 + **Our gap:** We have the deletion test in e2e but no CLI command that runs the replacement audit as a diagnostic. 48 + 49 + **Fix:** 50 + - Add `phoenix audit` CLI command 51 + - For each IU: assess boundary clarity, evaluation coverage, blast radius, deletion safety 52 + - Score each IU on a readiness gradient: opaque → observable → evaluable → regenerable 53 + - Output a structured audit report with specific blockers and recommended actions 54 + 55 + ## Gap 5: Negative Knowledge in Provenance 56 + 57 + **Book insight:** "What failed matters as much as what succeeded, and it disappears first." Failed generation attempts, rejected approaches, incident-driven constraints should be preserved. 58 + 59 + **Our gap:** Provenance edges record what happened. They don't record what was tried and rejected, or why constraints exist. 60 + 61 + **Fix:** 62 + - Add `NegativeKnowledge` type: records failed attempts, rejected approaches, incident references 63 + - Attach to canonical nodes and IUs as provenance annotations 64 + - Preserved across compaction (like approvals and signatures) 65 + - `phoenix status` surfaces when regeneration is attempted without consulting negative knowledge 66 + 67 + ## Implementation Order 68 + 69 + 1. **Gap 1** (Evaluations) — foundational, everything else builds on it 70 + 2. **Gap 2** (Conservation/Pace Layers) — extends IU model 71 + 3. **Gap 3** (Conceptual Mass) — extends manifest tracking 72 + 4. **Gap 4** (Audit command) — uses all of the above 73 + 5. **Gap 5** (Negative Knowledge) — extends provenance 74 + 75 + ## Estimated Scope 76 + 77 + Each gap is ~100-300 lines of model + logic + tests. Total: ~800-1500 lines new code.
+445
src/audit.ts
··· 1 + /** 2 + * Replacement Audit — the diagnostic from Chapter 4 of The Phoenix Architecture. 3 + * 4 + * "Pick a component and ask: could I replace this implementation entirely 5 + * and have its dependents not notice?" 6 + * 7 + * Assesses each IU on: 8 + * 1. Boundary clarity — are contracts explicit and complete? 9 + * 2. Evaluation coverage — can a replacement be verified? 10 + * 3. Blast radius — how many dependents break if replacement goes wrong? 11 + * 4. Deletion safety — can it be removed without uncontrolled failure? 12 + * 5. Pace layer appropriateness — is regeneration cadence correct? 13 + * 6. Conceptual mass — is cognitive burden within budget? 14 + * 7. Negative knowledge — are past failures consulted? 15 + */ 16 + 17 + import type { ImplementationUnit } from './models/iu.js'; 18 + import type { EvaluationCoverage } from './models/evaluation.js'; 19 + import type { ConceptualMassReport } from './models/conceptual-mass.js'; 20 + import type { PaceLayerMetadata } from './models/pace-layer.js'; 21 + import type { NegativeKnowledge } from './models/negative-knowledge.js'; 22 + import { 23 + computeConceptualMass, 24 + interactionPotential, 25 + checkRatchet, 26 + MASS_THRESHOLDS, 27 + } from './models/conceptual-mass.js'; 28 + 29 + /** 30 + * Readiness gradient — from The Phoenix Architecture Chapter 21. 31 + * 32 + * opaque → behavior unknown, deeply coupled 33 + * observable → behavior documented, boundaries traced 34 + * evaluable → evaluations capture real behavior 35 + * regenerable → safe to delete and replace 36 + */ 37 + export type ReadinessLevel = 'opaque' | 'observable' | 'evaluable' | 'regenerable'; 38 + 39 + export interface AuditResult { 40 + iu_id: string; 41 + iu_name: string; 42 + readiness: ReadinessLevel; 43 + score: number; // 0-100 44 + boundary_clarity: AuditDimension; 45 + evaluation_coverage: AuditDimension; 46 + blast_radius: AuditDimension; 47 + deletion_safety: AuditDimension; 48 + pace_layer: AuditDimension; 49 + conceptual_mass: AuditDimension; 50 + negative_knowledge: AuditDimension; 51 + blockers: AuditBlocker[]; 52 + recommendations: string[]; 53 + } 54 + 55 + export interface AuditDimension { 56 + name: string; 57 + score: number; // 0-100 58 + status: 'good' | 'warning' | 'critical'; 59 + detail: string; 60 + } 61 + 62 + export interface AuditBlocker { 63 + category: 'boundary' | 'evaluation' | 'coupling' | 'mass' | 'pace' | 'negative_knowledge'; 64 + severity: 'error' | 'warning'; 65 + message: string; 66 + recommended_action: string; 67 + } 68 + 69 + export interface AuditInput { 70 + iu: ImplementationUnit; 71 + allIUs: ImplementationUnit[]; 72 + evalCoverage: EvaluationCoverage; 73 + paceLayer?: PaceLayerMetadata; 74 + negativeKnowledge: NegativeKnowledge[]; 75 + previousMass?: number; 76 + } 77 + 78 + /** 79 + * Run the replacement audit on a single IU. 80 + */ 81 + export function auditIU(input: AuditInput): AuditResult { 82 + const { iu, allIUs, evalCoverage, paceLayer, negativeKnowledge, previousMass } = input; 83 + const blockers: AuditBlocker[] = []; 84 + const recommendations: string[] = []; 85 + 86 + // 1. Boundary clarity 87 + const boundaryClarity = assessBoundaryClarity(iu, blockers); 88 + 89 + // 2. Evaluation coverage 90 + const evalDimension = assessEvaluationCoverage(iu, evalCoverage, blockers); 91 + 92 + // 3. Blast radius 93 + const blastRadius = assessBlastRadius(iu, allIUs, blockers); 94 + 95 + // 4. Deletion safety (composite of boundary + eval + blast radius) 96 + const deletionSafety = assessDeletionSafety(boundaryClarity, evalDimension, blastRadius); 97 + 98 + // 5. Pace layer 99 + const paceDimension = assessPaceLayer(iu, paceLayer, blockers); 100 + 101 + // 6. Conceptual mass 102 + const massDimension = assessConceptualMass(iu, previousMass, blockers); 103 + 104 + // 7. Negative knowledge 105 + const nkDimension = assessNegativeKnowledge(iu, negativeKnowledge, blockers, recommendations); 106 + 107 + // Composite score (weighted) 108 + const score = Math.round( 109 + boundaryClarity.score * 0.20 + 110 + evalDimension.score * 0.25 + 111 + blastRadius.score * 0.15 + 112 + deletionSafety.score * 0.15 + 113 + paceDimension.score * 0.10 + 114 + massDimension.score * 0.10 + 115 + nkDimension.score * 0.05 116 + ); 117 + 118 + // Readiness level 119 + const readiness = scoreToReadiness(score, blockers); 120 + 121 + // Generate recommendations 122 + if (evalCoverage.gaps.length > 0) { 123 + recommendations.push(`Address ${evalCoverage.gaps.length} evaluation gap(s) before regenerating`); 124 + } 125 + if (boundaryClarity.score < 50) { 126 + recommendations.push('Define explicit boundary contracts before attempting regeneration'); 127 + } 128 + if (blastRadius.score < 50) { 129 + recommendations.push('Reduce blast radius by introducing interface boundaries with dependents'); 130 + } 131 + 132 + return { 133 + iu_id: iu.iu_id, 134 + iu_name: iu.name, 135 + readiness, 136 + score, 137 + boundary_clarity: boundaryClarity, 138 + evaluation_coverage: evalDimension, 139 + blast_radius: blastRadius, 140 + deletion_safety: deletionSafety, 141 + pace_layer: paceDimension, 142 + conceptual_mass: massDimension, 143 + negative_knowledge: nkDimension, 144 + blockers, 145 + recommendations, 146 + }; 147 + } 148 + 149 + /** 150 + * Audit all IUs in the system. 151 + */ 152 + export function auditAll( 153 + ius: ImplementationUnit[], 154 + evalCoverages: Map<string, EvaluationCoverage>, 155 + paceLayers: Map<string, PaceLayerMetadata>, 156 + negativeKnowledge: NegativeKnowledge[], 157 + previousMasses: Map<string, number>, 158 + ): AuditResult[] { 159 + return ius.map(iu => auditIU({ 160 + iu, 161 + allIUs: ius, 162 + evalCoverage: evalCoverages.get(iu.iu_id) ?? emptyEvalCoverage(iu), 163 + paceLayer: paceLayers.get(iu.iu_id), 164 + negativeKnowledge: negativeKnowledge.filter(nk => nk.subject_id === iu.iu_id), 165 + previousMass: previousMasses.get(iu.iu_id), 166 + })); 167 + } 168 + 169 + // ─── Dimension Assessors ───────────────────────────────────────────────────── 170 + 171 + function assessBoundaryClarity(iu: ImplementationUnit, blockers: AuditBlocker[]): AuditDimension { 172 + let score = 0; 173 + const bp = iu.boundary_policy; 174 + const contract = iu.contract; 175 + 176 + // Contract completeness 177 + if (contract.description.length > 0) score += 15; 178 + if (contract.inputs.length > 0) score += 20; 179 + if (contract.outputs.length > 0) score += 20; 180 + if (contract.invariants.length > 0) score += 15; 181 + 182 + // Boundary policy declared 183 + const hasAllowedIUs = bp.code.allowed_ius.length > 0; 184 + const hasForbiddenIUs = bp.code.forbidden_ius.length > 0 || bp.code.forbidden_packages.length > 0; 185 + if (hasAllowedIUs || hasForbiddenIUs) score += 15; 186 + 187 + // Side channels declared 188 + const sideChannels = Object.values(bp.side_channels).flat(); 189 + if (sideChannels.length > 0) score += 15; 190 + 191 + if (score < 40) { 192 + blockers.push({ 193 + category: 'boundary', 194 + severity: 'error', 195 + message: `${iu.name} has weak boundary definition (score: ${score}/100)`, 196 + recommended_action: 'Define explicit inputs, outputs, invariants, and boundary policy', 197 + }); 198 + } 199 + 200 + return { 201 + name: 'Boundary Clarity', 202 + score: Math.min(score, 100), 203 + status: score >= 70 ? 'good' : score >= 40 ? 'warning' : 'critical', 204 + detail: `Contract: ${contract.inputs.length} inputs, ${contract.outputs.length} outputs, ${contract.invariants.length} invariants`, 205 + }; 206 + } 207 + 208 + function assessEvaluationCoverage( 209 + iu: ImplementationUnit, 210 + coverage: EvaluationCoverage, 211 + blockers: AuditBlocker[], 212 + ): AuditDimension { 213 + let score = Math.round(coverage.coverage_ratio * 60); 214 + 215 + // Bonus for diversity of evaluation bindings 216 + const bindingCount = Object.values(coverage.by_binding).filter(v => v > 0).length; 217 + score += bindingCount * 8; 218 + 219 + // Penalty for gaps 220 + score -= coverage.gaps.length * 5; 221 + score = Math.max(0, Math.min(100, score)); 222 + 223 + if (coverage.total_evaluations === 0) { 224 + blockers.push({ 225 + category: 'evaluation', 226 + severity: 'error', 227 + message: `${iu.name} has no behavioral evaluations`, 228 + recommended_action: 'Write evaluations at the IU boundary before regenerating', 229 + }); 230 + } else if (coverage.gaps.length > 2) { 231 + blockers.push({ 232 + category: 'evaluation', 233 + severity: 'warning', 234 + message: `${iu.name} has ${coverage.gaps.length} evaluation gaps`, 235 + recommended_action: 'Address evaluation gaps to improve regeneration safety', 236 + }); 237 + } 238 + 239 + return { 240 + name: 'Evaluation Coverage', 241 + score, 242 + status: score >= 70 ? 'good' : score >= 40 ? 'warning' : 'critical', 243 + detail: `${coverage.total_evaluations} evaluations, ${Math.round(coverage.coverage_ratio * 100)}% canon coverage, ${coverage.gaps.length} gaps`, 244 + }; 245 + } 246 + 247 + function assessBlastRadius( 248 + iu: ImplementationUnit, 249 + allIUs: ImplementationUnit[], 250 + blockers: AuditBlocker[], 251 + ): AuditDimension { 252 + // Count how many other IUs depend on this one 253 + const dependentCount = allIUs.filter(other => 254 + other.iu_id !== iu.iu_id && other.dependencies.includes(iu.iu_id) 255 + ).length; 256 + 257 + // Invert: fewer dependents = higher score 258 + const maxDeps = Math.max(allIUs.length - 1, 1); 259 + const score = Math.round((1 - dependentCount / maxDeps) * 100); 260 + 261 + if (dependentCount > 3) { 262 + blockers.push({ 263 + category: 'coupling', 264 + severity: 'warning', 265 + message: `${iu.name} has ${dependentCount} dependents — wide blast radius`, 266 + recommended_action: 'Consider introducing interface boundaries to reduce coupling', 267 + }); 268 + } 269 + 270 + return { 271 + name: 'Blast Radius', 272 + score, 273 + status: score >= 70 ? 'good' : score >= 40 ? 'warning' : 'critical', 274 + detail: `${dependentCount} dependent IU(s)`, 275 + }; 276 + } 277 + 278 + function assessDeletionSafety( 279 + boundary: AuditDimension, 280 + evaluation: AuditDimension, 281 + blastRadius: AuditDimension, 282 + ): AuditDimension { 283 + // Deletion safety is the minimum of the three foundations 284 + const score = Math.min(boundary.score, evaluation.score, blastRadius.score); 285 + 286 + return { 287 + name: 'Deletion Safety', 288 + score, 289 + status: score >= 70 ? 'good' : score >= 40 ? 'warning' : 'critical', 290 + detail: `Min of boundary (${boundary.score}), eval (${evaluation.score}), blast (${blastRadius.score})`, 291 + }; 292 + } 293 + 294 + function assessPaceLayer( 295 + iu: ImplementationUnit, 296 + paceLayer: PaceLayerMetadata | undefined, 297 + blockers: AuditBlocker[], 298 + ): AuditDimension { 299 + if (!paceLayer) { 300 + blockers.push({ 301 + category: 'pace', 302 + severity: 'warning', 303 + message: `${iu.name} has no pace layer classification`, 304 + recommended_action: 'Classify IU into a pace layer: surface, service, domain, or foundation', 305 + }); 306 + return { 307 + name: 'Pace Layer', 308 + score: 50, 309 + status: 'warning', 310 + detail: 'No pace layer classification', 311 + }; 312 + } 313 + 314 + let score = 70; // Classified is already good 315 + if (paceLayer.classification_rationale !== 'Default classification — needs review') { 316 + score += 15; // Reviewed classification 317 + } 318 + if (paceLayer.conservation) { 319 + score += 15; // Conservation is explicitly declared 320 + } 321 + 322 + return { 323 + name: 'Pace Layer', 324 + score: Math.min(score, 100), 325 + status: score >= 70 ? 'good' : score >= 40 ? 'warning' : 'critical', 326 + detail: `${paceLayer.pace_layer} layer, ${paceLayer.conservation ? 'conservation' : 'non-conservation'}, weight: ${paceLayer.dependency_weight}`, 327 + }; 328 + } 329 + 330 + function assessConceptualMass( 331 + iu: ImplementationUnit, 332 + previousMass: number | undefined, 333 + blockers: AuditBlocker[], 334 + ): AuditDimension { 335 + const sideChannelCount = Object.values(iu.boundary_policy.side_channels).flat().length; 336 + 337 + const mass = computeConceptualMass({ 338 + contract_inputs: iu.contract.inputs.length, 339 + contract_outputs: iu.contract.outputs.length, 340 + contract_invariants: iu.contract.invariants.length, 341 + dependency_count: iu.dependencies.length, 342 + side_channel_count: sideChannelCount, 343 + canon_node_count: iu.source_canon_ids.length, 344 + file_count: iu.output_files.length, 345 + }); 346 + 347 + const ip = interactionPotential(mass); 348 + const ratchetViolation = checkRatchet(mass, previousMass); 349 + 350 + // Score: lower mass = higher score 351 + let score = 100; 352 + if (mass > MASS_THRESHOLDS.danger) score = 20; 353 + else if (mass > MASS_THRESHOLDS.warning) score = 50; 354 + else if (mass > MASS_THRESHOLDS.healthy) score = 70; 355 + 356 + if (ratchetViolation) { 357 + score -= 20; 358 + blockers.push({ 359 + category: 'mass', 360 + severity: 'warning', 361 + message: `${iu.name} conceptual mass grew from ${previousMass} to ${mass} (ratchet violation)`, 362 + recommended_action: 'Compact: reduce concepts, merge redundant abstractions, or split the IU', 363 + }); 364 + } 365 + 366 + if (mass > MASS_THRESHOLDS.danger) { 367 + blockers.push({ 368 + category: 'mass', 369 + severity: 'error', 370 + message: `${iu.name} has conceptual mass ${mass} (>${MASS_THRESHOLDS.danger}): exceeds working memory`, 371 + recommended_action: 'This IU is too complex for one person to reason about safely. Split it.', 372 + }); 373 + } 374 + 375 + return { 376 + name: 'Conceptual Mass', 377 + score: Math.max(0, score), 378 + status: score >= 70 ? 'good' : score >= 40 ? 'warning' : 'critical', 379 + detail: `Mass: ${mass}, interactions: ${ip}${ratchetViolation ? ' ⚠ RATCHET VIOLATION' : ''}${previousMass !== undefined ? ` (prev: ${previousMass})` : ''}`, 380 + }; 381 + } 382 + 383 + function assessNegativeKnowledge( 384 + iu: ImplementationUnit, 385 + nk: NegativeKnowledge[], 386 + blockers: AuditBlocker[], 387 + recommendations: string[], 388 + ): AuditDimension { 389 + // Having negative knowledge is good — it means lessons are captured 390 + const score = nk.length > 0 ? 80 : 60; 391 + 392 + if (nk.length > 0) { 393 + const constraints = nk.filter(n => n.kind === 'incident_constraint'); 394 + if (constraints.length > 0) { 395 + recommendations.push( 396 + `Consult ${constraints.length} incident constraint(s) before regenerating ${iu.name}` 397 + ); 398 + } 399 + const failedGens = nk.filter(n => n.kind === 'failed_generation'); 400 + if (failedGens.length > 0) { 401 + recommendations.push( 402 + `${failedGens.length} prior generation attempt(s) failed for ${iu.name} — review before retrying` 403 + ); 404 + } 405 + } 406 + 407 + return { 408 + name: 'Negative Knowledge', 409 + score, 410 + status: score >= 70 ? 'good' : 'warning', 411 + detail: nk.length > 0 412 + ? `${nk.length} record(s): ${nk.map(n => n.kind).join(', ')}` 413 + : 'No negative knowledge recorded', 414 + }; 415 + } 416 + 417 + // ─── Helpers ───────────────────────────────────────────────────────────────── 418 + 419 + function scoreToReadiness(score: number, blockers: AuditBlocker[]): ReadinessLevel { 420 + const hasErrors = blockers.some(b => b.severity === 'error'); 421 + if (hasErrors || score < 30) return 'opaque'; 422 + if (score < 50) return 'observable'; 423 + if (score < 75) return 'evaluable'; 424 + return 'regenerable'; 425 + } 426 + 427 + function emptyEvalCoverage(iu: ImplementationUnit): EvaluationCoverage { 428 + return { 429 + iu_id: iu.iu_id, 430 + iu_name: iu.name, 431 + total_evaluations: 0, 432 + by_binding: { domain_rule: 0, boundary_contract: 0, constraint: 0, invariant: 0, failure_mode: 0 }, 433 + by_origin: { specified: 0, characterization: 0, incident: 0, audit: 0 }, 434 + canon_ids_covered: [], 435 + canon_ids_uncovered: iu.source_canon_ids, 436 + coverage_ratio: 0, 437 + conservation_count: 0, 438 + gaps: [{ 439 + category: 'missing_boundary', 440 + subject: iu.iu_id, 441 + message: `No evaluations exist for ${iu.name}`, 442 + recommended_action: 'Write behavioral evaluations before attempting regeneration', 443 + }], 444 + }; 445 + }
+129
src/cli.ts
··· 56 56 // LLM 57 57 import { resolveProvider, describeAvailability } from './llm/resolve.js'; 58 58 59 + // Audit & Fowler gaps 60 + import { auditIU, auditAll } from './audit.js'; 61 + import type { AuditResult, ReadinessLevel } from './audit.js'; 62 + import { EvaluationStore } from './store/evaluation-store.js'; 63 + import { NegativeKnowledgeStore } from './store/negative-knowledge-store.js'; 64 + import type { PaceLayerMetadata } from './models/pace-layer.js'; 65 + 59 66 // Models 60 67 import type { Clause } from './models/clause.js'; 61 68 import { DiffType } from './models/clause.js'; ··· 1249 1256 await new Promise(() => {}); 1250 1257 } 1251 1258 1259 + // ─── Replacement Audit (Fowler Ch. 4) ──────────────────────────────────────── 1260 + 1261 + function cmdAudit(args: string[]): void { 1262 + const { phoenixDir } = requirePhoenixRoot(); 1263 + const ius = loadIUs(phoenixDir); 1264 + const evalStore = new EvaluationStore(phoenixDir); 1265 + const nkStore = new NegativeKnowledgeStore(phoenixDir); 1266 + 1267 + if (ius.length === 0) { 1268 + console.log(yellow('⚠ No Implementation Units found. Run `phoenix plan` first.')); 1269 + return; 1270 + } 1271 + 1272 + // Build coverage map 1273 + const evalCoverages = new Map<string, any>(); 1274 + for (const iu of ius) { 1275 + evalCoverages.set(iu.iu_id, evalStore.coverage(iu)); 1276 + } 1277 + 1278 + // Load pace layers (from iu metadata or defaults) 1279 + const paceLayers = new Map<string, PaceLayerMetadata>(); 1280 + // TODO: load from .phoenix/pace-layers.json when populated 1281 + 1282 + const nk = nkStore.getActive(); 1283 + const previousMasses = new Map<string, number>(); 1284 + // TODO: load from previous manifest cycle 1285 + 1286 + // Filter by --iu if specified 1287 + const iuArg = args.find(a => a.startsWith('--iu=')); 1288 + const targetIUs = iuArg 1289 + ? ius.filter(iu => iu.iu_id === iuArg.slice(5) || iu.name === iuArg.slice(5)) 1290 + : ius; 1291 + 1292 + const results = auditAll(targetIUs, evalCoverages, paceLayers, nk, previousMasses); 1293 + 1294 + console.log(); 1295 + console.log(bold('🔥 Phoenix Replacement Audit')); 1296 + console.log(dim(' "Could I replace this implementation entirely and have its dependents not notice?"')); 1297 + console.log(); 1298 + 1299 + // Summary counts 1300 + const readinessCounts: Record<ReadinessLevel, number> = { 1301 + regenerable: 0, evaluable: 0, observable: 0, opaque: 0, 1302 + }; 1303 + for (const r of results) readinessCounts[r.readiness]++; 1304 + 1305 + console.log( 1306 + ` ${green(`● ${readinessCounts.regenerable} regenerable`)} ` + 1307 + `${blue(`◐ ${readinessCounts.evaluable} evaluable`)} ` + 1308 + `${yellow(`○ ${readinessCounts.observable} observable`)} ` + 1309 + `${red(`◌ ${readinessCounts.opaque} opaque`)}` 1310 + ); 1311 + console.log(); 1312 + 1313 + // Per-IU details 1314 + for (const result of results) { 1315 + const readinessIcon = readinessToIcon(result.readiness); 1316 + const scoreColor = result.score >= 75 ? green : result.score >= 50 ? yellow : red; 1317 + 1318 + console.log(` ${readinessIcon} ${bold(result.iu_name)} ${dim(`(${result.iu_id})`)} — ${scoreColor(`${result.score}/100`)} ${dim(result.readiness)}`); 1319 + 1320 + // Dimension summary 1321 + const dims = [ 1322 + result.boundary_clarity, 1323 + result.evaluation_coverage, 1324 + result.blast_radius, 1325 + result.deletion_safety, 1326 + result.pace_layer, 1327 + result.conceptual_mass, 1328 + result.negative_knowledge, 1329 + ]; 1330 + for (const d of dims) { 1331 + const icon = d.status === 'good' ? green('✓') : d.status === 'warning' ? yellow('⚠') : red('✖'); 1332 + console.log(` ${icon} ${dim(d.name + ':')} ${d.detail}`); 1333 + } 1334 + 1335 + // Blockers 1336 + if (result.blockers.length > 0) { 1337 + console.log(` ${red('Blockers:')}`); 1338 + for (const b of result.blockers) { 1339 + const sev = b.severity === 'error' ? red('✖') : yellow('⚠'); 1340 + console.log(` ${sev} ${b.message}`); 1341 + console.log(` ${dim('→ ' + b.recommended_action)}`); 1342 + } 1343 + } 1344 + 1345 + // Recommendations 1346 + if (result.recommendations.length > 0) { 1347 + console.log(` ${cyan('Recommendations:')}`); 1348 + for (const r of result.recommendations) { 1349 + console.log(` ${dim('→')} ${r}`); 1350 + } 1351 + } 1352 + 1353 + console.log(); 1354 + } 1355 + 1356 + // Overall verdict 1357 + const totalScore = results.length > 0 1358 + ? Math.round(results.reduce((sum, r) => sum + r.score, 0) / results.length) 1359 + : 0; 1360 + const totalBlockers = results.reduce((sum, r) => sum + r.blockers.length, 0); 1361 + 1362 + console.log(dim(' ─────────────────────────────────────────')); 1363 + console.log(` ${bold('Overall:')} ${totalScore}/100 avg score, ${totalBlockers} blocker(s)`); 1364 + console.log(` ${dim('Trust > cleverness.')}`); 1365 + console.log(); 1366 + } 1367 + 1368 + function readinessToIcon(readiness: ReadinessLevel): string { 1369 + switch (readiness) { 1370 + case 'regenerable': return green('●'); 1371 + case 'evaluable': return blue('◐'); 1372 + case 'observable': return yellow('○'); 1373 + case 'opaque': return red('◌'); 1374 + } 1375 + } 1376 + 1252 1377 function cmdVersion(): void { 1253 1378 console.log(`Phoenix VCS v${VERSION}`); 1254 1379 } ··· 1284 1409 ${cyan('drift')} Check generated files for drift 1285 1410 ${cyan('evaluate')} [--iu=<id>] Evaluate evidence against policy 1286 1411 ${cyan('cascade')} Show cascade failure effects 1412 + ${cyan('audit')} [--iu=<id>] Replacement audit — readiness per IU 1287 1413 1288 1414 ${bold('Inspection:')} 1289 1415 ${cyan('inspect')} [--port=N] Interactive pipeline visualisation (opens browser) ··· 1347 1473 break; 1348 1474 case 'cascade': 1349 1475 cmdCascade(); 1476 + break; 1477 + case 'audit': 1478 + cmdAudit(commandArgs); 1350 1479 break; 1351 1480 case 'inspect': 1352 1481 await cmdInspect(commandArgs);
+24
src/index.ts
··· 82 82 export { resolveProvider, describeAvailability } from './llm/resolve.js'; 83 83 export { buildPrompt, SYSTEM_PROMPT } from './llm/prompt.js'; 84 84 85 + // Evaluations (durable behavioral truth surface — survives regeneration) 86 + export type { 87 + Evaluation, EvaluationBinding, EvaluationOrigin, 88 + EvaluationCoverage, EvaluationGap, 89 + } from './models/evaluation.js'; 90 + 91 + // Pace Layers & Conservation 92 + export type { PaceLayer, PaceLayerMetadata, PaceLayerViolation } from './models/pace-layer.js'; 93 + export { defaultPaceLayerMetadata, inferPaceLayer, isPaceAppropriate, detectLayerCrossing } from './models/pace-layer.js'; 94 + 95 + // Conceptual Mass 96 + export type { ConceptualMassReport } from './models/conceptual-mass.js'; 97 + export { computeConceptualMass, interactionPotential, checkRatchet, MASS_THRESHOLDS } from './models/conceptual-mass.js'; 98 + 99 + // Negative Knowledge (the system's immune memory) 100 + export type { NegativeKnowledge, NegativeKnowledgeKind } from './models/negative-knowledge.js'; 101 + export { hasRelevantNegativeKnowledge } from './models/negative-knowledge.js'; 102 + 103 + // Replacement Audit 104 + export type { AuditResult, AuditDimension, AuditBlocker, ReadinessLevel } from './audit.js'; 105 + export { auditIU, auditAll } from './audit.js'; 106 + 85 107 // Stores 86 108 export { ContentStore } from './store/content-store.js'; 87 109 export { SpecStore } from './store/spec-store.js'; 88 110 export { CanonicalStore } from './store/canonical-store.js'; 89 111 export { EvidenceStore } from './store/evidence-store.js'; 112 + export { EvaluationStore } from './store/evaluation-store.js'; 113 + export { NegativeKnowledgeStore } from './store/negative-knowledge-store.js';
+85
src/models/conceptual-mass.ts
··· 1 + /** 2 + * Conceptual Mass model — cognitive burden measurement. 3 + * 4 + * "Volume is cheap. Cognitive load compounds." 5 + * Conceptual mass is the total cognitive burden a system imposes: 6 + * distinct concepts, interdependencies, and hidden behaviors that 7 + * a person must hold in mind to work safely. 8 + * 9 + * (See: Fowler, The Phoenix Architecture, Chapter 10) 10 + */ 11 + 12 + export interface ConceptualMassReport { 13 + iu_id: string; 14 + iu_name: string; 15 + /** Count of distinct types/interfaces in the IU contract */ 16 + contract_concepts: number; 17 + /** Count of dependencies (other IUs) */ 18 + dependency_count: number; 19 + /** Count of side-channel dependencies */ 20 + side_channel_count: number; 21 + /** Count of canonical nodes mapped to this IU */ 22 + canon_node_count: number; 23 + /** Count of output files */ 24 + file_count: number; 25 + /** Total conceptual mass score */ 26 + mass: number; 27 + /** Pairwise interaction potential: mass * (mass - 1) / 2 */ 28 + interaction_potential: number; 29 + /** Previous mass (from last regen cycle), if available */ 30 + previous_mass?: number; 31 + /** Delta from previous cycle */ 32 + mass_delta?: number; 33 + /** Whether this violates the ratchet rule */ 34 + ratchet_violation: boolean; 35 + } 36 + 37 + /** 38 + * Compute conceptual mass for an IU. 39 + * Mass = contract_concepts + dependency_count + side_channel_count + canon_node_count 40 + * 41 + * This is a proxy for "how many distinct concepts must someone hold in mind 42 + * to change this safely?" 43 + */ 44 + export function computeConceptualMass(params: { 45 + contract_inputs: number; 46 + contract_outputs: number; 47 + contract_invariants: number; 48 + dependency_count: number; 49 + side_channel_count: number; 50 + canon_node_count: number; 51 + file_count: number; 52 + }): number { 53 + const contractConcepts = params.contract_inputs + params.contract_outputs + params.contract_invariants; 54 + return contractConcepts + params.dependency_count + params.side_channel_count + params.canon_node_count; 55 + } 56 + 57 + /** 58 + * Compute pairwise interaction potential. 59 + * n concepts → n*(n-1)/2 potential interactions. 60 + */ 61 + export function interactionPotential(mass: number): number { 62 + return mass > 1 ? (mass * (mass - 1)) / 2 : 0; 63 + } 64 + 65 + /** 66 + * Check the ratchet rule: mass cannot grow across two consecutive 67 + * regeneration cycles without explicit justification. 68 + */ 69 + export function checkRatchet(currentMass: number, previousMass: number | undefined): boolean { 70 + if (previousMass === undefined) return false; // no previous data, no violation 71 + return currentMass > previousMass; 72 + } 73 + 74 + /** 75 + * Default mass budget thresholds. 76 + * Based on working memory limits (~4-7 chunks). 77 + */ 78 + export const MASS_THRESHOLDS = { 79 + /** Ideal: one person can hold it all */ 80 + healthy: 7, 81 + /** Caution: approaching cognitive limit */ 82 + warning: 12, 83 + /** Danger: exceeds working memory */ 84 + danger: 20, 85 + } as const;
+87
src/models/evaluation.ts
··· 1 + /** 2 + * Evaluation model — durable behavioral truth surface. 3 + * 4 + * Evaluations bind to behavior at IU boundaries, not to implementation internals. 5 + * They survive regeneration. The separating question: "Would this assertion still 6 + * be meaningful if the entire implementation were replaced tomorrow?" 7 + * 8 + * Distinct from implementation tests, which die with the code they describe. 9 + * (See: Fowler, The Phoenix Architecture, Chapter 5) 10 + */ 11 + 12 + /** What the evaluation asserts about */ 13 + export type EvaluationBinding = 14 + | 'domain_rule' // business logic invariant 15 + | 'boundary_contract' // input/output shape at IU boundary 16 + | 'constraint' // latency, throughput, error rate 17 + | 'invariant' // property that holds across all states 18 + | 'failure_mode'; // behavior under error conditions 19 + 20 + /** How confident we are the evaluation captures real behavior */ 21 + export type EvaluationOrigin = 22 + | 'specified' // derived from spec/intent 23 + | 'characterization' // captured from existing implementation (legacy) 24 + | 'incident' // added after a production incident 25 + | 'audit'; // added during evaluation audit 26 + 27 + export interface Evaluation { 28 + /** Unique ID, content-addressed */ 29 + eval_id: string; 30 + /** Human-readable name */ 31 + name: string; 32 + /** Which IU boundary this evaluates */ 33 + iu_id: string; 34 + /** What this evaluation binds to */ 35 + binding: EvaluationBinding; 36 + /** How this evaluation was created */ 37 + origin: EvaluationOrigin; 38 + /** Behavioral assertion in human-readable form */ 39 + assertion: string; 40 + /** 41 + * Given/When/Then specification: 42 + * - given: preconditions 43 + * - when: action at the boundary 44 + * - then: expected observable outcome 45 + */ 46 + given: string; 47 + when: string; 48 + then: string; 49 + /** Canonical node IDs this evaluation covers */ 50 + canon_ids: string[]; 51 + /** Whether this is a conservation-layer evaluation (surface stability) */ 52 + conservation: boolean; 53 + /** Provenance: why this evaluation exists */ 54 + rationale?: string; 55 + /** Link to incident/decision that motivated this */ 56 + provenance_ref?: string; 57 + /** Created timestamp */ 58 + created_at: string; 59 + /** Last verified timestamp */ 60 + last_verified_at?: string; 61 + /** Status of last verification */ 62 + last_status?: 'pass' | 'fail' | 'untested'; 63 + } 64 + 65 + /** 66 + * Evaluation coverage report for an IU 67 + */ 68 + export interface EvaluationCoverage { 69 + iu_id: string; 70 + iu_name: string; 71 + total_evaluations: number; 72 + by_binding: Record<EvaluationBinding, number>; 73 + by_origin: Record<EvaluationOrigin, number>; 74 + canon_ids_covered: string[]; 75 + canon_ids_uncovered: string[]; 76 + coverage_ratio: number; 77 + conservation_count: number; 78 + /** Gap analysis */ 79 + gaps: EvaluationGap[]; 80 + } 81 + 82 + export interface EvaluationGap { 83 + category: 'missing_boundary' | 'missing_invariant' | 'missing_failure_mode' | 'untested' | 'stale'; 84 + subject: string; 85 + message: string; 86 + recommended_action: string; 87 + }
+54
src/models/negative-knowledge.ts
··· 1 + /** 2 + * Negative Knowledge model — what was tried and failed. 3 + * 4 + * "What failed matters as much as what succeeded, and it disappears first." 5 + * Failed generation attempts, rejected approaches, incident-driven constraints. 6 + * Preserved across compaction. The system's immune memory. 7 + * 8 + * (See: Fowler, The Phoenix Architecture, Chapter 14) 9 + */ 10 + 11 + export type NegativeKnowledgeKind = 12 + | 'failed_generation' // generation attempt that didn't pass evaluations 13 + | 'rejected_approach' // architectural approach tried and abandoned 14 + | 'incident_constraint' // constraint added after a production incident 15 + | 'deprecated_behavior' // behavior intentionally removed with reason 16 + | 'known_limitation'; // known issue accepted with rationale 17 + 18 + export interface NegativeKnowledge { 19 + /** Unique ID */ 20 + nk_id: string; 21 + /** What kind of negative knowledge */ 22 + kind: NegativeKnowledgeKind; 23 + /** Which IU or canonical node this applies to */ 24 + subject_id: string; 25 + /** Subject type */ 26 + subject_type: 'iu' | 'canonical_node' | 'system'; 27 + /** Human-readable description of what was tried */ 28 + what_was_tried: string; 29 + /** Why it failed or was rejected */ 30 + why_it_failed: string; 31 + /** What constraint or lesson this implies for future regeneration */ 32 + constraint_for_future: string; 33 + /** Reference to incident, post-mortem, or decision record */ 34 + reference?: string; 35 + /** When this knowledge was recorded */ 36 + recorded_at: string; 37 + /** Who recorded it */ 38 + recorded_by?: string; 39 + /** Is this still relevant? (can be marked stale) */ 40 + active: boolean; 41 + } 42 + 43 + /** 44 + * Check if a regeneration should consult negative knowledge before proceeding. 45 + */ 46 + export function hasRelevantNegativeKnowledge( 47 + records: NegativeKnowledge[], 48 + subjectId: string, 49 + ): NegativeKnowledge[] { 50 + return records.filter(nk => 51 + nk.active && 52 + nk.subject_id === subjectId 53 + ); 54 + }
+122
src/models/pace-layer.ts
··· 1 + /** 2 + * Pace Layer & Conservation Layer model. 3 + * 4 + * Different parts of a system change at different speeds. 5 + * A layer's rate of change is a function of its blast radius. 6 + * Conservation layers are surfaces where external trust accumulates. 7 + * 8 + * (See: Fowler, The Phoenix Architecture, Chapters 6 & 15) 9 + */ 10 + 11 + /** 12 + * Pace layer classification — slowest to fastest. 13 + * 14 + * Foundation: correctness is load-bearing (billing, fulfillment). Changes yearly. 15 + * Domain: event schemas, domain models. Changes quarterly. 16 + * Service: API shapes, integration contracts. Changes monthly. 17 + * Surface: UI, banners, display logic. Changes days-to-weeks. 18 + */ 19 + export type PaceLayer = 'foundation' | 'domain' | 'service' | 'surface'; 20 + 21 + /** 22 + * Extended IU metadata for pace-layer-aware regeneration. 23 + */ 24 + export interface PaceLayerMetadata { 25 + /** Which pace layer this IU occupies */ 26 + pace_layer: PaceLayer; 27 + /** Is this a conservation layer? (external trust depends on stability) */ 28 + conservation: boolean; 29 + /** Why this classification was chosen */ 30 + classification_rationale: string; 31 + /** How many other IUs depend on this one's interface */ 32 + dependency_weight: number; 33 + /** Expected change cadence */ 34 + expected_change_cadence: 'daily' | 'weekly' | 'monthly' | 'quarterly' | 'yearly'; 35 + /** Last classification review date */ 36 + last_reviewed: string; 37 + } 38 + 39 + /** 40 + * Pace layer violation diagnostic 41 + */ 42 + export interface PaceLayerViolation { 43 + iu_id: string; 44 + iu_name: string; 45 + current_layer: PaceLayer; 46 + violation_type: 'regen_too_fast' | 'dependency_crosses_layer' | 'conservation_unprotected'; 47 + message: string; 48 + recommended_action: string; 49 + } 50 + 51 + /** 52 + * Default pace layer metadata for an IU 53 + */ 54 + export function defaultPaceLayerMetadata(): PaceLayerMetadata { 55 + return { 56 + pace_layer: 'service', 57 + conservation: false, 58 + classification_rationale: 'Default classification — needs review', 59 + dependency_weight: 0, 60 + expected_change_cadence: 'monthly', 61 + last_reviewed: new Date().toISOString(), 62 + }; 63 + } 64 + 65 + /** 66 + * Infer pace layer from dependency weight heuristic. 67 + * High dependency weight → slower layer. 68 + */ 69 + export function inferPaceLayer(dependencyWeight: number, hasExternalDependents: boolean): PaceLayer { 70 + if (hasExternalDependents || dependencyWeight >= 5) return 'foundation'; 71 + if (dependencyWeight >= 3) return 'domain'; 72 + if (dependencyWeight >= 1) return 'service'; 73 + return 'surface'; 74 + } 75 + 76 + /** 77 + * Check if a regeneration speed is appropriate for a pace layer. 78 + */ 79 + export function isPaceAppropriate( 80 + layer: PaceLayer, 81 + daysSinceLastRegen: number, 82 + ): boolean { 83 + const minimums: Record<PaceLayer, number> = { 84 + surface: 1, // can regen daily 85 + service: 7, // at most weekly 86 + domain: 30, // at most monthly 87 + foundation: 90, // at most quarterly 88 + }; 89 + return daysSinceLastRegen >= minimums[layer]; 90 + } 91 + 92 + /** 93 + * Layer ordering for comparison (lower = slower = more stable) 94 + */ 95 + const LAYER_ORDER: Record<PaceLayer, number> = { 96 + foundation: 0, 97 + domain: 1, 98 + service: 2, 99 + surface: 3, 100 + }; 101 + 102 + /** 103 + * Check if a dependency crosses pace layers in the wrong direction 104 + * (fast layer depending on slow layer is fine; slow layer depending on fast layer is a violation) 105 + */ 106 + export function detectLayerCrossing( 107 + sourceLayer: PaceLayer, 108 + targetLayer: PaceLayer, 109 + ): PaceLayerViolation | null { 110 + if (LAYER_ORDER[sourceLayer] < LAYER_ORDER[targetLayer]) { 111 + return { 112 + iu_id: '', 113 + iu_name: '', 114 + current_layer: sourceLayer, 115 + violation_type: 'dependency_crosses_layer', 116 + message: `Slow layer (${sourceLayer}) depends on fast layer (${targetLayer}). ` + 117 + `This couples slow-changing logic to fast-changing implementation.`, 118 + recommended_action: `Introduce an interface boundary between the ${sourceLayer} and ${targetLayer} layers.`, 119 + }; 120 + } 121 + return null; 122 + }
+172
src/store/evaluation-store.ts
··· 1 + /** 2 + * Evaluation Store — persists durable behavioral evaluations. 3 + * 4 + * Evaluations are versioned independently of implementation. 5 + * They are the system's constitution, not the implementation's unit tests. 6 + */ 7 + 8 + import { writeFileSync, readFileSync, existsSync, mkdirSync } from 'node:fs'; 9 + import { join } from 'node:path'; 10 + import type { Evaluation, EvaluationCoverage, EvaluationGap } from '../models/evaluation.js'; 11 + import type { ImplementationUnit } from '../models/iu.js'; 12 + 13 + interface EvalIndex { 14 + evaluations: Evaluation[]; 15 + } 16 + 17 + export class EvaluationStore { 18 + private indexPath: string; 19 + 20 + constructor(phoenixRoot: string) { 21 + const dir = join(phoenixRoot, 'evaluations'); 22 + mkdirSync(dir, { recursive: true }); 23 + this.indexPath = join(dir, 'evaluations.json'); 24 + } 25 + 26 + private load(): EvalIndex { 27 + if (!existsSync(this.indexPath)) return { evaluations: [] }; 28 + return JSON.parse(readFileSync(this.indexPath, 'utf8')); 29 + } 30 + 31 + private save(index: EvalIndex): void { 32 + writeFileSync(this.indexPath, JSON.stringify(index, null, 2), 'utf8'); 33 + } 34 + 35 + add(evaluation: Evaluation): void { 36 + const index = this.load(); 37 + // Replace if same ID exists 38 + const existing = index.evaluations.findIndex(e => e.eval_id === evaluation.eval_id); 39 + if (existing >= 0) { 40 + index.evaluations[existing] = evaluation; 41 + } else { 42 + index.evaluations.push(evaluation); 43 + } 44 + this.save(index); 45 + } 46 + 47 + addMany(evaluations: Evaluation[]): void { 48 + const index = this.load(); 49 + for (const evaluation of evaluations) { 50 + const existing = index.evaluations.findIndex(e => e.eval_id === evaluation.eval_id); 51 + if (existing >= 0) { 52 + index.evaluations[existing] = evaluation; 53 + } else { 54 + index.evaluations.push(evaluation); 55 + } 56 + } 57 + this.save(index); 58 + } 59 + 60 + getByIU(iuId: string): Evaluation[] { 61 + return this.load().evaluations.filter(e => e.iu_id === iuId); 62 + } 63 + 64 + getAll(): Evaluation[] { 65 + return this.load().evaluations; 66 + } 67 + 68 + getConservation(): Evaluation[] { 69 + return this.load().evaluations.filter(e => e.conservation); 70 + } 71 + 72 + remove(evalId: string): boolean { 73 + const index = this.load(); 74 + const before = index.evaluations.length; 75 + index.evaluations = index.evaluations.filter(e => e.eval_id !== evalId); 76 + if (index.evaluations.length < before) { 77 + this.save(index); 78 + return true; 79 + } 80 + return false; 81 + } 82 + 83 + /** 84 + * Compute evaluation coverage for an IU. 85 + */ 86 + coverage(iu: ImplementationUnit): EvaluationCoverage { 87 + const evals = this.getByIU(iu.iu_id); 88 + const gaps: EvaluationGap[] = []; 89 + 90 + // Count by binding 91 + const byBinding: Record<string, number> = { 92 + domain_rule: 0, boundary_contract: 0, constraint: 0, invariant: 0, failure_mode: 0, 93 + }; 94 + const byOrigin: Record<string, number> = { 95 + specified: 0, characterization: 0, incident: 0, audit: 0, 96 + }; 97 + const coveredCanonIds = new Set<string>(); 98 + 99 + for (const e of evals) { 100 + byBinding[e.binding] = (byBinding[e.binding] ?? 0) + 1; 101 + byOrigin[e.origin] = (byOrigin[e.origin] ?? 0) + 1; 102 + for (const cid of e.canon_ids) coveredCanonIds.add(cid); 103 + } 104 + 105 + // Check for missing coverage 106 + const uncoveredCanonIds = iu.source_canon_ids.filter(id => !coveredCanonIds.has(id)); 107 + 108 + if (byBinding.boundary_contract === 0) { 109 + gaps.push({ 110 + category: 'missing_boundary', 111 + subject: iu.iu_id, 112 + message: `No boundary contract evaluations for ${iu.name}`, 113 + recommended_action: 'Write evaluations asserting input/output behavior at the IU boundary', 114 + }); 115 + } 116 + if (byBinding.failure_mode === 0) { 117 + gaps.push({ 118 + category: 'missing_failure_mode', 119 + subject: iu.iu_id, 120 + message: `No failure mode evaluations for ${iu.name}`, 121 + recommended_action: 'Write evaluations asserting behavior under error conditions', 122 + }); 123 + } 124 + if (byBinding.invariant === 0 && iu.contract.invariants.length > 0) { 125 + gaps.push({ 126 + category: 'missing_invariant', 127 + subject: iu.iu_id, 128 + message: `${iu.contract.invariants.length} contract invariants but no invariant evaluations`, 129 + recommended_action: 'Write evaluations for each declared invariant', 130 + }); 131 + } 132 + for (const e of evals) { 133 + if (e.last_status === undefined || e.last_status === 'untested') { 134 + gaps.push({ 135 + category: 'untested', 136 + subject: e.eval_id, 137 + message: `Evaluation "${e.name}" has never been verified`, 138 + recommended_action: 'Run evaluation suite against current implementation', 139 + }); 140 + } 141 + } 142 + // Check for stale evaluations (>90 days since last verification) 143 + const ninetyDaysAgo = new Date(Date.now() - 90 * 24 * 60 * 60 * 1000).toISOString(); 144 + for (const e of evals) { 145 + if (e.last_verified_at && e.last_verified_at < ninetyDaysAgo) { 146 + gaps.push({ 147 + category: 'stale', 148 + subject: e.eval_id, 149 + message: `Evaluation "${e.name}" last verified >90 days ago`, 150 + recommended_action: 'Re-verify evaluation against current implementation', 151 + }); 152 + } 153 + } 154 + 155 + const coverageRatio = iu.source_canon_ids.length > 0 156 + ? coveredCanonIds.size / iu.source_canon_ids.length 157 + : evals.length > 0 ? 1 : 0; 158 + 159 + return { 160 + iu_id: iu.iu_id, 161 + iu_name: iu.name, 162 + total_evaluations: evals.length, 163 + by_binding: byBinding as Record<any, number>, 164 + by_origin: byOrigin as Record<any, number>, 165 + canon_ids_covered: [...coveredCanonIds], 166 + canon_ids_uncovered: uncoveredCanonIds, 167 + coverage_ratio: coverageRatio, 168 + conservation_count: evals.filter(e => e.conservation).length, 169 + gaps, 170 + }; 171 + } 172 + }
+65
src/store/negative-knowledge-store.ts
··· 1 + /** 2 + * Negative Knowledge Store — persists what was tried and failed. 3 + * Preserved across compaction. The system's immune memory. 4 + */ 5 + 6 + import { writeFileSync, readFileSync, existsSync, mkdirSync } from 'node:fs'; 7 + import { join } from 'node:path'; 8 + import type { NegativeKnowledge } from '../models/negative-knowledge.js'; 9 + 10 + interface NKIndex { 11 + records: NegativeKnowledge[]; 12 + } 13 + 14 + export class NegativeKnowledgeStore { 15 + private indexPath: string; 16 + 17 + constructor(phoenixRoot: string) { 18 + const dir = join(phoenixRoot, 'provenance'); 19 + mkdirSync(dir, { recursive: true }); 20 + this.indexPath = join(dir, 'negative-knowledge.json'); 21 + } 22 + 23 + private load(): NKIndex { 24 + if (!existsSync(this.indexPath)) return { records: [] }; 25 + return JSON.parse(readFileSync(this.indexPath, 'utf8')); 26 + } 27 + 28 + private save(index: NKIndex): void { 29 + writeFileSync(this.indexPath, JSON.stringify(index, null, 2), 'utf8'); 30 + } 31 + 32 + add(record: NegativeKnowledge): void { 33 + const index = this.load(); 34 + const existing = index.records.findIndex(r => r.nk_id === record.nk_id); 35 + if (existing >= 0) { 36 + index.records[existing] = record; 37 + } else { 38 + index.records.push(record); 39 + } 40 + this.save(index); 41 + } 42 + 43 + getBySubject(subjectId: string): NegativeKnowledge[] { 44 + return this.load().records.filter(r => r.subject_id === subjectId && r.active); 45 + } 46 + 47 + getAll(): NegativeKnowledge[] { 48 + return this.load().records; 49 + } 50 + 51 + getActive(): NegativeKnowledge[] { 52 + return this.load().records.filter(r => r.active); 53 + } 54 + 55 + markStale(nkId: string): boolean { 56 + const index = this.load(); 57 + const record = index.records.find(r => r.nk_id === nkId); 58 + if (record) { 59 + record.active = false; 60 + this.save(index); 61 + return true; 62 + } 63 + return false; 64 + } 65 + }
+146
tests/unit/audit.test.ts
··· 1 + import { describe, it, expect } from 'vitest'; 2 + import { auditIU, auditAll } from '../../src/audit.js'; 3 + import type { ImplementationUnit } from '../../src/models/iu.js'; 4 + import type { EvaluationCoverage } from '../../src/models/evaluation.js'; 5 + import { defaultBoundaryPolicy, defaultEnforcement } from '../../src/models/iu.js'; 6 + 7 + function makeIU(overrides: Partial<ImplementationUnit> = {}): ImplementationUnit { 8 + return { 9 + iu_id: 'iu-1', 10 + kind: 'module', 11 + name: 'TestModule', 12 + risk_tier: 'medium', 13 + contract: { 14 + description: 'Test module', 15 + inputs: ['input1'], 16 + outputs: ['output1'], 17 + invariants: ['must be consistent'], 18 + }, 19 + source_canon_ids: ['c1', 'c2'], 20 + dependencies: [], 21 + boundary_policy: defaultBoundaryPolicy(), 22 + enforcement: defaultEnforcement(), 23 + evidence_policy: { required: ['unit_tests'] }, 24 + output_files: ['test.ts'], 25 + ...overrides, 26 + }; 27 + } 28 + 29 + function makeCoverage(overrides: Partial<EvaluationCoverage> = {}): EvaluationCoverage { 30 + return { 31 + iu_id: 'iu-1', 32 + iu_name: 'TestModule', 33 + total_evaluations: 0, 34 + by_binding: { domain_rule: 0, boundary_contract: 0, constraint: 0, invariant: 0, failure_mode: 0 }, 35 + by_origin: { specified: 0, characterization: 0, incident: 0, audit: 0 }, 36 + canon_ids_covered: [], 37 + canon_ids_uncovered: ['c1', 'c2'], 38 + coverage_ratio: 0, 39 + conservation_count: 0, 40 + gaps: [], 41 + ...overrides, 42 + }; 43 + } 44 + 45 + describe('Replacement Audit', () => { 46 + it('marks IU with no evaluations and weak boundaries as opaque', () => { 47 + const iu = makeIU({ 48 + contract: { description: '', inputs: [], outputs: [], invariants: [] }, 49 + }); 50 + const result = auditIU({ 51 + iu, 52 + allIUs: [iu], 53 + evalCoverage: makeCoverage(), 54 + negativeKnowledge: [], 55 + }); 56 + expect(result.readiness).toBe('opaque'); 57 + expect(result.blockers.length).toBeGreaterThan(0); 58 + }); 59 + 60 + it('marks well-defined IU with full evaluations as evaluable or regenerable', () => { 61 + const iu = makeIU(); 62 + const cov = makeCoverage({ 63 + total_evaluations: 5, 64 + by_binding: { domain_rule: 1, boundary_contract: 2, constraint: 0, invariant: 1, failure_mode: 1 }, 65 + by_origin: { specified: 3, characterization: 1, incident: 1, audit: 0 }, 66 + canon_ids_covered: ['c1', 'c2'], 67 + canon_ids_uncovered: [], 68 + coverage_ratio: 1.0, 69 + gaps: [], 70 + }); 71 + const result = auditIU({ 72 + iu, 73 + allIUs: [iu], 74 + evalCoverage: cov, 75 + negativeKnowledge: [], 76 + }); 77 + expect(['evaluable', 'regenerable']).toContain(result.readiness); 78 + expect(result.score).toBeGreaterThan(50); 79 + }); 80 + 81 + it('penalizes wide blast radius', () => { 82 + const target = makeIU({ iu_id: 'iu-core', name: 'Core' }); 83 + const dep1 = makeIU({ iu_id: 'iu-a', name: 'A', dependencies: ['iu-core'] }); 84 + const dep2 = makeIU({ iu_id: 'iu-b', name: 'B', dependencies: ['iu-core'] }); 85 + const dep3 = makeIU({ iu_id: 'iu-c', name: 'C', dependencies: ['iu-core'] }); 86 + const dep4 = makeIU({ iu_id: 'iu-d', name: 'D', dependencies: ['iu-core'] }); 87 + 88 + const result = auditIU({ 89 + iu: target, 90 + allIUs: [target, dep1, dep2, dep3, dep4], 91 + evalCoverage: makeCoverage({ iu_id: 'iu-core' }), 92 + negativeKnowledge: [], 93 + }); 94 + expect(result.blast_radius.score).toBeLessThan(50); 95 + expect(result.blockers.some(b => b.category === 'coupling')).toBe(true); 96 + }); 97 + 98 + it('flags ratchet violation when mass grows', () => { 99 + const iu = makeIU(); 100 + const result = auditIU({ 101 + iu, 102 + allIUs: [iu], 103 + evalCoverage: makeCoverage(), 104 + negativeKnowledge: [], 105 + previousMass: 2, // was 2, now it's more (inputs + outputs + invariants + canon nodes = 5) 106 + }); 107 + expect(result.conceptual_mass.detail).toContain('RATCHET'); 108 + expect(result.blockers.some(b => b.category === 'mass')).toBe(true); 109 + }); 110 + 111 + it('incorporates negative knowledge in recommendations', () => { 112 + const iu = makeIU(); 113 + const result = auditIU({ 114 + iu, 115 + allIUs: [iu], 116 + evalCoverage: makeCoverage(), 117 + negativeKnowledge: [{ 118 + nk_id: 'nk-1', 119 + kind: 'incident_constraint', 120 + subject_id: 'iu-1', 121 + subject_type: 'iu', 122 + what_was_tried: 'Async auth flow', 123 + why_it_failed: 'Race condition on token refresh', 124 + constraint_for_future: 'Auth must be synchronous', 125 + recorded_at: new Date().toISOString(), 126 + active: true, 127 + }], 128 + }); 129 + expect(result.recommendations.some(r => r.includes('incident constraint'))).toBe(true); 130 + }); 131 + 132 + it('audits all IUs at once', () => { 133 + const ius = [makeIU({ iu_id: 'a', name: 'A' }), makeIU({ iu_id: 'b', name: 'B' })]; 134 + const results = auditAll( 135 + ius, 136 + new Map([ 137 + ['a', makeCoverage({ iu_id: 'a' })], 138 + ['b', makeCoverage({ iu_id: 'b' })], 139 + ]), 140 + new Map(), 141 + [], 142 + new Map(), 143 + ); 144 + expect(results).toHaveLength(2); 145 + }); 146 + });
+47
tests/unit/conceptual-mass.test.ts
··· 1 + import { describe, it, expect } from 'vitest'; 2 + import { 3 + computeConceptualMass, 4 + interactionPotential, 5 + checkRatchet, 6 + MASS_THRESHOLDS, 7 + } from '../../src/models/conceptual-mass.js'; 8 + 9 + describe('Conceptual Mass', () => { 10 + it('computes mass as sum of concept counts', () => { 11 + const mass = computeConceptualMass({ 12 + contract_inputs: 2, 13 + contract_outputs: 1, 14 + contract_invariants: 1, 15 + dependency_count: 2, 16 + side_channel_count: 1, 17 + canon_node_count: 3, 18 + file_count: 2, 19 + }); 20 + // 2+1+1 (contract) + 2 (deps) + 1 (side) + 3 (canon) = 10 21 + expect(mass).toBe(10); 22 + }); 23 + 24 + it('computes interaction potential as n*(n-1)/2', () => { 25 + expect(interactionPotential(0)).toBe(0); 26 + expect(interactionPotential(1)).toBe(0); 27 + expect(interactionPotential(2)).toBe(1); 28 + expect(interactionPotential(5)).toBe(10); 29 + expect(interactionPotential(10)).toBe(45); 30 + }); 31 + 32 + it('detects ratchet violation when mass grows', () => { 33 + expect(checkRatchet(10, 8)).toBe(true); 34 + expect(checkRatchet(10, 10)).toBe(false); 35 + expect(checkRatchet(8, 10)).toBe(false); 36 + }); 37 + 38 + it('no violation when no previous data', () => { 39 + expect(checkRatchet(10, undefined)).toBe(false); 40 + }); 41 + 42 + it('has sensible thresholds', () => { 43 + expect(MASS_THRESHOLDS.healthy).toBe(7); 44 + expect(MASS_THRESHOLDS.warning).toBe(12); 45 + expect(MASS_THRESHOLDS.danger).toBe(20); 46 + }); 47 + });
+118
tests/unit/evaluation.test.ts
··· 1 + import { describe, it, expect, beforeEach, afterEach } from 'vitest'; 2 + import { mkdtempSync, rmSync } from 'node:fs'; 3 + import { join } from 'node:path'; 4 + import { tmpdir } from 'node:os'; 5 + import { EvaluationStore } from '../../src/store/evaluation-store.js'; 6 + import type { Evaluation } from '../../src/models/evaluation.js'; 7 + import type { ImplementationUnit } from '../../src/models/iu.js'; 8 + import { defaultBoundaryPolicy, defaultEnforcement } from '../../src/models/iu.js'; 9 + 10 + function makeIU(overrides: Partial<ImplementationUnit> = {}): ImplementationUnit { 11 + return { 12 + iu_id: 'iu-auth', 13 + kind: 'module', 14 + name: 'AuthModule', 15 + risk_tier: 'high', 16 + contract: { 17 + description: 'Handles authentication', 18 + inputs: ['credentials'], 19 + outputs: ['token', 'error'], 20 + invariants: ['expired tokens must be rejected'], 21 + }, 22 + source_canon_ids: ['canon-1', 'canon-2', 'canon-3'], 23 + dependencies: [], 24 + boundary_policy: defaultBoundaryPolicy(), 25 + enforcement: defaultEnforcement(), 26 + evidence_policy: { required: ['unit_tests'] }, 27 + output_files: ['auth.ts'], 28 + ...overrides, 29 + }; 30 + } 31 + 32 + function makeEval(overrides: Partial<Evaluation> = {}): Evaluation { 33 + return { 34 + eval_id: 'eval-1', 35 + name: 'Auth rejects expired tokens', 36 + iu_id: 'iu-auth', 37 + binding: 'boundary_contract', 38 + origin: 'specified', 39 + assertion: 'Expired tokens receive 401', 40 + given: 'A token that expired 1 minute ago', 41 + when: 'The token is presented for authentication', 42 + then: 'The system returns a 401 Unauthorized response', 43 + canon_ids: ['canon-1'], 44 + conservation: false, 45 + created_at: new Date().toISOString(), 46 + ...overrides, 47 + }; 48 + } 49 + 50 + describe('EvaluationStore', () => { 51 + let dir: string; 52 + let store: EvaluationStore; 53 + 54 + beforeEach(() => { 55 + dir = mkdtempSync(join(tmpdir(), 'phoenix-eval-')); 56 + store = new EvaluationStore(dir); 57 + }); 58 + 59 + afterEach(() => rmSync(dir, { recursive: true, force: true })); 60 + 61 + it('starts empty', () => { 62 + expect(store.getAll()).toEqual([]); 63 + }); 64 + 65 + it('adds and retrieves evaluations', () => { 66 + const ev = makeEval(); 67 + store.add(ev); 68 + expect(store.getAll()).toHaveLength(1); 69 + expect(store.getByIU('iu-auth')).toHaveLength(1); 70 + expect(store.getByIU('iu-other')).toHaveLength(0); 71 + }); 72 + 73 + it('replaces on duplicate eval_id', () => { 74 + store.add(makeEval({ eval_id: 'eval-1', name: 'v1' })); 75 + store.add(makeEval({ eval_id: 'eval-1', name: 'v2' })); 76 + expect(store.getAll()).toHaveLength(1); 77 + expect(store.getAll()[0].name).toBe('v2'); 78 + }); 79 + 80 + it('removes evaluations', () => { 81 + store.add(makeEval()); 82 + expect(store.remove('eval-1')).toBe(true); 83 + expect(store.getAll()).toHaveLength(0); 84 + expect(store.remove('nonexistent')).toBe(false); 85 + }); 86 + 87 + it('filters conservation evaluations', () => { 88 + store.add(makeEval({ eval_id: 'e1', conservation: true })); 89 + store.add(makeEval({ eval_id: 'e2', conservation: false })); 90 + expect(store.getConservation()).toHaveLength(1); 91 + }); 92 + 93 + it('computes coverage for an IU', () => { 94 + const iu = makeIU(); 95 + store.add(makeEval({ eval_id: 'e1', binding: 'boundary_contract', canon_ids: ['canon-1'] })); 96 + store.add(makeEval({ eval_id: 'e2', binding: 'failure_mode', canon_ids: ['canon-2'] })); 97 + 98 + const cov = store.coverage(iu); 99 + expect(cov.total_evaluations).toBe(2); 100 + expect(cov.canon_ids_covered).toContain('canon-1'); 101 + expect(cov.canon_ids_covered).toContain('canon-2'); 102 + expect(cov.canon_ids_uncovered).toEqual(['canon-3']); 103 + expect(cov.coverage_ratio).toBeCloseTo(2 / 3); 104 + expect(cov.by_binding.boundary_contract).toBe(1); 105 + expect(cov.by_binding.failure_mode).toBe(1); 106 + }); 107 + 108 + it('identifies coverage gaps', () => { 109 + const iu = makeIU(); 110 + // No evaluations at all 111 + const cov = store.coverage(iu); 112 + expect(cov.gaps.length).toBeGreaterThan(0); 113 + const categories = cov.gaps.map(g => g.category); 114 + expect(categories).toContain('missing_boundary'); 115 + expect(categories).toContain('missing_failure_mode'); 116 + expect(categories).toContain('missing_invariant'); 117 + }); 118 + });
+77
tests/unit/negative-knowledge.test.ts
··· 1 + import { describe, it, expect, beforeEach, afterEach } from 'vitest'; 2 + import { mkdtempSync, rmSync } from 'node:fs'; 3 + import { join } from 'node:path'; 4 + import { tmpdir } from 'node:os'; 5 + import { NegativeKnowledgeStore } from '../../src/store/negative-knowledge-store.js'; 6 + import { hasRelevantNegativeKnowledge } from '../../src/models/negative-knowledge.js'; 7 + import type { NegativeKnowledge } from '../../src/models/negative-knowledge.js'; 8 + 9 + function makeNK(overrides: Partial<NegativeKnowledge> = {}): NegativeKnowledge { 10 + return { 11 + nk_id: 'nk-1', 12 + kind: 'failed_generation', 13 + subject_id: 'iu-auth', 14 + subject_type: 'iu', 15 + what_was_tried: 'Async token refresh', 16 + why_it_failed: 'Race condition caused token reuse', 17 + constraint_for_future: 'Token refresh must be synchronous', 18 + recorded_at: new Date().toISOString(), 19 + active: true, 20 + ...overrides, 21 + }; 22 + } 23 + 24 + describe('NegativeKnowledgeStore', () => { 25 + let dir: string; 26 + let store: NegativeKnowledgeStore; 27 + 28 + beforeEach(() => { 29 + dir = mkdtempSync(join(tmpdir(), 'phoenix-nk-')); 30 + store = new NegativeKnowledgeStore(dir); 31 + }); 32 + 33 + afterEach(() => rmSync(dir, { recursive: true, force: true })); 34 + 35 + it('starts empty', () => { 36 + expect(store.getAll()).toEqual([]); 37 + }); 38 + 39 + it('adds and retrieves records', () => { 40 + store.add(makeNK()); 41 + expect(store.getAll()).toHaveLength(1); 42 + expect(store.getBySubject('iu-auth')).toHaveLength(1); 43 + expect(store.getBySubject('iu-other')).toHaveLength(0); 44 + }); 45 + 46 + it('replaces on duplicate nk_id', () => { 47 + store.add(makeNK({ nk_id: 'nk-1', what_was_tried: 'v1' })); 48 + store.add(makeNK({ nk_id: 'nk-1', what_was_tried: 'v2' })); 49 + expect(store.getAll()).toHaveLength(1); 50 + expect(store.getAll()[0].what_was_tried).toBe('v2'); 51 + }); 52 + 53 + it('marks records stale', () => { 54 + store.add(makeNK()); 55 + expect(store.markStale('nk-1')).toBe(true); 56 + expect(store.getActive()).toHaveLength(0); 57 + expect(store.getAll()).toHaveLength(1); // still in store, just inactive 58 + }); 59 + 60 + it('getBySubject only returns active records', () => { 61 + store.add(makeNK({ nk_id: 'nk-1', active: false })); 62 + expect(store.getBySubject('iu-auth')).toHaveLength(0); 63 + }); 64 + }); 65 + 66 + describe('hasRelevantNegativeKnowledge', () => { 67 + it('returns matching active records', () => { 68 + const records = [ 69 + makeNK({ nk_id: 'nk-1', subject_id: 'iu-auth' }), 70 + makeNK({ nk_id: 'nk-2', subject_id: 'iu-other' }), 71 + makeNK({ nk_id: 'nk-3', subject_id: 'iu-auth', active: false }), 72 + ]; 73 + const relevant = hasRelevantNegativeKnowledge(records, 'iu-auth'); 74 + expect(relevant).toHaveLength(1); 75 + expect(relevant[0].nk_id).toBe('nk-1'); 76 + }); 77 + });
+69
tests/unit/pace-layer.test.ts
··· 1 + import { describe, it, expect } from 'vitest'; 2 + import { 3 + inferPaceLayer, 4 + isPaceAppropriate, 5 + detectLayerCrossing, 6 + defaultPaceLayerMetadata, 7 + } from '../../src/models/pace-layer.js'; 8 + 9 + describe('Pace Layers', () => { 10 + it('infers foundation for high dependency weight', () => { 11 + expect(inferPaceLayer(5, false)).toBe('foundation'); 12 + expect(inferPaceLayer(3, true)).toBe('foundation'); 13 + }); 14 + 15 + it('infers domain for moderate dependency weight', () => { 16 + expect(inferPaceLayer(3, false)).toBe('domain'); 17 + expect(inferPaceLayer(4, false)).toBe('domain'); 18 + }); 19 + 20 + it('infers service for low dependency weight', () => { 21 + expect(inferPaceLayer(1, false)).toBe('service'); 22 + expect(inferPaceLayer(2, false)).toBe('service'); 23 + }); 24 + 25 + it('infers surface for zero dependency weight', () => { 26 + expect(inferPaceLayer(0, false)).toBe('surface'); 27 + }); 28 + 29 + it('allows daily regen for surface layer', () => { 30 + expect(isPaceAppropriate('surface', 1)).toBe(true); 31 + expect(isPaceAppropriate('surface', 0)).toBe(false); 32 + }); 33 + 34 + it('requires weekly minimum for service layer', () => { 35 + expect(isPaceAppropriate('service', 7)).toBe(true); 36 + expect(isPaceAppropriate('service', 3)).toBe(false); 37 + }); 38 + 39 + it('requires monthly minimum for domain layer', () => { 40 + expect(isPaceAppropriate('domain', 30)).toBe(true); 41 + expect(isPaceAppropriate('domain', 15)).toBe(false); 42 + }); 43 + 44 + it('requires quarterly minimum for foundation layer', () => { 45 + expect(isPaceAppropriate('foundation', 90)).toBe(true); 46 + expect(isPaceAppropriate('foundation', 60)).toBe(false); 47 + }); 48 + 49 + it('detects slow-depends-on-fast violation', () => { 50 + const v = detectLayerCrossing('foundation', 'surface'); 51 + expect(v).not.toBeNull(); 52 + expect(v!.violation_type).toBe('dependency_crosses_layer'); 53 + }); 54 + 55 + it('allows fast-depends-on-slow (normal)', () => { 56 + expect(detectLayerCrossing('surface', 'foundation')).toBeNull(); 57 + expect(detectLayerCrossing('service', 'domain')).toBeNull(); 58 + }); 59 + 60 + it('allows same-layer dependencies', () => { 61 + expect(detectLayerCrossing('domain', 'domain')).toBeNull(); 62 + }); 63 + 64 + it('provides sensible defaults', () => { 65 + const d = defaultPaceLayerMetadata(); 66 + expect(d.pace_layer).toBe('service'); 67 + expect(d.conservation).toBe(false); 68 + }); 69 + });