docs(coding-standards): add L1-L9 layer-hygiene guidelines + lint

+8 -1

Makefile

··· 7 7 # all runs to one path and pytest wipes it on startup, destroying concurrent state. 8 8 export TMPDIR := /var/tmp 9 9 10 - .PHONY: install uninstall test test-apps test-app test-only test-integration test-integration-only test-all format format-check ci clean clean-install coverage watch versions update update-prices pre-commit skills dev all sail sandbox sandbox-stop install-pinchtab verify-browser update-browser-baselines review verify-api update-api-baselines install-service uninstall-service service-logs gate-agents-rename 10 + .PHONY: install uninstall test test-apps test-app test-only test-integration test-integration-only test-all format format-check ci clean clean-install coverage watch versions update update-prices pre-commit skills dev all sail sandbox sandbox-stop install-pinchtab verify-browser update-browser-baselines review verify-api update-api-baselines install-service uninstall-service service-logs gate-agents-rename check-layer-hygiene 11 11 12 12 # Default target - install package in editable mode 13 13 all: install ··· 466 466 @echo "=== Running rename gate ===" 467 467 @$(MAKE) gate-agents-rename 468 468 @echo "" 469 + @echo "=== Running layer-hygiene check ===" 470 + @$(MAKE) check-layer-hygiene 471 + @echo "" 469 472 @echo "=== Running mypy ===" 470 473 @$(MYPY) . || true 471 474 @echo "" ··· 516 519 # Rename guard for the agents -> talents transition 517 520 gate-agents-rename: .installed 518 521 $(VENV_BIN)/python scripts/gate_agents_rename.py 522 + 523 + # Low-bar layer-hygiene check (see docs/coding-standards.md § Layer Hygiene) 524 + check-layer-hygiene: .installed 525 + $(VENV_BIN)/python scripts/check_layer_hygiene.py

+71

docs/coding-standards.md

··· 50 50 - **Package Manager**: [uv](https://docs.astral.sh/uv/) — lock file (`uv.lock`) is committed, `make install` syncs from it 51 51 - **Installation**: `make install` (creates isolated `.venv/` and syncs deps from the lock file for repo-local development) 52 52 - **Updating**: `make update` upgrades all deps to latest and regenerates the lock file 53 + 54 + ## Layer Hygiene 55 + 56 + These invariants keep read paths pure, concentrate domain writes in one place per domain, and stop infrastructure modules (indexer, importer, scheduler, search, graph) from silently mutating cross-cutting state. They were derived from a codebase-wide audit of layer violations in April 2026 — see the motivating record at `vpe/workspace/solstone-layer-violations-audit.md` in the sol pbc internal extro repo (14 violations inventoried, remediation plan in flight). 57 + 58 + The low-bar grep enforcement is `scripts/check_layer_hygiene.py`, wired into `make ci`. Known audit-flagged files are allowlisted with audit-reference TODOs; the allowlist shrinks as remediation bundles ship. 59 + 60 + ### L1 — Layer boundaries are load-bearing 61 + 62 + Each module family has a declared responsibility. Infrastructure modules (indexer, importer, scheduler, search, graph, stats) may write **only their own output artifacts**. They may not create, modify, or delete domain state (entities, facets, observations, activities, events, chronicle day content). If an infrastructure module needs to trigger a domain mutation, it emits a callosum event or invokes a `sol call <domain> <verb>` subprocess — never writes domain state directly. 63 + 64 + ### L2 — Domain write ownership 65 + 66 + Each domain has exactly **one** write-owning module. No other module may call `atomic_write`, `json.dump`, `open("w")`, `Path.write_text`, `unlink`, `rmtree`, etc. on that domain's on-disk state. 67 + 68 + | Domain | Write-owning module(s) | 69 + |--------|------------------------| 70 + | Entities (`entities/*/entity.json`, `entities/*/*.npz`) | `think/entities/saving.py` + `apps/entities/call.py` | 71 + | Facets (`facets/*/facet.json`, `facets/*/relationships/`) | `think/facets.py` + `apps/facets/*` (if/when created) | 72 + | Observations (`observations.jsonl`) | `think/entities/observations.py` | 73 + | Activities (`facets/*/activities/*.jsonl`) | `think/activities.py` | 74 + | Facet events (`facets/*/events/*.jsonl`) | `think/hooks.py::write_events_jsonl`, called only via declared hook contract | 75 + | Chronicle day content (`chronicle/YYYYMMDD/**`) | The capturing module (observer, importer) per its declared outputs | 76 + | Index (SQLite, `indexer/*`) | `think/indexer/*` | 77 + 78 + ### L3 — Naming is a contract 79 + 80 + Function and CLI-subcommand verbs signal read vs. write intent. 81 + 82 + **Read verbs** (functions and CLI subcommands): `load_*`, `get_*`, `read_*`, `scan_*`, `list_*`, `show_*`, `find_*`, `match_*`, `resolve_*`, `query_*`, `lookup_*`, `status_*`, `check_*`, `validate_*`, `discover_*`, `format_*`, `render_*`, `extract_*`, `parse_*`, `view_*`, `inspect_*`, `info_*`, `describe_*`, `search_*`. 83 + 84 + A read-verb function must not mutate on-disk state. No exceptions for caches. No exceptions for "create on miss." 85 + 86 + If a function needs create-on-miss semantics, split it: 87 + 88 + ```python 89 + entity = load_entity(eid) or create_entity(eid, ...) 90 + ``` 91 + 92 + This makes the write visible at every call site. 93 + 94 + **Write verbs** are the ones allowed to write — choose the right one: `save_`, `create_`, `add_`, `insert_`, `append_`, `attach_`, `delete_`, `remove_`, `update_`, `rename_`, `move_`, `promote_`, `merge_`, `seed_`, `consolidate_`, `bootstrap_`, `backfill_`, `dispatch_`, `record_`, `ingest_`, `import_`, `rebuild_`. 95 + 96 + ### L4 — CLI read-verbs are read-only 97 + 98 + CLI subcommands with read verbs (list, show, status, get, search, find, check, validate, discover, inspect, info, describe, read, view) must not write to journal domain state under any flag combination. If a command needs a write path, split it into two commands — a read-verb reader and a write-verb writer. 99 + 100 + ### L5 — Write-verb defaults 101 + 102 + CLI subcommands with write verbs default to safe. 103 + 104 + - Preferred: no default mutation; an explicit `--commit` (or `--apply`) flag is required to perform the write. 105 + - Acceptable alternative: `--dry-run` defaulting to `False` *only if* the subcommand name is unambiguously a write verb AND the command's primary user journey is the write (e.g., `sol call entities create`). 106 + 107 + "Bootstrap", "backfill", and "resolve-names" are not unambiguous — default them to dry-run. 108 + 109 + ### L6 — Indexers never mutate source data 110 + 111 + An indexer's job is to build indexes from source-of-truth data. Indexers may not mutate the source data they read. Re-running `sol indexer --rescan` on an unchanged journal must be a no-op for domain state. 112 + 113 + ### L7 — Importers only write to imports/ 114 + 115 + Importers write source material to `imports/` and the raw-content areas of `chronicle/`. They may not create or modify entities, facets, observations, or other cross-cutting state. If an importer needs to create an entity for deduplication, it calls a domain-owned `seed_entity()` function in `think/entities/` that surfaces the write explicitly. 116 + 117 + ### L8 — Hooks have declared outputs 118 + 119 + Post-processing hooks (`think/hooks.py`, `talent/*.py` hook functions) declare every path they will write in their frontmatter. The hook runner validates that all actual writes match the declaration. Writes outside the declared set fail loudly — raise at runtime; assert in tests. 120 + 121 + ### L9 — Event handlers are idempotent 122 + 123 + Any function that handles a callosum event, a scheduled tick, or a supervisor-started automation is idempotent w.r.t. on-disk state. Append-only history records dedupe by a natural key (usually `(day, segment)` or `(day, segment, ts)`). Before adding a write to an event handler, ask: "what happens if this event fires twice?"

+263

scripts/check_layer_hygiene.py

··· 1 + #!/usr/bin/env python3 2 + # SPDX-License-Identifier: AGPL-3.0-only 3 + # Copyright (c) 2026 sol pbc 4 + 5 + """Layer-hygiene lint. 6 + 7 + Low-bar static check for the invariants in ``docs/coding-standards.md`` § 8 + "Layer Hygiene" (L1, L2, L3, L6, L7). Warns when code inside infrastructure 9 + modules (``think/indexer/``, ``think/importers/``, ``think/search/``, 10 + ``think/graph/``) or inside a read-verb CLI handler (a function in 11 + ``apps/*/call.py`` whose name contains a read verb such as ``load``, ``show``, 12 + ``check``, ``validate``, ``find``, ``list``, ``scan``, ``get``) performs a 13 + direct write (``atomic_write``, ``json.dump``, ``.write_text``, 14 + ``open(..., "w")``, ``unlink``, ``rmtree``) against a path under 15 + ``journal/entities/``, ``journal/facets/``, or ``journal/observations``. 16 + 17 + By design this is a grep-level check with known false-positive surface. Known 18 + audit-tracked violations are allowlisted below with a TODO and an audit 19 + reference. An allowlist entry is expected to disappear once its bundle ships — 20 + see ``vpe/workspace/solstone-layer-violations-audit.md`` in the sol pbc 21 + internal extro repo for the canonical list (V1-V14). 22 + 23 + Exit codes: 24 + 0 — no un-tracked violations 25 + 1 — new violations found outside the allowlist 26 + """ 27 + 28 + from __future__ import annotations 29 + 30 + import ast 31 + import re 32 + import subprocess 33 + import sys 34 + from pathlib import Path 35 + 36 + ROOT = Path(__file__).resolve().parent.parent 37 + 38 + # Module families scrutinized as "infrastructure" per L1/L6/L7. 39 + INFRASTRUCTURE_SCOPES: tuple[str, ...] = ( 40 + "think/indexer", 41 + "think/importers", 42 + "think/search", 43 + "think/graph", 44 + ) 45 + 46 + # Direct-write operations. Indirect writes via helper methods (e.g. 47 + # ``checklist.save()``) are out of scope by design — the audit notes that 48 + # indirect writes are not reachable by grep. 49 + WRITE_PATTERNS: tuple[tuple[re.Pattern[str], str], ...] = ( 50 + (re.compile(r"\batomic_write\s*$"), "atomic_write"), 51 + (re.compile(r"\bjson\.dump\s*\("), "json.dump"), 52 + (re.compile(r"\.write_text\s*\("), ".write_text"), 53 + (re.compile(r"""\bopen\s*\([^)]*["']w[+b]?["']"""), 'open(..., "w")'), 54 + (re.compile(r"\bos\.unlink\s*\("), "os.unlink"), 55 + (re.compile(r"\.unlink\s*\(\s*(?:missing_ok|$)"), ".unlink()"), 56 + (re.compile(r"\b(?:shutil\.)?rmtree\s*\("), "rmtree"), 57 + ) 58 + 59 + # Strings / identifiers that indicate the write target sits under one of the 60 + # protected domains. The window-based proximity check below uses these to 61 + # decide whether a flagged write is on a domain path. 62 + TARGET_PATH_PATTERNS: tuple[re.Pattern[str], ...] = ( 63 + re.compile(r"journal/entities\b"), 64 + re.compile(r"journal/facets\b"), 65 + re.compile(r"journal/observations"), 66 + re.compile(r'["\']entities["\']'), 67 + re.compile(r'["\']facets["\']'), 68 + re.compile(r'["\']observations'), 69 + re.compile( 70 + r"\b(?:entity|facet|observation|observations?)_(?:path|dir|file|json)\b" 71 + ), 72 + ) 73 + 74 + # Read verbs per docs/coding-standards.md § L3. Matched against any 75 + # underscore-split segment of the function name, so ``keys_validate`` and 76 + # ``check_nudges`` both trip the rule. 77 + READ_VERBS: frozenset[str] = frozenset( 78 + { 79 + "load", 80 + "get", 81 + "read", 82 + "scan", 83 + "list", 84 + "show", 85 + "find", 86 + "match", 87 + "resolve", 88 + "query", 89 + "lookup", 90 + "status", 91 + "check", 92 + "validate", 93 + "discover", 94 + "format", 95 + "render", 96 + "extract", 97 + "parse", 98 + "view", 99 + "inspect", 100 + "info", 101 + "describe", 102 + "search", 103 + } 104 + ) 105 + 106 + # Known violations from the solstone layer-violations audit (2026-04-17). 107 + # Each entry silences the lint for an entire file until the underlying 108 + # violation is fixed. Remove the entry when its bundle ships. 109 + # 110 + # Audit ref: vpe/workspace/solstone-layer-violations-audit.md (extro repo). 111 + ALLOWLIST: dict[str, str] = { 112 + # TODO(V1): consolidate_segment_entities() stealth-writes entities from 113 + # the indexer. Remove after Bundle A (entity-write ownership) lands. 114 + "think/indexer/journal.py": "V1", 115 + # TODO(V2): seed_entities() creates entities from importer shared code. 116 + # Indirect writes go through save_journal_entity(), so the direct-write 117 + # grep does not flag the file today. Keep the entry so the file is 118 + # named alongside V1 as a known audit target; remove after Bundle A. 119 + "think/importers/shared.py": "V2", 120 + # TODO(import-resolve-facet): apps/import/call.py's `resolve-facet` 121 + # command uses a read-verb name ("resolve_*" per L3) but writes to 122 + # journal/facets and unlinks staged files. Not in the audit's V1-V14, 123 + # but surfaced by this lint on first run. Needs CPO/VPE disposition: 124 + # rename to a write verb (e.g. `apply-staged-facet` + `skip-staged-facet`) 125 + # or accept as a V13-class dual-mode verb. 126 + "apps/import/call.py": "import-resolve-facet", 127 + } 128 + 129 + CONTEXT_WINDOW = 8 # lines above and below each write to search for paths 130 + 131 + 132 + def tracked_python_files() -> list[Path]: 133 + result = subprocess.run( 134 + ["git", "ls-files", "*.py"], 135 + cwd=ROOT, 136 + check=True, 137 + capture_output=True, 138 + text=True, 139 + ) 140 + return [Path(line) for line in result.stdout.splitlines() if line] 141 + 142 + 143 + def in_infrastructure_scope(rel: Path) -> bool: 144 + path_str = rel.as_posix() 145 + return any(path_str.startswith(scope + "/") for scope in INFRASTRUCTURE_SCOPES) 146 + 147 + 148 + def is_call_py(rel: Path) -> bool: 149 + parts = rel.parts 150 + return len(parts) >= 3 and parts[0] == "apps" and parts[-1] == "call.py" 151 + 152 + 153 + def has_target_path_nearby(lines: list[str], idx: int) -> bool: 154 + start = max(0, idx - CONTEXT_WINDOW) 155 + end = min(len(lines), idx + CONTEXT_WINDOW + 1) 156 + window = "\n".join(lines[start:end]) 157 + return any(p.search(window) for p in TARGET_PATH_PATTERNS) 158 + 159 + 160 + def scan_lines(lines: list[str]) -> list[tuple[int, str]]: 161 + findings: list[tuple[int, str]] = [] 162 + for idx, line in enumerate(lines): 163 + for pat, label in WRITE_PATTERNS: 164 + if pat.search(line) and has_target_path_nearby(lines, idx): 165 + findings.append((idx + 1, label)) 166 + break 167 + return findings 168 + 169 + 170 + def has_read_verb(name: str) -> bool: 171 + base = name.lstrip("_") 172 + return any(part in READ_VERBS for part in base.split("_") if part) 173 + 174 + 175 + def check_call_py(rel: Path, source: str) -> list[tuple[int, str, str]]: 176 + """Flag writes inside read-verb function bodies. 177 + 178 + Returns a list of ``(line_no, write_label, function_name)`` tuples. 179 + """ 180 + try: 181 + tree = ast.parse(source, filename=str(rel)) 182 + except SyntaxError: 183 + return [] 184 + 185 + findings: list[tuple[int, str, str]] = [] 186 + src_lines = source.splitlines() 187 + 188 + for node in ast.walk(tree): 189 + if not isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)): 190 + continue 191 + if not has_read_verb(node.name): 192 + continue 193 + start = node.lineno - 1 194 + end = (node.end_lineno or node.lineno) - 1 195 + body_lines = src_lines[start : end + 1] 196 + sub_findings = scan_lines(body_lines) 197 + for local_line, label in sub_findings: 198 + findings.append((start + local_line, label, node.name)) 199 + return findings 200 + 201 + 202 + def main() -> int: 203 + new: list[str] = [] 204 + tracked: list[str] = [] 205 + 206 + for rel in sorted(tracked_python_files()): 207 + abs_path = ROOT / rel 208 + if not abs_path.is_file(): 209 + continue 210 + try: 211 + source = abs_path.read_text(encoding="utf-8") 212 + except UnicodeDecodeError: 213 + continue 214 + 215 + rel_str = rel.as_posix() 216 + issues: list[str] = [] 217 + 218 + if in_infrastructure_scope(rel): 219 + for line_no, label in scan_lines(source.splitlines()): 220 + issues.append( 221 + f"{rel_str}:{line_no}: {label} " 222 + f"on journal-domain path (infrastructure scope)" 223 + ) 224 + 225 + if is_call_py(rel): 226 + for line_no, label, func_name in check_call_py(rel, source): 227 + issues.append( 228 + f"{rel_str}:{line_no}: {label} in read-verb handler '{func_name}()'" 229 + ) 230 + 231 + if not issues: 232 + continue 233 + 234 + audit_ref = ALLOWLIST.get(rel_str) 235 + for issue in issues: 236 + if audit_ref: 237 + tracked.append(f"{issue} [tracked: {audit_ref}]") 238 + else: 239 + new.append(issue) 240 + 241 + if tracked: 242 + print("layer-hygiene: known violations (tracked, expected to disappear):") 243 + for line in tracked: 244 + print(f" {line}") 245 + print() 246 + 247 + if new: 248 + print("layer-hygiene: NEW violations:", file=sys.stderr) 249 + for line in new: 250 + print(f" {line}", file=sys.stderr) 251 + print(file=sys.stderr) 252 + print( 253 + "See docs/coding-standards.md § Layer Hygiene (L1/L2/L3/L6/L7).", 254 + file=sys.stderr, 255 + ) 256 + return 1 257 + 258 + print("layer-hygiene: pass") 259 + return 0 260 + 261 + 262 + if __name__ == "__main__": 263 + raise SystemExit(main())

Configure Feed

Configure Feed