atproto user agency toolkit for individuals and groups
8
fork

Configure Feed

Select the types of activity you want to include in your feed.

at main 959 lines 47 kB view raw view rendered
1# Policy Engine Design Research: Declarative, Deterministic, Transport-Agnostic Replication Policy for AT Protocol 2 3## Executive Summary 4 5This document maps the design space for a policy engine that governs AT Protocol account replication in P2PDS. The engine must be declarative (policies are data, not code), deterministic (same inputs yield same conclusions everywhere), transport-agnostic (outcomes matter, not mechanisms), and account-centric (DIDs are the primitive, not raw blocks). 6 7The research covers policy language options, set reconciliation primitives, compliance verification, group agreement, failure handling, resource accounting, prior art, and architecture sketches. It concludes with a recommended starting point: a minimal JSON-based policy document published as an atproto record, evaluated locally by each node, using the existing RASL verification layer for compliance checking. 8 9--- 10 11## 1. Policy Language / Representation 12 13### The Design Space 14 15The core question: how do you express rules like "node X must hold all blocks for DID Y" or "99.9% uptime required" as evaluable data? 16 17#### Option A: JSON/CBOR Constraint Documents 18 19**Description:** Policies are structured JSON or CBOR objects with a fixed schema, interpreted by a purpose-built evaluator. The simplest possible approach. 20 21```json 22{ 23 "$type": "org.p2pds.policy", 24 "version": 1, 25 "type": "mutual-aid", 26 "members": [ 27 { "did": "did:plc:alice", "node": "did:plc:alice-node" }, 28 { "did": "did:plc:bob", "node": "did:plc:bob-node" }, 29 { "did": "did:plc:carol", "node": "did:plc:carol-node" } 30 ], 31 "rules": { 32 "replication": { 33 "strategy": "full", 34 "minCopies": 2, 35 "subjects": ["did:plc:alice", "did:plc:bob", "did:plc:carol"] 36 }, 37 "verification": { 38 "interval": "30m", 39 "method": "rasl-sampling", 40 "sampleSize": 50 41 }, 42 "sync": { 43 "maxLag": "5m" 44 } 45 } 46} 47``` 48 49**Pros:** 50- Trivially serializable as atproto records (JSON maps to atproto lexicon records, CBOR is native to atproto repos) 51- No new runtime dependency -- evaluated by application code 52- Version-controlled, diff-friendly, human-readable 53- Deterministic by construction: a fixed schema with enumerated strategies leaves no room for nondeterminism 54- Smallest possible implementation surface 55 56**Cons:** 57- Limited expressiveness -- every new rule type requires schema changes and new evaluator code 58- No composition: cannot combine policies from different sources without custom merge logic 59- No formal semantics -- the evaluator *is* the semantics, which risks divergence across implementations 60 61**Fit assessment:** Best starting point. Covers mutual aid and basic SLA scenarios. Expressiveness limits are a feature at this stage -- they prevent over-engineering. 62 63#### Option B: Datalog-Style Logic Programs 64 65**Description:** Policies expressed as Datalog rules over facts derived from system state. Datalog is a restricted subset of Prolog that always terminates and has well-defined semantics (least fixed-point). 66 67```datalog 68% Facts (derived from system state) 69member(alice). member(bob). member(carol). 70node_holds(alice_node, alice). node_holds(alice_node, bob). 71node_holds(bob_node, alice). node_holds(bob_node, carol). 72 73% Policy rule 74compliant(DID) :- 75 member(DID), 76 #count { N : node_holds(N, DID), N \= home_node(DID) } >= 2. 77 78violation(DID) :- member(DID), \+ compliant(DID). 79``` 80 81**Pros:** 82- Formally well-defined: evaluation always terminates, results are deterministic given the same facts 83- Composable: rules from different sources can be combined (union of rule sets) 84- Well-studied: decades of database and logic programming research 85- Can express recursive relationships (e.g., transitive trust chains) 86 87**Cons:** 88- Requires embedding a Datalog engine (e.g., a JS implementation or WASM module) 89- Serializing Datalog programs as atproto records is awkward (text blob or custom AST format) 90- Higher barrier for policy authors -- most people cannot read Datalog 91- Numerical constraints (latency < 500ms, uptime > 99.9%) require extensions to standard Datalog (aggregates, arithmetic) 92 93**Fit assessment:** Excellent formal properties, but premature for v1. Good target for v2 if the JSON approach proves too limiting. 94 95#### Option C: Rego (Open Policy Agent) 96 97**Description:** Rego is OPA's purpose-built policy language. It is declarative, supports partial evaluation, and has been widely adopted for Kubernetes, CI/CD, and API gateway policies. 98 99```rego 100package p2pds.replication 101 102default compliant = false 103 104compliant { 105 input.copies >= data.policy.minCopies 106 input.sync_lag_seconds <= data.policy.maxLagSeconds 107 input.verification.passed == true 108} 109``` 110 111**Pros:** 112- Designed specifically for policy evaluation 113- Deterministic: given the same input document and data, evaluation produces the same result 114- Rich ecosystem: tooling, testing frameworks, playground 115- Handles complex nested data well (JSON-native) 116- Can be compiled to WASM for portable, sandboxed execution 117 118**Cons:** 119- Heavy dependency: OPA is a Go binary; WASM builds add ~5MB 120- Rego is its own language with a learning curve 121- Not atproto-native: policies would need to be stored as opaque text blobs in records 122- OPA's bundle distribution model assumes a central control plane, which conflicts with P2P architecture 123 124**Fit assessment:** OPA is well-proven but brings significant complexity. The WASM compilation path is interesting for ensuring all nodes run the exact same evaluator. Consider for v2/v3 if policy complexity demands it. 125 126#### Option D: CUE Language 127 128**Description:** CUE is a constraint-based configuration language where types, values, and constraints exist on a single continuum. Combining CUE values is associative, commutative, and idempotent -- ideal for merging policies from multiple sources. 129 130```cue 131policy: { 132 version: 1 133 type: "mutual-aid" 134 replication: { 135 minCopies: >=2 & <=5 136 strategy: "full" | "partial" 137 subjects: [...string] 138 } 139 sync: maxLag: <=300 // seconds 140} 141``` 142 143**Pros:** 144- Merge is mathematically well-defined: `a & b` always produces the same result regardless of order 145- Constraints are first-class: `>=2 & <=5` is a type, not a runtime check 146- Deterministic by construction (lattice-based evaluation) 147- Good fit for "multiple parties each add constraints" scenario 148 149**Cons:** 150- Go-only implementation; no mature JS/TS runtime 151- Not widely adopted outside Google-adjacent projects 152- Serialization as atproto records would require a custom mapping 153- The lattice model can produce confusing error messages when constraints conflict 154 155**Fit assessment:** Theoretically elegant for multi-party policy composition. Impractical today due to runtime availability. Worth watching. 156 157#### Option E: CEL (Common Expression Language) 158 159**Description:** CEL is Google's expression language used in Kubernetes ValidatingAdmissionPolicy. It evaluates deterministically, is designed to be embedded, and has a well-defined type system. 160 161```cel 162// Is this node compliant? 163node.copies >= policy.minCopies && 164 node.syncLag <= policy.maxLag && 165 node.verification.lastPassed 166``` 167 168**Pros:** 169- Designed for embedding in other systems 170- Deterministic, terminating (no loops, no side effects) 171- Small, well-defined: no surprise behaviors 172- JS/TS implementations exist (cel-js) 173- Good fit for "evaluate a boolean expression against structured input" 174 175**Cons:** 176- Expression language only -- no rule composition or inference 177- Less expressive than Datalog or Rego for complex policies 178- Still requires a runtime dependency 179 180**Fit assessment:** Good middle ground between raw JSON and full policy language. Could be used to express compliance predicates within a JSON policy document: the policy defines thresholds, and a CEL expression defines how to combine them. 181 182#### Option F: AWS IAM / Zanzibar-Style Relationship Model 183 184**Description:** Instead of general-purpose rules, model policy as relationships between entities (nodes, DIDs, groups) with permissions derived from relationship traversal. 185 186```json 187{ 188 "definition": { 189 "group": { 190 "relations": { "member": { "this": {} } }, 191 "permissions": { 192 "replicate": { "computedUserset": { "relation": "member" } } 193 } 194 } 195 } 196} 197``` 198 199**Pros:** 200- Proven at scale (Google Zanzibar handles trillions of relationships) 201- Natural fit for "who can do what to whose data" 202- Efficient: relationship checks are O(depth of graph), typically O(1)-O(3) 203 204**Cons:** 205- Designed for authorization ("can X do Y to Z?"), not obligation ("X must do Y for Z") 206- Doesn't naturally express SLA metrics, latency bounds, or storage quotas 207- Would need significant adaptation for replication policy 208 209**Fit assessment:** Wrong abstraction for the core problem. Replication policy is about obligations and compliance, not permissions. However, relationship-based modeling could complement the policy engine for "who is allowed to join this group" type decisions. 210 211### Recommendation 212 213**Start with Option A (JSON constraint documents)** stored as atproto records. The schema is the policy language. Keep the door open for Option E (CEL expressions) as inline predicates within JSON documents for v2. Option B (Datalog) is the long-term target if policy complexity grows significantly. 214 215The key insight: **the policy language does not need to be Turing-complete, and it should not be.** Determinism and termination are easier to guarantee when expressiveness is bounded. A fixed JSON schema with enumerated rule types is a deliberately limited language, and that limitation is a strength. 216 217--- 218 219## 2. Set Reconciliation as Policy Primitive 220 221### The Core Insight 222 223The fundamental assertion in replication policy is set membership: "node X holds all blocks for DID Y." This is a set containment claim: 224 225``` 226blocks(node_X, did_Y) ⊇ blocks(source_pds, did_Y) 227``` 228 229If we can efficiently verify this set relationship, we can verify policy compliance. Set reconciliation protocols exist precisely for this purpose. 230 231### Existing Set Reconciliation Approaches 232 233#### Negentropy (Range-Based Set Reconciliation) 234 235**How it works:** Both parties sort their items by (timestamp, ID). The initiator partitions its set into ranges and computes a fingerprint (incremental XOR hash) for each range. The responder compares fingerprints and recursively splits ranges where fingerprints differ until individual missing items are identified. 236 237**Communication complexity:** O(d * log(n)) where d is the number of differences and n is the total set size. When sets are nearly identical (common case for synced repos), this is very efficient. 238 239**Relevance to P2PDS:** Negentropy was designed for Nostr relay synchronization -- a remarkably similar problem (syncing sets of content-addressed events between nodes). The key difference: Nostr events have natural timestamps, while atproto blocks in an MST don't. However, atproto commits have revisions (TIDs), which provide a total ordering. 240 241**Integration approach:** Rather than reconciling raw block CIDs, reconcile at the commit level. Each node maintains a set of (rev, commit_cid) pairs for each DID. Range-based reconciliation identifies which commits are missing, and then the node fetches the missing commit's blocks. 242 243#### Invertible Bloom Lookup Tables (IBLTs) 244 245**How it works:** An IBLT is a probabilistic data structure that supports insertion, deletion, lookup, and -- crucially -- *listing* of elements. Two parties each construct an IBLT of their set, XOR them together, and the resulting IBLT can be "peeled" to recover the symmetric difference. 246 247**Communication complexity:** O(d) where d is the number of differences, but requires knowing d in advance to size the table correctly. If d is underestimated, decoding fails. 248 249**Relevance to P2PDS:** IBLTs are optimal when the difference size is known or bounded. For repos that sync frequently, the number of new commits since last sync is typically small and predictable. 250 251#### CertainSync / Rateless IBLTs (2024-2025) 252 253**How it works:** Extends IBLTs to be "rateless" -- the encoder produces an infinite stream of coded symbols, and the decoder can succeed as soon as enough symbols arrive to cover the actual difference. No need to estimate d in advance. 254 255**Communication complexity:** Near-optimal -- approaches the information-theoretic lower bound. 256 257**Relevance to P2PDS:** Eliminates the main drawback of IBLTs (requiring pre-knowledge of difference size). Still very new and implementations are limited. 258 259#### Minisketch (BCH-Based) 260 261**How it works:** Uses BCH error-correcting codes to encode set sketches. Two parties exchange sketches; the difference between sketches can be decoded to recover the symmetric difference. 262 263**Communication complexity:** O(d) with a constant factor determined by the sketch capacity. 264 265**Relevance to P2PDS:** Used by Bitcoin's Erlay protocol for transaction relay. Highly efficient for small differences. Limited to elements that can be represented as fixed-size integers (requires hashing CIDs to a fixed size). 266 267### Set Algebra for Policy 268 269Replication policies can be expressed as set algebra: 270 271- **Full replication:** `blocks(node_A, did_X) = blocks(authoritative_source, did_X)` for all members 272- **Minimum copies:** `|{ node : blocks(node, did_X) ⊇ blocks(source, did_X) }| >= minCopies` for each subject DID 273- **Partial replication:** `blocks(node_A, did_X) ⊇ selected_collections(did_X)` (only certain record collections) 274 275The existing codebase already tracks block CIDs per DID in the `replication_blocks` table (see `/Users/dietrich/misc/p2pds/src/replication/sync-storage.ts`). This is the foundation for set-based verification. 276 277### Proving Set Containment Without Full Transfer 278 279Several approaches for proving `A ⊇ B` without transferring either full set: 280 2811. **Fingerprint comparison:** Compute a fingerprint (hash) of the set. If fingerprints match, sets are equal with high probability. Fast but only proves equality, not containment. 282 2832. **Merkle proof:** The atproto MST (Merkle Search Tree) already provides this. The repo root CID is a commitment to the entire record set. If node X has root CID R for did Y, and R matches the authoritative root, X has all the data. This is exactly what the existing Layer 0 verification does. 284 2853. **Bloom filter challenge:** The verifier constructs a Bloom filter of their set and sends it. The prover checks membership of random elements. False positives make this a probabilistic proof. 286 2874. **Random sampling (current approach):** Randomly select k block CIDs and check if the node can serve them. This is what the existing RASL verification layer does (see `/Users/dietrich/misc/p2pds/src/replication/verification.ts`). 288 289### Recommendation 290 291**Use commit-level Negentropy for sync, MST root comparison for equality proof, and RASL sampling for ongoing verification.** This layered approach matches the existing architecture: 292 293- Layer 0 (commit root) = MST root CID comparison = set equality proof 294- Layer 1 (RASL sampling) = probabilistic set containment proof 295- Layer 2 (future: Negentropy) = efficient difference computation for sync 296 297The policy engine doesn't need to know *how* reconciliation happens. It checks the *outcome*: "does node X's root CID for DID Y match the authoritative root CID?" This is the transport-agnostic principle in action. 298 299--- 300 301## 3. Compliance Verification 302 303### The Verification Spectrum 304 305From weakest to strongest: 306 3071. **Self-reporting:** Node claims "I have all blocks for DID X." (Trust-based, trivially gameable) 3082. **Peer-reporting:** Other nodes report on a node's behavior. (Social trust, Sybil-vulnerable) 3093. **Spot-check sampling:** Random block challenges via RASL. (Existing implementation) 3104. **Full set verification:** Compare complete block sets via reconciliation. (Expensive but definitive) 3115. **Cryptographic proof-of-storage:** Node proves it holds data without transferring it. (Filecoin-style, heavyweight) 312 313### What P2PDS Already Has 314 315The existing codebase implements a layered verification system (see `/Users/dietrich/misc/p2pds/src/replication/verification.ts`): 316 317- **Layer 0 (Commit Root):** Fetch the root CID via RASL endpoint. A 200 with correct bytes proves the remote serves the current repo head. This is a set equality check (if roots match, MSTs match, records match). 318- **Layer 1 (RASL Sampling):** Fetch random blocks via HTTP, compare with local copy. Content-addressed retrieval is unforgeable: correct bytes = peer has the data. 319- **Layer 2 (Bitswap):** Stub -- future IPFS network verification. 320- **Layer 3 (MST Path Proof):** Stub -- future sync.getRecord + CAR verification. 321 322This is already a solid foundation. The policy engine needs to: 3231. Define *what* verification is required (which layers, how often, what sample sizes) 3242. Record verification results 3253. Evaluate compliance based on results over time 326 327### Challenge-Response Protocols 328 329For stronger guarantees without the full Filecoin treatment: 330 331**Lightweight Proof-of-Data-Possession (PDP):** 332Filecoin recently introduced PDP for "hot storage" verification. It uses standard SHA-256 hashing -- no specialized hardware required. For a 1 GiB dataset with 256 KiB chunks, each proof requires only 12 hashes (~384 bytes) and 12 hash computations. 333 334**Adaptation for P2PDS:** 335The atproto MST already provides Merkle proofs. To challenge a node: 3361. Pick a random record path (e.g., `app.bsky.feed.post/3k2abc...`) 3372. Request the MST path proof via `com.atproto.sync.getRecord` 3383. Verify the proof against the known root CID 339 340This is essentially Layer 3 of the existing verification architecture. The MST *is* the Merkle tree, and atproto already defines the proof protocol. No new cryptography needed. 341 342### Verification Scheduling 343 344The policy should define: 345- `verificationInterval`: How often to run verification (e.g., every 30 minutes) 346- `verificationSampleSize`: How many blocks/records to check per run 347- `verificationLayers`: Which layers are required (e.g., [0, 1] for basic, [0, 1, 3] for strong) 348- `complianceWindow`: How many consecutive failures before non-compliance (e.g., 3) 349- `graceperiod`: Time after joining before verification starts (e.g., 1 hour) 350 351### Recommendation 352 353**Extend the existing layered verification to be policy-driven.** The verification infrastructure exists. What's missing is: 3541. Policy-defined verification parameters (currently hardcoded in `DEFAULT_VERIFICATION_CONFIG`) 3552. Compliance history tracking (currently only tracks last verification result) 3563. Compliance evaluation logic (pass/fail over a time window) 357 358--- 359 360## 4. Group Agreement / Policy Distribution 361 362### The Challenge 363 364Policy is not useful unless all parties agree on it. In a P2P system, there's no central authority to impose policy. Nodes must: 3651. Discover policies that apply to them 3662. Agree to be bound by a policy 3673. Detect when policy changes 3684. Handle the transition period during policy updates 369 370### Policy as Atproto Records 371 372The most natural approach for P2PDS: publish policies as atproto records in a new lexicon namespace. 373 374``` 375Lexicon: org.p2pds.policy 376Record: { 377 $type: "org.p2pds.policy", 378 version: 1, 379 name: "mutual-aid-group-1", 380 description: "Three-node mutual aid cluster", 381 rules: { ... }, 382 members: [ ... ], 383 createdAt: "2025-01-01T00:00:00Z", 384 updatedAt: "2025-01-01T00:00:00Z" 385} 386``` 387 388**Policy Lifecycle:** 3891. **Creation:** A node publishes a policy record to its repo 3902. **Discovery:** Other nodes discover it via record listing (similar to how `PeerDiscovery` works now with `org.p2pds.manifest`) 3913. **Acknowledgment:** Each member publishes an `org.p2pds.policy.ack` record referencing the policy 3924. **Activation:** Policy becomes active when all members have acknowledged 3935. **Update:** Publisher creates new version; members must re-acknowledge 3946. **Revocation:** Publisher deletes the record or sets status to "revoked" 395 396``` 397Lexicon: org.p2pds.policy.ack 398Record: { 399 $type: "org.p2pds.policy.ack", 400 policy: "at://did:plc:alice/org.p2pds.policy/mutual-aid-1", 401 policyVersion: 1, 402 accepted: true, 403 acceptedAt: "2025-01-01T00:01:00Z" 404} 405``` 406 407### Multi-Party Agreement 408 409**Simple approach (sufficient for v1):** One node is the policy "author." It publishes the policy. Each other member acknowledges it. The policy is active when all listed members have acknowledged. 410 411**More decentralized approach (v2):** Policy proposals are published as records. Members vote by publishing ack records. A quorum threshold (e.g., 2/3 of members) activates the policy. This mirrors DAO governance without a blockchain. 412 413**Signature-based approach (v3):** The policy document includes a field for member signatures (using atproto signing keys). A policy is valid only when it contains valid signatures from all/quorum members. This is a multi-sig scheme, similar to how DAOs use multi-sig wallets. 414 415### Policy Versioning and Migration 416 417When policy changes, nodes need a migration path: 418 4191. **Grace period:** New policy published with `effectiveAt` timestamp in the future. Nodes have until then to comply. 4202. **Compatibility window:** Both old and new policy are valid during transition. Nodes evaluate compliance against the more lenient of the two. 4213. **Breaking changes:** If new policy adds members or increases minCopies, the grace period must be long enough for sync to complete. 422 423### Discovery Mechanisms 424 425How does a node learn about policies that involve it? 426 4271. **Direct configuration:** Node operator configures policy URIs in `.env` or config file (simplest, similar to current `REPLICATE_DIDS`) 4282. **Record scanning:** Node periodically checks other nodes' repos for `org.p2pds.policy` records that mention its DID 4293. **PubSub notification:** Policy changes announced via IPFS gossipsub topics 4304. **DID document service endpoint:** Add a `#p2pds_policy` service entry to DID documents pointing to the policy 431 432### Recommendation 433 434**Start with direct configuration + policy-as-atproto-records.** The operator configures a list of policy AT-URIs. The node fetches and evaluates them. Publishing and acknowledgment create an auditable record in each participant's repo. 435 436--- 437 438## 5. Failure Handling and Incentives 439 440### Failure Modes 441 4421. **Temporary downtime:** Node is offline for minutes to hours (network issues, restarts) 4432. **Extended outage:** Node is offline for days (hardware failure, migration) 4443. **Sync lag:** Node is online but behind on syncing (slow network, overloaded) 4454. **Verification failure:** Node claims to hold data but fails spot checks (data corruption, incomplete sync) 4465. **Abandonment:** Node permanently disappears 4476. **Byzantine behavior:** Node deliberately serves incorrect data 448 449### Response Spectrum 450 451#### For Mutual Aid 452 453The goal is resilience, not punishment. Graduated responses: 454 455``` 456Level 0 - OK: All checks passing 457Level 1 - Degraded: 1-2 consecutive verification failures 458 Action: Log warning, increase verification frequency 459Level 2 - At Risk: 3+ consecutive failures OR >1hr sync lag 460 Action: Alert group members, other nodes begin 461 pre-emptive replication of at-risk data 462Level 3 - Failed: >24hr offline OR persistent verification failures 463 Action: Remove from compliance count, redistribute 464 replication obligations to remaining nodes 465Level 4 - Abandoned: >7 days with no heartbeat 466 Action: Remove from group membership, recalculate 467 replication targets 468``` 469 470**Rebalancing:** When a node fails, the remaining nodes must absorb its obligations. If the policy requires minCopies=2 and one of three nodes fails, the remaining two nodes each now hold all data (they already did -- they just become the only holders). The policy now shows "at risk" because there's no redundancy margin. 471 472#### For SaaS / SLA 473 474The goal is measurable compliance for reputation or payment. Stricter thresholds: 475 476``` 477Metric | Threshold | Measurement 478--------------------|-------------|------------------ 479Uptime | 99.9% | Heartbeat checks 480Sync latency | < 5min | Time from commit to local sync 481Block serve latency | < 500ms | RASL response time 482Verification pass | 100% | Layer 0 + Layer 1 must pass 483``` 484 485Compliance is calculated over a rolling window (e.g., 30 days). A node earns reputation/payment proportional to its compliance score. 486 487### Distinguishing Temporary vs. Permanent Failure 488 489This is fundamentally a timeout problem. The classic approach: 490 4911. **Heartbeat:** Nodes periodically announce presence (e.g., by updating their `org.p2pds.peer` record's `createdAt` timestamp, or via IPFS pubsub). 4922. **Exponential backoff categorization:** 493 - < 5 min: Probably transient (network blip) 494 - 5 min - 1 hour: Likely restart or deployment 495 - 1 hour - 24 hours: Possible hardware issue 496 - > 24 hours: Likely abandoned 4973. **Self-reporting on recovery:** When a node comes back, it announces its return and catches up on sync. The policy engine should have a "recovery" state that gives the node time to re-sync before enforcing full compliance. 498 499### Incentive Mechanisms 500 501For mutual aid, the incentive is reciprocal: "I hold your data because you hold mine." This is BitTorrent's tit-for-tat insight applied to storage. Key difference: unlike BitTorrent (where choking is instantaneous), storage obligations persist over time. You can't "unchoke" someone's data. 502 503**Trust score:** Each node maintains a score for each peer based on: 504- Verification pass rate over time 505- Sync lag average 506- Uptime percentage 507- Duration of membership 508 509This score is locally computed (not broadcast) and can inform local decisions like "should I replicate data for this node if it's not part of my policy?" 510 511### Recommendation 512 513**Implement graduated failure handling with configurable thresholds in the policy document.** Start simple: 514- `offlineGracePeriod`: How long before a node is marked non-compliant (e.g., "1h") 515- `recoveryGracePeriod`: How long a returning node has to catch up (e.g., "30m") 516- `maxConsecutiveFailures`: Verification failures before non-compliance (e.g., 3) 517 518The policy document defines thresholds; the local evaluator applies them to observed state. 519 520--- 521 522## 6. Resource Accounting 523 524### What to Track 525 526| Resource | Unit | Source | Verifiable? | 527|----------|------|--------|-------------| 528| Storage used | Bytes per DID | Local measurement | Self-reported, verifiable via block count | 529| Block count | Count per DID | Local DB query | Verifiable via set reconciliation | 530| Sync latency | Seconds | Time between commit and local sync | Self-reported | 531| Verification pass rate | Percentage | Verification history | Recorded by verifier | 532| Uptime | Percentage | Heartbeat history | Observed by peers | 533| Bandwidth consumed | Bytes in/out | Local measurement | Self-reported only | 534 535### The Self-Reporting Problem 536 537In a decentralized system, most metrics are self-reported. A dishonest node can claim 100% uptime, zero latency, and perfect verification scores. 538 539**Mitigations:** 540 5411. **Cross-verification:** Node A's claim about serving DID X is verified by Node B fetching blocks from Node A via RASL. The existing verification layer already does this. 542 5432. **Statistical sampling:** Don't verify everything -- verify enough to make cheating statistically improbable. If you sample 50 random blocks and all are correct, the probability of the node holding less than 99% of blocks is extremely low. (This is the existing RASL sampling approach.) 544 5453. **Peer-observed metrics:** Uptime and latency can be measured by peers. If three nodes all agree that Node D has been offline for 2 hours, that's more credible than Node D's self-report. 546 5474. **Commit-reveal for sync timing:** To prevent a node from claiming fast sync while actually syncing lazily: 548 - When a commit occurs, the authoritative PDS publishes a commitment (hash of commit CID + timestamp) 549 - Replicating nodes must publish their own receipt within the sync window 550 - The receipt includes the commit CID, proving they actually synced 551 552### Resource Accounting in the Policy 553 554```json 555{ 556 "accounting": { 557 "trackStorage": true, 558 "trackSyncLatency": true, 559 "trackUptime": true, 560 "reportingInterval": "1h", 561 "retentionPeriod": "30d" 562 } 563} 564``` 565 566Resource accounting data could be stored in a local SQLite table (extending the existing `replication_state` schema) and optionally published as atproto records for auditability. 567 568### Recommendation 569 570**Track storage and verification locally in SQLite. Measure uptime and sync latency via peer observation. Defer bandwidth accounting to v2.** The existing `SyncStorage` class already tracks per-DID sync state, block counts, and verification timestamps. Extending it with a time-series table for compliance history is straightforward. 571 572--- 573 574## 7. Existing Systems / Prior Art 575 576### Filecoin: The Heavyweight Approach 577 578**Relevant patterns:** 579- *Deal-making protocol:* Storage provider and client agree on terms (price, duration, replication factor) before storage begins. Analogous to our policy agreement. 580- *Sector sealing:* Data is encoded uniquely per-provider to prevent Sybil attacks (claiming multiple copies from one physical copy). P2PDS doesn't need this -- content-addressing already handles deduplication. 581- *WindowPoSt:* Periodic proof that data is still stored, checked every 24-hour window. Analogous to our periodic verification. 582- *Proof of Data Possession (PDP):* Lightweight SHA-256 based verification for hot storage. Most applicable to P2PDS. 583 584**What to take:** The deal-making lifecycle (propose, accept, active, expired, faulted). The periodic windowed verification model. The graduated fault handling (fault -> recovery period -> penalty/termination). 585 586**What to skip:** The heavyweight cryptography (SNARKs, VDFs), the blockchain settlement layer, sector sealing. These solve problems (preventing Sybil storage, trustless payment) that P2PDS handles differently (social trust + content-addressed verification). 587 588### Ceramic / ComposeDB: Decentralized Data with Sync 589 590**Relevant patterns:** 591- *Streams:* Append-only logs of events, similar to atproto commit sequences 592- *StreamTypes:* Define validation rules for streams (who can write, what schema). Analogous to our policy types. 593- *Event-based sync:* Nodes cooperate to distribute events to all interested consumers 594- *Historical sync:* Nodes can sync data from before they joined the network 595 596**What to take:** The concept of "interest" -- nodes declare which data they're interested in, and the network routes accordingly. This maps to our manifest records. 597 598**What to skip:** Ceramic's consensus layer (RAFT-based) and blockchain anchoring. Our content-addressed verification provides similar guarantees without consensus. 599 600### Kubernetes: Declarative Desired-State Reconciliation 601 602**Relevant patterns:** 603- *Spec vs. Status:* Every resource has a `spec` (desired state) and `status` (observed state). Controllers continuously reconcile status toward spec. 604- *Reconciliation loop:* Observe -> Diff -> Act, repeated. Idempotent. 605- *Conditions:* Status includes machine-readable conditions (Ready, Progressing, Degraded) with timestamps and reasons. 606- *ValidatingAdmissionPolicy:* CEL expressions evaluate constraints declaratively, in-process, without external webhooks. 607 608**What to take:** The spec/status pattern is directly applicable. Policy = spec. Observed compliance = status. The reconciliation loop is our sync + verify cycle. Conditions give us a model for graduated compliance states. 609 610``` 611Policy Spec (desired state): 612 - Node A holds DID X, DID Y 613 - Verification every 30m 614 - minCopies = 2 615 616Compliance Status (observed state): 617 - Node A: DID X synced (rev abc), DID Y synced (rev def) 618 - Last verification: 10m ago, passed 619 - Conditions: 620 - type: Compliant, status: True, reason: AllChecksPassing 621 - type: InSync, status: True, reason: SyncedWithin5m 622``` 623 624**This is the most directly applicable pattern in the entire research.** Kubernetes solved declarative desired-state reconciliation at scale. The same pattern works for replication policy. 625 626### Open Policy Agent (OPA): General-Purpose Policy Engine 627 628**Relevant patterns:** 629- *Decision as data:* Policy evaluation produces a JSON decision document, not a side effect 630- *Bundle distribution:* Policies packaged as bundles, distributed to evaluators 631- *Partial evaluation:* Compile policy partially when some inputs are known, evaluate the rest at runtime 632 633**What to take:** The separation of "policy definition" from "policy evaluation" from "policy enforcement." In P2PDS terms: define policy (atproto records), evaluate compliance (local engine), enforce (sync/rebalance actions). 634 635**What to skip:** The centralized bundle server model. In P2PDS, policies are discovered via atproto records, not pushed from a central authority. 636 637### BitTorrent: Tit-for-Tat Incentives 638 639**Relevant patterns:** 640- *Choking algorithm:* Upload to peers that upload to you. Reciprocity drives cooperation. 641- *Optimistic unchoking:* Periodically try new peers to discover better partners. 642- *Rarest-first strategy:* Prioritize replicating the rarest pieces to maximize network resilience. 643 644**What to take:** The principle that reciprocity is a sufficient incentive for cooperation in voluntary networks. In mutual aid, "I host your data because you host mine" is the storage equivalent of tit-for-tat. 645 646**Adaptation:** In BitTorrent, choking is instantaneous (stop uploading). In P2PDS, the equivalent would be "stop syncing new commits for a non-reciprocating peer" -- but you can't delete already-stored data without violating the policy. This asymmetry means tit-for-tat works differently for storage vs. bandwidth. 647 648### Hypercore / Dat: Sparse Replication 649 650**Relevant patterns:** 651- *Sparse mode:* Only download blocks that are explicitly requested. The Want/Have protocol. 652- *Selective sync:* Use a `.datdownload` file to specify which files to sync. 653- *Signed Merkle tree:* Each entry in the feed is authenticated by the feed author. 654 655**What to take:** The concept of selective replication -- not every node needs every record. A policy could specify that Node A replicates `app.bsky.feed.post` records but not `app.bsky.feed.like` records. This is "collection-level partial replication." 656 657**Relevance to policy:** Enables more flexible policies: "hold all posts but not all likes" is a valid replication strategy that reduces storage costs while preserving the most important data. 658 659### Nostr: Relay Policy and Negentropy Sync 660 661**Relevant patterns:** 662- *NIP-11 (Relay Information Document):* Relays publish their policies (what events they accept, retention periods, limitations) as a machine-readable JSON document. 663- *NIP-77 (Negentropy Syncing):* Efficient set reconciliation between relays using the Negentropy protocol. 664- *Event filtering:* Relays apply policies to decide which events to store and serve. 665 666**What to take:** NIP-11 is remarkably close to what P2PDS needs -- a machine-readable policy document published by each node. The Negentropy integration shows that range-based set reconciliation works well for syncing content-addressed data between nodes. 667 668**Direct applicability:** Nostr's relay-relay sync via Negentropy is the closest existing analog to P2PDS node-node repo sync. The main difference: Nostr events are independent, while atproto repos have a Merkle tree structure that provides stronger integrity guarantees. 669 670--- 671 672## 8. Architecture Sketch 673 674### Core Abstractions 675 676``` 677+-----------+ +-----------+ +------------+ +------------+ 678| Policy |---->| Obligation|---->|Verification|---->| Compliance | 679| (desired | | (what each| | (checking | | (did they | 680| state) | | node must| | that work | | do it?) | 681| | | do) | | was done) | | | 682+-----------+ +-----------+ +------------+ +------------+ 683 | | | | 684 v v v v 685 atproto local DB RASL/IPFS/sync local DB + 686 records (obligation (verification atproto 687 schedule) infrastructure) records 688``` 689 690**Policy:** The declarative document defining desired state. Published as atproto records. Immutable per version. 691 692**Obligation:** The derived per-node work items. "Node A must sync DID X, DID Y, DID Z." Computed from policy by the local evaluator. Stored in local DB. 693 694**Verification:** The process of checking that obligations are met. Uses the existing layered verification system. Transport-agnostic. 695 696**Compliance:** The result of evaluating verification history against policy requirements. "Node A is compliant / non-compliant / degraded." Stored locally and optionally published. 697 698### The Evaluation Loop 699 700Directly inspired by Kubernetes controllers: 701 702``` 703 +-------------------+ 704 | | 705 +---------+ OBSERVE | 706 | | - Fetch policies | 707 | | - Check sync state| 708 | | - Run verification| 709 | +--------+----------+ 710 | | 711 | v 712 | +--------+----------+ 713 | | | 714 | | EVALUATE | 715 | | - Derive obligations 716 | | - Compare desired vs actual 717 | | - Determine compliance 718 | +--------+----------+ 719 | | 720 | v 721 | +--------+----------+ 722 | | | 723 +<--------+ ACT | 724 | - Trigger sync | 725 | - Update status | 726 | - Notify/alert | 727 | - Rebalance | 728 +-------------------+ 729``` 730 731This loop runs on a timer (e.g., every 5 minutes) and also in response to events (new commit observed, peer heartbeat missed, verification completed). 732 733### Where State Lives 734 735| State | Storage | Why | 736|-------|---------|-----| 737| Policy documents | Atproto records (`org.p2pds.policy`) | Auditable, discoverable, signed by author | 738| Policy acknowledgments | Atproto records (`org.p2pds.policy.ack`) | Auditable proof of agreement | 739| Derived obligations | Local SQLite | Ephemeral, recomputable from policy | 740| Sync state | Local SQLite (existing `replication_state`) | Operational state, changes frequently | 741| Verification results | Local SQLite (new table) | Time-series data, queried for compliance | 742| Compliance status | Local SQLite + optionally atproto records | Local for evaluation, published for transparency | 743| Peer heartbeats | Local SQLite (observed) + atproto records (published) | Both self-reported and peer-observed | 744 745### Integration with Existing Codebase 746 747The policy engine slots in between the existing `ReplicationManager` and the sync/verification infrastructure: 748 749``` 750Current: 751 Config (REPLICATE_DIDS) -> ReplicationManager -> syncAll() -> verify() 752 753With Policy Engine: 754 PolicyEngine -> evaluatePolicies() -> derive obligations 755 | 756 v 757 ReplicationManager -> syncObligations() -> verify() -> reportCompliance() 758 ^ 759 | 760 PolicyDiscovery -> fetch org.p2pds.policy records from peers 761``` 762 763The `REPLICATE_DIDS` config becomes a fallback / bootstrap mechanism. Once policies are discovered, they take precedence. 764 765### Key Interfaces 766 767```typescript 768/** A policy document (deserialized from atproto record) */ 769interface Policy { 770 version: number; 771 type: "mutual-aid" | "sla" | "custom"; 772 members: PolicyMember[]; 773 rules: PolicyRules; 774 effectiveAt: string; 775 expiresAt?: string; 776} 777 778interface PolicyMember { 779 did: string; // The DID being served 780 nodeId: string; // The node responsible (could be same DID or different) 781} 782 783interface PolicyRules { 784 replication: { 785 strategy: "full" | "partial"; 786 minCopies: number; 787 subjects: string[]; // DIDs to replicate 788 collections?: string[]; // Optional: only these collections 789 }; 790 verification: { 791 interval: string; // Duration like "30m" 792 layers: number[]; // Which verification layers [0, 1, 3] 793 sampleSize: number; 794 }; 795 sync: { 796 maxLag: string; // Duration like "5m" 797 }; 798 compliance: { 799 offlineGracePeriod: string; 800 recoveryGracePeriod: string; 801 maxConsecutiveFailures: number; 802 }; 803} 804 805/** The obligation derived for a specific node from a policy */ 806interface Obligation { 807 policyUri: string; // AT-URI of the policy record 808 nodeId: string; // This node's DID 809 subjectDid: string; // DID to replicate 810 strategy: "full" | "partial"; 811 collections?: string[]; 812 verificationInterval: number; // ms 813 maxSyncLag: number; // ms 814} 815 816/** Compliance status for a node within a policy */ 817interface ComplianceStatus { 818 policyUri: string; 819 nodeId: string; 820 status: "compliant" | "degraded" | "non-compliant" | "unknown"; 821 obligations: ObligationStatus[]; 822 lastEvaluated: string; 823} 824 825interface ObligationStatus { 826 subjectDid: string; 827 synced: boolean; 828 lastSyncRev: string | null; 829 syncLag: number | null; // ms 830 verificationPassed: boolean; 831 consecutiveFailures: number; 832 lastVerified: string | null; 833} 834``` 835 836### Minimum Viable Architecture 837 838For v1, the simplest implementation that's still useful: 839 8401. **Policy document:** JSON object matching the schema above, stored as an `org.p2pds.policy` record 8412. **Policy evaluator:** A function that takes a Policy + current SyncState[] and returns ComplianceStatus 8423. **Policy-aware ReplicationManager:** Instead of iterating `REPLICATE_DIDS`, iterates derived obligations 8434. **Compliance reporting:** Log-level reporting + updated manifest records 844 845No new dependencies. No new languages. Just structured JSON and TypeScript evaluation logic. 846 847--- 848 849## 9. Open Questions 850 8511. **Policy authority:** Who gets to create policies? Anyone who lists your DID? Only you? Only DIDs you've explicitly authorized? This intersects with consent/authorization, which is a separate design problem. 852 8532. **Policy conflicts:** What if two policies require different replication strategies for the same DID? Precedence rules? Union of requirements? Error? 854 8553. **DID-to-node mapping:** The CLAUDE.md notes this as an open problem. A single DID can have multiple nodes. A single node can serve multiple DIDs. The policy needs to address which nodes are obligated, not just which DIDs. 856 8574. **Storage quotas:** How do you limit the total storage a node commits to? If 100 policies each require 1GB, the node needs 100GB. Who enforces limits? 858 8595. **Collection-level policies:** Can a policy specify "replicate app.bsky.feed.post but not app.bsky.feed.like"? This requires MST-aware partial replication, which the current sync (full repo CAR fetch) doesn't support. 860 8616. **Cross-policy verification:** If a node participates in multiple policies, should verification be shared (verify once, count for all policies) or independent (each policy verifies separately)? 862 8637. **Privacy:** Publishing policies as atproto records makes group membership public. Is this always desirable? Some groups might want private policies. 864 8658. **Bootstrapping:** How does a new node catch up on existing policy state? It needs to discover policies, acknowledge them, sync all required data, and pass verification before being counted as compliant. 866 8679. **Clock skew:** Compliance evaluation depends on timestamps (sync lag, verification intervals). How much clock skew is tolerable between nodes? 868 86910. **Incentive alignment for verification:** The verifier and the verified are both group members. What prevents collusion (node A "verifies" node B without actually checking)? 870 871--- 872 873## 10. Recommended Starting Point 874 875### The Simplest Useful Thing: Mutual Aid with Full Replication 876 877**Scenario:** Three nodes (Alice, Bob, Carol) each run a P2PDS instance. Each node holds complete replicas of all three members' repos. The policy is: "every member's data is held by at least 2 other members." 878 879**Implementation plan:** 880 881#### Step 1: Define the Policy Schema 882 883Create a Lexicon definition for `org.p2pds.policy` with the minimal fields: 884 885```json 886{ 887 "$type": "org.p2pds.policy", 888 "version": 1, 889 "type": "mutual-aid", 890 "name": "my-cluster", 891 "members": [ 892 { "did": "did:plc:alice" }, 893 { "did": "did:plc:bob" }, 894 { "did": "did:plc:carol" } 895 ], 896 "rules": { 897 "replication": { 898 "strategy": "full", 899 "minCopies": 2 900 }, 901 "verification": { 902 "interval": "30m", 903 "sampleSize": 50 904 }, 905 "sync": { 906 "maxLag": "5m" 907 } 908 } 909} 910``` 911 912#### Step 2: Publish and Discover 913 914- On startup, each node publishes its policy record to its own repo 915- Each node fetches policy records from all configured peer DIDs (extending existing `PeerDiscovery`) 916- Derive obligations: "I must replicate all member DIDs except my own" 917 918#### Step 3: Evaluate Compliance 919 920Add a `PolicyEvaluator` class: 921 922```typescript 923class PolicyEvaluator { 924 evaluate(policy: Policy, syncStates: SyncState[]): ComplianceStatus { 925 // For each subject DID in the policy: 926 // 1. Check if we've synced recently (syncLag < maxLag) 927 // 2. Check if verification passed 928 // 3. Determine compliance status 929 // Return aggregate compliance 930 } 931} 932``` 933 934This replaces the hardcoded `REPLICATE_DIDS` config with policy-driven obligation derivation. 935 936#### Step 4: Report Compliance 937 938- Update manifest records with compliance status 939- Log compliance changes 940- (Future: publish compliance attestations as atproto records) 941 942#### What This Gets You 943 944- **Declarative:** Policy is a JSON document, not imperative code 945- **Deterministic:** Given the same policy + sync states, any node computes the same compliance result 946- **Transport-agnostic:** Policy checks sync state and verification results, not how blocks were transferred 947- **Account-centric:** Policy lists DIDs, not block CIDs or IPFS hashes 948- **Publishable:** Policy lives in atproto repos, auditable by anyone 949- **Minimal:** No new dependencies, no new languages, ~200-300 lines of TypeScript 950 951#### What It Doesn't Handle (Yet) 952 953- Multi-party policy negotiation (v2: add acknowledgment records) 954- SLA metrics and reputation (v2: add time-series tracking) 955- Partial/collection-level replication (v2: needs MST-aware sync) 956- Failure rebalancing (v2: automated obligation redistribution) 957- CEL/Datalog predicates (v3: if JSON schema proves too limiting) 958 959This starting point establishes the core abstraction (policy -> obligation -> verification -> compliance) and the integration pattern (policy-driven ReplicationManager). Everything else is additive.