atproto user agency toolkit for individuals and groups
1# Policy Engine Design Research: Declarative, Deterministic, Transport-Agnostic Replication Policy for AT Protocol
2
3## Executive Summary
4
5This document maps the design space for a policy engine that governs AT Protocol account replication in P2PDS. The engine must be declarative (policies are data, not code), deterministic (same inputs yield same conclusions everywhere), transport-agnostic (outcomes matter, not mechanisms), and account-centric (DIDs are the primitive, not raw blocks).
6
7The research covers policy language options, set reconciliation primitives, compliance verification, group agreement, failure handling, resource accounting, prior art, and architecture sketches. It concludes with a recommended starting point: a minimal JSON-based policy document published as an atproto record, evaluated locally by each node, using the existing RASL verification layer for compliance checking.
8
9---
10
11## 1. Policy Language / Representation
12
13### The Design Space
14
15The core question: how do you express rules like "node X must hold all blocks for DID Y" or "99.9% uptime required" as evaluable data?
16
17#### Option A: JSON/CBOR Constraint Documents
18
19**Description:** Policies are structured JSON or CBOR objects with a fixed schema, interpreted by a purpose-built evaluator. The simplest possible approach.
20
21```json
22{
23 "$type": "org.p2pds.policy",
24 "version": 1,
25 "type": "mutual-aid",
26 "members": [
27 { "did": "did:plc:alice", "node": "did:plc:alice-node" },
28 { "did": "did:plc:bob", "node": "did:plc:bob-node" },
29 { "did": "did:plc:carol", "node": "did:plc:carol-node" }
30 ],
31 "rules": {
32 "replication": {
33 "strategy": "full",
34 "minCopies": 2,
35 "subjects": ["did:plc:alice", "did:plc:bob", "did:plc:carol"]
36 },
37 "verification": {
38 "interval": "30m",
39 "method": "rasl-sampling",
40 "sampleSize": 50
41 },
42 "sync": {
43 "maxLag": "5m"
44 }
45 }
46}
47```
48
49**Pros:**
50- Trivially serializable as atproto records (JSON maps to atproto lexicon records, CBOR is native to atproto repos)
51- No new runtime dependency -- evaluated by application code
52- Version-controlled, diff-friendly, human-readable
53- Deterministic by construction: a fixed schema with enumerated strategies leaves no room for nondeterminism
54- Smallest possible implementation surface
55
56**Cons:**
57- Limited expressiveness -- every new rule type requires schema changes and new evaluator code
58- No composition: cannot combine policies from different sources without custom merge logic
59- No formal semantics -- the evaluator *is* the semantics, which risks divergence across implementations
60
61**Fit assessment:** Best starting point. Covers mutual aid and basic SLA scenarios. Expressiveness limits are a feature at this stage -- they prevent over-engineering.
62
63#### Option B: Datalog-Style Logic Programs
64
65**Description:** Policies expressed as Datalog rules over facts derived from system state. Datalog is a restricted subset of Prolog that always terminates and has well-defined semantics (least fixed-point).
66
67```datalog
68% Facts (derived from system state)
69member(alice). member(bob). member(carol).
70node_holds(alice_node, alice). node_holds(alice_node, bob).
71node_holds(bob_node, alice). node_holds(bob_node, carol).
72
73% Policy rule
74compliant(DID) :-
75 member(DID),
76 #count { N : node_holds(N, DID), N \= home_node(DID) } >= 2.
77
78violation(DID) :- member(DID), \+ compliant(DID).
79```
80
81**Pros:**
82- Formally well-defined: evaluation always terminates, results are deterministic given the same facts
83- Composable: rules from different sources can be combined (union of rule sets)
84- Well-studied: decades of database and logic programming research
85- Can express recursive relationships (e.g., transitive trust chains)
86
87**Cons:**
88- Requires embedding a Datalog engine (e.g., a JS implementation or WASM module)
89- Serializing Datalog programs as atproto records is awkward (text blob or custom AST format)
90- Higher barrier for policy authors -- most people cannot read Datalog
91- Numerical constraints (latency < 500ms, uptime > 99.9%) require extensions to standard Datalog (aggregates, arithmetic)
92
93**Fit assessment:** Excellent formal properties, but premature for v1. Good target for v2 if the JSON approach proves too limiting.
94
95#### Option C: Rego (Open Policy Agent)
96
97**Description:** Rego is OPA's purpose-built policy language. It is declarative, supports partial evaluation, and has been widely adopted for Kubernetes, CI/CD, and API gateway policies.
98
99```rego
100package p2pds.replication
101
102default compliant = false
103
104compliant {
105 input.copies >= data.policy.minCopies
106 input.sync_lag_seconds <= data.policy.maxLagSeconds
107 input.verification.passed == true
108}
109```
110
111**Pros:**
112- Designed specifically for policy evaluation
113- Deterministic: given the same input document and data, evaluation produces the same result
114- Rich ecosystem: tooling, testing frameworks, playground
115- Handles complex nested data well (JSON-native)
116- Can be compiled to WASM for portable, sandboxed execution
117
118**Cons:**
119- Heavy dependency: OPA is a Go binary; WASM builds add ~5MB
120- Rego is its own language with a learning curve
121- Not atproto-native: policies would need to be stored as opaque text blobs in records
122- OPA's bundle distribution model assumes a central control plane, which conflicts with P2P architecture
123
124**Fit assessment:** OPA is well-proven but brings significant complexity. The WASM compilation path is interesting for ensuring all nodes run the exact same evaluator. Consider for v2/v3 if policy complexity demands it.
125
126#### Option D: CUE Language
127
128**Description:** CUE is a constraint-based configuration language where types, values, and constraints exist on a single continuum. Combining CUE values is associative, commutative, and idempotent -- ideal for merging policies from multiple sources.
129
130```cue
131policy: {
132 version: 1
133 type: "mutual-aid"
134 replication: {
135 minCopies: >=2 & <=5
136 strategy: "full" | "partial"
137 subjects: [...string]
138 }
139 sync: maxLag: <=300 // seconds
140}
141```
142
143**Pros:**
144- Merge is mathematically well-defined: `a & b` always produces the same result regardless of order
145- Constraints are first-class: `>=2 & <=5` is a type, not a runtime check
146- Deterministic by construction (lattice-based evaluation)
147- Good fit for "multiple parties each add constraints" scenario
148
149**Cons:**
150- Go-only implementation; no mature JS/TS runtime
151- Not widely adopted outside Google-adjacent projects
152- Serialization as atproto records would require a custom mapping
153- The lattice model can produce confusing error messages when constraints conflict
154
155**Fit assessment:** Theoretically elegant for multi-party policy composition. Impractical today due to runtime availability. Worth watching.
156
157#### Option E: CEL (Common Expression Language)
158
159**Description:** CEL is Google's expression language used in Kubernetes ValidatingAdmissionPolicy. It evaluates deterministically, is designed to be embedded, and has a well-defined type system.
160
161```cel
162// Is this node compliant?
163node.copies >= policy.minCopies &&
164 node.syncLag <= policy.maxLag &&
165 node.verification.lastPassed
166```
167
168**Pros:**
169- Designed for embedding in other systems
170- Deterministic, terminating (no loops, no side effects)
171- Small, well-defined: no surprise behaviors
172- JS/TS implementations exist (cel-js)
173- Good fit for "evaluate a boolean expression against structured input"
174
175**Cons:**
176- Expression language only -- no rule composition or inference
177- Less expressive than Datalog or Rego for complex policies
178- Still requires a runtime dependency
179
180**Fit assessment:** Good middle ground between raw JSON and full policy language. Could be used to express compliance predicates within a JSON policy document: the policy defines thresholds, and a CEL expression defines how to combine them.
181
182#### Option F: AWS IAM / Zanzibar-Style Relationship Model
183
184**Description:** Instead of general-purpose rules, model policy as relationships between entities (nodes, DIDs, groups) with permissions derived from relationship traversal.
185
186```json
187{
188 "definition": {
189 "group": {
190 "relations": { "member": { "this": {} } },
191 "permissions": {
192 "replicate": { "computedUserset": { "relation": "member" } }
193 }
194 }
195 }
196}
197```
198
199**Pros:**
200- Proven at scale (Google Zanzibar handles trillions of relationships)
201- Natural fit for "who can do what to whose data"
202- Efficient: relationship checks are O(depth of graph), typically O(1)-O(3)
203
204**Cons:**
205- Designed for authorization ("can X do Y to Z?"), not obligation ("X must do Y for Z")
206- Doesn't naturally express SLA metrics, latency bounds, or storage quotas
207- Would need significant adaptation for replication policy
208
209**Fit assessment:** Wrong abstraction for the core problem. Replication policy is about obligations and compliance, not permissions. However, relationship-based modeling could complement the policy engine for "who is allowed to join this group" type decisions.
210
211### Recommendation
212
213**Start with Option A (JSON constraint documents)** stored as atproto records. The schema is the policy language. Keep the door open for Option E (CEL expressions) as inline predicates within JSON documents for v2. Option B (Datalog) is the long-term target if policy complexity grows significantly.
214
215The key insight: **the policy language does not need to be Turing-complete, and it should not be.** Determinism and termination are easier to guarantee when expressiveness is bounded. A fixed JSON schema with enumerated rule types is a deliberately limited language, and that limitation is a strength.
216
217---
218
219## 2. Set Reconciliation as Policy Primitive
220
221### The Core Insight
222
223The fundamental assertion in replication policy is set membership: "node X holds all blocks for DID Y." This is a set containment claim:
224
225```
226blocks(node_X, did_Y) ⊇ blocks(source_pds, did_Y)
227```
228
229If we can efficiently verify this set relationship, we can verify policy compliance. Set reconciliation protocols exist precisely for this purpose.
230
231### Existing Set Reconciliation Approaches
232
233#### Negentropy (Range-Based Set Reconciliation)
234
235**How it works:** Both parties sort their items by (timestamp, ID). The initiator partitions its set into ranges and computes a fingerprint (incremental XOR hash) for each range. The responder compares fingerprints and recursively splits ranges where fingerprints differ until individual missing items are identified.
236
237**Communication complexity:** O(d * log(n)) where d is the number of differences and n is the total set size. When sets are nearly identical (common case for synced repos), this is very efficient.
238
239**Relevance to P2PDS:** Negentropy was designed for Nostr relay synchronization -- a remarkably similar problem (syncing sets of content-addressed events between nodes). The key difference: Nostr events have natural timestamps, while atproto blocks in an MST don't. However, atproto commits have revisions (TIDs), which provide a total ordering.
240
241**Integration approach:** Rather than reconciling raw block CIDs, reconcile at the commit level. Each node maintains a set of (rev, commit_cid) pairs for each DID. Range-based reconciliation identifies which commits are missing, and then the node fetches the missing commit's blocks.
242
243#### Invertible Bloom Lookup Tables (IBLTs)
244
245**How it works:** An IBLT is a probabilistic data structure that supports insertion, deletion, lookup, and -- crucially -- *listing* of elements. Two parties each construct an IBLT of their set, XOR them together, and the resulting IBLT can be "peeled" to recover the symmetric difference.
246
247**Communication complexity:** O(d) where d is the number of differences, but requires knowing d in advance to size the table correctly. If d is underestimated, decoding fails.
248
249**Relevance to P2PDS:** IBLTs are optimal when the difference size is known or bounded. For repos that sync frequently, the number of new commits since last sync is typically small and predictable.
250
251#### CertainSync / Rateless IBLTs (2024-2025)
252
253**How it works:** Extends IBLTs to be "rateless" -- the encoder produces an infinite stream of coded symbols, and the decoder can succeed as soon as enough symbols arrive to cover the actual difference. No need to estimate d in advance.
254
255**Communication complexity:** Near-optimal -- approaches the information-theoretic lower bound.
256
257**Relevance to P2PDS:** Eliminates the main drawback of IBLTs (requiring pre-knowledge of difference size). Still very new and implementations are limited.
258
259#### Minisketch (BCH-Based)
260
261**How it works:** Uses BCH error-correcting codes to encode set sketches. Two parties exchange sketches; the difference between sketches can be decoded to recover the symmetric difference.
262
263**Communication complexity:** O(d) with a constant factor determined by the sketch capacity.
264
265**Relevance to P2PDS:** Used by Bitcoin's Erlay protocol for transaction relay. Highly efficient for small differences. Limited to elements that can be represented as fixed-size integers (requires hashing CIDs to a fixed size).
266
267### Set Algebra for Policy
268
269Replication policies can be expressed as set algebra:
270
271- **Full replication:** `blocks(node_A, did_X) = blocks(authoritative_source, did_X)` for all members
272- **Minimum copies:** `|{ node : blocks(node, did_X) ⊇ blocks(source, did_X) }| >= minCopies` for each subject DID
273- **Partial replication:** `blocks(node_A, did_X) ⊇ selected_collections(did_X)` (only certain record collections)
274
275The existing codebase already tracks block CIDs per DID in the `replication_blocks` table (see `/Users/dietrich/misc/p2pds/src/replication/sync-storage.ts`). This is the foundation for set-based verification.
276
277### Proving Set Containment Without Full Transfer
278
279Several approaches for proving `A ⊇ B` without transferring either full set:
280
2811. **Fingerprint comparison:** Compute a fingerprint (hash) of the set. If fingerprints match, sets are equal with high probability. Fast but only proves equality, not containment.
282
2832. **Merkle proof:** The atproto MST (Merkle Search Tree) already provides this. The repo root CID is a commitment to the entire record set. If node X has root CID R for did Y, and R matches the authoritative root, X has all the data. This is exactly what the existing Layer 0 verification does.
284
2853. **Bloom filter challenge:** The verifier constructs a Bloom filter of their set and sends it. The prover checks membership of random elements. False positives make this a probabilistic proof.
286
2874. **Random sampling (current approach):** Randomly select k block CIDs and check if the node can serve them. This is what the existing RASL verification layer does (see `/Users/dietrich/misc/p2pds/src/replication/verification.ts`).
288
289### Recommendation
290
291**Use commit-level Negentropy for sync, MST root comparison for equality proof, and RASL sampling for ongoing verification.** This layered approach matches the existing architecture:
292
293- Layer 0 (commit root) = MST root CID comparison = set equality proof
294- Layer 1 (RASL sampling) = probabilistic set containment proof
295- Layer 2 (future: Negentropy) = efficient difference computation for sync
296
297The policy engine doesn't need to know *how* reconciliation happens. It checks the *outcome*: "does node X's root CID for DID Y match the authoritative root CID?" This is the transport-agnostic principle in action.
298
299---
300
301## 3. Compliance Verification
302
303### The Verification Spectrum
304
305From weakest to strongest:
306
3071. **Self-reporting:** Node claims "I have all blocks for DID X." (Trust-based, trivially gameable)
3082. **Peer-reporting:** Other nodes report on a node's behavior. (Social trust, Sybil-vulnerable)
3093. **Spot-check sampling:** Random block challenges via RASL. (Existing implementation)
3104. **Full set verification:** Compare complete block sets via reconciliation. (Expensive but definitive)
3115. **Cryptographic proof-of-storage:** Node proves it holds data without transferring it. (Filecoin-style, heavyweight)
312
313### What P2PDS Already Has
314
315The existing codebase implements a layered verification system (see `/Users/dietrich/misc/p2pds/src/replication/verification.ts`):
316
317- **Layer 0 (Commit Root):** Fetch the root CID via RASL endpoint. A 200 with correct bytes proves the remote serves the current repo head. This is a set equality check (if roots match, MSTs match, records match).
318- **Layer 1 (RASL Sampling):** Fetch random blocks via HTTP, compare with local copy. Content-addressed retrieval is unforgeable: correct bytes = peer has the data.
319- **Layer 2 (Bitswap):** Stub -- future IPFS network verification.
320- **Layer 3 (MST Path Proof):** Stub -- future sync.getRecord + CAR verification.
321
322This is already a solid foundation. The policy engine needs to:
3231. Define *what* verification is required (which layers, how often, what sample sizes)
3242. Record verification results
3253. Evaluate compliance based on results over time
326
327### Challenge-Response Protocols
328
329For stronger guarantees without the full Filecoin treatment:
330
331**Lightweight Proof-of-Data-Possession (PDP):**
332Filecoin recently introduced PDP for "hot storage" verification. It uses standard SHA-256 hashing -- no specialized hardware required. For a 1 GiB dataset with 256 KiB chunks, each proof requires only 12 hashes (~384 bytes) and 12 hash computations.
333
334**Adaptation for P2PDS:**
335The atproto MST already provides Merkle proofs. To challenge a node:
3361. Pick a random record path (e.g., `app.bsky.feed.post/3k2abc...`)
3372. Request the MST path proof via `com.atproto.sync.getRecord`
3383. Verify the proof against the known root CID
339
340This is essentially Layer 3 of the existing verification architecture. The MST *is* the Merkle tree, and atproto already defines the proof protocol. No new cryptography needed.
341
342### Verification Scheduling
343
344The policy should define:
345- `verificationInterval`: How often to run verification (e.g., every 30 minutes)
346- `verificationSampleSize`: How many blocks/records to check per run
347- `verificationLayers`: Which layers are required (e.g., [0, 1] for basic, [0, 1, 3] for strong)
348- `complianceWindow`: How many consecutive failures before non-compliance (e.g., 3)
349- `graceperiod`: Time after joining before verification starts (e.g., 1 hour)
350
351### Recommendation
352
353**Extend the existing layered verification to be policy-driven.** The verification infrastructure exists. What's missing is:
3541. Policy-defined verification parameters (currently hardcoded in `DEFAULT_VERIFICATION_CONFIG`)
3552. Compliance history tracking (currently only tracks last verification result)
3563. Compliance evaluation logic (pass/fail over a time window)
357
358---
359
360## 4. Group Agreement / Policy Distribution
361
362### The Challenge
363
364Policy is not useful unless all parties agree on it. In a P2P system, there's no central authority to impose policy. Nodes must:
3651. Discover policies that apply to them
3662. Agree to be bound by a policy
3673. Detect when policy changes
3684. Handle the transition period during policy updates
369
370### Policy as Atproto Records
371
372The most natural approach for P2PDS: publish policies as atproto records in a new lexicon namespace.
373
374```
375Lexicon: org.p2pds.policy
376Record: {
377 $type: "org.p2pds.policy",
378 version: 1,
379 name: "mutual-aid-group-1",
380 description: "Three-node mutual aid cluster",
381 rules: { ... },
382 members: [ ... ],
383 createdAt: "2025-01-01T00:00:00Z",
384 updatedAt: "2025-01-01T00:00:00Z"
385}
386```
387
388**Policy Lifecycle:**
3891. **Creation:** A node publishes a policy record to its repo
3902. **Discovery:** Other nodes discover it via record listing (similar to how `PeerDiscovery` works now with `org.p2pds.manifest`)
3913. **Acknowledgment:** Each member publishes an `org.p2pds.policy.ack` record referencing the policy
3924. **Activation:** Policy becomes active when all members have acknowledged
3935. **Update:** Publisher creates new version; members must re-acknowledge
3946. **Revocation:** Publisher deletes the record or sets status to "revoked"
395
396```
397Lexicon: org.p2pds.policy.ack
398Record: {
399 $type: "org.p2pds.policy.ack",
400 policy: "at://did:plc:alice/org.p2pds.policy/mutual-aid-1",
401 policyVersion: 1,
402 accepted: true,
403 acceptedAt: "2025-01-01T00:01:00Z"
404}
405```
406
407### Multi-Party Agreement
408
409**Simple approach (sufficient for v1):** One node is the policy "author." It publishes the policy. Each other member acknowledges it. The policy is active when all listed members have acknowledged.
410
411**More decentralized approach (v2):** Policy proposals are published as records. Members vote by publishing ack records. A quorum threshold (e.g., 2/3 of members) activates the policy. This mirrors DAO governance without a blockchain.
412
413**Signature-based approach (v3):** The policy document includes a field for member signatures (using atproto signing keys). A policy is valid only when it contains valid signatures from all/quorum members. This is a multi-sig scheme, similar to how DAOs use multi-sig wallets.
414
415### Policy Versioning and Migration
416
417When policy changes, nodes need a migration path:
418
4191. **Grace period:** New policy published with `effectiveAt` timestamp in the future. Nodes have until then to comply.
4202. **Compatibility window:** Both old and new policy are valid during transition. Nodes evaluate compliance against the more lenient of the two.
4213. **Breaking changes:** If new policy adds members or increases minCopies, the grace period must be long enough for sync to complete.
422
423### Discovery Mechanisms
424
425How does a node learn about policies that involve it?
426
4271. **Direct configuration:** Node operator configures policy URIs in `.env` or config file (simplest, similar to current `REPLICATE_DIDS`)
4282. **Record scanning:** Node periodically checks other nodes' repos for `org.p2pds.policy` records that mention its DID
4293. **PubSub notification:** Policy changes announced via IPFS gossipsub topics
4304. **DID document service endpoint:** Add a `#p2pds_policy` service entry to DID documents pointing to the policy
431
432### Recommendation
433
434**Start with direct configuration + policy-as-atproto-records.** The operator configures a list of policy AT-URIs. The node fetches and evaluates them. Publishing and acknowledgment create an auditable record in each participant's repo.
435
436---
437
438## 5. Failure Handling and Incentives
439
440### Failure Modes
441
4421. **Temporary downtime:** Node is offline for minutes to hours (network issues, restarts)
4432. **Extended outage:** Node is offline for days (hardware failure, migration)
4443. **Sync lag:** Node is online but behind on syncing (slow network, overloaded)
4454. **Verification failure:** Node claims to hold data but fails spot checks (data corruption, incomplete sync)
4465. **Abandonment:** Node permanently disappears
4476. **Byzantine behavior:** Node deliberately serves incorrect data
448
449### Response Spectrum
450
451#### For Mutual Aid
452
453The goal is resilience, not punishment. Graduated responses:
454
455```
456Level 0 - OK: All checks passing
457Level 1 - Degraded: 1-2 consecutive verification failures
458 Action: Log warning, increase verification frequency
459Level 2 - At Risk: 3+ consecutive failures OR >1hr sync lag
460 Action: Alert group members, other nodes begin
461 pre-emptive replication of at-risk data
462Level 3 - Failed: >24hr offline OR persistent verification failures
463 Action: Remove from compliance count, redistribute
464 replication obligations to remaining nodes
465Level 4 - Abandoned: >7 days with no heartbeat
466 Action: Remove from group membership, recalculate
467 replication targets
468```
469
470**Rebalancing:** When a node fails, the remaining nodes must absorb its obligations. If the policy requires minCopies=2 and one of three nodes fails, the remaining two nodes each now hold all data (they already did -- they just become the only holders). The policy now shows "at risk" because there's no redundancy margin.
471
472#### For SaaS / SLA
473
474The goal is measurable compliance for reputation or payment. Stricter thresholds:
475
476```
477Metric | Threshold | Measurement
478--------------------|-------------|------------------
479Uptime | 99.9% | Heartbeat checks
480Sync latency | < 5min | Time from commit to local sync
481Block serve latency | < 500ms | RASL response time
482Verification pass | 100% | Layer 0 + Layer 1 must pass
483```
484
485Compliance is calculated over a rolling window (e.g., 30 days). A node earns reputation/payment proportional to its compliance score.
486
487### Distinguishing Temporary vs. Permanent Failure
488
489This is fundamentally a timeout problem. The classic approach:
490
4911. **Heartbeat:** Nodes periodically announce presence (e.g., by updating their `org.p2pds.peer` record's `createdAt` timestamp, or via IPFS pubsub).
4922. **Exponential backoff categorization:**
493 - < 5 min: Probably transient (network blip)
494 - 5 min - 1 hour: Likely restart or deployment
495 - 1 hour - 24 hours: Possible hardware issue
496 - > 24 hours: Likely abandoned
4973. **Self-reporting on recovery:** When a node comes back, it announces its return and catches up on sync. The policy engine should have a "recovery" state that gives the node time to re-sync before enforcing full compliance.
498
499### Incentive Mechanisms
500
501For mutual aid, the incentive is reciprocal: "I hold your data because you hold mine." This is BitTorrent's tit-for-tat insight applied to storage. Key difference: unlike BitTorrent (where choking is instantaneous), storage obligations persist over time. You can't "unchoke" someone's data.
502
503**Trust score:** Each node maintains a score for each peer based on:
504- Verification pass rate over time
505- Sync lag average
506- Uptime percentage
507- Duration of membership
508
509This score is locally computed (not broadcast) and can inform local decisions like "should I replicate data for this node if it's not part of my policy?"
510
511### Recommendation
512
513**Implement graduated failure handling with configurable thresholds in the policy document.** Start simple:
514- `offlineGracePeriod`: How long before a node is marked non-compliant (e.g., "1h")
515- `recoveryGracePeriod`: How long a returning node has to catch up (e.g., "30m")
516- `maxConsecutiveFailures`: Verification failures before non-compliance (e.g., 3)
517
518The policy document defines thresholds; the local evaluator applies them to observed state.
519
520---
521
522## 6. Resource Accounting
523
524### What to Track
525
526| Resource | Unit | Source | Verifiable? |
527|----------|------|--------|-------------|
528| Storage used | Bytes per DID | Local measurement | Self-reported, verifiable via block count |
529| Block count | Count per DID | Local DB query | Verifiable via set reconciliation |
530| Sync latency | Seconds | Time between commit and local sync | Self-reported |
531| Verification pass rate | Percentage | Verification history | Recorded by verifier |
532| Uptime | Percentage | Heartbeat history | Observed by peers |
533| Bandwidth consumed | Bytes in/out | Local measurement | Self-reported only |
534
535### The Self-Reporting Problem
536
537In a decentralized system, most metrics are self-reported. A dishonest node can claim 100% uptime, zero latency, and perfect verification scores.
538
539**Mitigations:**
540
5411. **Cross-verification:** Node A's claim about serving DID X is verified by Node B fetching blocks from Node A via RASL. The existing verification layer already does this.
542
5432. **Statistical sampling:** Don't verify everything -- verify enough to make cheating statistically improbable. If you sample 50 random blocks and all are correct, the probability of the node holding less than 99% of blocks is extremely low. (This is the existing RASL sampling approach.)
544
5453. **Peer-observed metrics:** Uptime and latency can be measured by peers. If three nodes all agree that Node D has been offline for 2 hours, that's more credible than Node D's self-report.
546
5474. **Commit-reveal for sync timing:** To prevent a node from claiming fast sync while actually syncing lazily:
548 - When a commit occurs, the authoritative PDS publishes a commitment (hash of commit CID + timestamp)
549 - Replicating nodes must publish their own receipt within the sync window
550 - The receipt includes the commit CID, proving they actually synced
551
552### Resource Accounting in the Policy
553
554```json
555{
556 "accounting": {
557 "trackStorage": true,
558 "trackSyncLatency": true,
559 "trackUptime": true,
560 "reportingInterval": "1h",
561 "retentionPeriod": "30d"
562 }
563}
564```
565
566Resource accounting data could be stored in a local SQLite table (extending the existing `replication_state` schema) and optionally published as atproto records for auditability.
567
568### Recommendation
569
570**Track storage and verification locally in SQLite. Measure uptime and sync latency via peer observation. Defer bandwidth accounting to v2.** The existing `SyncStorage` class already tracks per-DID sync state, block counts, and verification timestamps. Extending it with a time-series table for compliance history is straightforward.
571
572---
573
574## 7. Existing Systems / Prior Art
575
576### Filecoin: The Heavyweight Approach
577
578**Relevant patterns:**
579- *Deal-making protocol:* Storage provider and client agree on terms (price, duration, replication factor) before storage begins. Analogous to our policy agreement.
580- *Sector sealing:* Data is encoded uniquely per-provider to prevent Sybil attacks (claiming multiple copies from one physical copy). P2PDS doesn't need this -- content-addressing already handles deduplication.
581- *WindowPoSt:* Periodic proof that data is still stored, checked every 24-hour window. Analogous to our periodic verification.
582- *Proof of Data Possession (PDP):* Lightweight SHA-256 based verification for hot storage. Most applicable to P2PDS.
583
584**What to take:** The deal-making lifecycle (propose, accept, active, expired, faulted). The periodic windowed verification model. The graduated fault handling (fault -> recovery period -> penalty/termination).
585
586**What to skip:** The heavyweight cryptography (SNARKs, VDFs), the blockchain settlement layer, sector sealing. These solve problems (preventing Sybil storage, trustless payment) that P2PDS handles differently (social trust + content-addressed verification).
587
588### Ceramic / ComposeDB: Decentralized Data with Sync
589
590**Relevant patterns:**
591- *Streams:* Append-only logs of events, similar to atproto commit sequences
592- *StreamTypes:* Define validation rules for streams (who can write, what schema). Analogous to our policy types.
593- *Event-based sync:* Nodes cooperate to distribute events to all interested consumers
594- *Historical sync:* Nodes can sync data from before they joined the network
595
596**What to take:** The concept of "interest" -- nodes declare which data they're interested in, and the network routes accordingly. This maps to our manifest records.
597
598**What to skip:** Ceramic's consensus layer (RAFT-based) and blockchain anchoring. Our content-addressed verification provides similar guarantees without consensus.
599
600### Kubernetes: Declarative Desired-State Reconciliation
601
602**Relevant patterns:**
603- *Spec vs. Status:* Every resource has a `spec` (desired state) and `status` (observed state). Controllers continuously reconcile status toward spec.
604- *Reconciliation loop:* Observe -> Diff -> Act, repeated. Idempotent.
605- *Conditions:* Status includes machine-readable conditions (Ready, Progressing, Degraded) with timestamps and reasons.
606- *ValidatingAdmissionPolicy:* CEL expressions evaluate constraints declaratively, in-process, without external webhooks.
607
608**What to take:** The spec/status pattern is directly applicable. Policy = spec. Observed compliance = status. The reconciliation loop is our sync + verify cycle. Conditions give us a model for graduated compliance states.
609
610```
611Policy Spec (desired state):
612 - Node A holds DID X, DID Y
613 - Verification every 30m
614 - minCopies = 2
615
616Compliance Status (observed state):
617 - Node A: DID X synced (rev abc), DID Y synced (rev def)
618 - Last verification: 10m ago, passed
619 - Conditions:
620 - type: Compliant, status: True, reason: AllChecksPassing
621 - type: InSync, status: True, reason: SyncedWithin5m
622```
623
624**This is the most directly applicable pattern in the entire research.** Kubernetes solved declarative desired-state reconciliation at scale. The same pattern works for replication policy.
625
626### Open Policy Agent (OPA): General-Purpose Policy Engine
627
628**Relevant patterns:**
629- *Decision as data:* Policy evaluation produces a JSON decision document, not a side effect
630- *Bundle distribution:* Policies packaged as bundles, distributed to evaluators
631- *Partial evaluation:* Compile policy partially when some inputs are known, evaluate the rest at runtime
632
633**What to take:** The separation of "policy definition" from "policy evaluation" from "policy enforcement." In P2PDS terms: define policy (atproto records), evaluate compliance (local engine), enforce (sync/rebalance actions).
634
635**What to skip:** The centralized bundle server model. In P2PDS, policies are discovered via atproto records, not pushed from a central authority.
636
637### BitTorrent: Tit-for-Tat Incentives
638
639**Relevant patterns:**
640- *Choking algorithm:* Upload to peers that upload to you. Reciprocity drives cooperation.
641- *Optimistic unchoking:* Periodically try new peers to discover better partners.
642- *Rarest-first strategy:* Prioritize replicating the rarest pieces to maximize network resilience.
643
644**What to take:** The principle that reciprocity is a sufficient incentive for cooperation in voluntary networks. In mutual aid, "I host your data because you host mine" is the storage equivalent of tit-for-tat.
645
646**Adaptation:** In BitTorrent, choking is instantaneous (stop uploading). In P2PDS, the equivalent would be "stop syncing new commits for a non-reciprocating peer" -- but you can't delete already-stored data without violating the policy. This asymmetry means tit-for-tat works differently for storage vs. bandwidth.
647
648### Hypercore / Dat: Sparse Replication
649
650**Relevant patterns:**
651- *Sparse mode:* Only download blocks that are explicitly requested. The Want/Have protocol.
652- *Selective sync:* Use a `.datdownload` file to specify which files to sync.
653- *Signed Merkle tree:* Each entry in the feed is authenticated by the feed author.
654
655**What to take:** The concept of selective replication -- not every node needs every record. A policy could specify that Node A replicates `app.bsky.feed.post` records but not `app.bsky.feed.like` records. This is "collection-level partial replication."
656
657**Relevance to policy:** Enables more flexible policies: "hold all posts but not all likes" is a valid replication strategy that reduces storage costs while preserving the most important data.
658
659### Nostr: Relay Policy and Negentropy Sync
660
661**Relevant patterns:**
662- *NIP-11 (Relay Information Document):* Relays publish their policies (what events they accept, retention periods, limitations) as a machine-readable JSON document.
663- *NIP-77 (Negentropy Syncing):* Efficient set reconciliation between relays using the Negentropy protocol.
664- *Event filtering:* Relays apply policies to decide which events to store and serve.
665
666**What to take:** NIP-11 is remarkably close to what P2PDS needs -- a machine-readable policy document published by each node. The Negentropy integration shows that range-based set reconciliation works well for syncing content-addressed data between nodes.
667
668**Direct applicability:** Nostr's relay-relay sync via Negentropy is the closest existing analog to P2PDS node-node repo sync. The main difference: Nostr events are independent, while atproto repos have a Merkle tree structure that provides stronger integrity guarantees.
669
670---
671
672## 8. Architecture Sketch
673
674### Core Abstractions
675
676```
677+-----------+ +-----------+ +------------+ +------------+
678| Policy |---->| Obligation|---->|Verification|---->| Compliance |
679| (desired | | (what each| | (checking | | (did they |
680| state) | | node must| | that work | | do it?) |
681| | | do) | | was done) | | |
682+-----------+ +-----------+ +------------+ +------------+
683 | | | |
684 v v v v
685 atproto local DB RASL/IPFS/sync local DB +
686 records (obligation (verification atproto
687 schedule) infrastructure) records
688```
689
690**Policy:** The declarative document defining desired state. Published as atproto records. Immutable per version.
691
692**Obligation:** The derived per-node work items. "Node A must sync DID X, DID Y, DID Z." Computed from policy by the local evaluator. Stored in local DB.
693
694**Verification:** The process of checking that obligations are met. Uses the existing layered verification system. Transport-agnostic.
695
696**Compliance:** The result of evaluating verification history against policy requirements. "Node A is compliant / non-compliant / degraded." Stored locally and optionally published.
697
698### The Evaluation Loop
699
700Directly inspired by Kubernetes controllers:
701
702```
703 +-------------------+
704 | |
705 +---------+ OBSERVE |
706 | | - Fetch policies |
707 | | - Check sync state|
708 | | - Run verification|
709 | +--------+----------+
710 | |
711 | v
712 | +--------+----------+
713 | | |
714 | | EVALUATE |
715 | | - Derive obligations
716 | | - Compare desired vs actual
717 | | - Determine compliance
718 | +--------+----------+
719 | |
720 | v
721 | +--------+----------+
722 | | |
723 +<--------+ ACT |
724 | - Trigger sync |
725 | - Update status |
726 | - Notify/alert |
727 | - Rebalance |
728 +-------------------+
729```
730
731This loop runs on a timer (e.g., every 5 minutes) and also in response to events (new commit observed, peer heartbeat missed, verification completed).
732
733### Where State Lives
734
735| State | Storage | Why |
736|-------|---------|-----|
737| Policy documents | Atproto records (`org.p2pds.policy`) | Auditable, discoverable, signed by author |
738| Policy acknowledgments | Atproto records (`org.p2pds.policy.ack`) | Auditable proof of agreement |
739| Derived obligations | Local SQLite | Ephemeral, recomputable from policy |
740| Sync state | Local SQLite (existing `replication_state`) | Operational state, changes frequently |
741| Verification results | Local SQLite (new table) | Time-series data, queried for compliance |
742| Compliance status | Local SQLite + optionally atproto records | Local for evaluation, published for transparency |
743| Peer heartbeats | Local SQLite (observed) + atproto records (published) | Both self-reported and peer-observed |
744
745### Integration with Existing Codebase
746
747The policy engine slots in between the existing `ReplicationManager` and the sync/verification infrastructure:
748
749```
750Current:
751 Config (REPLICATE_DIDS) -> ReplicationManager -> syncAll() -> verify()
752
753With Policy Engine:
754 PolicyEngine -> evaluatePolicies() -> derive obligations
755 |
756 v
757 ReplicationManager -> syncObligations() -> verify() -> reportCompliance()
758 ^
759 |
760 PolicyDiscovery -> fetch org.p2pds.policy records from peers
761```
762
763The `REPLICATE_DIDS` config becomes a fallback / bootstrap mechanism. Once policies are discovered, they take precedence.
764
765### Key Interfaces
766
767```typescript
768/** A policy document (deserialized from atproto record) */
769interface Policy {
770 version: number;
771 type: "mutual-aid" | "sla" | "custom";
772 members: PolicyMember[];
773 rules: PolicyRules;
774 effectiveAt: string;
775 expiresAt?: string;
776}
777
778interface PolicyMember {
779 did: string; // The DID being served
780 nodeId: string; // The node responsible (could be same DID or different)
781}
782
783interface PolicyRules {
784 replication: {
785 strategy: "full" | "partial";
786 minCopies: number;
787 subjects: string[]; // DIDs to replicate
788 collections?: string[]; // Optional: only these collections
789 };
790 verification: {
791 interval: string; // Duration like "30m"
792 layers: number[]; // Which verification layers [0, 1, 3]
793 sampleSize: number;
794 };
795 sync: {
796 maxLag: string; // Duration like "5m"
797 };
798 compliance: {
799 offlineGracePeriod: string;
800 recoveryGracePeriod: string;
801 maxConsecutiveFailures: number;
802 };
803}
804
805/** The obligation derived for a specific node from a policy */
806interface Obligation {
807 policyUri: string; // AT-URI of the policy record
808 nodeId: string; // This node's DID
809 subjectDid: string; // DID to replicate
810 strategy: "full" | "partial";
811 collections?: string[];
812 verificationInterval: number; // ms
813 maxSyncLag: number; // ms
814}
815
816/** Compliance status for a node within a policy */
817interface ComplianceStatus {
818 policyUri: string;
819 nodeId: string;
820 status: "compliant" | "degraded" | "non-compliant" | "unknown";
821 obligations: ObligationStatus[];
822 lastEvaluated: string;
823}
824
825interface ObligationStatus {
826 subjectDid: string;
827 synced: boolean;
828 lastSyncRev: string | null;
829 syncLag: number | null; // ms
830 verificationPassed: boolean;
831 consecutiveFailures: number;
832 lastVerified: string | null;
833}
834```
835
836### Minimum Viable Architecture
837
838For v1, the simplest implementation that's still useful:
839
8401. **Policy document:** JSON object matching the schema above, stored as an `org.p2pds.policy` record
8412. **Policy evaluator:** A function that takes a Policy + current SyncState[] and returns ComplianceStatus
8423. **Policy-aware ReplicationManager:** Instead of iterating `REPLICATE_DIDS`, iterates derived obligations
8434. **Compliance reporting:** Log-level reporting + updated manifest records
844
845No new dependencies. No new languages. Just structured JSON and TypeScript evaluation logic.
846
847---
848
849## 9. Open Questions
850
8511. **Policy authority:** Who gets to create policies? Anyone who lists your DID? Only you? Only DIDs you've explicitly authorized? This intersects with consent/authorization, which is a separate design problem.
852
8532. **Policy conflicts:** What if two policies require different replication strategies for the same DID? Precedence rules? Union of requirements? Error?
854
8553. **DID-to-node mapping:** The CLAUDE.md notes this as an open problem. A single DID can have multiple nodes. A single node can serve multiple DIDs. The policy needs to address which nodes are obligated, not just which DIDs.
856
8574. **Storage quotas:** How do you limit the total storage a node commits to? If 100 policies each require 1GB, the node needs 100GB. Who enforces limits?
858
8595. **Collection-level policies:** Can a policy specify "replicate app.bsky.feed.post but not app.bsky.feed.like"? This requires MST-aware partial replication, which the current sync (full repo CAR fetch) doesn't support.
860
8616. **Cross-policy verification:** If a node participates in multiple policies, should verification be shared (verify once, count for all policies) or independent (each policy verifies separately)?
862
8637. **Privacy:** Publishing policies as atproto records makes group membership public. Is this always desirable? Some groups might want private policies.
864
8658. **Bootstrapping:** How does a new node catch up on existing policy state? It needs to discover policies, acknowledge them, sync all required data, and pass verification before being counted as compliant.
866
8679. **Clock skew:** Compliance evaluation depends on timestamps (sync lag, verification intervals). How much clock skew is tolerable between nodes?
868
86910. **Incentive alignment for verification:** The verifier and the verified are both group members. What prevents collusion (node A "verifies" node B without actually checking)?
870
871---
872
873## 10. Recommended Starting Point
874
875### The Simplest Useful Thing: Mutual Aid with Full Replication
876
877**Scenario:** Three nodes (Alice, Bob, Carol) each run a P2PDS instance. Each node holds complete replicas of all three members' repos. The policy is: "every member's data is held by at least 2 other members."
878
879**Implementation plan:**
880
881#### Step 1: Define the Policy Schema
882
883Create a Lexicon definition for `org.p2pds.policy` with the minimal fields:
884
885```json
886{
887 "$type": "org.p2pds.policy",
888 "version": 1,
889 "type": "mutual-aid",
890 "name": "my-cluster",
891 "members": [
892 { "did": "did:plc:alice" },
893 { "did": "did:plc:bob" },
894 { "did": "did:plc:carol" }
895 ],
896 "rules": {
897 "replication": {
898 "strategy": "full",
899 "minCopies": 2
900 },
901 "verification": {
902 "interval": "30m",
903 "sampleSize": 50
904 },
905 "sync": {
906 "maxLag": "5m"
907 }
908 }
909}
910```
911
912#### Step 2: Publish and Discover
913
914- On startup, each node publishes its policy record to its own repo
915- Each node fetches policy records from all configured peer DIDs (extending existing `PeerDiscovery`)
916- Derive obligations: "I must replicate all member DIDs except my own"
917
918#### Step 3: Evaluate Compliance
919
920Add a `PolicyEvaluator` class:
921
922```typescript
923class PolicyEvaluator {
924 evaluate(policy: Policy, syncStates: SyncState[]): ComplianceStatus {
925 // For each subject DID in the policy:
926 // 1. Check if we've synced recently (syncLag < maxLag)
927 // 2. Check if verification passed
928 // 3. Determine compliance status
929 // Return aggregate compliance
930 }
931}
932```
933
934This replaces the hardcoded `REPLICATE_DIDS` config with policy-driven obligation derivation.
935
936#### Step 4: Report Compliance
937
938- Update manifest records with compliance status
939- Log compliance changes
940- (Future: publish compliance attestations as atproto records)
941
942#### What This Gets You
943
944- **Declarative:** Policy is a JSON document, not imperative code
945- **Deterministic:** Given the same policy + sync states, any node computes the same compliance result
946- **Transport-agnostic:** Policy checks sync state and verification results, not how blocks were transferred
947- **Account-centric:** Policy lists DIDs, not block CIDs or IPFS hashes
948- **Publishable:** Policy lives in atproto repos, auditable by anyone
949- **Minimal:** No new dependencies, no new languages, ~200-300 lines of TypeScript
950
951#### What It Doesn't Handle (Yet)
952
953- Multi-party policy negotiation (v2: add acknowledgment records)
954- SLA metrics and reputation (v2: add time-series tracking)
955- Partial/collection-level replication (v2: needs MST-aware sync)
956- Failure rebalancing (v2: automated obligation redistribution)
957- CEL/Datalog predicates (v3: if JSON schema proves too limiting)
958
959This starting point establishes the core abstraction (policy -> obligation -> verification -> compliance) and the integration pattern (policy-driven ReplicationManager). Everything else is additive.