···11+# CAR Format Implementation Notes
22+33+The **Content Addressable aRchives (CAR)** format is used to store content-addressable objects (IPLD blocks) as a sequence of bytes.
44+It is the standard format for repository export (`sync.getRepo`) and block transfer (`sync.getBlocks`) in the AT Protocol.
55+66+## 1. Format Overview (v1)
77+88+A CAR file consists of a **Header** followed by a sequence of **Data** sections.
99+1010+```text
1111+|--------- Header --------| |--------------- Data Section 1 ---------------| |--------------- Data Section 2 ---------------| ...
1212+[ varint | DAG-CBOR block ] [ varint | CID bytes | Block Data bytes ] [ varint | CID bytes | Block Data bytes ] ...
1313+```
1414+1515+### LEB128 Varints
1616+1717+All length prefixes in CAR are encoded as **unsigned LEB128 (UVarint)** integers.
1818+1919+- Used to prefix the Header block.
2020+- Used to prefix each Data section.
2121+2222+## 2. Header
2323+2424+The header is a single DAG-CBOR encoded block describing the archive.
2525+2626+**Encoding:**
2727+2828+1. Construct the CBOR map: `{ "version": 1, "roots": [<cid>, ...] }`.
2929+2. Encode as DAG-CBOR bytes.
3030+3. Prefix with the length of those bytes (as UVarint).
3131+3232+## 3. Data Sections
3333+3434+Following the header, the file contains a concatenated sequence of data sections. Each section represents one IPLD block.
3535+3636+```text
3737+[ Section Length (UVarint) ] [ CID (raw bytes) ] [ Binary Data ]
3838+```
3939+4040+- **Section Length**: The total length of the *CID bytes* + *Binary Data*.
4141+- **CID**: The raw binary representation of the block's CID (usually CIDv1 + DAG-CBOR + SHA2-256).
4242+- **Binary Data**: The actual content of the block.
4343+4444+The Section Length *includes* the length of the CID.
4545+4646+This is slightly different from some other framing formats where length might only cover the payload.
4747+4848+## 4. References
4949+5050+- [IPLD CARv1 Specification](https://ipld.io/specs/transport/car/carv1/)
+46
PDSharp.Docs/cbor.md
···11+# DAG-CBOR Implementation Notes
22+33+DAG-CBOR is the canonical data serialization format for the AT Protocol.
44+It is a strict subset of CBOR (RFC 8949) with specific rules for determinism and linking.
55+66+## 1. Canonicalization Rules
77+88+To ensure consistent Content IDs (CIDs) for the same data, specific canonicalization rules must be followed during encoding.
99+1010+### Map Key Sorting
1111+1212+Maps must be sorted by keys. The sorting order is **NOT** standard lexicographical order.
1313+1414+1. **Length**: Shorter keys come first.
1515+2. **Bytes**: keys of the same length are sorted lexicographically by their UTF-8 byte representation.
1616+1717+**Example:**
1818+1919+- `"a"` (len 1) comes before `"aa"` (len 2).
2020+- `"b"` (len 1) comes before `"aa"` (len 2).
2121+- `"a"` comes before `"b"`.
2222+2323+### Integer Encoding
2424+2525+Integers must be encoded using the smallest possible representation.
2626+2727+`System.Formats.Cbor` (in Strict mode) generally handles this, but care must be taken to treat `int`, `int64`, and `uint64` consistently.
2828+2929+## 2. Content Addressing (CIDs)
3030+3131+Links to other nodes (CIDs) are encoded using **CBOR Tag 42**.
3232+3333+### Format
3434+3535+1. **Tag**: `42` (Major type 6, value 42).
3636+2. **Payload**: A byte string containing:
3737+ - The `0x00` byte (Multibase identity prefix, required by IPLD specs for binary CID inclusion).
3838+ - The raw bytes of the CID.
3939+4040+## 3. Known Gotchas
4141+4242+- **Float vs Int**:
4343+ AT Protocol generally discourages floats where integers suffice.
4444+ F# types must be matched carefully to avoid encoding `2.0` instead of `2`.
4545+- **String Encoding**:
4646+ Must be UTF-8. Indefinite length strings are prohibited in DAG-CBOR.
+69
PDSharp.Docs/mst.md
···11+# Merkle Search Tree (MST) Implementation Notes
22+33+The Merkle Search Tree (MST) is a probabilistic, balanced search tree used by the AT Protocol to store repository records.
44+55+## Overview
66+77+MSTs combine properties of B-trees and Merkle trees to ensure:
88+99+1. **Determinism**: The tree structure is determined by the keys (and their hashes), not insertion order.
1010+2. **Verifyability**: Every node is content-addressed (CID), allowing the entire state to be verified via a single root hash.
1111+3. **Efficiency**: Efficient key-value lookups and delta-based sync (subtrees that haven't changed share the same CIDs).
1212+1313+## Core Concepts
1414+1515+### Layering (Probabilistic Balance)
1616+1717+MSTs do not use rotation for balancing. Instead, they assign every key a "layer" based on its hash.
1818+1919+- **Formula**:
2020+ `Layer(key) = countLeadingZeros(SHA256(key)) / 2`.
2121+- **Fanout**:
2222+ The divisor `2` implies a fanout of roughly 4 (2 bits per layer increment).
2323+- Keys with higher layers appear higher in the tree, splitting the range of keys below them.
2424+2525+### Data Structure (`MstNode`)
2626+2727+An MST node consists of:
2828+2929+- **Left Child (`l`)**: Use to traverse to keys lexicographically smaller than the first entry in this node.
3030+- **Entries (`e`)**: A sorted list of entries. Each entry contains:
3131+ - **Prefix Length (`p`)**: Length of the shared prefix with the *previous* key in the node (or the split key).
3232+ - **Key Suffix (`k`)**: The remaining bytes of the key.
3333+ - **Value (`v`)**: The CID of the record data.
3434+ - **Tree (`t`)**: (Optional) CID of the subtree containing keys between this entry and the next.
3535+3636+**Serialization**: The node is serialized as a DAG-CBOR array: `[l, [e1, e2, ...]]`.
3737+3838+## Algorithms
3939+4040+### Insertion (`Put`)
4141+4242+Insertion relies on the "Layer" property:
4343+4444+1. Calculate `Layer(newKey)`.
4545+2. Traverse the tree from the root.
4646+3. **Split Condition**: If `Layer(newKey)` is **greater** than the layer of the current node, the new key belongs *above* this node.
4747+ - The current node is **split** into two children (Left and Right) based on the `newKey`.
4848+ - The `newKey` becomes a new node pointing to these two children.
4949+4. **Recurse**: If `Layer(newKey)` is **less** than the current node, find the correct child subtree (based on key comparison) and recurse.
5050+5. **Same Layer**: If `Layer(newKey)` equals the current node's layer:
5151+ - Insert perfectly into the sorted entries list.
5252+ - Any existing child pointer at that position must be split and redistributed if necessary (though spec usually implies layers are unique enough or handled by standard BST insert at that level).
5353+5454+### Deletion
5555+5656+1. Locate the key.
5757+2. Remove the entry.
5858+3. **Merge**:
5959+ Since the key acted as a separator for two subtrees (its "Left" previous child and its "Tree" child), removing it requires merging these two adjacent subtrees into a single valid MST node to preserve the tree's density and structure.
6060+6161+### Determinism & Prefix Compression
6262+6363+- **Canonical Order**: Keys must always be sorted.
6464+- **Prefix Compression**:
6565+ Crucial for space saving.
6666+ The prefix length `p` is calculated relative to the *immediately preceding key* in the node.
6767+- **Issues**:
6868+ Insertion order *should not* matter (commutativity).
6969+ However, implementations must be careful with `Split` and `Merge` operations to ensure exactly the same node boundaries are created regardless of history.
+72-35
README.md
···11+<!-- markdownlint-disable MD033 -->
12# PDSharp
2333-> A Personal Data Server (PDS) for the AT Protocol, written in F# with Giraffe.
44+A Personal Data Server (PDS) for the AT Protocol, written in F# with Giraffe.
4556## Goal
67···89910## Requirements
10111111-- .NET 9.0 SDK
1212-- [Just](https://github.com/casey/just) (optional, for potential future task running)
1212+.NET 9.0 SDK
13131414## Getting Started
1515···34343535The server will start at `http://localhost:5000`.
36363737+## Configuration
3838+3939+The application uses `appsettings.json` and supports Environment Variable overrides.
4040+4141+| Key | Env Var | Default | Description |
4242+| ----------- | ------------------- | ----------------------- | ------------------------- |
4343+| `DidHost` | `PDSHARP_DidHost` | `did:web:localhost` | The DID of the PDS itself |
4444+| `PublicUrl` | `PDSHARP_PublicUrl` | `http://localhost:5000` | Publicly reachable URL |
4545+4646+Example `appsettings.json`:
4747+4848+```json
4949+{
5050+ "PublicUrl": "http://localhost:5000",
5151+ "DidHost": "did:web:localhost"
5252+}
5353+```
5454+3755## API Testing
38563939-### Server Info
5757+<details>
5858+<summary>Server Info</summary>
40594160```bash
4261curl http://localhost:5000/xrpc/com.atproto.server.describeServer
4362```
44636464+</details>
6565+4566### Record Operations
46674747-**Create a record:**
6868+<details>
6969+<summary>Create a record</summary>
48704971```bash
5072curl -X POST http://localhost:5000/xrpc/com.atproto.repo.createRecord \
···5274 -d '{"repo":"did:web:test","collection":"app.bsky.feed.post","record":{"text":"Hello, ATProto!"}}'
5375```
54765555-**Get a record** (use the rkey from createRecord response):
7777+</details>
7878+7979+<details>
8080+<summary>Get a record</summary>
56815782```bash
5883curl "http://localhost:5000/xrpc/com.atproto.repo.getRecord?repo=did:web:test&collection=app.bsky.feed.post&rkey=<RKEY>"
5984```
60856161-**Put a record** (upsert with explicit rkey):
8686+</details>
8787+8888+<details>
8989+<summary>Put a record</summary>
62906391```bash
6492curl -X POST http://localhost:5000/xrpc/com.atproto.repo.putRecord \
···6694 -d '{"repo":"did:web:test","collection":"app.bsky.feed.post","rkey":"my-post","record":{"text":"Updated!"}}'
6795```
68969797+</details>
9898+6999### Sync & CAR Export
701007171-**Get entire repository as CAR:**
101101+<details>
102102+<summary>Get entire repository as CAR</summary>
7210373104```bash
74105curl "http://localhost:5000/xrpc/com.atproto.sync.getRepo?did=did:web:test" -o repo.car
75106```
761077777-**Get specific blocks** (comma-separated CIDs):
108108+</details>
109109+110110+<details>
111111+<summary>Get specific blocks</summary>
7811279113```bash
80114curl "http://localhost:5000/xrpc/com.atproto.sync.getBlocks?did=did:web:test&cids=<CID1>,<CID2>" -o blocks.car
81115```
821168383-**Get a blob by CID:**
117117+</details>
118118+119119+<details>
120120+<summary>Get a blob by CID</summary>
8412185122```bash
86123curl "http://localhost:5000/xrpc/com.atproto.sync.getBlob?did=did:web:test&cid=<BLOB_CID>"
87124```
88125126126+</details>
127127+89128### Firehose (WebSocket)
9012991130Subscribe to real-time commit events using [websocat](https://github.com/vi/websocat):
921319393-```bash
9494-# Install websocat (macOS)
9595-brew install websocat
132132+<details>
133133+<summary>Open a WebSocket connection</summary>
961349797-# Connect to firehose
135135+```bash
98136websocat ws://localhost:5000/xrpc/com.atproto.sync.subscribeRepos
99137```
100138139139+</details>
140140+141141+<br />
101142Then create/update records in another terminal to see CBOR-encoded commit events stream in real-time.
102143103103-**With cursor for resumption:**
144144+<br />
145145+146146+<details>
147147+<summary>Open a WebSocket connection with cursor for resumption</summary>
104148105149```bash
106150websocat "ws://localhost:5000/xrpc/com.atproto.sync.subscribeRepos?cursor=5"
107151```
108152109109-## Configuration
110110-111111-The application uses `appsettings.json` and supports Environment Variable overrides.
112112-113113-| Key | Env Var | Default | Description |
114114-| ----------- | ------------------- | ----------------------- | ------------------------- |
115115-| `DidHost` | `PDSHARP_DidHost` | `did:web:localhost` | The DID of the PDS itself |
116116-| `PublicUrl` | `PDSHARP_PublicUrl` | `http://localhost:5000` | Publicly reachable URL |
117117-118118-Example `appsettings.json`:
119119-120120-```json
121121-{
122122- "PublicUrl": "http://localhost:5000",
123123- "DidHost": "did:web:localhost"
124124-}
125125-```
153153+</details>
126154127155## Architecture
128156129129-### App (Giraffe)
157157+<details>
158158+<summary>App (Giraffe)</summary>
130159131160- `XrpcRouter`: `/xrpc/<NSID>` routing
132161- `Auth`: Session management (JWTs)
133162- `RepoApi`: Write/Read records (`putRecord`, `getRecord`)
134163- `ServerApi`: Server meta (`describeServer`)
135164136136-### Core (Pure F#)
165165+</details>
166166+167167+<details>
168168+<summary>Core (Pure F#)</summary>
137169138170- `DidResolver`: Identity resolution
139171- `RepoEngine`: MST, DAG-CBOR, CIDs, Blocks
140172- `Models`: Data types for XRPC/Database
141173142142-### Infra
174174+</details>
175175+176176+<details>
177177+<summary>Infra</summary>
143178144179- SQLite/Postgres for persistence
145180- S3/Disk for blob storage
181181+182182+</details>
+27-35
roadmap.txt
···5959- [x] Conformance testing: diff CIDs/CARs/signatures vs reference PDS
6060DoD: Same inputs → same outputs for repo/sync surfaces
6161--------------------------------------------------------------------------------
6262-Milestone J: Persistence + Backups
6262+Milestone J: Storage Backend Configuration
6363--------------------------------------------------------------------------------
6464-Deliverables:
6565- - BackupOps module in Core (scheduler unit / cron / scripts, plus Litestream config)
6666-Backups (SQLite)
6767- [ ] Set PDS_SQLITE_DISABLE_WAL_AUTO_CHECKPOINT=true (Litestream-friendly)
6868- [ ] Run a scheduled backup/replication job that:
6969- - finds recently updated DBs
7070- - backs up /pds/actors/* and PDS-wide DBs
7171- - runs on SIGTERM during deploys (avoid missing last writes)
7272-Backups (Blobs)
7373- [ ] Configurable Options (app settings):
7474- (A) Disk blobs: include /pds/blocks in backups
7575- (B) S3-compatible blobstore: rely on object-store durability
7676-Guardrails
7777- [ ] Uptime check: https://<pds>/xrpc/_health
7878- [ ] Alert if “latest backup” is older than N minutes.
7979- [ ] Alert on disk pressure for /pds.
6464+- [ ] Configure SQLite WAL mode (PDS_SQLITE_DISABLE_WAL_AUTO_CHECKPOINT=true)
6565+- [ ] Implement S3-compatible blobstore adapter (optional via config)
6666+- [ ] Configure disk-based vs S3-based blob storage selection
6767+DoD: PDS runs with S3 blobs (if configured) and SQLite passes Litestream checks
6868+--------------------------------------------------------------------------------
6969+Milestone K: Backup Automation + Guardrails
7070+--------------------------------------------------------------------------------
7171+- [ ] Implement BackupOps module (scheduler/cron logic)
7272+- [ ] Automated backup jobs:
7373+ - [ ] Databases (Litestream or raw copy) + /pds/actors backup
7474+ - [ ] Local disk blobs (if applicable)
7575+- [ ] Guardrails & Monitoring:
7676+ - [ ] Uptime check endpoint: /xrpc/_health with JSON status
7777+ - [ ] Alerts: "Latest backup" too old, Disk pressure > 90%
7878+ - [ ] Log retention policies
8079DoD:
8181- - You can restore onto a fresh host and pass the P3 verification checklist.
8282- - Backups run automatically and are observable (“last successful backup”).
8383- - Backup set is explicitly documented (DBs + blobs decision).
8080+ - Backups run automatically and report status
8181+ - Health checks indicate system state
8282+ - Restore drill: Restore backups onto a fresh host passes verification
8383+ - Backup set is explicitly documented
8484================================================================================
8585PHASE 2: DEPLOYMENT (Self-Host)
8686================================================================================
8787-Milestone J: Topology + Domain Planning
8787+Milestone L: Topology + Domain Planning
8888--------------------------------------------------------------------------------
8989- Choose PDS hostname (pds.example.com) vs handle domain (example.com)
9090- Obtain domain, DNS access, VPS with static IP, reverse proxy
9191DoD: Clear plan for PDS location, handle, and DID resolution
9292--------------------------------------------------------------------------------
9393-Milestone K: DNS + TLS + Reverse Proxy
9393+Milestone M: DNS + TLS + Reverse Proxy
9494--------------------------------------------------------------------------------
9595- DNS A/AAAA records for PDS hostname
9696- TLS certs (ACME) via Caddy
9797DoD: https://<pds-hostname> responds with valid cert
9898--------------------------------------------------------------------------------
9999-Milestone L: Deploy PDSharp
9999+Milestone N: Deploy PDSharp
100100--------------------------------------------------------------------------------
101101- Deploy built PDS with persistence (SQLite/Postgres + blob storage)
102102- Verify /xrpc/com.atproto.server.describeServer
103103DoD: describeServer returns capabilities payload
104104--------------------------------------------------------------------------------
105105-Milestone M: Account Creation
105105+Milestone O: Account Creation
106106--------------------------------------------------------------------------------
107107- Create account using admin tooling
108108- Verify authentication: createSession
109109DoD: Obtain session and perform authenticated write
110110--------------------------------------------------------------------------------
111111-Milestone N: Smoke Test Repo + Blobs
111111+Milestone P: Smoke Test Repo + Blobs
112112--------------------------------------------------------------------------------
113113- Write record via putRecord
114114- Upload blob, verify retrieval via sync.getBlob
115115DoD: Posts appear in clients, media loads reliably
116116--------------------------------------------------------------------------------
117117-Milestone O: Account Migration
117117+Milestone Q: Account Migration
118118--------------------------------------------------------------------------------
119119- Export/import from bsky.social
120120- Update DID service endpoint
121121- Verify handle/DID resolution
122122DoD: Handle unchanged, DID points to your PDS
123123--------------------------------------------------------------------------------
124124-Milestone P: Reliability
125125---------------------------------------------------------------------------------
126126-- Backups: repo storage + database + blobs
127127-- Restore drill on fresh instance
128128-- Monitoring: uptime checks for describeServer + getBlob
129129-DoD: Restore from backup passes smoke tests
130130---------------------------------------------------------------------------------
131131-Milestone Q: Updates + Security
124124+Milestone R: Updates + Security
132125--------------------------------------------------------------------------------
133126- Update cadence with rollback plan
134127- Rate limits and access controls at proxy
135135-- Log retention and disk growth alerts
136128DoD: Update smoothly, maintain stable federation
137129================================================================================
138130QUICK CHECKLIST