···11+# pds-yrs Architecture
22+33+## Overview
44+55+pds-yrs stores [Yrs](https://docs.rs/yrs) CRDT documents on an ATProto PDS (Personal Data Server). Each file is tracked as a Yrs Doc, enabling conflict-free collaborative editing across multiple devices and writers — no git involved.
66+77+## Key Concepts
88+99+### Device-Aware Model
1010+1111+Each device gets its own **rkey** (auto-generated as `{project}-{random8}`), stored in `.yrs/pdsyrs_config.json`. The `YrsRepo.name` field is the shared **project name**, not the rkey. This means two devices editing the same project create separate records that won't overwrite each other.
1212+1313+On first initialization (`pds-yrs save --project my-wiki`):
1414+1. A new device rkey is generated (e.g. `my-wiki-a3f7k2m9`)
1515+2. `listRecords` discovers any existing repos for the same project
1616+3. Our rkey is added as a **collaborator** to each existing repo's record
1717+4. Those existing rkeys become our initial collaborators
1818+1919+Subsequent saves skip `listRecords` entirely — the rkey is read from local config.
2020+2121+### YrsRepo
2222+2323+A **YrsRepo** (`net.commoninternet.yrsrepo`) is a single ATProto record that holds one device/writer's CRDT state. It contains:
2424+2525+- A `name` — project name (shared across all devices for the same project)
2626+- A `files` map of `FileEntry` structs (one per tracked file)
2727+- An `updatedAt` timestamp
2828+- A `collaborators` list of `Collaborator` references (rkey + optional PDS URL)
2929+3030+A YrsRepo is **not** a complete picture of a collaborative project. To materialize the full state, you merge YrsRepos from all collaborators.
3131+3232+### Collaborators
3333+3434+Each `Collaborator` has:
3535+- **`rkey`** — the rkey of another device's repo record
3636+- **`pds`** — optional PDS URL for cross-PDS collaboration (omitted when on the same PDS)
3737+3838+Collaborators are registered automatically on first initialization and used by merge to discover peer rkeys without calling `listRecords`. Merge falls back to `listRecords` if no collaborators are found (e.g. when merging without a local config).
3939+4040+### FileEntry
4141+4242+Each file in a YrsRepo has a `FileEntry` with:
4343+4444+- **`content`** — Plain text of the current file contents. Always kept up-to-date so any client can read the data without Yrs decoding (the portability escape hatch — see Export).
4545+- **`snapshotBlob`** — BlobRef pointing to the full Yrs state (`encode_state_as_update_v1`).
4646+- **`stateVector`** — Base64-encoded Yrs state vector (reflects the state after snapshot + all updates applied).
4747+- **`updates`** — Ordered list of `PackRef`s, each pointing to an incremental Yrs update within a pack blob. These accumulate between compactions.
4848+- **`updatesCount`** — How many incremental saves have occurred since the last compaction. When any file in the repo reaches the compaction threshold (10), the entire repo is compacted.
4949+- **`kind`** — `text` (Yrs CRDT merge) or `binary` (raw blob, no CRDT).
5050+- **`packRef`** — Optional reference into a pack blob for the snapshot data (see below).
5151+5252+### Manifest
5353+5454+The manifest is a special `FileEntry` stored at the key `pdsyrs_manifest`. It uses a Yrs Map (not Text) to track which files exist and their kinds. The manifest is itself CRDT-merged, so concurrent adds from different collaborators converge — "set wins over delete" semantics mean that if one writer deletes a file while another edits it, the edit wins.
5555+5656+All internal keys use the `pdsyrs_` prefix to avoid collisions with user files.
5757+5858+### Local State (`.yrs/`)
5959+6060+The `.yrs/` directory (analogous to `.git/`) caches Yrs Doc snapshots, state vectors, and snapshot CIDs between sync cycles. This enables incremental sync — only diffs since the last cycle need to be exchanged.
6161+6262+Key files:
6363+- **`pdsyrs_config.json`** — device config: PDS URL, handle, DID, project name, and device rkey
6464+- **`pdsyrs_manifest.yrs`** / **`pdsyrs_manifest.sv`** — cached manifest CRDT state
6565+6666+## Pack Blobs
6767+6868+ATProto limits the number of blob uploads per request. To work around this, pds-yrs bundles all file blobs from a single save into one **pack blob**.
6969+7070+### Format
7171+7272+```
7373+[4 bytes: index length (u32 LE)]
7474+[index: JSON array of PackEntry]
7575+[blob data: concatenated file data]
7676+```
7777+7878+Each `PackEntry` records a file's `path`, `offset`, `length`, and `data_type` (snapshot, update, or binary). After uploading the pack blob, each FileEntry gets a `PackRef` pointing into it — the blob CID plus offset/length to extract that file's data.
7979+8080+### Compression
8181+8282+Packs are gzip-compressed before upload when beneficial. On load, `parse_pack_auto` detects gzip and decompresses transparently.
8383+8484+### Chunking
8585+8686+ATProto has a per-blob size limit (~50MB). Packs larger than `CHUNK_SIZE` are split into chunks, each uploaded as a separate blob. The `PackRef.chunks` field holds the ordered list of chunk BlobRefs. On load, chunks are reassembled before parsing the pack.
8787+8888+### Pack Cache
8989+9090+The loader and merger maintain a `pack_cache` keyed by CID. After compaction, all files share a single pack blob — one download serves all files. Between compactions, the repo may reference multiple pack blobs (the snapshot pack + update packs from subsequent saves). Each is downloaded once and cached. Since packs are immutable, previously cached packs remain valid across sync cycles.
9191+9292+The merge path uses the same pack cache pattern, reducing O(N × sites) blob downloads to O(pack_blobs × sites). For a typical repo with one pack blob per site, merging 200 files × 2 sites requires ~4 blob downloads instead of ~400.
9393+9494+## Operations
9595+9696+### Save (incremental)
9797+9898+1. Resolve device rkey from `.yrs/pdsyrs_config.json` (or generate on first init).
9999+2. On first init: discover peers via `listRecords`, register as collaborator.
100100+3. Collect local files (with optional include/exclude glob filters).
101101+4. Fetch the existing YrsRepo record from PDS (if any).
102102+5. For each file:
103103+ - **Unchanged**: reuse the existing FileEntry as-is (no upload).
104104+ - **Changed text, below compaction threshold**: reconstruct the Yrs Doc from the existing snapshot + accumulated updates, apply a character-level diff (`similar` crate, Patience algorithm), then compute `encode_diff_v1(old_state_vector)` — only the new ops, not the full state. The diff goes into the pending pack as a `PackDataType::Update`. The FileEntry keeps its existing `packRef` (snapshot stays in the old pack) and a new `PackRef` is appended to `updates`.
105105+ - **New text file**: full snapshot (no existing state to diff against).
106106+ - **Binary**: store raw bytes; change detection uses FNV-1a hash.
107107+6. Detect deletions: files in manifest but missing from disk are removed.
108108+7. Bundle all pending diffs/snapshots into a single compressed pack, upload it, wire up `PackRef`s.
109109+8. Preserve collaborators from existing record (merge with any new ones on first init).
110110+9. Write the YrsRepo record to PDS via `putRecord` (with `swapCommit` for optimistic concurrency).
111111+112112+Over successive saves, unchanged files keep pointing to their original pack blob. Only the changed files get new update entries in the new pack.
113113+114114+#### Compaction
115115+116116+When any file's `updatesCount` reaches the threshold (default 10), the entire repo is compacted: every file gets a fresh full snapshot in a single new pack blob. All `updates` lists are cleared, `updatesCount` resets to 0, and all old pack blobs become unreferenced (eligible for PDS garbage collection).
117117+118118+### Load
119119+120120+1. Resolve device rkey from `.yrs/pdsyrs_config.json`.
121121+2. Fetch the YrsRepo record.
122122+3. If a manifest exists, only load files listed in it.
123123+4. For each text file:
124124+ - Extract the snapshot data from its `packRef` pack blob (using the pack cache).
125125+ - Reconstruct the Yrs Doc from the snapshot.
126126+ - Apply each update in `updates` order — each may be in a different pack blob (all cached by CID).
127127+ - Materialize text and write to disk.
128128+5. For binary files: extract raw bytes from the pack and write to disk.
129129+130130+### Merge
131131+132132+Merging combines YrsRepos from multiple devices/collaborators:
133133+134134+1. Resolve rkeys: if own rkey is known, read collaborators from our record. Otherwise fall back to `listRecords`.
135135+2. Fetch all YrsRepo records by rkey.
136136+3. CRDT-merge the manifest Maps (concurrent adds converge, set-wins-over-delete).
137137+4. For each file in the merged manifest (using pack cache to avoid redundant downloads):
138138+ - **Text**: CRDT-merge all Yrs Docs. Each doc's state is diffed against the merged state vector and applied — the result is conflict-free by construction.
139139+ - **Binary**: compare CIDs. If all repos share the same CID, no conflict. If CIDs differ, create conflict files named `file.reponame.ext`.
140140+141141+### Sync
142142+143143+A polling loop that alternates load and save cycles, using the PDS as a relay when peers can't connect directly (WebRTC fallback). Auto-generates device rkey on first init. Configurable poll interval and optional periodic materialization.
144144+145145+### Export
146146+147147+The data portability escape hatch. Reads only the `content` field from each FileEntry — no Yrs decoding or blob downloads needed. Works even without the Yrs library.
148148+149149+### List
150150+151151+Lists all repos stored on PDS, showing project name, rkey, last update time, file count, and collaborator count. Can be filtered by project name.
152152+153153+## CLI
154154+155155+| Command | Key flags | Description |
156156+|---------|-----------|-------------|
157157+| `save` | `--project`, `--dir` | Save directory to PDS (auto-generates device rkey) |
158158+| `load` | `--project`, `--output` | Load from PDS using device rkey from local config |
159159+| `merge` | `--project`, `--output`, `--dir` (optional) | Merge all devices for a project |
160160+| `sync` | `--project`, `--dir`, `--interval` | Real-time sync loop |
161161+| `export` | `--project`, `--output` | Plain-text export |
162162+| `list` | `--project` (optional) | List repos on PDS |
163163+164164+All commands use `--project` (the shared project name) instead of raw rkeys. Device rkeys are managed automatically.
165165+166166+## Module Map
167167+168168+| Module | Responsibility |
169169+|--------|---------------|
170170+| `types.rs` | `YrsRepo`, `FileEntry`, `Collaborator`, `PackRef`, `BlobRef` structs |
171171+| `yrs_pds.rs` | Yrs Doc ↔ FileEntry conversion, manifest Map helpers, base64, file kind detection |
172172+| `pack.rs` | Pack blob creation, parsing, compression, chunking |
173173+| `save.rs` | Directory → PDS (with diffing, compaction, pack upload, collaborator merge) |
174174+| `load.rs` | PDS → directory (with pack cache) |
175175+| `merge.rs` | Multi-repo CRDT merge (with pack cache) |
176176+| `sync.rs` | Polling sync loop |
177177+| `export.rs` | Plain-text export (no Yrs needed) |
178178+| `local_state.rs` | `.yrs/` state directory, config, device rkey generation |
179179+| `pds_client.rs` | ATProto HTTP client (auth, records, blobs, listRecords) |
180180+| `main.rs` | CLI (`save`, `load`, `merge`, `sync`, `export`, `list` subcommands) |
+2-2
plans/improvements-v1.md
···2233## Context
4455-pds-yrs is a Rust crate that syncs Yrs CRDT documents to/from AT Protocol PDS. The benchmark work proved pure CRDT merge is both the fastest and most correct approach (2ms for 10 files, 81ms for 200 files with guaranteed conflicts, zero conflicts ever). Now we need to harden pds-yrs for real-world use.
55+pds-yrs is a Rust crate that syncs Yrs CRDT documents to/from ATProto PDS. The benchmark work proved pure CRDT merge is both the fastest and most correct approach (2ms for 10 files, 81ms for 200 files with guaranteed conflicts, zero conflicts ever). Now we need to harden pds-yrs for real-world use.
6677Current limitations: text-only files, no binary support, no rate limit handling, no blob batching, no compression, no token refresh, no chunking for large files, no file deletion support, and batch-only (no real-time sync).
88···146146- FileEntry's `updates_blob` becomes actively used (not just for compaction)
147147- Add `last_synced_at: Option<String>` timestamp per file
148148149149-**Optimization — AT Protocol Event Stream**:
149149+**Optimization — ATProto Event Stream**:
150150- Instead of polling, subscribe to the PDS firehose for record changes
151151- `com.atproto.sync.subscribeRepos` WebSocket — get notified when the SiteRecord changes
152152- Reduces latency from poll-interval to near-real-time (~100-500ms)
+1-1
plans/improvements-v3-crdt-manifest-final.md
···2233## Context
4455-pds-yrs syncs Yrs CRDT documents to/from AT Protocol PDS. Benchmark proved pure CRDT merge is fastest and most correct. Now hardening for real-world use.
55+pds-yrs syncs Yrs CRDT documents to/from ATProto PDS. Benchmark proved pure CRDT merge is fastest and most correct. Now hardening for real-world use.
6677Key design decisions:
88- **Manifest**: Yrs Map (keys = file paths, values = file kind). CRDT operations at key granularity — no character interleaving.
+156
plans/incremental-packs.md
···11+# Plan: Incremental Pack Saves
22+33+## Problem
44+55+Currently, every save re-uploads **all changed files as full snapshots** in a single pack blob. Even if only one file out of 200 changed, that file's entire Yrs state is re-encoded and uploaded. The "incremental" path in the code (`updatesCount < COMPACTION_THRESHOLD`) still produces a full snapshot — it just preserves CRDT history by reconstructing the doc before diffing. Previous pack blobs become orphaned since no FileEntry references them anymore.
66+77+This is wasteful for the common case: a user edits one file, saves, edits another file, saves, etc. Each save should only upload the diff for changed files.
88+99+## Design
1010+1111+### Core idea
1212+1313+Each save uploads a **small pack containing only Yrs diffs** for changed files. Unchanged files keep their existing `pack_ref` pointing to the previous pack blob. Over time, a YrsRepo accumulates references to multiple pack blobs. Periodically, a **compaction** pass merges everything into a single pack with fresh full snapshots.
1414+1515+### Current data model (what we have)
1616+1717+```rust
1818+pub struct FileEntry {
1919+ pub content: String, // plain text (always current)
2020+ pub snapshot_blob: BlobRef, // full Yrs state blob
2121+ pub state_vector: String, // base64 state vector
2222+ pub updates_blob: Option<BlobRef>, // unused in practice
2323+ pub updates_count: u32, // tracks edits, but always resets
2424+ pub snapshot_at: String,
2525+ pub kind: FileKind,
2626+ pub pack_ref: Option<PackRef>, // single pack ref
2727+ pub conflict_source: Option<String>,
2828+}
2929+```
3030+3131+### New data model
3232+3333+```rust
3434+pub struct FileEntry {
3535+ pub content: String,
3636+ pub snapshot_blob: BlobRef, // points to pack containing last full snapshot
3737+ pub state_vector: String, // current state vector (after all updates applied)
3838+ pub updates: Vec<PackRef>, // ordered list of incremental update packs since snapshot
3939+ pub updates_count: u32, // total number of incremental saves since last compaction
4040+ pub snapshot_at: String,
4141+ pub kind: FileKind,
4242+ pub pack_ref: Option<PackRef>, // where to find the snapshot within its pack blob
4343+ pub conflict_source: Option<String>,
4444+}
4545+```
4646+4747+Key change: `updates_blob: Option<BlobRef>` becomes `updates: Vec<PackRef>`. Each incremental save appends a new PackRef. The snapshot stays put until compaction.
4848+4949+### Save flow (new)
5050+5151+1. Collect local files, fetch existing YrsRepo record.
5252+2. For each file:
5353+ - **Unchanged**: reuse existing FileEntry as-is (no upload).
5454+ - **Changed text**:
5555+ - Reconstruct the Yrs Doc from the existing snapshot + all accumulated updates.
5656+ - Apply the local text diff to get the new Doc state.
5757+ - Compute `encode_diff_v1(existing_state_vector)` — this is just the new ops, not the full state.
5858+ - Add the diff bytes to the pending pack.
5959+ - Update the FileEntry: append a new PackRef to `updates`, increment `updates_count`, update `state_vector` and `content`.
6060+ - Keep `snapshot_blob` and `pack_ref` pointing to the existing snapshot pack.
6161+ - **New text file**: full snapshot (no existing state to diff against). Goes into pending pack.
6262+ - **Changed binary**: full blob in pending pack (no CRDT diffing for binary).
6363+ - **Deleted**: remove from manifest.
6464+3. Bundle all pending diffs/blobs into a single new pack blob, upload it.
6565+4. Wire up PackRefs for each changed file's new update entry.
6666+5. Write the YrsRepo record via `putRecord`.
6767+6868+### Load flow (updated)
6969+7070+1. Fetch YrsRepo record.
7171+2. For each file:
7272+ - Download the snapshot pack (if not cached), extract snapshot data via `pack_ref`.
7373+ - Reconstruct Doc from snapshot.
7474+ - For each entry in `updates`: download that pack (if not cached), extract the update data, apply it to the Doc.
7575+ - Materialize text and write to disk.
7676+3. Pack cache keyed by CID avoids redundant downloads — multiple files may share the same pack blob, and previous packs stay referenced across saves.
7777+7878+### Compaction
7979+8080+When `updates_count >= COMPACTION_THRESHOLD` for **any** file, the entire YrsRepo is compacted:
8181+8282+1. For every file: reconstruct full Doc (snapshot + all updates), encode a fresh full snapshot.
8383+2. Bundle all fresh snapshots into a single new pack blob.
8484+3. Reset all FileEntries: `pack_ref` points to the new pack, `updates` cleared, `updates_count` reset to 0.
8585+4. Upload the compacted pack and write the YrsRepo record.
8686+8787+After compaction, old pack blobs are no longer referenced and will be garbage-collected by the PDS.
8888+8989+Compaction is **per-repo, not per-file**. When any file crosses the threshold, everything gets compacted together. This keeps the logic simple and ensures pack blob count stays bounded.
9090+9191+### Why per-repo compaction (not per-file)
9292+9393+Per-file compaction would mean some files get new snapshots while others keep referencing old packs. The number of distinct pack blobs referenced by a YrsRepo would grow without bound — each old pack stays alive because at least one file still references it. Per-repo compaction produces exactly one pack blob, and all old packs become unreferenced simultaneously.
9494+9595+## Files to modify
9696+9797+### `src/types.rs`
9898+9999+- Replace `updates_blob: Option<BlobRef>` with `updates: Vec<PackRef>`
100100+- Add `#[serde(default, skip_serializing_if = "Vec::is_empty")]` on `updates`
101101+- Remove the `updates_blob` serde attributes
102102+103103+### `src/save.rs`
104104+105105+- **Incremental path** (changed file, below threshold):
106106+ - Reconstruct Doc from snapshot + existing updates (not just snapshot)
107107+ - Compute diff via `encode_diff_v1` instead of full `encode_snapshot`
108108+ - Use `PackDataType::Update` for the pending blob
109109+ - Preserve existing `pack_ref` and `snapshot_blob`; append new entry to `updates`
110110+- **Compaction path** (any file crosses threshold, or first save):
111111+ - Encode fresh full snapshots for ALL files
112112+ - Clear `updates` on all FileEntries
113113+ - Reset `updates_count` to 0
114114+- **Wire-up after pack upload**: differentiate between snapshot entries and update entries when assigning PackRefs
115115+116116+### `src/load.rs`
117117+118118+- After extracting snapshot data, iterate over `entry.updates` and apply each one
119119+- Each update may be in a different pack blob — use pack cache
120120+121121+### `src/yrs_pds.rs`
122122+123123+- `file_entry_to_doc()`: apply all entries from `updates` (not just a single `updates_blob`)
124124+- `get_file_blob_data()`: no longer needs to handle `updates_blob` separately — caller handles the updates list
125125+126126+### `src/merge.rs`
127127+128128+- `merge_text_file()` calls `file_entry_to_doc()` which handles updates — no direct changes needed
129129+- Verify that after merge, the output FileEntries have clean state (compacted)
130130+131131+### `src/export.rs`
132132+133133+- No changes needed — export reads `content` field only
134134+135135+### `src/local_state.rs`
136136+137137+- No changes needed — local state tracks full Doc snapshots independently
138138+139139+## Verification
140140+141141+### Unit tests
142142+143143+1. **Incremental save produces diff, not snapshot**: save a file, modify it, save again — verify the second save's pending blob is `PackDataType::Update` and is smaller than a full snapshot
144144+2. **Multiple incremental saves accumulate updates**: save 3 times with edits — verify FileEntry has `updates.len() == 2` (first save is snapshot, next two are updates)
145145+3. **Compaction resets everything**: push past threshold — verify `updates` is empty and `updates_count` is 0 after compaction
146146+4. **Load with accumulated updates**: create a FileEntry with snapshot + 3 update PackRefs — verify the loaded Doc has all edits applied in order
147147+5. **Round-trip**: save → edit → save → edit → save → load — verify final content matches
148148+149149+### Integration tests (with mock PDS)
150150+151151+6. **Incremental pack is small**: save 10 files, edit 1, save again — verify the second pack blob is much smaller than the first
152152+7. **Pack cache hit**: load a repo with 10 files sharing one pack — verify only one blob download
153153+8. **Compaction triggers correctly**: save 51 times with single-file edits — verify compaction happens on save 51
154154+### Property tests
155155+156156+10. **Content equivalence**: for any sequence of edits, the materialized content after incremental saves must equal the content after a single full-snapshot save
+8-8
src/export.rs
···11-//! Export site content from PDS as plain text files.
11+//! Export repo content from PDS as plain text files.
22//!
33//! This is the data portability escape hatch — reads the `content` field
44//! from each FileEntry without requiring Yrs decoding.
···66use std::path::Path;
7788use crate::pds_client::PdsClient;
99-use crate::types::{SiteRecord, COLLECTION};
99+use crate::types::{YrsRepo, COLLECTION};
10101111-/// Export a site from PDS to plain text files.
1111+/// Export a repo from PDS to plain text files.
1212///
1313/// Reads only the `content` field from each FileEntry — no Yrs
1414/// decoding or blob downloads needed. This works even if the Yrs
···2020 output_dir: &Path,
2121 verbose: bool,
2222) -> Result<usize, String> {
2323- // Fetch site record
2323+ // Fetch repo record
2424 let record = client
2525 .get_record(did, COLLECTION, rkey)
2626 .await?
2727- .ok_or_else(|| format!("site record not found: {}", rkey))?;
2727+ .ok_or_else(|| format!("repo record not found: {}", rkey))?;
28282929- let site: SiteRecord =
3030- serde_json::from_value(record.value).map_err(|e| format!("parse SiteRecord: {}", e))?;
2929+ let repo: YrsRepo =
3030+ serde_json::from_value(record.value).map_err(|e| format!("parse YrsRepo: {}", e))?;
31313232 let mut files_exported = 0;
33333434- for (rel_path, entry) in &site.files {
3434+ for (rel_path, entry) in &repo.files {
3535 if verbose {
3636 eprintln!("pds-yrs: export {}", rel_path);
3737 }
+3-3
src/lib.rs
···11-//! pds-yrs: Sync Yrs CRDT documents via AT Protocol PDS.
11+//! pds-yrs: Sync Yrs CRDT documents via ATProto PDS.
22//!
33//! No git involved — files are stored as Yrs Doc state on the PDS,
44//! with plain text content alongside for portability.
···1717pub use export::export;
1818pub use load::load;
1919pub use local_state::LocalState;
2020-pub use merge::merge_sites;
2020+pub use merge::{merge_project, merge_repos};
2121pub use pack::{
2222 chunk_data, compress, create_compressed_pack, create_pack, decompress, extract_entry, is_gzip,
2323 is_precompressed_extension, parse_pack, parse_pack_auto, reassemble_chunks, PackBlob,
···2626pub use pds_client::PdsClient;
2727pub use save::{save, save_filtered};
2828pub use sync::{sync_loop, SyncConfig, SyncCycleResult};
2929-pub use types::{FileKind, COLLECTION, MANIFEST_KEY};
2929+pub use types::{Collaborator, FileKind, COLLECTION, MANIFEST_KEY};
+62-16
src/load.rs
···11-//! Load a site from PDS into a local directory.
11+//! Load a repo from PDS into a local directory.
2233use std::collections::HashMap;
44use std::path::Path;
5566use crate::pds_client::PdsClient;
77-use crate::types::{FileKind, LoadResult, SiteRecord, COLLECTION, MANIFEST_KEY};
77+use crate::types::{FileKind, LoadResult, YrsRepo, COLLECTION, MANIFEST_KEY};
88use crate::yrs_pds;
991010-/// Load a site from PDS into a directory.
1010+/// Load a repo from PDS into a directory.
1111///
1212/// If a manifest exists, only loads files listed in the manifest.
1313/// Supports both text (Yrs CRDT) and binary files.
···1919 output_dir: &Path,
2020 verbose: bool,
2121) -> Result<LoadResult, String> {
2222- // Fetch site record
2222+ // Fetch repo record
2323 let record = client
2424 .get_record(did, COLLECTION, rkey)
2525 .await?
2626- .ok_or_else(|| format!("site record not found: {}", rkey))?;
2626+ .ok_or_else(|| format!("repo record not found: {}", rkey))?;
27272828- let site: SiteRecord =
2929- serde_json::from_value(record.value).map_err(|e| format!("parse SiteRecord: {}", e))?;
2828+ let repo: YrsRepo =
2929+ serde_json::from_value(record.value).map_err(|e| format!("parse YrsRepo: {}", e))?;
30303131 // Determine which files to load
3232- let file_list = if let Some(manifest_entry) = site.files.get(MANIFEST_KEY) {
3232+ let file_list = if let Some(manifest_entry) = repo.files.get(MANIFEST_KEY) {
3333 let manifest_doc = yrs_pds::file_entry_to_doc(manifest_entry, client, did).await?;
3434 let entries = yrs_pds::manifest_entries(&manifest_doc);
3535 if verbose {
···4646 // Cache for pack blobs (keyed by CID) to avoid re-downloading
4747 let mut pack_cache: HashMap<String, Vec<u8>> = HashMap::new();
48484949- for (rel_path, entry) in &site.files {
4949+ for (rel_path, entry) in &repo.files {
5050 if rel_path == MANIFEST_KEY {
5151 continue;
5252 }
···9595 )
9696 .await?;
9797 let doc = yrs_pds::doc_from_snapshot(&snapshot_data)?;
9898- if let Some(ref updates_blob) = entry.updates_blob {
9999- let updates_data = client.get_blob(did, updates_blob.cid()).await?;
100100- blobs_downloaded += 1;
101101- yrs_pds::apply_update(&doc, &updates_data)?;
9898+ // Apply incremental updates from pack refs
9999+ for update_ref in &entry.updates {
100100+ let update_data = get_pack_ref_data_cached(
101101+ update_ref,
102102+ client,
103103+ did,
104104+ &mut pack_cache,
105105+ &mut blobs_downloaded,
106106+ )
107107+ .await?;
108108+ yrs_pds::apply_update(&doc, &update_data)?;
102109 }
103110 let content = yrs_pds::materialize(&doc);
104111 std::fs::write(&output_path, &content)
···107114 // Direct blob download (no pack ref)
108115 let doc = yrs_pds::file_entry_to_doc(entry, client, did).await?;
109116 blobs_downloaded += 1;
110110- if entry.updates_blob.is_some() {
111111- blobs_downloaded += 1;
112112- }
117117+ blobs_downloaded += entry.updates.len();
113118 let content = yrs_pds::materialize(&doc);
114119 std::fs::write(&output_path, &content)
115120 .map_err(|e| format!("write {:?}: {}", output_path, e))?;
···131136 files_loaded,
132137 blobs_downloaded,
133138 })
139139+}
140140+141141+/// Extract data from a PackRef, using pack cache when available.
142142+async fn get_pack_ref_data_cached(
143143+ pack_ref: &crate::types::PackRef,
144144+ client: &PdsClient,
145145+ did: &str,
146146+ pack_cache: &mut HashMap<String, Vec<u8>>,
147147+ blobs_downloaded: &mut usize,
148148+) -> Result<Vec<u8>, String> {
149149+ let cid = pack_ref.blob.cid().to_string();
150150+151151+ if !pack_cache.contains_key(&cid) {
152152+ let data = if let Some(ref chunks) = pack_ref.chunks {
153153+ let mut chunk_data = Vec::new();
154154+ for chunk_ref in chunks {
155155+ let chunk = client.get_blob(did, chunk_ref.cid()).await?;
156156+ *blobs_downloaded += 1;
157157+ chunk_data.push(chunk);
158158+ }
159159+ crate::pack::reassemble_chunks(&chunk_data)
160160+ } else {
161161+ let d = client.get_blob(did, &cid).await?;
162162+ *blobs_downloaded += 1;
163163+ d
164164+ };
165165+ pack_cache.insert(cid.clone(), data);
166166+ }
167167+168168+ let pack_data = pack_cache.get(&cid).unwrap();
169169+ let (_, blob_data) = crate::pack::parse_pack_auto(pack_data)?;
170170+171171+ let start = pack_ref.offset as usize;
172172+ let end = start + pack_ref.length as usize;
173173+ if end > blob_data.len() {
174174+ return Err(format!(
175175+ "pack_ref out of bounds: {}..{} in {} bytes",
176176+ start, end, blob_data.len()
177177+ ));
178178+ }
179179+ Ok(blob_data[start..end].to_vec())
134180}
135181136182/// Get blob data for a file entry, using pack cache when available.
+85-13
src/local_state.rs
···11-//! Local state management for `.pds-yrs/` directory.
11+//! Local state management for `.yrs/` directory.
22//!
33//! Persists Yrs Doc state and state vectors between save/load cycles,
44//! enabling incremental sync (only exchange diffs since last sync).
5566use std::path::{Path, PathBuf};
7788+use rand::Rng;
89use yrs::Doc;
9101011use crate::yrs_pds;
11121213/// Local state directory name (like `.git/`).
1313-const STATE_DIR: &str = ".pds-yrs";
1414+const STATE_DIR: &str = ".yrs";
14151515-/// Manages the `.pds-yrs/` local state directory.
1616+/// Manages the `.yrs/` local state directory.
1617pub struct LocalState {
1718 state_dir: PathBuf,
1819}
19202020-/// Configuration stored in `.pds-yrs/_config.json`.
2121+/// Configuration stored in `.yrs/pdsyrs_config.json`.
2122#[derive(Debug, serde::Serialize, serde::Deserialize)]
2223pub struct LocalConfig {
2324 pub pds_url: String,
2425 pub handle: String,
2525- pub site_rkey: String,
2626 pub did: String,
2727+ pub project: String,
2828+ pub repo_rkey: String,
2929+ /// Cached access token (avoids re-login on every invocation).
3030+ #[serde(default, skip_serializing_if = "Option::is_none")]
3131+ pub access_token: Option<String>,
3232+ /// Cached refresh token (used to get a new access token when expired).
3333+ #[serde(default, skip_serializing_if = "Option::is_none")]
3434+ pub refresh_token: Option<String>,
2735}
28362937impl LocalState {
···44524553 /// Save config.
4654 pub fn save_config(&self, config: &LocalConfig) -> Result<(), String> {
4747- let path = self.state_dir.join("_config.json");
5555+ let path = self.state_dir.join("pdsyrs_config.json");
4856 let json =
4957 serde_json::to_string_pretty(config).map_err(|e| format!("serialize config: {}", e))?;
5058 std::fs::write(&path, json).map_err(|e| format!("write config: {}", e))
···52605361 /// Load config.
5462 pub fn load_config(&self) -> Result<Option<LocalConfig>, String> {
5555- let path = self.state_dir.join("_config.json");
6363+ let path = self.state_dir.join("pdsyrs_config.json");
5664 if !path.exists() {
5765 return Ok(None);
5866 }
···6270 Ok(Some(config))
6371 }
64727373+ /// Save tokens to config (updates existing config in place).
7474+ pub fn save_tokens(
7575+ &self,
7676+ access_token: &str,
7777+ refresh_token: &str,
7878+ ) -> Result<(), String> {
7979+ if let Some(mut config) = self.load_config()? {
8080+ config.access_token = Some(access_token.to_string());
8181+ config.refresh_token = Some(refresh_token.to_string());
8282+ self.save_config(&config)?;
8383+ }
8484+ Ok(())
8585+ }
8686+6587 /// Save a Yrs Doc's full state for a file path.
6688 pub fn save_doc_state(&self, rel_path: &str, doc: &Doc) -> Result<(), String> {
6789 let yrs_path = self.doc_path(rel_path);
···107129108130 /// Save the manifest Doc state.
109131 pub fn save_manifest(&self, doc: &Doc) -> Result<(), String> {
110110- self.save_doc_state("_manifest", doc)
132132+ self.save_doc_state("pdsyrs_manifest", doc)
111133 }
112134113135 /// Load the manifest Doc.
114136 pub fn load_manifest(&self) -> Result<Option<Doc>, String> {
115115- let yrs_path = self.doc_path("_manifest");
137137+ let yrs_path = self.doc_path("pdsyrs_manifest");
116138 if !yrs_path.exists() {
117139 return Ok(None);
118140 }
···136158 Ok(())
137159 }
138160139139- /// List all file paths that have local state (excluding _manifest and _config).
161161+ /// Ensure a device rkey exists for a project.
162162+ ///
163163+ /// If `pdsyrs_config.json` exists and the project matches, returns `(rkey, false)`.
164164+ /// Otherwise generates `{project}-{random8}`, saves config, and returns `(rkey, true)`.
165165+ /// The boolean indicates whether this is a newly created rkey (first initialization).
166166+ pub fn ensure_device_rkey(
167167+ &self,
168168+ project: &str,
169169+ pds_url: &str,
170170+ handle: &str,
171171+ did: &str,
172172+ ) -> Result<(String, bool), String> {
173173+ if let Some(config) = self.load_config()? {
174174+ if config.project == project {
175175+ return Ok((config.repo_rkey, false));
176176+ }
177177+ }
178178+ let suffix = generate_random_suffix();
179179+ let rkey = format!("{}-{}", project, suffix);
180180+ let config = LocalConfig {
181181+ pds_url: pds_url.to_string(),
182182+ handle: handle.to_string(),
183183+ did: did.to_string(),
184184+ project: project.to_string(),
185185+ repo_rkey: rkey.clone(),
186186+ access_token: None,
187187+ refresh_token: None,
188188+ };
189189+ self.save_config(&config)?;
190190+ Ok((rkey, true))
191191+ }
192192+193193+ /// List all file paths that have local state (excluding internal pdsyrs_ files).
140194 pub fn list_tracked_files(&self) -> Result<Vec<String>, String> {
141195 let mut files = Vec::new();
142196 self.collect_tracked(&self.state_dir, &self.state_dir, &mut files)?;
···197251 // Strip .yrs extension
198252 let rel = rel.trim_end_matches(".yrs").to_string();
199253 // Skip internal files
200200- if !rel.starts_with('_') {
254254+ if !rel.starts_with("pdsyrs_") {
201255 files.push(rel);
202256 }
203257 }
···205259 }
206260 Ok(())
207261 }
262262+}
263263+264264+/// Generate 8 random lowercase alphanumeric characters for device rkey suffix.
265265+fn generate_random_suffix() -> String {
266266+ let mut rng = rand::thread_rng();
267267+ let chars: Vec<char> = (0..8)
268268+ .map(|_| {
269269+ let idx = rng.gen_range(0..36);
270270+ if idx < 10 {
271271+ (b'0' + idx) as char
272272+ } else {
273273+ (b'a' + idx - 10) as char
274274+ }
275275+ })
276276+ .collect();
277277+ chars.into_iter().collect()
208278}
209279210280#[cfg(test)]
···256326 let config = LocalConfig {
257327 pds_url: "https://pds.example.com".to_string(),
258328 handle: "user.example.com".to_string(),
259259- site_rkey: "my-site".to_string(),
260329 did: "did:plc:abc123".to_string(),
330330+ project: "my-site".to_string(),
331331+ repo_rkey: "my-site-a1b2c3d4".to_string(),
261332 };
262333 state.save_config(&config).unwrap();
263334264335 let loaded = state.load_config().unwrap().unwrap();
265336 assert_eq!(loaded.pds_url, "https://pds.example.com");
266266- assert_eq!(loaded.site_rkey, "my-site");
337337+ assert_eq!(loaded.project, "my-site");
338338+ assert_eq!(loaded.repo_rkey, "my-site-a1b2c3d4");
267339 }
268340269341 #[test]
+306-45
src/main.rs
···44#[derive(Parser)]
55#[command(
66 name = "pds-yrs",
77- about = "Sync Yrs CRDT documents via AT Protocol PDS"
77+ about = "Sync Yrs CRDT documents via ATProto PDS"
88)]
99struct Cli {
1010 #[command(subcommand)]
···1818 /// Directory to save
1919 #[arg(long)]
2020 dir: String,
2121- /// AT Protocol handle
2121+ /// ATProto handle
2222 #[arg(long)]
2323 handle: String,
2424- /// Site name (used as rkey)
2424+ /// Project name (shared across devices)
2525 #[arg(long)]
2626- site: String,
2626+ project: String,
2727 /// Password for authentication
2828 #[arg(long)]
2929 password: String,
···4040 #[arg(long)]
4141 verbose: bool,
4242 },
4343- /// Load a site from PDS into a local directory
4343+ /// Load a repo from PDS into a local directory
4444 Load {
4545- /// AT Protocol handle
4545+ /// ATProto handle
4646 #[arg(long)]
4747 handle: String,
4848- /// Site name (used as rkey)
4848+ /// Project name (resolves device rkey from local config)
4949 #[arg(long)]
5050- site: String,
5050+ project: String,
5151 /// Output directory
5252 #[arg(long)]
5353 output: String,
···6161 #[arg(long)]
6262 verbose: bool,
6363 },
6464- /// Merge sites from multiple collaborators
6464+ /// Merge repos from all devices for a project
6565 Merge {
6666- /// Comma-separated site rkeys to merge
6666+ /// Project name (uses collaborators from own record, or falls back to listRecords)
6767 #[arg(long)]
6868- sites: String,
6969- /// AT Protocol handle
6868+ project: String,
6969+ /// ATProto handle
7070 #[arg(long)]
7171 handle: String,
7272 /// Output directory
···7878 /// PDS URL
7979 #[arg(long)]
8080 pds: Option<String>,
8181+ /// Working directory with .yrs/ config (to resolve own rkey for collaborator lookup)
8282+ #[arg(long)]
8383+ dir: Option<String>,
8184 /// Show progress
8285 #[arg(long)]
8386 verbose: bool,
···8790 /// Directory to sync
8891 #[arg(long)]
8992 dir: String,
9090- /// AT Protocol handle
9393+ /// ATProto handle
9194 #[arg(long)]
9295 handle: String,
9393- /// Site name (used as rkey)
9696+ /// Project name (shared across devices)
9497 #[arg(long)]
9595- site: String,
9898+ project: String,
9699 /// Password for authentication
97100 #[arg(long)]
98101 password: String,
···112115 #[arg(long)]
113116 verbose: bool,
114117 },
115115- /// Export site content as plain text (no Yrs decoding needed)
118118+ /// Export repo content as plain text (no Yrs decoding needed)
116119 Export {
117117- /// AT Protocol handle
120120+ /// ATProto handle
118121 #[arg(long)]
119122 handle: String,
120120- /// Site name (used as rkey)
123123+ /// Project name (resolves device rkey from local config)
121124 #[arg(long)]
122122- site: String,
125125+ project: String,
123126 /// Output directory
124127 #[arg(long)]
125128 output: String,
···133136 #[arg(long)]
134137 verbose: bool,
135138 },
139139+ /// List repos stored on PDS
140140+ List {
141141+ /// ATProto handle
142142+ #[arg(long)]
143143+ handle: String,
144144+ /// Filter by project name (optional, lists all if omitted)
145145+ #[arg(long)]
146146+ project: Option<String>,
147147+ /// Password for authentication
148148+ #[arg(long)]
149149+ password: String,
150150+ /// PDS URL
151151+ #[arg(long)]
152152+ pds: Option<String>,
153153+ },
136154}
137155138156#[tokio::main]
···143161 Command::Save {
144162 dir,
145163 handle,
146146- site,
164164+ project,
147165 password,
148166 pds,
149167 include,
···153171 run_save(
154172 &dir,
155173 &handle,
156156- &site,
174174+ &project,
157175 &password,
158176 pds.as_deref(),
159177 &include,
···164182 }
165183 Command::Load {
166184 handle,
167167- site,
185185+ project,
168186 output,
169187 password,
170188 pds,
171189 verbose,
172172- } => run_load(&handle, &site, &output, &password, pds.as_deref(), verbose).await,
190190+ } => run_load(&handle, &project, &output, &password, pds.as_deref(), verbose).await,
173191 Command::Merge {
174174- sites,
192192+ project,
175193 handle,
176194 output,
177195 password,
178196 pds,
197197+ dir,
179198 verbose,
180180- } => run_merge(&sites, &handle, &output, &password, pds.as_deref(), verbose).await,
199199+ } => run_merge(&project, &handle, &output, &password, pds.as_deref(), dir.as_deref(), verbose).await,
181200 Command::Sync {
182201 dir,
183202 handle,
184184- site,
203203+ project,
185204 password,
186205 pds,
187206 interval,
···192211 run_sync(
193212 &dir,
194213 &handle,
195195- &site,
214214+ &project,
196215 &password,
197216 pds.as_deref(),
198217 interval,
···204223 }
205224 Command::Export {
206225 handle,
207207- site,
226226+ project,
208227 output,
209228 password,
210229 pds,
211230 verbose,
212212- } => run_export(&handle, &site, &output, &password, pds.as_deref(), verbose).await,
231231+ } => run_export(&handle, &project, &output, &password, pds.as_deref(), verbose).await,
232232+ Command::List {
233233+ handle,
234234+ project,
235235+ password,
236236+ pds,
237237+ } => run_list(&handle, project.as_deref(), &password, pds.as_deref()).await,
213238 };
214239215240 if let Err(e) = result {
···229254 Ok((client, session.did))
230255}
231256257257+/// Login with token caching: try cached tokens from .yrs/ config first,
258258+/// fall back to password login, then save tokens for next time.
259259+async fn login_cached(
260260+ dir: &std::path::Path,
261261+ handle: &str,
262262+ password: &str,
263263+ pds_url: Option<&str>,
264264+) -> Result<(pds_yrs::PdsClient, String), String> {
265265+ let url = pds_url.unwrap_or("https://bluesky-pds.t1cc.commoninternet.net");
266266+ let local_state = pds_yrs::LocalState::open(dir)?;
267267+268268+ // Try cached tokens from config
269269+ if let Some(config) = local_state.load_config()? {
270270+ if let (Some(ref access), Some(ref refresh)) =
271271+ (&config.access_token, &config.refresh_token)
272272+ {
273273+ let mut client = pds_yrs::PdsClient::new(url);
274274+ client.set_tokens(access.clone(), refresh.clone());
275275+276276+ // Validate with a lightweight call (getRecord for non-existent record is fine)
277277+ // If it fails, the token is expired — try refresh, then fall back to login
278278+ match client.refresh_session().await {
279279+ Ok(()) => {
280280+ // Save refreshed tokens
281281+ if let (Some(at), Some(rt)) = (client.access_token(), client.refresh_token()) {
282282+ let _ = local_state.save_tokens(at, rt);
283283+ }
284284+ return Ok((client, config.did));
285285+ }
286286+ Err(_) => {
287287+ // Refresh failed, fall through to password login
288288+ }
289289+ }
290290+ }
291291+ }
292292+293293+ // Fall back to password login
294294+ let mut client = pds_yrs::PdsClient::new(url);
295295+ let session = client.login(handle, password).await?;
296296+297297+ // Cache tokens
298298+ if let (Some(at), Some(rt)) = (client.access_token(), client.refresh_token()) {
299299+ let _ = local_state.save_tokens(at, rt);
300300+ }
301301+302302+ Ok((client, session.did))
303303+}
304304+232305async fn run_save(
233306 dir: &str,
234307 handle: &str,
235235- site: &str,
308308+ project: &str,
236309 password: &str,
237310 pds_url: Option<&str>,
238311 include: &[String],
239312 exclude: &[String],
240313 verbose: bool,
241314) -> Result<(), String> {
242242- let (client, did) = login(handle, password, pds_url).await?;
315315+ let url = pds_url.unwrap_or("https://bluesky-pds.t1cc.commoninternet.net");
316316+ let dir_path = std::path::Path::new(dir);
317317+ let (client, did) = login_cached(dir_path, handle, password, Some(url)).await?;
318318+ let local_state = pds_yrs::LocalState::open(dir_path)?;
319319+ let (rkey, is_new) = local_state.ensure_device_rkey(project, url, handle, &did)?;
320320+ if verbose {
321321+ eprintln!(
322322+ "pds-yrs: project='{}', device rkey='{}' ({})",
323323+ project,
324324+ rkey,
325325+ if is_new { "new" } else { "existing" }
326326+ );
327327+ }
328328+329329+ // On first initialization, discover peers and register as collaborator
330330+ let initial_collaborators = if is_new {
331331+ discover_and_register(&client, &did, project, &rkey, verbose).await?
332332+ } else {
333333+ None
334334+ };
335335+243336 let inc = if include.is_empty() {
244337 None
245338 } else {
···251344 Some(exclude)
252345 };
253346 let result = pds_yrs::save_filtered(
254254- std::path::Path::new(dir),
347347+ dir_path,
255348 &client,
256349 &did,
257257- site,
350350+ &rkey,
351351+ project,
352352+ initial_collaborators.as_deref(),
258353 inc,
259354 exc,
260355 verbose,
···267362 Ok(())
268363}
269364365365+/// On first initialization, discover existing repos for the same project,
366366+/// add our rkey as a collaborator to each, and return them as our initial collaborators.
367367+async fn discover_and_register(
368368+ client: &pds_yrs::PdsClient,
369369+ did: &str,
370370+ project: &str,
371371+ our_rkey: &str,
372372+ verbose: bool,
373373+) -> Result<Option<Vec<pds_yrs::types::Collaborator>>, String> {
374374+ let records = client
375375+ .list_all_records(did, pds_yrs::COLLECTION)
376376+ .await?;
377377+378378+ let mut peer_collaborators: Vec<pds_yrs::types::Collaborator> = Vec::new();
379379+380380+ for entry in &records {
381381+ let peer_rkey = match entry.uri.rsplit('/').next() {
382382+ Some(r) => r.to_string(),
383383+ None => continue,
384384+ };
385385+ if peer_rkey == our_rkey {
386386+ continue;
387387+ }
388388+ let repo: pds_yrs::types::YrsRepo = match serde_json::from_value(entry.value.clone()) {
389389+ Ok(r) => r,
390390+ Err(_) => continue,
391391+ };
392392+ if repo.name != project {
393393+ continue;
394394+ }
395395+396396+ // Add our rkey as collaborator to this peer's record
397397+ if !repo.collaborators.iter().any(|c| c.rkey == our_rkey) {
398398+ let mut updated = repo.clone();
399399+ updated.collaborators.push(pds_yrs::types::Collaborator {
400400+ rkey: our_rkey.to_string(),
401401+ pds: None,
402402+ });
403403+ let updated_json = serde_json::to_value(&updated)
404404+ .map_err(|e| format!("serialize updated peer: {}", e))?;
405405+ client
406406+ .put_record(did, pds_yrs::COLLECTION, &peer_rkey, updated_json, entry.cid.clone())
407407+ .await?;
408408+ if verbose {
409409+ eprintln!(
410410+ "pds-yrs: registered as collaborator on peer '{}'",
411411+ peer_rkey
412412+ );
413413+ }
414414+ }
415415+416416+ // Track this peer as our collaborator
417417+ peer_collaborators.push(pds_yrs::types::Collaborator {
418418+ rkey: peer_rkey,
419419+ pds: None,
420420+ });
421421+ }
422422+423423+ if peer_collaborators.is_empty() {
424424+ Ok(None)
425425+ } else {
426426+ if verbose {
427427+ let names: Vec<&str> = peer_collaborators.iter().map(|c| c.rkey.as_str()).collect();
428428+ eprintln!("pds-yrs: discovered {} peer(s): {}", names.len(), names.join(", "));
429429+ }
430430+ Ok(Some(peer_collaborators))
431431+ }
432432+}
433433+270434async fn run_load(
271435 handle: &str,
272272- site: &str,
436436+ project: &str,
273437 output: &str,
274438 password: &str,
275439 pds_url: Option<&str>,
276440 verbose: bool,
277441) -> Result<(), String> {
278442 let (client, did) = login(handle, password, pds_url).await?;
279279- let result = pds_yrs::load(&client, &did, site, std::path::Path::new(output), verbose).await?;
443443+ let output_path = std::path::Path::new(output);
444444+ let local_state = pds_yrs::LocalState::open(output_path)?;
445445+ let rkey = match local_state.load_config()? {
446446+ Some(config) if config.project == project => config.repo_rkey,
447447+ _ => return Err(format!(
448448+ "no device rkey found for project '{}' — run 'save' first to initialize",
449449+ project
450450+ )),
451451+ };
452452+ if verbose {
453453+ eprintln!("pds-yrs: loading project='{}', rkey='{}'", project, rkey);
454454+ }
455455+ let result = pds_yrs::load(&client, &did, &rkey, output_path, verbose).await?;
280456 eprintln!(
281457 "pds-yrs: loaded {} file(s), {} blob(s) downloaded",
282458 result.files_loaded, result.blobs_downloaded
···285461}
286462287463async fn run_merge(
288288- sites: &str,
464464+ project: &str,
289465 handle: &str,
290466 output: &str,
291467 password: &str,
292468 pds_url: Option<&str>,
469469+ dir: Option<&str>,
293470 verbose: bool,
294471) -> Result<(), String> {
295472 let (client, did) = login(handle, password, pds_url).await?;
296296- let rkeys: Vec<&str> = sites.split(',').collect();
297297- pds_yrs::merge_sites(&client, &did, &rkeys, std::path::Path::new(output), verbose).await?;
473473+ // Try to resolve own rkey from local config for collaborator-based merge
474474+ let own_rkey = if let Some(d) = dir {
475475+ let local_state = pds_yrs::LocalState::open(std::path::Path::new(d))?;
476476+ local_state
477477+ .load_config()?
478478+ .filter(|c| c.project == project)
479479+ .map(|c| c.repo_rkey)
480480+ } else {
481481+ None
482482+ };
483483+ pds_yrs::merge_project(
484484+ &client,
485485+ &did,
486486+ project,
487487+ own_rkey.as_deref(),
488488+ std::path::Path::new(output),
489489+ verbose,
490490+ )
491491+ .await?;
298492 eprintln!("pds-yrs: merge complete");
299493 Ok(())
300494}
···302496async fn run_sync(
303497 dir: &str,
304498 handle: &str,
305305- site: &str,
499499+ project: &str,
306500 password: &str,
307501 pds_url: Option<&str>,
308502 interval: u64,
···310504 exclude: &[String],
311505 verbose: bool,
312506) -> Result<(), String> {
313313- let (client, did) = login(handle, password, pds_url).await?;
507507+ let url = pds_url.unwrap_or("https://bluesky-pds.t1cc.commoninternet.net");
508508+ let dir_path = std::path::Path::new(dir);
509509+ let (client, did) = login_cached(dir_path, handle, password, Some(url)).await?;
510510+ let local_state = pds_yrs::LocalState::open(dir_path)?;
511511+ let (rkey, is_new) = local_state.ensure_device_rkey(project, url, handle, &did)?;
512512+ if is_new {
513513+ discover_and_register(&client, &did, project, &rkey, verbose).await?;
514514+ }
314515 let config = pds_yrs::SyncConfig {
315516 dir: dir.to_string(),
316517 interval: std::time::Duration::from_secs(interval),
···321522 verbose,
322523 };
323524 eprintln!(
324324- "pds-yrs: starting sync (interval={}s, Ctrl+C to stop)",
325325- interval
525525+ "pds-yrs: starting sync project='{}', rkey='{}' (interval={}s, Ctrl+C to stop)",
526526+ project, rkey, interval
326527 );
327327- pds_yrs::sync_loop(&client, &did, site, &config).await?;
528528+ pds_yrs::sync_loop(&client, &did, &rkey, project, &config).await?;
328529 Ok(())
329530}
330531331532async fn run_export(
332533 handle: &str,
333333- site: &str,
534534+ project: &str,
334535 output: &str,
335536 password: &str,
336537 pds_url: Option<&str>,
337538 verbose: bool,
338539) -> Result<(), String> {
339540 let (client, did) = login(handle, password, pds_url).await?;
340340- let count = pds_yrs::export(&client, &did, site, std::path::Path::new(output), verbose).await?;
541541+ let output_path = std::path::Path::new(output);
542542+ let local_state = pds_yrs::LocalState::open(output_path)?;
543543+ let rkey = match local_state.load_config()? {
544544+ Some(config) if config.project == project => config.repo_rkey,
545545+ _ => return Err(format!(
546546+ "no device rkey found for project '{}' — run 'save' first to initialize",
547547+ project
548548+ )),
549549+ };
550550+ if verbose {
551551+ eprintln!("pds-yrs: exporting project='{}', rkey='{}'", project, rkey);
552552+ }
553553+ let count = pds_yrs::export(&client, &did, &rkey, output_path, verbose).await?;
341554 eprintln!("pds-yrs: exported {} file(s)", count);
342555 Ok(())
343556}
557557+558558+async fn run_list(
559559+ handle: &str,
560560+ project: Option<&str>,
561561+ password: &str,
562562+ pds_url: Option<&str>,
563563+) -> Result<(), String> {
564564+ let (client, did) = login(handle, password, pds_url).await?;
565565+ let records = client
566566+ .list_all_records(&did, pds_yrs::COLLECTION)
567567+ .await?;
568568+569569+ let mut entries: Vec<(String, String, String, usize, usize)> = Vec::new();
570570+ for entry in &records {
571571+ if let Ok(repo) = serde_json::from_value::<pds_yrs::types::YrsRepo>(entry.value.clone()) {
572572+ if let Some(filter) = project {
573573+ if repo.name != filter {
574574+ continue;
575575+ }
576576+ }
577577+ let rkey = entry
578578+ .uri
579579+ .rsplit('/')
580580+ .next()
581581+ .unwrap_or("?")
582582+ .to_string();
583583+ let file_count = repo
584584+ .files
585585+ .keys()
586586+ .filter(|k| !k.starts_with("pdsyrs_"))
587587+ .count();
588588+ let collab_count = repo.collaborators.len();
589589+ entries.push((repo.name, rkey, repo.updated_at, file_count, collab_count));
590590+ }
591591+ }
592592+593593+ if entries.is_empty() {
594594+ eprintln!("pds-yrs: no repos found");
595595+ return Ok(());
596596+ }
597597+598598+ println!("{:<20} {:<30} {:<25} {:<6} {}", "PROJECT", "RKEY", "UPDATED", "FILES", "COLLABS");
599599+ for (name, rkey, updated, files, collabs) in &entries {
600600+ println!("{:<20} {:<30} {:<25} {:<6} {}", name, rkey, updated, files, collabs);
601601+ }
602602+603603+ Ok(())
604604+}
+216-74
src/merge.rs
···11-//! Merge multiple collaborators' sites via CRDT.
11+//! Merge multiple collaborators' repos via CRDT.
2233use std::collections::HashMap;
44use std::path::Path;
···66use yrs::updates::decoder::Decode;
77use yrs::{Doc, ReadTxn, Transact};
8899+use crate::pack;
910use crate::pds_client::PdsClient;
1010-use crate::types::{FileKind, SiteRecord, COLLECTION, MANIFEST_KEY};
1111+use crate::types::{FileEntry, FileKind, PackRef, YrsRepo, COLLECTION, MANIFEST_KEY};
1112use crate::yrs_pds;
12131313-/// Merge sites from multiple rkeys into an output directory.
1414+/// Merge all repos for a project.
1515+///
1616+/// If `own_rkey` is provided, uses the collaborators field from that record
1717+/// to discover peer rkeys (no `listRecords` needed). Otherwise falls back
1818+/// to `listRecords` to discover all repos for the project.
1919+pub async fn merge_project(
2020+ client: &PdsClient,
2121+ did: &str,
2222+ project_name: &str,
2323+ own_rkey: Option<&str>,
2424+ output_dir: &Path,
2525+ verbose: bool,
2626+) -> Result<(), String> {
2727+ let mut rkeys: Vec<String> = Vec::new();
2828+2929+ // Try collaborators from own record first
3030+ if let Some(rkey) = own_rkey {
3131+ if let Some(record) = client.get_record(did, COLLECTION, rkey).await? {
3232+ if let Ok(repo) = serde_json::from_value::<YrsRepo>(record.value) {
3333+ if repo.name == project_name {
3434+ rkeys.push(rkey.to_string());
3535+ for collab in &repo.collaborators {
3636+ if !rkeys.iter().any(|r| r == &collab.rkey) {
3737+ rkeys.push(collab.rkey.clone());
3838+ }
3939+ }
4040+ }
4141+ }
4242+ }
4343+ }
4444+4545+ // Fall back to listRecords if no collaborators found
4646+ if rkeys.is_empty() {
4747+ if verbose {
4848+ eprintln!("pds-yrs: no collaborators found, discovering via listRecords");
4949+ }
5050+ let records = client.list_all_records(did, COLLECTION).await?;
5151+ for entry in &records {
5252+ if let Ok(repo) = serde_json::from_value::<YrsRepo>(entry.value.clone()) {
5353+ if repo.name == project_name {
5454+ if let Some(rkey) = entry.uri.rsplit('/').next() {
5555+ rkeys.push(rkey.to_string());
5656+ }
5757+ }
5858+ }
5959+ }
6060+ }
6161+6262+ if rkeys.is_empty() {
6363+ return Err(format!("no repos found for project: {}", project_name));
6464+ }
6565+ if verbose {
6666+ eprintln!(
6767+ "pds-yrs: merging {} repo(s) for project '{}': {}",
6868+ rkeys.len(),
6969+ project_name,
7070+ rkeys.join(", ")
7171+ );
7272+ }
7373+ let rkey_refs: Vec<&str> = rkeys.iter().map(|s| s.as_str()).collect();
7474+ merge_repos(client, did, &rkey_refs, output_dir, verbose).await
7575+}
7676+7777+/// Merge repos from multiple rkeys into an output directory.
1478///
1579/// For text files: CRDT merge all Yrs Docs (conflict-free).
1680/// For binary files: detect conflicts via CID comparison, create
1781/// `file.creator1.ext` + `file.creator2.ext` when CIDs differ.
1882/// Manifest Maps are CRDT-merged — "set wins over delete" for edit-wins.
1919-pub async fn merge_sites(
8383+pub async fn merge_repos(
2084 client: &PdsClient,
2185 did: &str,
2286 rkeys: &[&str],
2387 output_dir: &Path,
2488 verbose: bool,
2589) -> Result<(), String> {
2626- // Fetch all site records
2727- let mut sites: Vec<(String, SiteRecord)> = Vec::new();
9090+ // Fetch all repo records
9191+ let mut repos: Vec<(String, YrsRepo)> = Vec::new();
2892 for rkey in rkeys {
2993 let record = client
3094 .get_record(did, COLLECTION, rkey)
3195 .await?
3232- .ok_or_else(|| format!("site record not found: {}", rkey))?;
3333- let site: SiteRecord = serde_json::from_value(record.value)
3434- .map_err(|e| format!("parse SiteRecord for {}: {}", rkey, e))?;
3535- sites.push((rkey.to_string(), site));
9696+ .ok_or_else(|| format!("repo record not found: {}", rkey))?;
9797+ let repo: YrsRepo = serde_json::from_value(record.value)
9898+ .map_err(|e| format!("parse YrsRepo for {}: {}", rkey, e))?;
9999+ repos.push((rkey.to_string(), repo));
36100 }
37101102102+ // Pack cache: keyed by CID, avoids redundant blob downloads.
103103+ // All files in a repo typically share 1-2 pack blobs, so this reduces
104104+ // O(N × sites) blob downloads to O(pack_blobs × sites).
105105+ let mut pack_cache: HashMap<String, Vec<u8>> = HashMap::new();
106106+38107 // CRDT-merge manifests
3939- let merged_manifest = merge_manifests(&sites, client, did).await?;
108108+ let merged_manifest = merge_manifests(&repos, client, did, &mut pack_cache).await?;
40109 let manifest_entries = yrs_pds::manifest_entries(&merged_manifest);
4111042111 if verbose {
···46115 );
47116 }
481174949- // For each file in the merged manifest, merge across sites
118118+ // For each file in the merged manifest, merge across repos
50119 for (rel_path, kind) in &manifest_entries {
5151- // Collect which sites have this file
5252- let mut site_indices: Vec<usize> = Vec::new();
5353- for (i, (_, site)) in sites.iter().enumerate() {
5454- if site.files.contains_key(rel_path) {
5555- site_indices.push(i);
120120+ // Collect which repos have this file
121121+ let mut repo_indices: Vec<usize> = Vec::new();
122122+ for (i, (_, repo)) in repos.iter().enumerate() {
123123+ if repo.files.contains_key(rel_path) {
124124+ repo_indices.push(i);
56125 }
57126 }
581275959- if site_indices.is_empty() {
128128+ if repo_indices.is_empty() {
60129 continue;
61130 }
6213163132 if verbose {
64133 eprintln!(
6565- "pds-yrs: merging {} ({}, from {} site(s))",
134134+ "pds-yrs: merging {} ({}, from {} repo(s))",
66135 rel_path,
67136 match kind {
68137 FileKind::Text => "text",
69138 FileKind::Binary => "binary",
70139 },
7171- site_indices.len()
140140+ repo_indices.len()
72141 );
73142 }
74143···8014981150 match kind {
82151 FileKind::Text => {
8383- let content = merge_text_file(rel_path, &site_indices, &sites, client, did).await?;
152152+ let content =
153153+ merge_text_file(rel_path, &repo_indices, &repos, client, did, &mut pack_cache)
154154+ .await?;
84155 std::fs::write(&output_path, &content)
85156 .map_err(|e| format!("write {:?}: {}", output_path, e))?;
86157 }
87158 FileKind::Binary => {
88159 merge_binary_file(
89160 rel_path,
9090- &site_indices,
9191- &sites,
161161+ &repo_indices,
162162+ &repos,
92163 rkeys,
93164 client,
94165 did,
···106177 Ok(())
107178}
108179109109-/// CRDT-merge manifest Maps from all sites.
180180+/// CRDT-merge manifest Maps from all repos.
110181async fn merge_manifests(
111111- sites: &[(String, SiteRecord)],
182182+ repos: &[(String, YrsRepo)],
112183 client: &PdsClient,
113184 did: &str,
185185+ pack_cache: &mut HashMap<String, Vec<u8>>,
114186) -> Result<Doc, String> {
115187 let mut manifest_docs: Vec<Doc> = Vec::new();
116188117117- for (_rkey, site) in sites {
118118- let doc = if let Some(manifest_entry) = site.files.get(MANIFEST_KEY) {
119119- yrs_pds::file_entry_to_doc(manifest_entry, client, did).await?
189189+ for (_rkey, repo) in repos {
190190+ let doc = if let Some(manifest_entry) = repo.files.get(MANIFEST_KEY) {
191191+ file_entry_to_doc_cached(manifest_entry, client, did, pack_cache).await?
120192 } else {
121121- // Legacy site — create manifest from its files
193193+ // Legacy repo — create manifest from its files
122194 let doc = yrs_pds::new_manifest_doc();
123123- for (path, entry) in &site.files {
195195+ for (path, entry) in &repo.files {
124196 if path != MANIFEST_KEY {
125197 yrs_pds::manifest_insert(&doc, path, &entry.kind);
126198 }
···147219 Ok(manifest_docs.into_iter().next().unwrap())
148220}
149221150150-/// CRDT-merge a text file across multiple sites.
222222+/// CRDT-merge a text file across multiple repos.
151223async fn merge_text_file(
152224 rel_path: &str,
153153- site_indices: &[usize],
154154- sites: &[(String, SiteRecord)],
225225+ repo_indices: &[usize],
226226+ repos: &[(String, YrsRepo)],
155227 client: &PdsClient,
156228 did: &str,
229229+ pack_cache: &mut HashMap<String, Vec<u8>>,
157230) -> Result<String, String> {
158158- if site_indices.len() == 1 {
159159- let entry = &sites[site_indices[0]].1.files[rel_path];
160160- let doc = yrs_pds::file_entry_to_doc(entry, client, did).await?;
231231+ if repo_indices.len() == 1 {
232232+ let entry = &repos[repo_indices[0]].1.files[rel_path];
233233+ let doc = file_entry_to_doc_cached(entry, client, did, pack_cache).await?;
161234 return Ok(yrs_pds::materialize(&doc));
162235 }
163236164237 let mut docs: Vec<Doc> = Vec::new();
165165- for &idx in site_indices {
166166- let entry = &sites[idx].1.files[rel_path];
167167- let doc = yrs_pds::file_entry_to_doc(entry, client, did).await?;
238238+ for &idx in repo_indices {
239239+ let entry = &repos[idx].1.files[rel_path];
240240+ let doc = file_entry_to_doc_cached(entry, client, did, pack_cache).await?;
168241 docs.push(doc);
169242 }
170243···184257/// Handle binary file merge — detect conflicts via CID comparison.
185258async fn merge_binary_file(
186259 rel_path: &str,
187187- site_indices: &[usize],
188188- sites: &[(String, SiteRecord)],
260260+ repo_indices: &[usize],
261261+ repos: &[(String, YrsRepo)],
189262 rkeys: &[&str],
190263 client: &PdsClient,
191264 did: &str,
192265 output_dir: &Path,
193266) -> Result<(), String> {
194194- if site_indices.len() == 1 {
195195- // Only one site has this binary file — just download it
196196- let entry = &sites[site_indices[0]].1.files[rel_path];
267267+ if repo_indices.len() == 1 {
268268+ // Only one repo has this binary file — just download it
269269+ let entry = &repos[repo_indices[0]].1.files[rel_path];
197270 let data = client.get_blob(did, entry.snapshot_blob.cid()).await?;
198271 let output_path = output_dir.join(rel_path);
199272 std::fs::write(&output_path, &data)
···201274 return Ok(());
202275 }
203276204204- // Collect CIDs from all sites
205205- let mut cid_site: HashMap<String, Vec<usize>> = HashMap::new();
206206- for &idx in site_indices {
207207- let entry = &sites[idx].1.files[rel_path];
208208- cid_site
277277+ // Collect CIDs from all repos
278278+ let mut cid_repo: HashMap<String, Vec<usize>> = HashMap::new();
279279+ for &idx in repo_indices {
280280+ let entry = &repos[idx].1.files[rel_path];
281281+ cid_repo
209282 .entry(entry.snapshot_blob.cid().to_string())
210283 .or_default()
211284 .push(idx);
212285 }
213286214214- if cid_site.len() == 1 {
215215- // All sites have the same CID — no conflict
216216- let entry = &sites[site_indices[0]].1.files[rel_path];
287287+ if cid_repo.len() == 1 {
288288+ // All repos have the same CID — no conflict
289289+ let entry = &repos[repo_indices[0]].1.files[rel_path];
217290 let data = client.get_blob(did, entry.snapshot_blob.cid()).await?;
218291 let output_path = output_dir.join(rel_path);
219292 std::fs::write(&output_path, &data)
···225298 let ext = path.extension().and_then(|s| s.to_str()).unwrap_or("");
226299 let parent = path.parent().unwrap_or(std::path::Path::new(""));
227300228228- for (cid, indices) in &cid_site {
229229- let site_name = &rkeys[indices[0]];
301301+ for (cid, indices) in &cid_repo {
302302+ let repo_name = &rkeys[indices[0]];
230303 let conflict_name = if ext.is_empty() {
231231- format!("{}.{}", stem, site_name)
304304+ format!("{}.{}", stem, repo_name)
232305 } else {
233233- format!("{}.{}.{}", stem, site_name, ext)
306306+ format!("{}.{}.{}", stem, repo_name, ext)
234307 };
235308 let conflict_path = output_dir.join(parent).join(&conflict_name);
236309 if let Some(p) = conflict_path.parent() {
···245318 Ok(())
246319}
247320248248-/// Generate a conflict filename: stem.site_name.ext
249249-pub fn conflict_filename(rel_path: &str, site_name: &str) -> String {
321321+/// Reconstruct a Yrs Doc from a FileEntry, using pack cache to avoid redundant downloads.
322322+async fn file_entry_to_doc_cached(
323323+ entry: &FileEntry,
324324+ client: &PdsClient,
325325+ did: &str,
326326+ pack_cache: &mut HashMap<String, Vec<u8>>,
327327+) -> Result<Doc, String> {
328328+ let snapshot_data = get_blob_data_cached(entry, client, did, pack_cache).await?;
329329+ let doc = yrs_pds::doc_from_snapshot(&snapshot_data)?;
330330+331331+ for update_ref in &entry.updates {
332332+ let update_data =
333333+ get_pack_ref_data_cached(update_ref, client, did, pack_cache).await?;
334334+ yrs_pds::apply_update(&doc, &update_data)?;
335335+ }
336336+337337+ Ok(doc)
338338+}
339339+340340+/// Get blob data for a file entry's snapshot, using pack cache.
341341+async fn get_blob_data_cached(
342342+ entry: &FileEntry,
343343+ client: &PdsClient,
344344+ did: &str,
345345+ pack_cache: &mut HashMap<String, Vec<u8>>,
346346+) -> Result<Vec<u8>, String> {
347347+ if let Some(ref pack_ref) = entry.pack_ref {
348348+ get_pack_ref_data_cached(pack_ref, client, did, pack_cache).await
349349+ } else {
350350+ client.get_blob(did, entry.snapshot_blob.cid()).await
351351+ }
352352+}
353353+354354+/// Extract data from a PackRef, using pack cache to avoid redundant downloads.
355355+async fn get_pack_ref_data_cached(
356356+ pack_ref: &PackRef,
357357+ client: &PdsClient,
358358+ did: &str,
359359+ pack_cache: &mut HashMap<String, Vec<u8>>,
360360+) -> Result<Vec<u8>, String> {
361361+ let cid = pack_ref.blob.cid().to_string();
362362+363363+ if !pack_cache.contains_key(&cid) {
364364+ let data = if let Some(ref chunks) = pack_ref.chunks {
365365+ let mut chunk_data = Vec::new();
366366+ for chunk_ref in chunks {
367367+ chunk_data.push(client.get_blob(did, chunk_ref.cid()).await?);
368368+ }
369369+ pack::reassemble_chunks(&chunk_data)
370370+ } else {
371371+ client.get_blob(did, &cid).await?
372372+ };
373373+ pack_cache.insert(cid.clone(), data);
374374+ }
375375+376376+ let pack_data = pack_cache.get(&cid).unwrap();
377377+ let (_, blob_data) = pack::parse_pack_auto(pack_data)?;
378378+379379+ let start = pack_ref.offset as usize;
380380+ let end = start + pack_ref.length as usize;
381381+ if end > blob_data.len() {
382382+ return Err(format!(
383383+ "pack_ref out of bounds: {}..{} in {} bytes",
384384+ start, end, blob_data.len()
385385+ ));
386386+ }
387387+ Ok(blob_data[start..end].to_vec())
388388+}
389389+390390+/// Generate a conflict filename: stem.repo_name.ext
391391+pub fn conflict_filename(rel_path: &str, repo_name: &str) -> String {
250392 let path = std::path::Path::new(rel_path);
251393 let stem = path.file_stem().and_then(|s| s.to_str()).unwrap_or("file");
252394 let ext = path.extension().and_then(|s| s.to_str()).unwrap_or("");
253395 let parent = path.parent().unwrap_or(std::path::Path::new(""));
254396255397 let conflict_name = if ext.is_empty() {
256256- format!("{}.{}", stem, site_name)
398398+ format!("{}.{}", stem, repo_name)
257399 } else {
258258- format!("{}.{}.{}", stem, site_name, ext)
400400+ format!("{}.{}.{}", stem, repo_name, ext)
259401 };
260402261403 parent.join(&conflict_name).to_string_lossy().to_string()
···309451310452 #[test]
311453 fn binary_merge_no_conflict_same_cid() {
312312- // simulate: two sites have same binary CID → no conflict
454454+ // simulate: two repos have same binary CID → no conflict
313455 let mut cid_site: HashMap<String, Vec<usize>> = HashMap::new();
314456 let cid = "bafysame".to_string();
315457 cid_site.entry(cid.clone()).or_default().push(0);
···321463322464 #[test]
323465 fn binary_merge_conflict_different_cids() {
324324- // simulate: two sites modify same binary differently → conflict files
325325- let mut cid_site: HashMap<String, Vec<usize>> = HashMap::new();
326326- cid_site.entry("bafyabc".to_string()).or_default().push(0);
327327- cid_site.entry("bafydef".to_string()).or_default().push(1);
466466+ // simulate: two repos modify same binary differently → conflict files
467467+ let mut cid_repo: HashMap<String, Vec<usize>> = HashMap::new();
468468+ cid_repo.entry("bafyabc".to_string()).or_default().push(0);
469469+ cid_repo.entry("bafydef".to_string()).or_default().push(1);
328470329329- assert_eq!(cid_site.len(), 2);
471471+ assert_eq!(cid_repo.len(), 2);
330472331473 // verify conflict filenames
332474 let rkeys = ["site-a", "site-b"];
333475 let mut conflict_files = Vec::new();
334334- for (_, indices) in &cid_site {
335335- let site_name = rkeys[indices[0]];
336336- conflict_files.push(conflict_filename("logo.png", site_name));
476476+ for (_, indices) in &cid_repo {
477477+ let repo_name = rkeys[indices[0]];
478478+ conflict_files.push(conflict_filename("logo.png", repo_name));
337479 }
338480 conflict_files.sort();
339481 assert!(conflict_files.contains(&"logo.site-a.png".to_string()));
···345487 use yrs::updates::decoder::Decode;
346488 use yrs::{Text, Transact};
347489348348- // Simulate two sites editing the same text file
490490+ // Simulate two repos editing the same text file
349491 let base_doc = crate::yrs_pds::doc_from_text("Hello world");
350492 let base_snapshot = crate::yrs_pds::encode_snapshot(&base_doc);
351493352352- // Site A: adds " from Alice" at end
494494+ // Repo A: adds " from Alice" at end
353495 let doc_a = crate::yrs_pds::doc_from_snapshot(&base_snapshot).unwrap();
354496 {
355497 let text = doc_a.get_or_insert_text("content");
···357499 text.insert(&mut txn, 11, " from Alice");
358500 }
359501360360- // Site B: adds "Dear " at beginning
502502+ // Repo B: adds "Dear " at beginning
361503 let doc_b = crate::yrs_pds::doc_from_snapshot(&base_snapshot).unwrap();
362504 {
363505 let text = doc_b.get_or_insert_text("content");
···373515374516 let merged = crate::yrs_pds::materialize(&doc_a);
375517 // CRDT should include both edits
376376- assert!(merged.contains("Dear "), "should have site B's prefix");
377377- assert!(merged.contains("from Alice"), "should have site A's suffix");
518518+ assert!(merged.contains("Dear "), "should have repo B's prefix");
519519+ assert!(merged.contains("from Alice"), "should have repo A's suffix");
378520 }
379521}
+1-1
src/pack.rs
···145145 data.len() >= 2 && data[0] == 0x1f && data[1] == 0x8b
146146}
147147148148-/// Maximum blob size before chunking (40MB — AT Protocol limit is ~50MB but leave margin).
148148+/// Maximum blob size before chunking (40MB — ATProto limit is ~50MB but leave margin).
149149pub const CHUNK_SIZE: usize = 40 * 1024 * 1024;
150150151151/// Split data into chunks of at most CHUNK_SIZE bytes.
+90-1
src/pds_client.rs
···11-//! PDS client for AT Protocol XRPC calls.
11+//! PDS client for ATProto XRPC calls.
22//!
33//! Simplified port from git-remote-pds — Bearer auth only.
44···6666 size: u64,
6767}
68686969+/// Response from `com.atproto.repo.listRecords`.
7070+#[derive(Debug, Deserialize)]
7171+pub struct ListRecordsResponse {
7272+ pub records: Vec<ListRecordEntry>,
7373+ pub cursor: Option<String>,
7474+}
7575+7676+/// A single record entry from `listRecords`.
7777+#[derive(Debug, Deserialize)]
7878+pub struct ListRecordEntry {
7979+ pub uri: String,
8080+ pub cid: Option<String>,
8181+ pub value: serde_json::Value,
8282+}
8383+6984impl PdsClient {
7085 pub fn new(base_url: impl Into<String>) -> Self {
7186 Self {
···79948095 pub fn base_url(&self) -> &str {
8196 &self.base_url
9797+ }
9898+9999+ /// Set pre-existing tokens (e.g. from cached config).
100100+ pub fn set_tokens(&mut self, access_token: String, refresh_token: String) {
101101+ self.auth_token = Some(access_token);
102102+ self.refresh_token = Some(refresh_token);
103103+ self.token_expiry =
104104+ Some(std::time::Instant::now() + std::time::Duration::from_secs(TOKEN_TTL_SECS));
105105+ }
106106+107107+ /// Get the current access token (if authenticated).
108108+ pub fn access_token(&self) -> Option<&str> {
109109+ self.auth_token.as_deref()
110110+ }
111111+112112+ /// Get the current refresh token (if authenticated).
113113+ pub fn refresh_token(&self) -> Option<&str> {
114114+ self.refresh_token.as_deref()
82115 }
8311684117 pub async fn login(
···327360 }
328361329362 Ok(())
363363+ }
364364+365365+ /// List records in a collection (single page).
366366+ pub async fn list_records(
367367+ &self,
368368+ did: &str,
369369+ collection: &str,
370370+ limit: u32,
371371+ cursor: Option<&str>,
372372+ ) -> Result<ListRecordsResponse, String> {
373373+ let mut url = format!(
374374+ "{}/xrpc/com.atproto.repo.listRecords?repo={}&collection={}&limit={}",
375375+ self.base_url, did, collection, limit
376376+ );
377377+ if let Some(c) = cursor {
378378+ url.push_str(&format!("&cursor={}", c));
379379+ }
380380+381381+ let resp = self
382382+ .http
383383+ .get(&url)
384384+ .send()
385385+ .await
386386+ .map_err(|e| format!("list_records request failed: {}", e))?;
387387+388388+ if !resp.status().is_success() {
389389+ let status = resp.status();
390390+ let text = resp.text().await.unwrap_or_default();
391391+ return Err(format!("list_records failed ({}): {}", status, text));
392392+ }
393393+394394+ resp.json()
395395+ .await
396396+ .map_err(|e| format!("parse list_records response: {}", e))
397397+ }
398398+399399+ /// List all records in a collection (paginated, collects all pages).
400400+ pub async fn list_all_records(
401401+ &self,
402402+ did: &str,
403403+ collection: &str,
404404+ ) -> Result<Vec<ListRecordEntry>, String> {
405405+ let mut all = Vec::new();
406406+ let mut cursor: Option<String> = None;
407407+ loop {
408408+ let resp = self
409409+ .list_records(did, collection, 100, cursor.as_deref())
410410+ .await?;
411411+ let has_more = resp.cursor.is_some();
412412+ cursor = resp.cursor;
413413+ all.extend(resp.records);
414414+ if !has_more {
415415+ break;
416416+ }
417417+ }
418418+ Ok(all)
330419 }
331420332421 /// Delete a record with explicit DID.
+191-42
src/save.rs
···11-//! Save a directory of files to PDS as a SiteRecord with Yrs CRDT state.
11+//! Save a directory of files to PDS as a YrsRepo record with Yrs CRDT state.
2233use std::collections::HashMap;
44use std::path::Path;
···66use crate::pack::{self, PackDataType};
77use crate::pds_client::PdsClient;
88use crate::types::{
99- BlobRef, FileEntry, FileKind, PackRef, SaveResult, SiteRecord, COLLECTION, MANIFEST_KEY,
99+ BlobRef, Collaborator, FileEntry, FileKind, PackRef, SaveResult, YrsRepo, COLLECTION,
1010+ MANIFEST_KEY,
1011};
1112use crate::yrs_pds;
12131313-/// Compaction threshold: create new snapshot when updates exceed this count.
1414-const COMPACTION_THRESHOLD: u32 = 50;
1414+/// Compaction threshold: when any file's updates_count reaches this,
1515+/// the entire repo is compacted (all files get fresh snapshots in one pack).
1616+const COMPACTION_THRESHOLD: u32 = 10;
15171618/// Pending blob to be packed into a single upload.
1719struct PendingBlob {
···2022 data_type: PackDataType,
2123}
22242525+/// Whether a pending blob is an incremental update (appended to updates list)
2626+/// or a snapshot (replaces pack_ref).
2727+#[derive(Clone, PartialEq)]
2828+enum PendingKind {
2929+ /// Full snapshot — will become the new pack_ref.
3030+ Snapshot,
3131+ /// Incremental update — will be appended to updates list.
3232+ Update,
3333+}
3434+2335/// Save a directory to PDS.
2436///
2537/// Maintains a CRDT manifest (Yrs Map) tracking all files. Supports both
···3042 client: &PdsClient,
3143 did: &str,
3244 rkey: &str,
4545+ project_name: &str,
4646+ new_collaborators: Option<&[Collaborator]>,
3347 verbose: bool,
3448) -> Result<SaveResult, String> {
3535- save_filtered(dir, client, did, rkey, None, None, verbose).await
4949+ save_filtered(
5050+ dir,
5151+ client,
5252+ did,
5353+ rkey,
5454+ project_name,
5555+ new_collaborators,
5656+ None,
5757+ None,
5858+ verbose,
5959+ )
6060+ .await
3661}
37623863/// Save a directory to PDS with optional include/exclude glob filters.
6464+///
6565+/// `new_collaborators` is used on first initialization to set initial collaborators.
6666+/// On subsequent saves, collaborators are preserved from the existing record.
3967pub async fn save_filtered(
4068 dir: &Path,
4169 client: &PdsClient,
4270 did: &str,
4371 rkey: &str,
7272+ project_name: &str,
7373+ new_collaborators: Option<&[Collaborator]>,
4474 include: Option<&[String]>,
4575 exclude: Option<&[String]>,
4676 verbose: bool,
···53835484 // Fetch existing record if present
5585 let existing = client.get_record(did, COLLECTION, rkey).await?;
5656- let existing_site: Option<SiteRecord> = existing
8686+ let existing_repo: Option<YrsRepo> = existing
5787 .as_ref()
5888 .and_then(|r| serde_json::from_value(r.value.clone()).ok());
5989 let swap_cid = existing.as_ref().and_then(|r| r.cid.clone());
60906191 // Reconstruct or create manifest
6262- let manifest_doc = if let Some(ref site) = existing_site {
6363- if let Some(manifest_entry) = site.files.get(MANIFEST_KEY) {
9292+ let manifest_doc = if let Some(ref repo) = existing_repo {
9393+ if let Some(manifest_entry) = repo.files.get(MANIFEST_KEY) {
6494 yrs_pds::file_entry_to_doc(manifest_entry, client, did).await?
6595 } else {
6696 let doc = yrs_pds::new_manifest_doc();
6767- for (path, entry) in &site.files {
9797+ for (path, entry) in &repo.files {
6898 if path != MANIFEST_KEY {
6999 yrs_pds::manifest_insert(&doc, path, &entry.kind);
70100 }
···7710778108 let mut file_entries: HashMap<String, FileEntry> = HashMap::new();
79109 let mut pending_blobs: Vec<PendingBlob> = Vec::new();
110110+ // Track whether each pending blob is a snapshot or incremental update
111111+ let mut pending_kinds: HashMap<String, PendingKind> = HashMap::new();
80112 let mut files_uploaded = 0;
81113 let mut files_skipped = 0;
114114+ let mut needs_compaction = false;
8211583116 // Track which local files exist (for deletion detection)
84117 let local_paths: std::collections::HashSet<String> =
···99132 data: file_data.clone(),
100133 data_type: PackDataType::Binary,
101134 });
102102- // Placeholder entry — will be updated with pack_ref after upload
135135+ pending_kinds.insert(rel_path.clone(), PendingKind::Snapshot);
103136 file_entries.insert(rel_path.clone(), placeholder_binary_entry(&hash));
104137 files_uploaded += 1;
105138 continue;
···107140 };
108141109142 // Check if file changed since last save
110110- if let Some(ref site) = existing_site {
111111- if let Some(existing_entry) = site.files.get(rel_path) {
143143+ if let Some(ref repo) = existing_repo {
144144+ if let Some(existing_entry) = repo.files.get(rel_path) {
112145 if existing_entry.content == content {
113146 file_entries.insert(rel_path.clone(), existing_entry.clone());
114147 files_skipped += 1;
···121154 // Changed — re-assert in manifest
122155 yrs_pds::manifest_insert(&manifest_doc, rel_path, &FileKind::Text);
123156124124- // Incremental update path
125125- if existing_entry.updates_count < COMPACTION_THRESHOLD {
157157+ // Incremental update path: compute diff only
158158+ if existing_entry.pack_ref.is_some()
159159+ && existing_entry.updates_count + 1 < COMPACTION_THRESHOLD
160160+ {
126161 if let Ok(doc) =
127162 reconstruct_and_diff(existing_entry, content, client, did).await
128163 {
129129- let snapshot = yrs_pds::encode_snapshot(&doc);
164164+ // Compute only the new ops (diff from previous state vector)
165165+ let old_sv_bytes = yrs_pds::base64_decode(&existing_entry.state_vector)?;
166166+ let diff = yrs_pds::encode_diff(&doc, &old_sv_bytes)?;
130167 let sv = yrs_pds::encode_state_vector(&doc);
131168 let materialized = yrs_pds::materialize(&doc);
132169 pending_blobs.push(PendingBlob {
133170 path: rel_path.clone(),
134134- data: snapshot,
135135- data_type: PackDataType::Snapshot,
171171+ data: diff,
172172+ data_type: PackDataType::Update,
136173 });
174174+ pending_kinds.insert(rel_path.clone(), PendingKind::Update);
175175+ // Keep existing snapshot/pack_ref, will append update
137176 file_entries.insert(
138177 rel_path.clone(),
139178 FileEntry {
140179 content: materialized,
141141- snapshot_blob: placeholder_blob_ref(),
180180+ snapshot_blob: existing_entry.snapshot_blob.clone(),
142181 state_vector: yrs_pds::base64_encode(&sv),
143143- updates_blob: None,
182182+ updates: existing_entry.updates.clone(),
144183 updates_count: existing_entry.updates_count + 1,
145145- snapshot_at: chrono::Utc::now().to_rfc3339(),
184184+ snapshot_at: existing_entry.snapshot_at.clone(),
146185 kind: FileKind::Text,
147147- pack_ref: None,
186186+ pack_ref: existing_entry.pack_ref.clone(),
148187 conflict_source: None,
149188 },
150189 );
···156195 }
157196 }
158197198198+ // Check if this triggers compaction
199199+ if existing_entry.updates_count + 1 >= COMPACTION_THRESHOLD {
200200+ needs_compaction = true;
201201+ }
202202+159203 if verbose {
160160- eprintln!("pds-yrs: full snapshot (compaction) {}", rel_path);
204204+ eprintln!("pds-yrs: full snapshot {}", rel_path);
161205 }
162206 } else {
163207 yrs_pds::manifest_insert(&manifest_doc, rel_path, &FileKind::Text);
···175219 data: snapshot,
176220 data_type: PackDataType::Snapshot,
177221 });
222222+ pending_kinds.insert(rel_path.clone(), PendingKind::Snapshot);
178223 file_entries.insert(
179224 rel_path.clone(),
180225 FileEntry {
181226 content: content.to_string(),
182227 snapshot_blob: placeholder_blob_ref(),
183228 state_vector: yrs_pds::base64_encode(&sv),
184184- updates_blob: None,
229229+ updates: vec![],
185230 updates_count: 0,
186231 snapshot_at: chrono::Utc::now().to_rfc3339(),
187232 kind: FileKind::Text,
···195240 }
196241 }
197242 FileKind::Binary => {
198198- if let Some(ref site) = existing_site {
199199- if let Some(existing_entry) = site.files.get(rel_path) {
243243+ if let Some(ref repo) = existing_repo {
244244+ if let Some(existing_entry) = repo.files.get(rel_path) {
200245 if existing_entry.kind == FileKind::Binary {
201246 let hash = hex_hash(file_data);
202247 if existing_entry.content == hash {
···222267 data: file_data.clone(),
223268 data_type: PackDataType::Binary,
224269 });
270270+ pending_kinds.insert(rel_path.clone(), PendingKind::Snapshot);
225271 file_entries.insert(rel_path.clone(), placeholder_binary_entry(&hash));
226272 files_uploaded += 1;
227273 if verbose {
···231277 }
232278 }
233279280280+ // Compaction: if any file crossed the threshold, re-snapshot ALL files
281281+ if needs_compaction {
282282+ if verbose {
283283+ eprintln!("pds-yrs: compaction triggered — re-snapshotting all files");
284284+ }
285285+ pending_blobs.clear();
286286+ pending_kinds.clear();
287287+ file_entries.clear();
288288+ files_uploaded = 0;
289289+ files_skipped = 0;
290290+291291+ for (rel_path, file_data, kind) in &local_files {
292292+ match kind {
293293+ FileKind::Text => {
294294+ let content = match std::str::from_utf8(file_data) {
295295+ Ok(s) => s,
296296+ Err(_) => {
297297+ let hash = hex_hash(file_data);
298298+ pending_blobs.push(PendingBlob {
299299+ path: rel_path.clone(),
300300+ data: file_data.clone(),
301301+ data_type: PackDataType::Binary,
302302+ });
303303+ pending_kinds.insert(rel_path.clone(), PendingKind::Snapshot);
304304+ file_entries.insert(rel_path.clone(), placeholder_binary_entry(&hash));
305305+ files_uploaded += 1;
306306+ continue;
307307+ }
308308+ };
309309+ // Reconstruct full doc if we have existing state, else create fresh
310310+ let doc = if let Some(ref repo) = existing_repo {
311311+ if let Some(existing_entry) = repo.files.get(rel_path) {
312312+ if let Ok(d) = reconstruct_and_diff(existing_entry, content, client, did).await {
313313+ d
314314+ } else {
315315+ yrs_pds::doc_from_text(content)
316316+ }
317317+ } else {
318318+ yrs_pds::doc_from_text(content)
319319+ }
320320+ } else {
321321+ yrs_pds::doc_from_text(content)
322322+ };
323323+ let snapshot = yrs_pds::encode_snapshot(&doc);
324324+ let sv = yrs_pds::encode_state_vector(&doc);
325325+ let materialized = yrs_pds::materialize(&doc);
326326+ pending_blobs.push(PendingBlob {
327327+ path: rel_path.clone(),
328328+ data: snapshot,
329329+ data_type: PackDataType::Snapshot,
330330+ });
331331+ pending_kinds.insert(rel_path.clone(), PendingKind::Snapshot);
332332+ file_entries.insert(
333333+ rel_path.clone(),
334334+ FileEntry {
335335+ content: materialized,
336336+ snapshot_blob: placeholder_blob_ref(),
337337+ state_vector: yrs_pds::base64_encode(&sv),
338338+ updates: vec![],
339339+ updates_count: 0,
340340+ snapshot_at: chrono::Utc::now().to_rfc3339(),
341341+ kind: FileKind::Text,
342342+ pack_ref: None,
343343+ conflict_source: None,
344344+ },
345345+ );
346346+ files_uploaded += 1;
347347+ }
348348+ FileKind::Binary => {
349349+ let hash = hex_hash(file_data);
350350+ pending_blobs.push(PendingBlob {
351351+ path: rel_path.clone(),
352352+ data: file_data.clone(),
353353+ data_type: PackDataType::Binary,
354354+ });
355355+ pending_kinds.insert(rel_path.clone(), PendingKind::Snapshot);
356356+ file_entries.insert(rel_path.clone(), placeholder_binary_entry(&hash));
357357+ files_uploaded += 1;
358358+ }
359359+ }
360360+ }
361361+ }
362362+234363 // Detect deletions: files in manifest but not on disk
235364 let manifest_entries = yrs_pds::manifest_entries(&manifest_doc);
236365 for path in manifest_entries.keys() {
···250379 data: manifest_snapshot,
251380 data_type: PackDataType::Snapshot,
252381 });
382382+ pending_kinds.insert(MANIFEST_KEY.to_string(), PendingKind::Snapshot);
253383254254- // Upload all blobs as a single pack (or individually if nothing to pack)
384384+ // Upload all blobs as a single pack
255385 let total_bytes;
256386 if pending_blobs.is_empty() {
257387 total_bytes = 0;
258258- // Still need manifest entry
259388 let manifest_entry = yrs_pds::doc_to_file_entry(&manifest_doc, client, did).await?;
260389 file_entries.insert(MANIFEST_KEY.to_string(), manifest_entry);
261390 } else {
···268397 let is_compressed = pack::is_gzip(&pack_blob.data);
269398 total_bytes = pack_blob.data.len() as u64;
270399271271- // Upload pack blob — chunk if larger than AT Protocol limit
400400+ // Upload pack blob — chunk if larger than ATProto limit
272401 let (blob_ref, chunk_refs) = if pack_blob.data.len() > pack::CHUNK_SIZE {
273402 let chunks = pack::chunk_data(&pack_blob.data);
274403 let mut refs = Vec::new();
···284413 }
285414 refs.push(r);
286415 }
287287- // Use first chunk's ref as the primary blob ref
288416 let primary = refs[0].clone();
289417 (primary, Some(refs))
290418 } else {
···316444 };
317445318446 if entry.path == MANIFEST_KEY {
319319- // Manifest entry
320447 let manifest_content = yrs_pds::materialize_manifest_content(&manifest_doc);
321448 file_entries.insert(
322449 MANIFEST_KEY.to_string(),
···324451 content: manifest_content,
325452 snapshot_blob: blob_ref.clone(),
326453 state_vector: yrs_pds::base64_encode(&manifest_sv),
327327- updates_blob: None,
454454+ updates: vec![],
328455 updates_count: 0,
329456 snapshot_at: chrono::Utc::now().to_rfc3339(),
330457 kind: FileKind::Text,
···333460 },
334461 );
335462 } else if let Some(fe) = file_entries.get_mut(&entry.path) {
336336- // Update placeholder blob ref with actual pack ref
337337- fe.snapshot_blob = blob_ref.clone();
338338- fe.pack_ref = Some(pack_ref);
463463+ let kind = pending_kinds.get(&entry.path).cloned().unwrap_or(PendingKind::Snapshot);
464464+ match kind {
465465+ PendingKind::Snapshot => {
466466+ // Full snapshot — replace pack_ref, clear updates
467467+ fe.snapshot_blob = blob_ref.clone();
468468+ fe.pack_ref = Some(pack_ref);
469469+ fe.updates.clear();
470470+ }
471471+ PendingKind::Update => {
472472+ // Incremental update — append to updates list
473473+ fe.updates.push(pack_ref);
474474+ }
475475+ }
339476 }
340477 }
341478 }
342479343343- // Build SiteRecord
480480+ // Build YrsRepo — preserve existing collaborators, merge with any new ones
481481+ let mut collaborators = existing_repo
482482+ .as_ref()
483483+ .map(|r| r.collaborators.clone())
484484+ .unwrap_or_default();
485485+ if let Some(new_collabs) = new_collaborators {
486486+ for c in new_collabs {
487487+ if !collaborators.iter().any(|existing| existing.rkey == c.rkey) {
488488+ collaborators.push(c.clone());
489489+ }
490490+ }
491491+ }
344492 let now = chrono::Utc::now().to_rfc3339();
345345- let record = SiteRecord {
346346- name: rkey.to_string(),
493493+ let record = YrsRepo {
494494+ name: project_name.to_string(),
347495 files: file_entries,
348496 updated_at: now,
497497+ collaborators,
349498 };
350499351500 let record_json =
352352- serde_json::to_value(&record).map_err(|e| format!("serialize SiteRecord: {}", e))?;
501501+ serde_json::to_value(&record).map_err(|e| format!("serialize YrsRepo: {}", e))?;
353502354503 client
355504 .put_record(did, COLLECTION, rkey, record_json, swap_cid)
···377526 content: hash.to_string(),
378527 snapshot_blob: placeholder_blob_ref(),
379528 state_vector: String::new(),
380380- updates_blob: None,
529529+ updates: vec![],
381530 updates_count: 0,
382531 snapshot_at: chrono::Utc::now().to_rfc3339(),
383532 kind: FileKind::Binary,
···563712 fn collect_files_skips_hidden_and_pds_yrs() {
564713 let tmp = tempfile::tempdir().unwrap();
565714 std::fs::write(tmp.path().join("visible.md"), "content").unwrap();
566566- std::fs::create_dir_all(tmp.path().join(".pds-yrs")).unwrap();
567567- std::fs::write(tmp.path().join(".pds-yrs/state.yrs"), "state").unwrap();
715715+ std::fs::create_dir_all(tmp.path().join(".yrs")).unwrap();
716716+ std::fs::write(tmp.path().join(".yrs/state.yrs"), "state").unwrap();
568717 std::fs::create_dir_all(tmp.path().join(".git")).unwrap();
569718 std::fs::write(tmp.path().join(".git/HEAD"), "ref").unwrap();
570719···652801 assert_eq!(entry.kind, FileKind::Binary);
653802 assert_eq!(entry.content, "abc123");
654803 assert!(entry.state_vector.is_empty());
655655- assert!(entry.updates_blob.is_none());
804804+ assert!(entry.updates.is_empty());
656805 }
657806658807 #[test]
+4-2
src/sync.rs
···6666 client: &PdsClient,
6767 did: &str,
6868 rkey: &str,
6969+ project_name: &str,
6970 config: &SyncConfig,
7071) -> Result<Vec<SyncCycleResult>, String> {
7172 let dir = Path::new(&config.dir);
···7980 }
8081 }
81828282- let result = sync_cycle(client, did, rkey, dir, cycle, config).await?;
8383+ let result = sync_cycle(client, did, rkey, project_name, dir, cycle, config).await?;
83848485 if config.verbose {
8586 eprintln!(
···113114 client: &PdsClient,
114115 did: &str,
115116 rkey: &str,
117117+ project_name: &str,
116118 dir: &Path,
117119 cycle: u32,
118120 config: &SyncConfig,
···157159 } else {
158160 // Save local changes with filters
159161 let save_result =
160160- save::save_filtered(dir, client, did, rkey, inc, exc, config.verbose).await?;
162162+ save::save_filtered(dir, client, did, rkey, project_name, None, inc, exc, config.verbose).await?;
161163 files_uploaded = save_result.files_uploaded;
162164163165 // Determine if this cycle should materialize
+37-17
src/types.rs
···11-//! AT Protocol types for CRDT-on-PDS storage.
11+//! ATProto types for CRDT-on-PDS storage.
2233use serde::{Deserialize, Serialize};
44use std::collections::HashMap;
5566-/// Collection name for site records.
77-pub const COLLECTION: &str = "net.commoninternet.lichen.site";
66+/// Collection name for yrs repo records.
77+pub const COLLECTION: &str = "net.commoninternet.yrsrepo";
8899-/// Key for the manifest FileEntry in the SiteRecord.
1010-pub const MANIFEST_KEY: &str = "_manifest";
99+/// Key for the manifest FileEntry in the YrsRepo.
1010+pub const MANIFEST_KEY: &str = "pdsyrs_manifest";
11111212/// File kind — determines how the file is stored and merged.
1313#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
···1919 Binary,
2020}
21212222-/// A site stored on PDS with Yrs CRDT state per file.
2222+/// A collaborator reference — points to another device's rkey, optionally on a different PDS.
2323+#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
2424+pub struct Collaborator {
2525+ /// The rkey of the collaborator's repo record.
2626+ pub rkey: String,
2727+ /// PDS URL if different from the current PDS (for cross-PDS collaboration).
2828+ #[serde(skip_serializing_if = "Option::is_none")]
2929+ pub pds: Option<String>,
3030+}
3131+3232+/// A repo stored on PDS with Yrs CRDT state per file.
3333+///
3434+/// `name` is the project name — shared across all devices/writers for the same project.
3535+/// Each device gets its own rkey (auto-generated), while `name` identifies the project.
3636+/// `collaborators` lists other device rkeys for the same project, enabling merge
3737+/// without needing to list all records.
2338#[derive(Debug, Clone, Serialize, Deserialize)]
2424-pub struct SiteRecord {
3939+pub struct YrsRepo {
2540 pub name: String,
2641 pub files: HashMap<String, FileEntry>,
2742 #[serde(rename = "updatedAt")]
2843 pub updated_at: String,
4444+ #[serde(default, skip_serializing_if = "Vec::is_empty")]
4545+ pub collaborators: Vec<Collaborator>,
2946}
30473148/// A single file's state, stored as Yrs CRDT + plain text.
···4057 /// State vector bytes, base64-encoded for inline storage.
4158 #[serde(rename = "stateVector")]
4259 pub state_vector: String,
4343- /// Incremental updates since snapshot.
4444- #[serde(rename = "updatesBlob", skip_serializing_if = "Option::is_none")]
4545- pub updates_blob: Option<BlobRef>,
6060+ /// Incremental update packs since snapshot (ordered, each points into a pack blob).
6161+ #[serde(default, skip_serializing_if = "Vec::is_empty")]
6262+ pub updates: Vec<PackRef>,
4663 /// Number of incremental updates applied since last snapshot.
4764 #[serde(rename = "updatesCount", default)]
4865 pub updates_count: u32,
···95112 pub total_size: u64,
96113}
971149898-/// AT Protocol blob reference.
115115+/// ATProto blob reference.
99116#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
100117pub struct BlobRef {
101118 #[serde(rename = "$type")]
···172189 100,
173190 ),
174191 state_vector: "AQID".to_string(),
175175- updates_blob: None,
192192+ updates: vec![],
176193 updates_count: 0,
177194 snapshot_at: "2026-03-13T00:00:00Z".to_string(),
178195 kind: FileKind::Text,
···199216 5000,
200217 ),
201218 state_vector: String::new(),
202202- updates_blob: None,
219219+ updates: vec![],
203220 updates_count: 0,
204221 snapshot_at: "2026-03-13T00:00:00Z".to_string(),
205222 kind: FileKind::Binary,
···213230 }
214231215232 #[test]
216216- fn site_record_serialization() {
233233+ fn yrs_repo_serialization() {
217234 let mut files = HashMap::new();
218235 files.insert(
219236 "index.md".to_string(),
···225242 50,
226243 ),
227244 state_vector: "AQID".to_string(),
228228- updates_blob: None,
245245+ updates: vec![],
229246 updates_count: 0,
230247 snapshot_at: "2026-03-13T00:00:00Z".to_string(),
231248 kind: FileKind::Text,
···233250 conflict_source: None,
234251 },
235252 );
236236- let record = SiteRecord {
253253+ let record = YrsRepo {
237254 name: "my-site".to_string(),
238255 files,
239256 updated_at: "2026-03-13T00:00:00Z".to_string(),
257257+ collaborators: vec![],
240258 };
241259 let json = serde_json::to_string(&record).unwrap();
242242- let deserialized: SiteRecord = serde_json::from_str(&json).unwrap();
260260+ assert!(!json.contains("collaborators")); // empty vec is skipped
261261+ let deserialized: YrsRepo = serde_json::from_str(&json).unwrap();
243262 assert_eq!(deserialized.name, "my-site");
244263 assert!(deserialized.files.contains_key("index.md"));
264264+ assert!(deserialized.collaborators.is_empty());
245265 }
246266247267 #[test]
+53-26
src/yrs_pds.rs
···9292 let snapshot_blob = client.upload_blob(snapshot.clone()).await?;
93939494 // We need to reference the blob in a record for it to persist,
9595- // so we return the FileEntry which will be embedded in a SiteRecord.
9595+ // so we return the FileEntry which will be embedded in a YrsRepo.
96969797 let now = chrono::Utc::now().to_rfc3339();
9898 let _ = did; // used by caller for the record
···101101 content,
102102 snapshot_blob,
103103 state_vector: base64_encode(&sv),
104104- updates_blob: None,
104104+ updates: vec![],
105105 updates_count: 0,
106106 snapshot_at: now,
107107 kind: FileKind::Text,
···123123 let doc = doc_from_snapshot(&snapshot_data)?;
124124125125 // Apply incremental updates if present
126126- if let Some(ref updates_blob) = entry.updates_blob {
127127- let updates_data = client.get_blob(did, updates_blob.cid()).await?;
128128- apply_update(&doc, &updates_data)?;
126126+ for update_ref in &entry.updates {
127127+ let update_data = get_pack_ref_data(update_ref, client, did).await?;
128128+ apply_update(&doc, &update_data)?;
129129 }
130130131131 Ok(doc)
132132+}
133133+134134+/// Extract data from a PackRef by downloading and parsing the pack blob.
135135+pub async fn get_pack_ref_data(
136136+ pack_ref: &crate::types::PackRef,
137137+ client: &PdsClient,
138138+ did: &str,
139139+) -> Result<Vec<u8>, String> {
140140+ let pack_data = if let Some(ref chunks) = pack_ref.chunks {
141141+ let mut chunk_data = Vec::new();
142142+ for chunk_ref in chunks {
143143+ chunk_data.push(client.get_blob(did, chunk_ref.cid()).await?);
144144+ }
145145+ crate::pack::reassemble_chunks(&chunk_data)
146146+ } else {
147147+ client.get_blob(did, pack_ref.blob.cid()).await?
148148+ };
149149+ let (_, blob_data) = crate::pack::parse_pack_auto(&pack_data)?;
150150+ let start = pack_ref.offset as usize;
151151+ let end = start + pack_ref.length as usize;
152152+ if end > blob_data.len() {
153153+ return Err(format!(
154154+ "pack_ref out of bounds: {}..{} in {} bytes",
155155+ start, end, blob_data.len()
156156+ ));
157157+ }
158158+ Ok(blob_data[start..end].to_vec())
132159}
133160134161/// Get the raw blob data for a FileEntry, handling pack_ref extraction.
···529556530557 #[test]
531558 fn manifest_crdt_merge_concurrent_add() {
532532- // Two sites start from same base
559559+ // Two repos start from same base
533560 let base = new_manifest_doc();
534561 manifest_insert(&base, "shared.md", &FileKind::Text);
535562 let base_snapshot = encode_manifest_snapshot(&base);
536563537537- // Site A adds a file
538538- let site_a = manifest_from_snapshot(&base_snapshot).unwrap();
539539- manifest_insert(&site_a, "page-a.md", &FileKind::Text);
564564+ // Repo A adds a file
565565+ let repo_a = manifest_from_snapshot(&base_snapshot).unwrap();
566566+ manifest_insert(&repo_a, "page-a.md", &FileKind::Text);
540567541541- // Site B adds a different file
542542- let site_b = manifest_from_snapshot(&base_snapshot).unwrap();
543543- manifest_insert(&site_b, "page-b.md", &FileKind::Text);
568568+ // Repo B adds a different file
569569+ let repo_b = manifest_from_snapshot(&base_snapshot).unwrap();
570570+ manifest_insert(&repo_b, "page-b.md", &FileKind::Text);
544571545572 // Merge B into A
546546- let sv_a = site_a.transact().state_vector();
547547- let diff_b = site_b.transact().encode_diff_v1(&sv_a);
573573+ let sv_a = repo_a.transact().state_vector();
574574+ let diff_b = repo_b.transact().encode_diff_v1(&sv_a);
548575 if let Ok(update) = yrs::Update::decode_v1(&diff_b) {
549549- let _ = site_a.transact_mut().apply_update(update);
576576+ let _ = repo_a.transact_mut().apply_update(update);
550577 }
551578552552- let entries = manifest_entries(&site_a);
579579+ let entries = manifest_entries(&repo_a);
553580 assert_eq!(entries.len(), 3);
554581 assert!(entries.contains_key("shared.md"));
555582 assert!(entries.contains_key("page-a.md"));
···563590 manifest_insert(&base, "file.md", &FileKind::Text);
564591 let base_snapshot = encode_manifest_snapshot(&base);
565592566566- // Site A deletes the file
567567- let site_a = manifest_from_snapshot(&base_snapshot).unwrap();
568568- manifest_remove(&site_a, "file.md");
593593+ // Repo A deletes the file
594594+ let repo_a = manifest_from_snapshot(&base_snapshot).unwrap();
595595+ manifest_remove(&repo_a, "file.md");
569596570570- // Site B re-asserts the file (simulating an edit)
571571- let site_b = manifest_from_snapshot(&base_snapshot).unwrap();
572572- manifest_insert(&site_b, "file.md", &FileKind::Text);
597597+ // Repo B re-asserts the file (simulating an edit)
598598+ let repo_b = manifest_from_snapshot(&base_snapshot).unwrap();
599599+ manifest_insert(&repo_b, "file.md", &FileKind::Text);
573600574601 // Merge B into A — set should win over delete
575575- let sv_a = site_a.transact().state_vector();
576576- let diff_b = site_b.transact().encode_diff_v1(&sv_a);
602602+ let sv_a = repo_a.transact().state_vector();
603603+ let diff_b = repo_b.transact().encode_diff_v1(&sv_a);
577604 if let Ok(update) = yrs::Update::decode_v1(&diff_b) {
578578- let _ = site_a.transact_mut().apply_update(update);
605605+ let _ = repo_a.transact_mut().apply_update(update);
579606 }
580607581581- let entries = manifest_entries(&site_a);
608608+ let entries = manifest_entries(&repo_a);
582609 assert!(
583610 entries.contains_key("file.md"),
584611 "edit wins: set should win over delete"