wip: benchmarks for testing different p2p sync strategies using a pds as a relay
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Rust 100.0%
10 1 0

Clone this repository

https://tangled.org/notplants.bsky.social/pds-merge-bench https://tangled.org/did:plc:3nogfd4smhmbrv4wo6kl7zg2/pds-merge-bench
git@tangled.org:notplants.bsky.social/pds-merge-bench git@tangled.org:did:plc:3nogfd4smhmbrv4wo6kl7zg2/pds-merge-bench

For self-hosted knots, clone URLs may differ based on your setup.

Download tar.gz
README.md

v6-refactor Benchmark Report#

Date: 2026-03-14 What changed: pds-yrs refactored from single YrsRepo record to two-record model (YrsRepo + YrsBranch). YrsRepo is the project-level registry (one per project, rkey = project name). YrsBranch is the per-device record with file index and blob refs. Merge discovery now reads YrsRepo to find branches instead of scanning via listRecords.

Environment: All benchmarks run against a real PDS (bluesky-pds.t1cc.commoninternet.net), sequentially to avoid network contention.


Summary#

All 24 scenarios across 5 benchmark suites pass for all 4 strategies. The refactor has no regressions.

Key findings:

  • Strategy 4 (yrs-on-pds) is 3-5x faster than git-based strategies for small-to-medium repos (10-200 files)
  • Strategy 4 is 2-3x slower for 1000-file stress tests (25s vs 9-12s) due to per-file PDS record operations
  • CRDT strategies (2, 3, 4) produce zero conflicts in all scenarios, including guaranteed-conflict scenarios where git produces 10-37 conflicts
  • Strategies 1-3 are network-dominated: git push/clone via PDS takes ~7-12s regardless of file count, making local merge time negligible
  • File size impact: 50x increase in file size (1KB to 50KB) only doubles strategy 4 time (1.5s to 5s), while strategies 2-3 increase ~40%

1. Default Matrix (9 scenarios)#

2 collaborators, varying file count (10/50/200) and conflict rate (low/medium/high).

Files Conflict 1-git (ms) Git Conflicts 2-diff (ms) 3-sidecar (ms) 4-yrs (ms)
10 low 7,331 0 7,408 9,307 1,544
10 medium 6,384 1 7,468 8,372 1,658
10 high 6,744 3 8,322 7,036 1,748
50 low 8,144 0 7,558 7,748 1,966
50 medium 8,112 2 7,447 6,944 1,501
50 high 7,430 7 7,910 7,688 1,629
200 low 6,939 1 7,365 7,741 2,832
200 medium 8,077 5 7,519 8,375 3,004
200 high 6,940 15 7,871 7,809 2,666

Observations:

  • Strategy 4 averages 2.1s across all default scenarios vs 7.6s for strategies 1-3
  • Git conflicts scale with conflict rate as expected (0 at low, 15 at high/200 files)
  • Strategies 2 and 3 are nearly identical in performance (~7-9s), dominated by PDS network I/O

2. Stress Test (3 scenarios)#

1000 files, 2 collaborators, 50 edits each.

Conflict 1-git (ms) Git Conflicts 2-diff (ms) 3-sidecar (ms) 4-yrs (ms)
low 8,192 2 9,252 9,031 24,791
medium 8,559 12 9,742 10,792 26,336
high 8,326 37 11,281 11,917 26,208

Observations:

  • Strategy 4 is ~3x slower at 1000 files (~25s vs ~9s) — each file requires individual PDS blob operations
  • Git strategies benefit from bundling all files in a single git push/clone
  • Conflict rate has minimal impact on strategy 4 time (CRDT merge is O(1) per file regardless of edits)
  • Higher conflict rates do slow strategies 2-3 slightly (9s to 11-12s) as the merge driver processes more conflict regions

3. Guaranteed Conflict (3 scenarios)#

Every edit touches the same lines. Maximum conflict scenario for git.

Files 1-git (ms) Git Conflicts 2-diff (ms) 3-sidecar (ms) 4-yrs (ms)
10 6,630 10 7,727 8,582 2,036
50 7,863 20 8,339 8,736 1,866
200 7,518 20 9,099 9,114 3,204

Observations:

  • Git fails every scenario with 10-20 conflict files
  • All three CRDT strategies handle guaranteed conflicts with zero manual intervention
  • Strategy 4 remains fastest (2-3s vs 7-9s)

4. File Size Sweep (3 scenarios)#

50 files, medium conflict, varying file size (1KB/10KB/50KB).

Avg Size 1-git (ms) Git Conflicts 2-diff (ms) 3-sidecar (ms) 4-yrs (ms)
1 KB 7,750 2 8,829 7,174 1,532
10 KB 7,107 2 7,381 7,624 2,372
50 KB 7,395 2 10,216 10,252 4,986

Observations:

  • Strategy 4 scales linearly with file size: 1.5s (1KB) → 2.4s (10KB) → 5.0s (50KB)
  • Git strategies are less affected by file size (network overhead dominates)
  • At 50KB, strategies 2-3 slow ~40% (10.2s vs 7.2s) due to larger CRDT state vectors
  • Even at 50KB, strategy 4 is still 2x faster than strategies 2-3

5. Multi-Collaborator (6 scenarios)#

5 collaborators, 50 or 200 files.

Files Conflict 1-git (ms) Git Conflicts 2-diff (ms) 3-sidecar (ms) 4-yrs (ms)
50 low 10,118 0 9,741 10,629 1,718
200 low 11,324 1 11,433 10,537 3,366
50 medium 10,142 2 10,741 10,291 1,964
200 medium 11,384 5 12,888 12,443 3,297
50 high 10,967 7 12,657 11,930 2,080
200 high 12,031 15 12,329 14,769 3,331

Observations:

  • 5 collaborators add ~3s to git strategies vs 2-collaborator scenarios (more branches to push/merge)
  • Strategy 4 time is independent of collaborator count (~2s for 50 files regardless of 2 or 5 collaborators)
  • The multi-collab advantage is strategy 4's strongest differentiator: 5-6x faster at 50 files, 3-4x faster at 200 files
  • Git conflict count stays the same as 2-collaborator scenarios (conflicts are pairwise, and the merge is sequential)

Comparison with v5 (pre-refactor)#

The refactor (YrsRepo/YrsBranch split) adds one extra PDS API call per save (to create/update the YrsRepo registry record). This adds negligible overhead — strategy 4 times are within normal variance of v5 results.

The main functional improvement is in merge discovery: merge_project() now does a single getRecord on the project's YrsRepo instead of scanning with listRecords. This is both faster and more reliable.

Conclusions#

  1. For repos under 200 files, strategy 4 (yrs-on-pds) is the clear winner — 3-5x faster than any git-based approach
  2. For 1000+ file repos, git-based strategies are faster due to bundled transport — strategy 4 would need batch upload/download to compete
  3. CRDT merge eliminates all conflicts across all scenarios, including guaranteed-conflict cases
  4. Strategies 2 and 3 perform nearly identically — the sidecar approach adds no measurable overhead over diff-only
  5. The v6 refactor has no performance regressions — the cleaner architecture (deterministic project lookup, no listRecords scanning) maintains the same performance profile