···117117 in separate lanes. The iCloud lane uses an AIMD controller (attic's
118118 `AIMDController` implementing LadderKit's `AdaptiveConcurrencyControlling`)
119119 to back off when Photos.app or iCloud pushes back, and to ramp up on a
120120- clean lane.
120120+ clean lane. See [Lanes and adaptive concurrency](docs/lanes-and-adaptive-concurrency.md)
121121+ for details.
121122- **Retry queue** — transient failures are remembered on S3
122123 (`retry-queue.json`) and retried first on the next run, carrying
123124 attempts/first-seen/last-message for each UUID.
···261262262263- [Architecture](docs/architecture.md) — How attic works: the backup pipeline,
263264 photo library access, manifest lifecycle, and design boundaries
265265+- [Lanes and adaptive concurrency](docs/lanes-and-adaptive-concurrency.md) —
266266+ Why attic splits exports into local and iCloud lanes, and how the AIMD
267267+ controller adapts to iCloud throttling
264268- [Asset Metadata](docs/metadata.md) — Schema reference for the per-asset JSON
265269 uploaded to S3
+157
docs/lanes-and-adaptive-concurrency.md
···11+# Lanes and adaptive concurrency
22+33+Attic backs up two very different kinds of photo assets in parallel: ones that
44+are already on disk, and ones that still live on Apple's servers. Mixing them
55+in a single pool is a compromise that loses to both extremes, so the exporter
66+partitions each batch into two **lanes** and runs each at a concurrency limit
77+suited to its behavior.
88+99+## The two kinds of assets
1010+1111+With **"Optimize Mac Storage"** enabled, Photos.app keeps only recent or
1212+frequently-accessed originals on disk. Everything else is a thumbnail-only
1313+placeholder whose original lives in iCloud.
1414+1515+### Local assets
1616+1717+The original file is on disk in the Photos library bundle. Exporting one is
1818+essentially a file copy + inline SHA-256 hash.
1919+2020+- **Fast** — hundreds of MB/s, limited by disk and CPU.
2121+- **Predictable** — no network in the loop.
2222+- **Few failure modes** — "disk full" and "permission denied" are about it.
2323+- **Parallelism helps** — more concurrency, more throughput, up to disk
2424+ saturation.
2525+2626+### iCloud-only assets
2727+2828+Only the thumbnail is on disk. Exporting means asking Photos.app to pull the
2929+original from iCloud, which involves auth, a round-trip to cold storage, and a
3030+write back to local disk.
3131+3232+- **Slow** — network-bound, often seconds per asset.
3333+- **Heavily throttled** — iCloud rate-limits hard if you ask for too many at
3434+ once.
3535+- **Many transient failure modes** — timeouts, `-1712` AppleEvent timeouts,
3636+ 503s from iCloud, rate-limit waits.
3737+- **Some permanent failures** — shared-album derivatives that have gone
3838+ missing server-side raise `-1728 "Can't get media item"`. Retrying doesn't
3939+ help.
4040+- **Parallelism past a point hurts** — iCloud starts queuing you, total
4141+ throughput drops.
4242+4343+## Why a single pool is wrong
4444+4545+If you run one pool at a single concurrency limit, you're stuck picking a
4646+number that's wrong for one side:
4747+4848+| Pool size | Local lane | iCloud lane |
4949+|-----------|------------|-------------|
5050+| 16 | great | throttled into the ground |
5151+| 2 | drip-feed | safe but slow |
5252+5353+Neither number is right. Splitting the lanes lets each run at its own pace.
5454+5555+## How the split is decided
5656+5757+LadderKit exposes a `LocalAvailabilityProviding` protocol that answers "is
5858+this asset's original on disk?" The real implementation,
5959+`PhotosDatabaseLocalAvailability`, reads one column from Photos.sqlite:
6060+6161+```
6262+ZINTERNALRESOURCE.ZLOCALAVAILABILITY = 1
6363+```
6464+6565+This is the same flag Photos.app itself uses to decide whether to show the
6666+little download-cloud icon in the UI. It's cheap to read and doesn't touch
6767+PhotoKit.
6868+6969+At batch time, `PhotoExporter` partitions every UUID:
7070+7171+- `ZLOCALAVAILABILITY = 1` → **local lane**
7272+- everything else → **iCloud lane**
7373+7474+```
7575+ ┌── local lane ──→ concurrency = maxConcurrency (e.g. 16)
7676+batch ────┤
7777+ └── iCloud lane ──→ concurrency = AIMDController.currentLimit()
7878+ (adapts: 1-12, starts at 6)
7979+```
8080+8181+Each lane has its own `TaskGroup`, its own concurrency cap, and — critically
8282+— its own feedback signals. Congestion on the iCloud side doesn't slow the
8383+local side down.
8484+8585+## The iCloud lane is adaptive
8686+8787+Because iCloud's tolerance shifts minute to minute, picking a fixed iCloud
8888+concurrency is also wrong. The iCloud lane is gated by an **AIMD controller**
8989+(attic's `AIMDController`, implementing LadderKit's
9090+`AdaptiveConcurrencyControlling` protocol).
9191+9292+**AIMD** (Additive Increase, Multiplicative Decrease) is the congestion-control
9393+policy TCP uses. The asymmetry is the point: recover cautiously, back off
9494+hard.
9595+9696+- The controller keeps a **sliding window of the last 20 outcomes**.
9797+- **Transient failure rate > 30%** → halve the limit (floor at `minLimit`).
9898+- **Transient failure rate ≤ 5%** → grow the limit by 1 (cap at `maxLimit`).
9999+- **Window clears on every limit change** — prevents stale pre-change
100100+ outcomes from immediately re-tripping the new limit.
101101+102102+The exporter polls `currentLimit()` between dispatches and reports each
103103+`ExportOutcome` (`.success`, `.transientFailure`, `.permanentFailure`) via
104104+`record(_:)`. The controller is observation-only — it doesn't hold permits or
105105+gate dispatch directly, it just publishes a number the exporter reads.
106106+107107+### Why permanent failures don't affect the limit
108108+109109+A batch full of `-1728` shared-album tombstones isn't a lane-health signal —
110110+the lane is fine, those assets just don't exist anymore. Reporting them as
111111+transient failures would permanently pin the lane at `minLimit`.
112112+113113+Ladder classifies each export error as `.other`, `.transientCloud`, or
114114+`.permanentlyUnavailable` and reports `.permanentFailure` to the controller
115115+for the last category. The controller **ignores `.permanentFailure`** entirely
116116+— it doesn't enter the window, doesn't count toward the rate. Attic also
117117+records permanent-unavailable UUIDs in `unavailable-assets.json` so they're
118118+skipped forever on future runs.
119119+120120+## What this looks like in practice
121121+122122+On a mixed Optimize-Storage library:
123123+124124+```
125125+Backup started — 2,431 assets pending
126126+ Local lane: 1,804 assets → running at 16 concurrent
127127+ iCloud lane: 627 assets → running at 6 concurrent (adaptive)
128128+129129+[...]
130130+131131+ iCloud lane throttling — limit 6 → 3
132132+ iCloud lane recovering — limit 3 → 4
133133+ iCloud lane recovering — limit 4 → 5
134134+```
135135+136136+- The local lane blasts through cached originals in parallel.
137137+- The iCloud lane ticks along at whatever rate iCloud currently tolerates.
138138+- Failures in one lane don't slow the other down.
139139+- Permanent failures (tombstones) are skipped, not retried, and don't affect
140140+ concurrency tuning.
141141+142142+## Where this lives in the code
143143+144144+| Layer | Type | Where |
145145+|---|---|---|
146146+| Local/iCloud split | `LocalAvailabilityProviding`, `PhotosDatabaseLocalAvailability` | LadderKit |
147147+| Per-lane dispatch | `PhotoExporter` | LadderKit |
148148+| Controller protocol | `AdaptiveConcurrencyControlling`, `ExportOutcome` | LadderKit |
149149+| Error classification | `ExportClassification` | LadderKit |
150150+| AIMD policy | `AIMDController` | `Sources/AtticCore/AIMDController.swift` |
151151+| Permanent-unavailable store | `UnavailableStore` | `Sources/AtticCore/UnavailableAssets.swift` |
152152+153153+LadderKit supplies the **mechanism** (partitioning, protocol, outcome
154154+reporting). AtticCore supplies the **policy** (the actual AIMD controller
155155+implementation, the unavailable store, the backup pipeline). The two
156156+responsibilities are cleanly separated so a different caller could plug in a
157157+different controller (EWMA, PID, token bucket) without touching ladder.