# Lanes and adaptive concurrency Attic backs up two very different kinds of photo assets in parallel: ones that are already on disk, and ones that still live on Apple's servers. Mixing them in a single pool is a compromise that loses to both extremes, so the exporter partitions each batch into two **lanes** and runs each at a concurrency limit suited to its behavior. ## The two kinds of assets With **"Optimize Mac Storage"** enabled, Photos.app keeps only recent or frequently-accessed originals on disk. Everything else is a thumbnail-only placeholder whose original lives in iCloud. ### Local assets The original file is on disk in the Photos library bundle. Exporting one is essentially a file copy + inline SHA-256 hash. - **Fast** — hundreds of MB/s, limited by disk and CPU. - **Predictable** — no network in the loop. - **Few failure modes** — "disk full" and "permission denied" are about it. - **Parallelism helps** — more concurrency, more throughput, up to disk saturation. ### iCloud-only assets Only the thumbnail is on disk. Exporting means asking Photos.app to pull the original from iCloud, which involves auth, a round-trip to cold storage, and a write back to local disk. - **Slow** — network-bound, often seconds per asset. - **Heavily throttled** — iCloud rate-limits hard if you ask for too many at once. - **Many transient failure modes** — timeouts, `-1712` AppleEvent timeouts, 503s from iCloud, rate-limit waits. - **Some permanent failures** — shared-album derivatives that have gone missing server-side raise `-1728 "Can't get media item"`. Retrying doesn't help. - **Parallelism past a point hurts** — iCloud starts queuing you, total throughput drops. ## Why a single pool is wrong If you run one pool at a single concurrency limit, you're stuck picking a number that's wrong for one side: | Pool size | Local lane | iCloud lane | |-----------|------------|-------------| | 16 | great | throttled into the ground | | 2 | drip-feed | safe but slow | Neither number is right. Splitting the lanes lets each run at its own pace. ## How the split is decided LadderKit exposes a `LocalAvailabilityProviding` protocol that answers "is this asset's original on disk?" The real implementation, `PhotosDatabaseLocalAvailability`, reads one column from Photos.sqlite: ``` ZINTERNALRESOURCE.ZLOCALAVAILABILITY = 1 ``` This is the same flag Photos.app itself uses to decide whether to show the little download-cloud icon in the UI. It's cheap to read and doesn't touch PhotoKit. At batch time, `PhotoExporter` partitions every UUID: - `ZLOCALAVAILABILITY = 1` → **local lane** - everything else → **iCloud lane** ``` ┌── local lane ──→ concurrency = maxConcurrency (e.g. 16) batch ────┤ └── iCloud lane ──→ concurrency = AIMDController.currentLimit() (adapts: 1-12, starts at 6) ``` Each lane has its own `TaskGroup`, its own concurrency cap, and — critically — its own feedback signals. Congestion on the iCloud side doesn't slow the local side down. ## The iCloud lane is adaptive Because iCloud's tolerance shifts minute to minute, picking a fixed iCloud concurrency is also wrong. The iCloud lane is gated by an **AIMD controller** (attic's `AIMDController`, implementing LadderKit's `AdaptiveConcurrencyControlling` protocol). **AIMD** (Additive Increase, Multiplicative Decrease) is the congestion-control policy TCP uses. The asymmetry is the point: recover cautiously, back off hard. - The controller keeps a **sliding window of the last 20 outcomes**. - **Transient failure rate > 30%** → halve the limit (floor at `minLimit`). - **Transient failure rate ≤ 5%** → grow the limit by 1 (cap at `maxLimit`). - **Window clears on every limit change** — prevents stale pre-change outcomes from immediately re-tripping the new limit. The exporter polls `currentLimit()` between dispatches and reports each `ExportOutcome` (`.success`, `.transientFailure`, `.permanentFailure`) via `record(_:)`. The controller is observation-only — it doesn't hold permits or gate dispatch directly, it just publishes a number the exporter reads. ### Why permanent failures don't affect the limit A batch full of `-1728` shared-album tombstones isn't a lane-health signal — the lane is fine, those assets just don't exist anymore. Reporting them as transient failures would permanently pin the lane at `minLimit`. Ladder classifies each export error as `.other`, `.transientCloud`, or `.permanentlyUnavailable` and reports `.permanentFailure` to the controller for the last category. The controller **ignores `.permanentFailure`** entirely — it doesn't enter the window, doesn't count toward the rate. Attic also records permanent-unavailable UUIDs in `unavailable-assets.json` so they're skipped forever on future runs. ## What this looks like in practice On a mixed Optimize-Storage library: ``` Backup started — 2,431 assets pending Local lane: 1,804 assets → running at 16 concurrent iCloud lane: 627 assets → running at 6 concurrent (adaptive) [...] iCloud lane throttling — limit 6 → 3 iCloud lane recovering — limit 3 → 4 iCloud lane recovering — limit 4 → 5 ``` - The local lane blasts through cached originals in parallel. - The iCloud lane ticks along at whatever rate iCloud currently tolerates. - Failures in one lane don't slow the other down. - Permanent failures (tombstones) are skipped, not retried, and don't affect concurrency tuning. ## Where this lives in the code | Layer | Type | Where | |---|---|---| | Local/iCloud split | `LocalAvailabilityProviding`, `PhotosDatabaseLocalAvailability` | LadderKit | | Per-lane dispatch | `PhotoExporter` | LadderKit | | Controller protocol | `AdaptiveConcurrencyControlling`, `ExportOutcome` | LadderKit | | Error classification | `ExportClassification` | LadderKit | | AIMD policy | `AIMDController` | `Sources/AtticCore/AIMDController.swift` | | Permanent-unavailable store | `UnavailableStore` | `Sources/AtticCore/UnavailableAssets.swift` | LadderKit supplies the **mechanism** (partitioning, protocol, outcome reporting). AtticCore supplies the **policy** (the actual AIMD controller implementation, the unavailable store, the backup pipeline). The two responsibilities are cleanly separated so a different caller could plug in a different controller (EWMA, PID, token bucket) without touching ladder.