A container registry that uses the AT Protocol for manifest storage and S3 for blob storage.
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

at main 49 lines 3.0 kB view raw view rendered
1# Config Blob Storage Decision 2 3## Background 4 5OCI image manifests reference two types of blobs: 6 71. **Layers** — filesystem diffs (tar+gzip), typically large, content-addressed and shared across users 82. **Config blob** — small JSON (~2-15KB) containing image metadata: architecture, OS, environment variables, entrypoint, Dockerfile build history, and labels 9 10In ATCR, manifests are stored in the user's PDS while all blobs (layers and config) are stored in S3 via the hold service. The hold tracks layers with `io.atcr.hold.layer` records but has no equivalent tracking for config blobs. 11 12## Considered: Storing Config Blobs in PDS 13 14Config blobs are unique per image build — unlike layers which are deduplicated across users, a config blob contains the specific Dockerfile history, env vars, and labels for that build. This makes them conceptually "user data" that could belong in the user's PDS alongside the manifest. 15 16The proposal was to add a `ConfigBlob` field to `ManifestRecord`, uploading the config blob to PDS during push (the data is already fetched from S3 for label extraction). The config would remain in S3 as well since the distribution library puts it there during the blob push phase. 17 18Potential benefits: 19- Manifests become more self-contained in PDS 20- Config metadata (entrypoint, env, history) available without S3 access (e.g., for web UI) 21- Aligns with the principle that user-specific data belongs in the user's PDS 22 23## Decision: Keep Config Blobs in S3 Only 24 25Config blobs can contain sensitive data: 26 27- **Environment variables** — `ENV DATABASE_URL=...`, `ENV API_KEY=...` set in Dockerfiles 28- **Build history** — `history[].created_by` reveals exact Dockerfile commands, internal registry URLs, build arguments 29- **Labels** — may contain internal metadata not intended for public consumption 30 31ATProto has no private data. The current storage split creates a useful privacy boundary: 32 33| Storage | Visibility | Contains | 34|---------|-----------|----------| 35| PDS | Public (anyone) | Manifest structure, tags, repo names, annotations | 36| Hold/S3 | Auth-gated | Layers + config — actual image content | 37 38This boundary enables **semi-private repos**: the public PDS metadata tells you what images exist (names, tags, sizes), but you cannot reconstruct or run the image without hold access. Storing config in PDS would break this — build secrets and Dockerfile history would be publicly readable even when the hold restricts blob access. 39 40We considered making PDS storage optional (only for fully public holds or allow-all-crew holds), but an optional field that can't be relied upon adds complexity without clear benefit — the config must live in S3 regardless for the pull path. 41 42## Current Status 43 44Config blobs remain in S3 behind hold authorization. GC handles config digests to prevent orphaned deletion (config digests are included in the referenced set alongside layer digests). 45 46## Revisit If 47 48- ATProto adds private data support 49- A concrete use case emerges that requires PDS-native config access