A container registry that uses the AT Protocol for manifest storage and S3 for blob storage.
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

ui fixes, add ability to warn/hide unreachable manifests from the ui. clean up docs

+3499 -3326
+154 -72
CLAUDE.md
··· 19 19 # Run tests 20 20 go test ./... 21 21 22 + # Run tests for specific package 23 + go test ./pkg/atproto/... 24 + go test ./pkg/appview/storage/... 25 + 26 + # Run specific test 27 + go test -run TestManifestStore ./pkg/atproto/... 28 + 22 29 # Run with race detector 23 30 go test -race ./... 31 + 32 + # Run tests with verbose output 33 + go test -v ./... 24 34 25 35 # Update dependencies 26 36 go mod tidy ··· 101 111 2. HTTP Request → /v2/alice/myapp/manifests/latest 102 112 3. Registry Middleware (pkg/appview/middleware/registry.go) 103 113 → Resolves "alice" to DID and PDS endpoint 104 - → Queries alice's sailor profile for defaultHold 114 + → Queries alice's sailor profile for defaultHold (returns DID if set) 105 115 → If not set, checks alice's io.atcr.hold records 106 - → Falls back to AppView's default_storage_endpoint 107 - → Stores DID/PDS/storage endpoint in context 116 + → Falls back to AppView's default_hold_did 117 + → Stores DID/PDS/hold DID in RegistryContext 108 118 4. Routing Repository (pkg/appview/storage/routing_repository.go) 109 119 → Creates RoutingRepository 110 120 → Returns ATProto ManifestStore for manifests 111 - → Returns ProxyBlobStore for blobs 112 - 5. Blob PUT → Resolved hold service (redirects to S3/storage) 113 - 6. Manifest PUT → alice's PDS as io.atcr.manifest record (includes holdEndpoint) 121 + → Returns ProxyBlobStore for blobs (routes to hold DID) 122 + 5. Blob PUT → ProxyBlobStore calls hold's XRPC multipart upload endpoints: 123 + a. POST /xrpc/io.atcr.hold.initiateUpload (gets uploadID) 124 + b. POST /xrpc/io.atcr.hold.getPartUploadUrl (gets presigned URL for each part) 125 + c. PUT to S3 presigned URL (or PUT /xrpc/io.atcr.hold.uploadPart for buffered mode) 126 + d. POST /xrpc/io.atcr.hold.completeUpload (finalizes upload) 127 + 6. Manifest PUT → alice's PDS as io.atcr.manifest record (includes holdDid + holdEndpoint) 128 + → Manifest also uploaded to PDS blob storage (ATProto CID format) 114 129 ``` 115 130 116 131 #### Push with BYOS (Bring Your Own Storage) 117 132 ``` 118 133 1. Client: docker push atcr.io/alice/myapp:latest 119 134 2. Registry Middleware resolves alice → did:plc:alice123 120 - 3. Hold discovery via findStorageEndpoint(): 121 - a. Check alice's sailor profile for defaultHold 122 - b. If not set, check alice's io.atcr.hold records 123 - c. Fall back to AppView's default_storage_endpoint 124 - 4. Found: alice's profile has defaultHold = "https://alice-storage.fly.dev" 125 - 5. Routing Repository returns ProxyBlobStore(alice-storage.fly.dev) 126 - 6. ProxyBlobStore calls alice-storage.fly.dev for presigned URL 127 - 7. Storage service validates alice's DID, generates S3 presigned URL 128 - 8. Client redirected to upload blob directly to alice's S3/Storj 129 - 9. Manifest stored in alice's PDS with holdEndpoint = "https://alice-storage.fly.dev" 135 + 3. Hold discovery via findHoldDID(): 136 + a. Check alice's sailor profile for defaultHold (returns DID if set) 137 + b. If not set, check alice's io.atcr.hold records (legacy) 138 + c. Fall back to AppView's default_hold_did 139 + 4. Found: alice's profile has defaultHold = "did:web:alice-storage.fly.dev" 140 + 5. Routing Repository returns ProxyBlobStore(did:web:alice-storage.fly.dev) 141 + 6. ProxyBlobStore: 142 + a. Resolves hold DID → https://alice-storage.fly.dev (did:web resolution) 143 + b. Gets service token from alice's PDS via com.atproto.server.getServiceAuth 144 + c. Calls hold XRPC endpoints with service token authentication: 145 + - POST /xrpc/io.atcr.hold.initiateUpload 146 + - POST /xrpc/io.atcr.hold.getPartUploadUrl (returns presigned S3 URL) 147 + - PUT to S3 presigned URL (direct upload to alice's S3/Storj) 148 + - POST /xrpc/io.atcr.hold.completeUpload 149 + 7. Hold service validates service token, checks crew membership, generates presigned URLs 150 + 8. Manifest stored in alice's PDS with: 151 + - holdDid = "did:web:alice-storage.fly.dev" (primary) 152 + - holdEndpoint = "https://alice-storage.fly.dev" (backward compat) 130 153 ``` 131 154 132 155 #### Pull Flow ··· 134 157 1. Client: docker pull atcr.io/alice/myapp:latest 135 158 2. GET /v2/alice/myapp/manifests/latest 136 159 3. AppView fetches manifest from alice's PDS 137 - 4. Manifest contains holdEndpoint = "https://alice-storage.fly.dev" 138 - 5. Hold endpoint cached: (alice's DID, "myapp") → "https://alice-storage.fly.dev" 160 + 4. Manifest contains: 161 + - holdDid = "did:web:alice-storage.fly.dev" (primary reference) 162 + - holdEndpoint = "https://alice-storage.fly.dev" (legacy fallback) 163 + 5. Hold DID cached: (alice's DID, "myapp") → "did:web:alice-storage.fly.dev" 164 + TTL: 10 minutes (covers typical pull operations) 139 165 6. Client requests blobs: GET /v2/alice/myapp/blobs/sha256:abc123 140 - 7. AppView checks cache, routes to hold from manifest (not re-discovered) 141 - 8. ProxyBlobStore calls alice-storage.fly.dev for presigned download URL 142 - 9. Client redirected to download blob directly from alice's S3 166 + 7. AppView checks cache, routes to hold DID from manifest (not re-discovered) 167 + 8. ProxyBlobStore: 168 + a. Resolves hold DID → https://alice-storage.fly.dev 169 + b. Gets service token from alice's PDS via com.atproto.server.getServiceAuth 170 + c. Calls GET /xrpc/com.atproto.sync.getBlob?did={userDID}&cid=sha256:abc123&method=GET 171 + d. Hold returns presigned download URL in JSON response 172 + 9. Client redirected to download blob directly from alice's S3 via presigned URL 143 173 ``` 144 174 145 - **Key insight:** Pull uses the historical `holdEndpoint` from the manifest, ensuring blobs are fetched from the hold where they were originally pushed, even if alice later changes her default hold. 175 + **Key insight:** Pull uses the historical `holdDid` from the manifest, ensuring blobs are fetched from the hold where they were originally pushed, even if alice later changes her default hold. Hold cache (10min TTL) avoids re-querying PDS for each blob during the same pull operation. 146 176 147 177 ### Name Resolution 148 178 ··· 269 299 - Uses XRPC protocol (com.atproto.repo.*) 270 300 271 301 **lexicon.go**: ATProto record schemas 272 - - `ManifestRecord`: OCI manifest stored as ATProto record (includes `holdEndpoint` field) 302 + - `ManifestRecord`: OCI manifest stored as ATProto record (includes `holdDid` + `holdEndpoint` fields) 273 303 - `TagRecord`: Tag pointing to manifest digest 274 - - `HoldRecord`: Storage hold definition (for BYOS) 275 - - `HoldCrewRecord`: Hold crew membership/permissions 276 - - `SailorProfileRecord`: User profile with `defaultHold` preference 277 - - Collections: `io.atcr.manifest`, `io.atcr.tag`, `io.atcr.hold`, `io.atcr.hold.crew`, `io.atcr.sailor.profile` 304 + - `HoldRecord`: Storage hold definition (LEGACY - for old BYOS model) 305 + - `HoldCrewRecord`: Hold crew membership (LEGACY - stored in owner's PDS) 306 + - `CaptainRecord`: Hold ownership record (NEW - stored in hold's embedded PDS at rkey "self") 307 + - `CrewRecord`: Hold crew membership (NEW - stored in hold's embedded PDS, one record per member) 308 + - `SailorProfileRecord`: User profile with `defaultHold` preference (can be DID or URL) 309 + - Collections: `io.atcr.manifest`, `io.atcr.tag`, `io.atcr.hold` (legacy), `io.atcr.hold.crew` (used by both legacy and new models), `io.atcr.hold.captain` (new), `io.atcr.sailor.profile` 278 310 279 311 **profile.go**: Sailor profile management 280 312 - `EnsureProfile()`: Creates profile with default hold on first authentication ··· 289 321 #### Storage Layer (`pkg/appview/storage/`) 290 322 291 323 **routing_repository.go**: Routes content by type 292 - - `Manifests()` → returns ATProto ManifestStore (caches instance for hold endpoint extraction) 324 + - `Manifests()` → returns ATProto ManifestStore (caches instance for hold DID extraction) 293 325 - `Blobs()` → checks hold cache for pull, uses discovery for push 294 - - Pull: Uses cached `holdEndpoint` from manifest (historical reference) 295 - - Push: Uses discovery-based endpoint from `findStorageEndpoint()` 296 - - Always returns ProxyBlobStore (routes to hold service) 326 + - Pull: Uses cached `holdDid` from manifest (historical reference) 327 + - Push: Uses discovery-based DID from `findHoldDID()` in middleware 328 + - Always returns ProxyBlobStore (routes to hold service via DID) 297 329 - Implements `distribution.Repository` interface 330 + - Uses RegistryContext to pass DID, PDS endpoint, hold DID, OAuth refresher, etc. 298 331 299 - **hold_cache.go**: In-memory hold endpoint cache 300 - - Caches `(DID, repository) → holdEndpoint` for pull operations 332 + **hold_cache.go**: In-memory hold DID cache 333 + - Caches `(DID, repository) → holdDid` for pull operations 301 334 - TTL: 10 minutes (covers typical pull operations) 302 335 - Cleanup: Background goroutine runs every 5 minutes 303 336 - **NOTE:** Simple in-memory cache for MVP. For production: use Redis or similar 304 - - Prevents expensive ATProto lookups on every blob request 337 + - Prevents expensive PDS manifest lookups on every blob request during pull 305 338 306 - **proxy_blob_store.go**: External storage proxy 307 - - Calls user's storage service for presigned URLs 308 - - Issues HTTP redirects for blob uploads/downloads 339 + **proxy_blob_store.go**: External storage proxy (routes to hold via XRPC) 340 + - Resolves hold DID → HTTP URL for XRPC requests (did:web resolution) 341 + - Gets service tokens from user's PDS (`com.atproto.server.getServiceAuth`) 342 + - Calls hold XRPC endpoints with service token authentication: 343 + - Multipart upload: initiateUpload, getPartUploadUrl, uploadPart, completeUpload, abortUpload 344 + - Blob read: com.atproto.sync.getBlob (returns presigned download URL) 309 345 - Implements full `distribution.BlobStore` interface 310 - - Supports multipart uploads for large blobs 311 - - Used when user has `io.atcr.hold` record 346 + - Supports both presigned URL mode (S3 direct) and buffered mode (proxy via hold) 312 347 313 348 #### AppView Web UI (`pkg/appview/`) 314 349 ··· 348 383 349 384 #### Hold Service (`cmd/hold/`) 350 385 351 - Lightweight standalone service for BYOS (Bring Your Own Storage): 386 + Lightweight standalone service for BYOS (Bring Your Own Storage) with embedded PDS: 352 387 353 388 **Architecture:** 354 - - Reuses distribution's storage driver factory 355 - - Supports all distribution drivers: S3, Storj, Minio, Azure, GCS, filesystem 356 - - Authorization follows ATProto's public-by-default model 357 - - Generates presigned URLs (15min expiry) or proxies uploads/downloads 389 + - **Embedded PDS**: Each hold has a full ATProto PDS for storing captain + crew records 390 + - **DID**: Hold identified by did:web (e.g., `did:web:hold01.atcr.io`) 391 + - **Storage**: Reuses distribution's storage driver factory (S3, Storj, Minio, Azure, GCS, filesystem) 392 + - **Authorization**: Based on captain + crew records in embedded PDS 393 + - **Blob operations**: Generates presigned URLs (15min expiry) or proxies uploads/downloads via XRPC 358 394 359 395 **Authorization Model:** 360 396 361 397 Read access: 362 398 - **Public hold** (`HOLD_PUBLIC=true`): Anonymous + all authenticated users 363 - - **Private hold** (`HOLD_PUBLIC=false`): Authenticated users only (any ATCR user) 399 + - **Private hold** (`HOLD_PUBLIC=false`): Requires authentication + crew membership with blob:read permission 364 400 365 401 Write access: 366 - - Hold owner OR crew members only 402 + - Hold owner OR crew members with blob:write permission 367 403 - Verified via `io.atcr.hold.crew` records in hold's embedded PDS 368 404 369 - Key insight: "Private" gates anonymous access, not authenticated access. This reflects ATProto's current limitation (no private PDS records yet). 405 + **Embedded PDS Endpoints** (`pkg/hold/pds/xrpc.go`): 370 406 371 - **Embedded PDS Endpoints:** 372 - 373 - Each hold service includes an embedded PDS (Personal Data Server) that stores captain + crew records: 374 - 407 + Standard ATProto sync endpoints: 375 408 - `GET /xrpc/com.atproto.sync.getRepo?did={did}` - Download full repository as CAR file 376 409 - `GET /xrpc/com.atproto.sync.getRepo?did={did}&since={rev}` - Download repository diff since revision 377 410 - `GET /xrpc/com.atproto.sync.subscribeRepos` - WebSocket firehose for real-time events 378 411 - `GET /xrpc/com.atproto.sync.listRepos` - List all repositories (single-user PDS) 412 + - `GET /xrpc/com.atproto.sync.getBlob?did={did}&cid={digest}` - Get blob or presigned download URL 413 + 414 + Repository management: 415 + - `GET /xrpc/com.atproto.repo.describeRepo?repo={did}` - Repository metadata 416 + - `GET /xrpc/com.atproto.repo.getRecord?repo={did}&collection={col}&rkey={key}` - Get record 417 + - `GET /xrpc/com.atproto.repo.listRecords?repo={did}&collection={col}` - List records (supports pagination) 418 + - `POST /xrpc/com.atproto.repo.deleteRecord` - Delete record (owner/crew admin only) 419 + - `POST /xrpc/com.atproto.repo.uploadBlob` - Upload ATProto blob (owner/crew admin only) 420 + 421 + DID resolution: 379 422 - `GET /.well-known/did.json` - DID document (did:web resolution) 380 - - Standard ATProto repo endpoints (getRecord, listRecords, etc.) 423 + - `GET /.well-known/atproto-did` - DID for handle resolution 381 424 382 - The `subscribeRepos` endpoint broadcasts #commit events whenever crew membership changes, allowing AppViews to monitor hold access control in real-time. 425 + Crew management: 426 + - `POST /xrpc/io.atcr.hold.requestCrew` - Request crew membership (authenticated users) 383 427 384 - **Configuration:** Environment variables (see `.env.example`) 385 - - `HOLD_PUBLIC_URL` - Public URL of hold service (required) 428 + **OCI Multipart Upload Endpoints** (`pkg/hold/oci/xrpc.go`): 429 + 430 + All require blob:write permission via service token authentication: 431 + - `POST /xrpc/io.atcr.hold.initiateUpload` - Start multipart upload session 432 + - `POST /xrpc/io.atcr.hold.getPartUploadUrl` - Get presigned URL for uploading a part 433 + - `PUT /xrpc/io.atcr.hold.uploadPart` - Direct buffered part upload (alternative to presigned URLs) 434 + - `POST /xrpc/io.atcr.hold.completeUpload` - Finalize multipart upload and move to final location 435 + - `POST /xrpc/io.atcr.hold.abortUpload` - Cancel multipart upload and cleanup temp data 436 + 437 + **AppView-to-Hold Authentication:** 438 + - AppView uses service tokens from user's PDS (`com.atproto.server.getServiceAuth`) 439 + - Service tokens are scoped to specific hold DIDs and include the user's DID 440 + - Hold validates tokens and checks crew membership for authorization 441 + - Tokens cached for 50 seconds (valid for 60 seconds from PDS) 442 + 443 + **Configuration:** Environment variables (see `.env.hold.example`) 444 + - `HOLD_PUBLIC_URL` - Public URL of hold service (required, used for did:web generation) 386 445 - `STORAGE_DRIVER` - Storage driver type (s3, filesystem) 387 446 - `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` - S3 credentials 388 447 - `S3_BUCKET`, `S3_ENDPOINT` - S3 configuration 389 448 - `HOLD_PUBLIC` - Allow public reads (default: false) 390 - - `HOLD_OWNER` - DID for auto-registration (optional) 449 + - `HOLD_OWNER` - DID for captain record creation (optional) 450 + - `HOLD_ALLOW_ALL_CREW` - Allow any authenticated user to register as crew (default: false) 451 + - `HOLD_DATABASE_PATH` - Path for embedded PDS database (required) 452 + - `HOLD_DATABASE_KEY_PATH` - Path for PDS signing keys (optional, generated if missing) 391 453 392 454 **Deployment:** Can run on Fly.io, Railway, Docker, Kubernetes, etc. 393 455 ··· 399 461 "$type": "io.atcr.manifest", 400 462 "repository": "myapp", 401 463 "digest": "sha256:abc123...", 402 - "holdEndpoint": "https://hold1.alice.com", 464 + "holdDid": "did:web:hold01.atcr.io", 465 + "holdEndpoint": "https://hold1.atcr.io", 403 466 "schemaVersion": 2, 404 467 "mediaType": "application/vnd.oci.image.manifest.v1+json", 405 468 "config": { "digest": "sha256:...", "size": 1234 }, 406 469 "layers": [ 407 470 { "digest": "sha256:...", "size": 5678 } 408 471 ], 472 + "manifestBlob": { 473 + "$type": "blob", 474 + "ref": { "$link": "bafyrei..." }, 475 + "mimeType": "application/vnd.oci.image.manifest.v1+json", 476 + "size": 1234 477 + }, 409 478 "createdAt": "2025-09-30T..." 410 479 } 411 480 ``` 412 481 482 + **Key fields:** 483 + - `holdDid` - DID of the hold service where blobs are stored (PRIMARY reference, new) 484 + - `holdEndpoint` - HTTP URL of hold service (DEPRECATED, kept for backward compatibility) 485 + - `manifestBlob` - Reference to manifest blob in ATProto blob storage (CID format) 486 + 413 487 Record key = manifest digest (without algorithm prefix) 414 488 Collection = `io.atcr.manifest` 415 489 ··· 425 499 ```json 426 500 { 427 501 "$type": "io.atcr.sailor.profile", 428 - "defaultHold": "https://hold1.alice.com", 502 + "defaultHold": "did:web:hold1.alice.com", 429 503 "createdAt": "2025-10-02T...", 430 504 "updatedAt": "2025-10-02T..." 431 505 } ··· 433 507 434 508 **Profile Management:** 435 509 - Created automatically on first authentication (OAuth or Basic Auth) 436 - - If AppView has `default_storage_endpoint` configured, profile gets that as `defaultHold` 510 + - `defaultHold` can be a DID (preferred, e.g., `did:web:hold01.atcr.io`) or legacy URL 511 + - If AppView has `default_hold_did` configured, profile gets that as `defaultHold` 437 512 - Users can update their profile to change default hold (future: via UI) 438 513 - Setting `defaultHold` to null opts out of defaults (use own holds or AppView default) 439 514 440 - **Hold Resolution Priority** (in `findStorageEndpoint()`): 441 - 1. **Profile's `defaultHold`** - User's explicit preference 442 - 2. **User's `io.atcr.hold` records** - User's own holds 443 - 3. **AppView's `default_storage_endpoint`** - Fallback default 515 + **Hold Resolution Priority** (in `findHoldDID()` in middleware): 516 + 1. **Profile's `defaultHold`** - User's explicit preference (DID or URL) 517 + 2. **User's `io.atcr.hold` records** - User's own holds (legacy BYOS model) 518 + 3. **AppView's `default_hold_did`** - Fallback default (configured in middleware) 444 519 445 520 This ensures: 446 521 - Users can join shared holds by setting their profile's `defaultHold` ··· 472 547 **Server:** 473 548 - `ATCR_HTTP_ADDR` - HTTP listen address (default: `:5000`) 474 549 - `ATCR_BASE_URL` - Public URL for OAuth/JWT realm (auto-detected in dev) 475 - - `ATCR_DEFAULT_HOLD` - Default hold endpoint for blob storage (REQUIRED) 550 + - `ATCR_DEFAULT_HOLD_DID` - Default hold DID for blob storage (REQUIRED, e.g., `did:web:hold01.atcr.io`) 476 551 477 552 **Authentication:** 478 553 - `ATCR_AUTH_KEY_PATH` - JWT signing key path (default: `/var/lib/atcr/auth/private-key.pem`) ··· 537 612 **Modifying storage routing**: 538 613 1. Edit `pkg/appview/storage/routing_repository.go` 539 614 2. Update `Blobs()` method to change routing logic 540 - 3. Consider context values: `storage.endpoint`, `atproto.did` 615 + 3. Context is passed via RegistryContext struct (holds DID, PDS endpoint, hold DID, OAuth refresher, etc.) 541 616 542 617 **Changing name resolution**: 543 618 1. Modify `pkg/atproto/resolver.go` for DID/handle resolution 544 619 2. Update `pkg/appview/middleware/registry.go` if changing routing logic 545 - 3. Remember: `findStorageEndpoint()` queries PDS for `io.atcr.hold` records 620 + 3. Remember: `findHoldDID()` checks sailor profile, then `io.atcr.hold` records (legacy), then default hold DID 546 621 547 622 **Working with OAuth client**: 548 623 - Client is self-contained: pass `baseURL`, it handles client ID/redirect URI/scopes ··· 582 657 583 658 ## Important Context Values 584 659 585 - When working with the codebase, these context values are used for routing: 660 + When working with the codebase, routing information is passed via the `RegistryContext` struct (`pkg/appview/storage/context.go`): 586 661 587 - - `atproto.did` - Resolved DID for the user (e.g., `did:plc:alice123`) 588 - - `atproto.pds` - User's PDS endpoint (e.g., `https://bsky.social`) 589 - - `atproto.identity` - Original identity string (handle or DID) 590 - - `storage.endpoint` - Storage service URL (if user has `io.atcr.registry` record) 591 - - `auth.did` - Authenticated DID from validated token 662 + - `DID` - User's DID (e.g., `did:plc:alice123`) 663 + - `PDSEndpoint` - User's PDS endpoint (e.g., `https://bsky.social`) 664 + - `HoldDID` - Hold service DID (e.g., `did:web:hold01.atcr.io`) 665 + - `Repository` - Image repository name (e.g., `myapp`) 666 + - `ATProtoClient` - Client for calling user's PDS with OAuth/Basic Auth 667 + - `Refresher` - OAuth token refresher for service token requests 668 + - `Database` - Database for metrics tracking 669 + - `Authorizer` - Hold authorizer for access control 670 + 671 + Legacy context keys (deprecated): 672 + - `hold.did` - Hold DID (now in RegistryContext) 673 + - `auth.did` - Authenticated DID from validated token (now in auth middleware) 592 674 593 675 ## Documentation References 594 676
+8 -5
README.md
··· 21 21 1. **AppView** - Registry API + web UI 22 22 - Serves OCI Distribution API (Docker push/pull) 23 23 - Resolves handles/DIDs to PDS endpoints 24 - - Routes manifests to PDS, blobs to storage 24 + - Routes manifests to user's PDS, blobs to hold services 25 25 - Web interface for browsing/search 26 26 27 - 2. **Hold Service** - Storage service (optional BYOS) 27 + 2. **Hold Service** - Storage service with embedded PDS (optional BYOS) 28 + - Each hold has a full ATProto PDS for access control (captain + crew records) 29 + - Identified by did:web (e.g., `did:web:hold01.atcr.io`) 28 30 - Generates presigned URLs for S3/Storj/Minio/etc. 29 - - Users can deploy their own storage 31 + - Users can deploy their own storage and control access via crew membership 30 32 31 33 3. **Credential Helper** - Client authentication 32 34 - ATProto OAuth with DPoP 33 35 - Automatic authentication on first push/pull 34 36 35 37 **Storage model:** 36 - - Manifests → ATProto records (small JSON) 37 - - Blobs → S3 or BYOS (large binaries) 38 + - Manifests → ATProto records in user's PDS (small JSON, includes `holdDid` reference) 39 + - Blobs → Hold services via XRPC multipart upload (large binaries, stored in S3/etc.) 40 + - AppView uses service tokens to communicate with holds on behalf of users 38 41 39 42 ## Features 40 43
+43 -11
cmd/appview/serve.go
··· 26 26 "atcr.io/pkg/appview" 27 27 "atcr.io/pkg/appview/db" 28 28 uihandlers "atcr.io/pkg/appview/handlers" 29 + "atcr.io/pkg/appview/holdhealth" 29 30 "atcr.io/pkg/appview/jetstream" 30 31 "github.com/gorilla/mux" 31 32 ) ··· 72 73 return fmt.Errorf("failed to initialize UI database - required for session storage") 73 74 } 74 75 76 + // Initialize hold health checker 77 + fmt.Println("Initializing hold health checker...") 78 + cacheTTL := 15 * time.Minute // Cache TTL from user requirements 79 + healthChecker := holdhealth.NewChecker(cacheTTL) 80 + 81 + // Start background health check worker 82 + refreshInterval := 5 * time.Minute // Refresh every 5 minutes 83 + dbAdapter := holdhealth.NewDBAdapter(uiDatabase) 84 + healthWorker := holdhealth.NewWorker(healthChecker, dbAdapter, refreshInterval) 85 + 86 + // Create context for worker lifecycle management 87 + workerCtx, workerCancel := context.WithCancel(context.Background()) 88 + defer workerCancel() // Ensure context is cancelled on all exit paths 89 + healthWorker.Start(workerCtx) 90 + fmt.Println("Hold health worker started (5min refresh interval, 15min cache TTL)") 91 + 75 92 // Initialize OAuth components 76 93 fmt.Println("Initializing OAuth components...") 77 94 ··· 132 149 middleware.SetGlobalAuthorizer(holdAuthorizer) 133 150 fmt.Println("Hold authorizer initialized with database caching") 134 151 135 - // Initialize UI routes with OAuth app, refresher, and device store 136 - uiTemplates, uiRouter := initializeUIRoutes(uiDatabase, uiReadOnlyDB, uiSessionStore, oauthApp, refresher, baseURL, deviceStore, defaultHoldDID) 152 + // Initialize UI routes with OAuth app, refresher, device store, and health checker 153 + uiTemplates, uiRouter := initializeUIRoutes(uiDatabase, uiReadOnlyDB, uiSessionStore, oauthApp, refresher, baseURL, deviceStore, defaultHoldDID, healthChecker) 137 154 138 155 // Create OAuth server 139 156 oauthServer := oauth.NewServer(oauthApp) ··· 256 273 select { 257 274 case <-stop: 258 275 fmt.Println("Shutting down registry server...") 276 + 277 + // Stop health worker first 278 + fmt.Println("Stopping hold health worker...") 279 + healthWorker.Stop() 280 + 259 281 shutdownCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second) 260 282 defer cancel() 261 283 ··· 263 285 return fmt.Errorf("server shutdown error: %w", err) 264 286 } 265 287 case err := <-errChan: 288 + // Stop health worker on error (workerCancel called by defer) 289 + healthWorker.Stop() 266 290 return fmt.Errorf("server error: %w", err) 267 291 } 268 292 ··· 320 344 // database: read-write connection for auth and writes 321 345 // readOnlyDB: read-only connection for public queries (search, user pages, etc.) 322 346 // defaultHoldDID: DID of the default hold service (e.g., "did:web:hold01.atcr.io") 323 - func initializeUIRoutes(database *sql.DB, readOnlyDB *sql.DB, sessionStore *db.SessionStore, oauthApp *oauth.App, refresher *oauth.Refresher, baseURL string, deviceStore *db.DeviceStore, defaultHoldDID string) (*template.Template, *mux.Router) { 347 + // healthChecker: hold endpoint health checker 348 + func initializeUIRoutes(database *sql.DB, readOnlyDB *sql.DB, sessionStore *db.SessionStore, oauthApp *oauth.App, refresher *oauth.Refresher, baseURL string, deviceStore *db.DeviceStore, defaultHoldDID string, healthChecker *holdhealth.Checker) (*template.Template, *mux.Router) { 324 349 // Check if UI is enabled 325 350 uiEnabled := os.Getenv("ATCR_UI_ENABLED") 326 351 if uiEnabled == "false" { ··· 356 381 357 382 router.Handle("/api/recent-pushes", middleware.OptionalAuth(sessionStore, database)( 358 383 &uihandlers.RecentPushesHandler{ 359 - DB: readOnlyDB, 360 - Templates: templates, 361 - RegistryURL: uihandlers.TrimRegistryURL(baseURL), 384 + DB: readOnlyDB, 385 + Templates: templates, 386 + RegistryURL: uihandlers.TrimRegistryURL(baseURL), 387 + HealthChecker: healthChecker, 362 388 }, 363 389 )).Methods("GET") 364 390 ··· 428 454 }, 429 455 )).Methods("GET") 430 456 457 + // Manifest health check API endpoint (HTMX polling) 458 + router.Handle("/api/manifest-health", &uihandlers.ManifestHealthHandler{ 459 + HealthChecker: healthChecker, 460 + }).Methods("GET") 461 + 431 462 router.Handle("/u/{handle}", middleware.OptionalAuth(sessionStore, database)( 432 463 &uihandlers.UserPageHandler{ 433 464 DB: readOnlyDB, ··· 438 469 439 470 router.Handle("/r/{handle}/{repository}", middleware.OptionalAuth(sessionStore, database)( 440 471 &uihandlers.RepositoryPageHandler{ 441 - DB: readOnlyDB, 442 - Templates: templates, 443 - RegistryURL: uihandlers.TrimRegistryURL(baseURL), 444 - Directory: oauthApp.Directory(), 445 - Refresher: refresher, 472 + DB: readOnlyDB, 473 + Templates: templates, 474 + RegistryURL: uihandlers.TrimRegistryURL(baseURL), 475 + Directory: oauthApp.Directory(), 476 + Refresher: refresher, 477 + HealthChecker: healthChecker, 446 478 }, 447 479 )).Methods("GET") 448 480
+212 -362
docs/BYOS.md
··· 2 2 3 3 ## Overview 4 4 5 - ATCR supports "Bring Your Own Storage" (BYOS) for blob storage. This allows users to: 6 - - Deploy their own storage service backed by S3/Storj/Minio/filesystem 7 - - Control who can use their storage (public or private) 8 - - Keep blob data in their own infrastructure while manifests remain in their ATProto PDS 5 + ATCR supports "Bring Your Own Storage" (BYOS) for blob storage. Users can: 6 + - Deploy their own hold service with embedded PDS 7 + - Control access via crew membership in the hold's PDS 8 + - Keep blob data in their own S3/Storj/Minio while manifests stay in their user PDS 9 9 10 10 ## Architecture 11 11 12 12 ``` 13 - ┌─────────────────────────────────────────────┐ 14 - │ ATCR AppView (API) │ 15 - │ - Manifests → ATProto PDS │ 16 - │ - Auth & token validation │ 17 - │ - Blob routing (issues redirects) │ 18 - │ - Profile management │ 19 - └─────────────────┬───────────────────────────┘ 20 - 21 - │ Hold discovery priority: 22 - │ 1. io.atcr.sailor.profile.defaultHold 23 - │ 2. io.atcr.hold records 24 - │ 3. AppView default_storage_endpoint 25 - 26 - ┌─────────────────────────────────────────────┐ 27 - │ User's PDS │ 28 - │ - io.atcr.sailor.profile (hold preference) │ 29 - │ - io.atcr.hold records (own holds) │ 30 - │ - io.atcr.manifest records (with holdEP) │ 31 - └─────────────────┬───────────────────────────┘ 32 - 33 - │ Redirects to hold 34 - 35 - ┌─────────────────────────────────────────────┐ 36 - │ Storage Service (Hold) │ 37 - │ - Blob storage (S3/Storj/Minio/filesystem) │ 38 - │ - Presigned URL generation │ 39 - │ - Authorization (DID-based) │ 40 - └─────────────────────────────────────────────┘ 41 - ``` 42 - 43 - ## ATProto Records 44 - 45 - ### io.atcr.sailor.profile 46 - 47 - **NEW:** User profile for hold selection preferences. Created automatically on first authentication. 48 - 49 - ```json 50 - { 51 - "$type": "io.atcr.sailor.profile", 52 - "defaultHold": "https://team-hold.example.com", 53 - "createdAt": "2025-10-02T12:00:00Z", 54 - "updatedAt": "2025-10-02T12:00:00Z" 55 - } 13 + ┌──────────────────────────────────────────┐ 14 + │ ATCR AppView (API) │ 15 + │ - Manifests → User's PDS │ 16 + │ - Auth & service token management │ 17 + │ - Blob routing via XRPC │ 18 + │ - Profile management │ 19 + └────────────┬─────────────────────────────┘ 20 + 21 + │ Hold discovery priority: 22 + │ 1. io.atcr.sailor.profile.defaultHold (DID) 23 + │ 2. io.atcr.hold records (legacy) 24 + │ 3. AppView default_hold_did 25 + 26 + ┌──────────────────────────────────────────┐ 27 + │ User's PDS │ 28 + │ - io.atcr.sailor.profile (hold DID) │ 29 + │ - io.atcr.manifest (with holdDid) │ 30 + └────────────┬─────────────────────────────┘ 31 + 32 + │ Service token from user's PDS 33 + 34 + ┌──────────────────────────────────────────┐ 35 + │ Hold Service (did:web:hold.example.com) │ 36 + │ ├── Embedded PDS │ 37 + │ │ ├── Captain record (ownership) │ 38 + │ │ └── Crew records (access control) │ 39 + │ ├── XRPC multipart upload endpoints │ 40 + │ └── Storage driver (S3/Storj/etc.) │ 41 + └──────────────────────────────────────────┘ 56 42 ``` 57 43 58 - **Record key:** Always `"self"` (only one profile per user) 59 - 60 - **Behavior:** 61 - - Created automatically when user first authenticates (OAuth or Basic Auth) 62 - - If AppView has `default_storage_endpoint`, profile gets that as initial `defaultHold` 63 - - User can update to join shared holds or use their own hold 64 - - Set `defaultHold` to `null` to opt out of defaults (use own hold or AppView default) 65 - 66 - **This solves the multi-hold problem:** Users who are crew members of multiple holds can explicitly choose which one to use via their profile. 44 + ## Hold Service Components 67 45 68 - ### io.atcr.hold 46 + Each hold is a full ATProto actor with: 47 + - **DID**: `did:web:hold.example.com` (hold's identity) 48 + - **Embedded PDS**: Stores captain + crew records (shared data) 49 + - **Storage backend**: S3, Storj, Minio, filesystem, etc. 50 + - **XRPC endpoints**: Standard ATProto + custom OCI multipart upload 69 51 70 - Users create a hold record in their PDS to configure their own storage: 52 + ### Records in Hold's PDS 71 53 54 + **Captain record** (`io.atcr.hold.captain/self`): 72 55 ```json 73 56 { 74 - "$type": "io.atcr.hold", 75 - "endpoint": "https://alice-storage.example.com", 57 + "$type": "io.atcr.hold.captain", 76 58 "owner": "did:plc:alice123", 77 59 "public": false, 78 - "createdAt": "2025-10-01T12:00:00Z" 60 + "deployedAt": "2025-10-14T...", 61 + "region": "iad", 62 + "provider": "fly.io" 79 63 } 80 64 ``` 81 65 82 - ### io.atcr.hold.crew 83 - 84 - Hold owners can add crew members (for shared storage): 85 - 66 + **Crew records** (`io.atcr.hold.crew/{rkey}`): 86 67 ```json 87 68 { 88 69 "$type": "io.atcr.hold.crew", 89 - "hold": "at://did:plc:alice/io.atcr.hold/my-storage", 90 70 "member": "did:plc:bob456", 91 - "role": "write", 92 - "addedAt": "2025-10-01T12:00:00Z" 71 + "role": "admin", 72 + "permissions": ["blob:read", "blob:write"], 73 + "addedAt": "2025-10-14T..." 93 74 } 94 75 ``` 95 76 96 - **Note:** Crew records are stored in the **hold owner's PDS**, not the crew member's PDS. This ensures the hold owner maintains full control over access. 77 + ### Sailor Profile (User's PDS) 97 78 98 - ## Storage Service 79 + Users set their preferred hold in their sailor profile: 99 80 100 - ### Deployment 81 + ```json 82 + { 83 + "$type": "io.atcr.sailor.profile", 84 + "defaultHold": "did:web:hold.example.com", 85 + "createdAt": "2025-10-02T...", 86 + "updatedAt": "2025-10-02T..." 87 + } 88 + ``` 101 89 102 - The storage service is a lightweight HTTP server that: 103 - 1. Accepts presigned URL requests 104 - 2. Verifies DID authorization 105 - 3. Generates presigned URLs for S3/Storj/etc 106 - 4. Returns URLs to AppView for client redirect 90 + ## Deployment 107 91 108 92 ### Configuration 109 93 110 - The hold service is configured entirely via environment variables. See `.env.example` for all options. 111 - 112 - **Required environment variables:** 94 + Hold service is configured entirely via environment variables: 113 95 114 96 ```bash 115 - # Hold service public URL (REQUIRED) 116 - HOLD_PUBLIC_URL=https://storage.example.com 97 + # Hold identity (REQUIRED) 98 + HOLD_PUBLIC_URL=https://hold.example.com 99 + HOLD_OWNER=did:plc:your-did-here 117 100 118 - # Storage driver type 101 + # Storage backend 119 102 STORAGE_DRIVER=s3 120 - 121 - # For S3/Minio 122 103 AWS_ACCESS_KEY_ID=your_access_key 123 104 AWS_SECRET_ACCESS_KEY=your_secret_key 124 105 AWS_REGION=us-east-1 125 106 S3_BUCKET=my-blobs 126 107 127 - # For Storj (optional - custom S3 endpoint) 128 - # S3_ENDPOINT=https://gateway.storjshare.io 108 + # Access control 109 + HOLD_PUBLIC=false # Require authentication for reads 110 + HOLD_ALLOW_ALL_CREW=false # Only explicit crew members can write 129 111 130 - # For filesystem storage 131 - # STORAGE_DRIVER=filesystem 132 - # STORAGE_ROOT_DIR=/var/lib/atcr-storage 112 + # Embedded PDS 113 + HOLD_DATABASE_PATH=/var/lib/atcr-hold/hold.db 114 + HOLD_DATABASE_KEY_PATH=/var/lib/atcr-hold/keys 133 115 ``` 134 116 135 - **Authorization:** 136 - 137 - ATCR follows ATProto's public-by-default model with gated anonymous access: 138 - 139 - **Read Access:** 140 - - **Public hold** (`HOLD_PUBLIC=true`): Anonymous reads allowed (no authentication) 141 - - **Private hold** (`HOLD_PUBLIC=false`): Requires authentication (any ATCR user with sailor.profile) 142 - 143 - **Write Access:** 144 - - Always requires authentication 145 - - Must be hold owner OR crew member (verified via `io.atcr.hold.crew` records in owner's PDS) 146 - 147 - **Key Points:** 148 - - "Private" just means "no anonymous access" - not "limited user access" 149 - - Any authenticated ATCR user can read from private holds 150 - - Crew membership only controls WRITE access, not READ access 151 - - This aligns with ATProto's public records model (no private PDS records yet) 152 - 153 - ### Running 117 + ### Running Locally 154 118 155 119 ```bash 156 120 # Build 157 - go build -o atcr-hold ./cmd/hold 121 + go build -o bin/atcr-hold ./cmd/hold 158 122 159 - # Set environment variables (or use .env file) 160 - export HOLD_PUBLIC_URL=https://storage.example.com 161 - export STORAGE_DRIVER=s3 162 - export AWS_ACCESS_KEY_ID=... 163 - export AWS_SECRET_ACCESS_KEY=... 164 - export AWS_REGION=us-east-1 165 - export S3_BUCKET=my-blobs 123 + # Run (with env vars or .env file) 124 + export HOLD_PUBLIC_URL=http://localhost:8080 125 + export HOLD_OWNER=did:plc:your-did-here 126 + export STORAGE_DRIVER=filesystem 127 + export STORAGE_ROOT_DIR=/tmp/atcr-hold 128 + export HOLD_DATABASE_PATH=/tmp/atcr-hold/hold.db 166 129 167 - # Run 168 - ./atcr-hold 130 + ./bin/atcr-hold 169 131 ``` 170 132 171 - **Registration (required):** 172 - 173 - The hold service must be registered in a PDS to be discoverable by the AppView. 174 - 175 - **Standard registration workflow:** 176 - 177 - 1. Set `HOLD_OWNER` to your DID: 178 - ```bash 179 - export HOLD_OWNER=did:plc:your-did-here 180 - ``` 181 - 182 - 2. Start the hold service: 183 - ```bash 184 - ./atcr-hold 185 - ``` 186 - 187 - 3. **Check the logs** for the OAuth authorization URL: 188 - ``` 189 - ================================================================================ 190 - OAUTH AUTHORIZATION REQUIRED 191 - ================================================================================ 192 - 193 - Please visit this URL to authorize the hold service: 194 - 195 - https://bsky.app/authorize?client_id=... 196 - 197 - Waiting for authorization... 198 - ================================================================================ 199 - ``` 200 - 201 - 4. Visit the URL in your browser and authorize 202 - 203 - 5. The hold service will: 204 - - Exchange the authorization code for a token 205 - - Create `io.atcr.hold` record in your PDS 206 - - Create `io.atcr.hold.crew` record (making you the owner) 207 - - Save registration state 208 - 209 - 6. On subsequent runs, the service checks if already registered and skips OAuth 210 - 211 - **Alternative methods:** 212 - 213 - - **Manual API registration**: Call `POST /register` with your own OAuth token 214 - - **Completely manual**: Create PDS records yourself using any ATProto client 133 + On first run, the hold service creates: 134 + - Captain record in embedded PDS (making you the owner) 135 + - Crew record for owner with all permissions 136 + - DID document at `/.well-known/did.json` 215 137 216 138 ### Deploy to Fly.io 217 139 ··· 223 145 224 146 [env] 225 147 HOLD_PUBLIC_URL = "https://my-atcr-hold.fly.dev" 226 - HOLD_SERVER_ADDR = ":8080" 227 148 STORAGE_DRIVER = "s3" 228 149 AWS_REGION = "us-east-1" 229 150 S3_BUCKET = "my-blobs" 230 151 HOLD_PUBLIC = "false" 152 + HOLD_ALLOW_ALL_CREW = "false" 231 153 232 154 [http_service] 233 155 internal_port = 8080 ··· 250 172 fly secrets set AWS_ACCESS_KEY_ID=... 251 173 fly secrets set AWS_SECRET_ACCESS_KEY=... 252 174 fly secrets set HOLD_OWNER=did:plc:your-did-here 253 - 254 - # Check logs for OAuth URL on first run 255 - fly logs 256 - 257 - # Visit the OAuth URL shown in logs to authorize 258 - # The hold service will register itself in your PDS 259 175 ``` 260 176 261 177 ## Request Flow 262 178 263 179 ### Push with BYOS 264 180 265 - 1. **Docker push** `atcr.io/alice/myapp:latest` 266 - 2. **AppView** resolves `alice` → `did:plc:alice123` 267 - 3. **AppView** discovers hold via priority logic: 268 - - Check alice's `io.atcr.sailor.profile` for `defaultHold` 269 - - If not set, check alice's `io.atcr.hold` records 270 - - Fall back to AppView's `default_storage_endpoint` 271 - 4. **Found:** `alice.profile.defaultHold = "https://team-hold.example.com"` 272 - 5. **AppView** → team-hold: POST `/put-presigned-url` 273 - ```json 274 - { 275 - "did": "did:plc:alice123", 276 - "digest": "sha256:abc123...", 277 - "size": 1048576 278 - } 279 - ``` 280 - 6. **Hold service**: 281 - - Verifies alice is authorized (checks crew records) 282 - - Generates S3 presigned upload URL (15min expiry) 283 - - Returns: `{"url": "https://s3.../blob?signature=..."}` 284 - 7. **AppView** → Docker: `307 Redirect` to presigned URL 285 - 8. **Docker** → S3: PUT blob directly (no proxy) 286 - 9. **Manifest** stored in alice's PDS with `holdEndpoint: "https://team-hold.example.com"` 181 + ``` 182 + 1. Client: docker push atcr.io/alice/myapp:latest 287 183 288 - ### Pull with BYOS 184 + 2. AppView resolves alice → did:plc:alice123 289 185 290 - 1. **Docker pull** `atcr.io/alice/myapp:latest` 291 - 2. **AppView** fetches manifest from alice's PDS 292 - 3. **Manifest** contains `holdEndpoint: "https://team-hold.example.com"` 293 - 4. **AppView** caches: `(alice's DID, "myapp") → "https://team-hold.example.com"` (10min TTL) 294 - 5. **Docker** requests blobs: GET `/v2/alice/myapp/blobs/sha256:abc123` 295 - 6. **AppView** uses **cached hold from manifest** (not re-discovered) 296 - 7. **AppView** → team-hold: POST `/get-presigned-url` 297 - 8. **Hold service** returns presigned download URL 298 - 9. **AppView** → Docker: `307 Redirect` 299 - 10. **Docker** → S3: GET blob directly 186 + 3. AppView discovers hold DID: 187 + - Check alice's sailor profile for defaultHold 188 + - Returns: "did:web:alice-storage.fly.dev" 300 189 301 - **Key insight:** Pull uses the historical `holdEndpoint` from the manifest, ensuring blobs are fetched from where they were originally pushed, even if alice later changes her profile's `defaultHold`. 190 + 4. AppView gets service token from alice's PDS: 191 + GET /xrpc/com.atproto.server.getServiceAuth?aud=did:web:alice-storage.fly.dev 192 + Response: { "token": "eyJ..." } 302 193 303 - ## Default Registry 194 + 5. AppView initiates multipart upload to hold: 195 + POST https://alice-storage.fly.dev/xrpc/io.atcr.hold.initiateUpload 196 + Authorization: Bearer {serviceToken} 197 + Body: { "digest": "sha256:abc..." } 198 + Response: { "uploadId": "xyz" } 304 199 305 - The AppView can run its own storage service as the default: 200 + 6. For each part: 201 + - AppView: POST /xrpc/io.atcr.hold.getPartUploadUrl 202 + - Hold validates service token, checks crew membership 203 + - Hold returns: { "url": "https://s3.../presigned" } 204 + - Client uploads directly to S3 presigned URL 306 205 307 - ### AppView config 206 + 7. AppView completes upload: 207 + POST /xrpc/io.atcr.hold.completeUpload 208 + Body: { "uploadId": "xyz", "digest": "sha256:abc...", "parts": [...] } 308 209 309 - ```yaml 310 - middleware: 311 - - name: registry 312 - options: 313 - atproto-resolver: 314 - default_storage_endpoint: https://storage.atcr.io 210 + 8. Manifest stored in alice's PDS: 211 + - holdDid: "did:web:alice-storage.fly.dev" 212 + - holdEndpoint: "https://alice-storage.fly.dev" (backward compat) 315 213 ``` 316 214 317 - ### Default hold service config 318 - 319 - ```bash 320 - # Accept any authenticated DID 321 - HOLD_PUBLIC=false # Requires authentication 215 + ### Pull with BYOS 322 216 323 - # Or allow public reads 324 - HOLD_PUBLIC=true # Public reads, auth required for writes 325 217 ``` 218 + 1. Client: docker pull atcr.io/alice/myapp:latest 326 219 327 - This provides free-tier shared storage for users who don't want to deploy their own. 220 + 2. AppView fetches manifest from alice's PDS 328 221 329 - ## Storage Drivers Supported 222 + 3. Manifest contains: 223 + - holdDid: "did:web:alice-storage.fly.dev" 330 224 331 - The storage service uses distribution's storage drivers: 225 + 4. AppView caches hold DID for 10 minutes (covers pull operation) 332 226 333 - - **S3** - AWS S3, Minio, Storj (via S3 gateway) 334 - - **Filesystem** - Local disk (for testing) 335 - - **Azure** - Azure Blob Storage 336 - - **GCS** - Google Cloud Storage 337 - - **Swift** - OpenStack Swift 338 - - **OSS** - Alibaba Cloud OSS 227 + 5. Client requests blob: GET /v2/alice/myapp/blobs/sha256:abc123 339 228 340 - ## Quotas 229 + 6. AppView uses cached hold DID from manifest 341 230 342 - Quotas are NOT implemented in the storage service. Instead, use: 231 + 7. AppView gets service token from alice's PDS 343 232 344 - - **S3**: Bucket policies, lifecycle rules 345 - - **Storj**: Project limits in Storj dashboard 346 - - **Minio**: Quota enforcement features 347 - - **Filesystem**: Disk quotas at OS level 233 + 8. AppView calls hold XRPC: 234 + GET /xrpc/com.atproto.sync.getBlob?did={userDID}&cid=sha256:abc123 235 + Authorization: Bearer {serviceToken} 236 + Response: { "url": "https://s3.../presigned-download" } 348 237 349 - ## Security 238 + 9. AppView redirects client to presigned S3 URL 350 239 351 - ### Authorization 240 + 10. Client downloads directly from S3 241 + ``` 352 242 353 - Authorization is based on ATProto's public-by-default model: 243 + **Key insight:** Pull uses the `holdDid` stored in the manifest, ensuring blobs are fetched from where they were originally pushed. 354 244 355 - **Read Authorization:** 356 - - **Public hold** (`public: true` in hold record): 357 - - Anonymous users: ✅ Allowed 358 - - Any authenticated user: ✅ Allowed 245 + ## Access Control 359 246 360 - - **Private hold** (`public: false` in hold record): 361 - - Anonymous users: ❌ 401 Unauthorized 362 - - Any authenticated ATCR user: ✅ Allowed (no crew membership required) 247 + ### Read Access 363 248 364 - **Write Authorization:** 365 - - Anonymous users: ❌ 401 Unauthorized 366 - - Authenticated non-crew: ❌ 403 Forbidden 367 - - Authenticated crew member: ✅ Allowed 368 - - Hold owner: ✅ Allowed 249 + - **Public hold** (`HOLD_PUBLIC=true`): Anonymous + authenticated users 250 + - **Private hold** (`HOLD_PUBLIC=false`): Authenticated users with crew membership 369 251 370 - **Implementation:** 371 - - Hold service queries owner's PDS for `io.atcr.hold.crew` records 372 - - Crew records are public ATProto records (read without authentication) 373 - - "Private" holds only gate anonymous access, not authenticated user access 374 - - This reflects ATProto's current limitation: no private PDS records 252 + ### Write Access 375 253 376 - ### Presigned URLs 254 + - Hold owner (captain) OR crew members only 255 + - Verified via `io.atcr.hold.crew` records in hold's embedded PDS 256 + - Service token proves user identity (from user's PDS) 377 257 378 - - 15 minute expiry 379 - - Client uploads/downloads directly to storage 380 - - No data flows through AppView or hold service 258 + ### Authorization Flow 381 259 382 - ### Private Holds 260 + ```go 261 + 1. AppView gets service token from user's PDS 262 + 2. AppView sends request to hold with service token 263 + 3. Hold validates service token (checks it's from user's PDS) 264 + 4. Hold extracts user's DID from token 265 + 5. Hold checks crew records in its embedded PDS 266 + 6. If crew member found → allow, else → deny 267 + ``` 383 268 384 - "Private" holds gate anonymous access while remaining accessible to authenticated users: 269 + ## Managing Crew Members 385 270 386 - **What "Private" Means:** 387 - - `HOLD_PUBLIC=false` prevents anonymous reads 388 - - Any authenticated ATCR user can still read 389 - - This aligns with ATProto's public records model 271 + ### Add Crew Member 390 272 391 - **Write Control:** 392 - - Only hold owner and crew members can write 393 - - Crew membership managed via `io.atcr.hold.crew` records in owner's PDS 394 - - Removing crew member immediately revokes write access 273 + Use ATProto client to create crew record in hold's PDS: 395 274 396 - **Future: True Private Access** 397 - - When ATProto adds private PDS records, ATCR can support truly private repos 398 - - For now, "private" = "authenticated-only access" 275 + ```bash 276 + # Via XRPC (if hold supports it) 277 + POST https://hold.example.com/xrpc/io.atcr.hold.requestCrew 278 + Authorization: Bearer {userOAuthToken} 399 279 400 - ## Example: Personal Storage 280 + # Or manually via captain's OAuth to hold's PDS 281 + atproto put-record \ 282 + --pds https://hold.example.com \ 283 + --collection io.atcr.hold.crew \ 284 + --rkey "{memberDID}" \ 285 + --value '{ 286 + "$type": "io.atcr.hold.crew", 287 + "member": "did:plc:bob456", 288 + "role": "admin", 289 + "permissions": ["blob:read", "blob:write"] 290 + }' 291 + ``` 401 292 402 - Alice wants to use her own Storj account: 293 + ### Remove Crew Member 403 294 404 - 1. **Set environment variables**: 405 - ```bash 406 - export HOLD_PUBLIC_URL=https://alice-storage.fly.dev 407 - export HOLD_OWNER=did:plc:alice123 408 - export STORAGE_DRIVER=s3 409 - export AWS_ACCESS_KEY_ID=your_storj_access_key 410 - export AWS_SECRET_ACCESS_KEY=your_storj_secret_key 411 - export S3_ENDPOINT=https://gateway.storjshare.io 412 - export S3_BUCKET=alice-blobs 413 - ``` 295 + ```bash 296 + atproto delete-record \ 297 + --pds https://hold.example.com \ 298 + --collection io.atcr.hold.crew \ 299 + --rkey "{memberDID}" 300 + ``` 414 301 415 - 2. **Deploy hold service** to Fly.io - auto-registration creates hold + crew record 302 + ## Storage Drivers 416 303 417 - 3. **Push images** - AppView automatically routes to her storage 304 + Hold service supports all distribution storage drivers: 305 + - **S3** - AWS S3, Minio, Storj (via S3 gateway) 306 + - **Filesystem** - Local disk (for testing) 307 + - **Azure** - Azure Blob Storage 308 + - **GCS** - Google Cloud Storage 309 + - **Swift** - OpenStack Swift 418 310 419 311 ## Example: Team Hold 420 312 421 - A company wants shared storage for their team: 313 + ```bash 314 + # 1. Deploy hold service 315 + export HOLD_PUBLIC_URL=https://team-hold.fly.dev 316 + export HOLD_OWNER=did:plc:admin 317 + export HOLD_PUBLIC=false # Private 318 + export STORAGE_DRIVER=s3 319 + export AWS_ACCESS_KEY_ID=... 320 + export S3_BUCKET=team-blobs 422 321 423 - 1. **Deploy hold service** with S3 credentials and auto-registration: 424 - ```bash 425 - export HOLD_PUBLIC_URL=https://company-hold.fly.dev 426 - export HOLD_OWNER=did:plc:admin 427 - export HOLD_PUBLIC=false 428 - export STORAGE_DRIVER=s3 429 - export AWS_ACCESS_KEY_ID=... 430 - export AWS_SECRET_ACCESS_KEY=... 431 - export S3_BUCKET=company-blobs 432 - ``` 322 + fly deploy 433 323 434 - 2. **Hold service auto-registers** on first run, creating: 435 - - Hold record in admin's PDS 436 - - Crew record making admin the owner 324 + # 2. Hold auto-creates captain + crew records on first run 437 325 438 - 3. **Admin adds crew members** via ATProto client or manually: 439 - ```bash 440 - # Using atproto client 441 - atproto put-record \ 442 - --collection io.atcr.hold.crew \ 443 - --rkey "company-did:plc:engineer1" \ 444 - --value '{ 445 - "$type": "io.atcr.hold.crew", 446 - "hold": "at://did:plc:admin/io.atcr.hold/company", 447 - "member": "did:plc:engineer1", 448 - "role": "write" 449 - }' 450 - ``` 326 + # 3. Admin adds team members via hold's PDS (requires OAuth) 327 + # (TODO: Implement crew management UI/CLI) 451 328 452 - 4. **Team members set their profile** to use the shared hold: 453 - ```bash 454 - # Engineer updates their sailor profile 455 - atproto put-record \ 456 - --collection io.atcr.sailor.profile \ 457 - --rkey "self" \ 458 - --value '{ 459 - "$type": "io.atcr.sailor.profile", 460 - "defaultHold": "https://company-hold.fly.dev" 461 - }' 462 - ``` 329 + # 4. Team members set their sailor profile: 330 + atproto put-record \ 331 + --collection io.atcr.sailor.profile \ 332 + --rkey "self" \ 333 + --value '{ 334 + "$type": "io.atcr.sailor.profile", 335 + "defaultHold": "did:web:team-hold.fly.dev" 336 + }' 463 337 464 - 5. **Hold service queries PDS** for crew records to authorize writes 465 - 6. **Engineers push/pull** using `atcr.io/engineer1/myapp` - blobs go to company hold 338 + # 5. Team members can now push/pull using team hold 339 + ``` 466 340 467 341 ## Limitations 468 342 469 - 1. **No resume/partial uploads** - Storage service doesn't track upload state 470 - 2. **No advanced features** - Just basic put/get, no deduplication logic 471 - 3. **In-memory cache** - Hold endpoint cache is in-memory (for production, use Redis) 472 - 4. **Manual profile updates** - No UI for updating sailor profile (must use ATProto client) 343 + ### Current IAM Challenges 473 344 474 - ## Performance Optimization: S3 Presigned URLs 345 + See [EMBEDDED_PDS.md](./EMBEDDED_PDS.md#iam-challenges) for detailed discussion. 475 346 476 - **Status:** Planned implementation (see [PRESIGNED_URLS.md](./PRESIGNED_URLS.md)) 347 + **Known issues:** 348 + 1. **RPC permission format**: Service tokens don't work with IP-based DIDs in local dev 349 + 2. **Dynamic hold discovery**: AppView can't dynamically OAuth arbitrary holds from sailor profiles 350 + 3. **Manual profile management**: No UI for updating sailor profile (must use ATProto client) 477 351 478 - Currently, hold services act as proxies for blob data. With presigned URLs: 479 - 480 - - **Downloads:** Docker → S3 direct (via 307 redirect) 481 - - **Uploads:** Docker → AppView → S3 (via presigned URL) 482 - - **Hold service bandwidth:** Reduced by 99.98% (only orchestration) 483 - 484 - **Benefits:** 485 - - Hold services can run on minimal infrastructure ($5/month instances) 486 - - Direct S3 transfers at maximum speed 487 - - Scales to arbitrarily large images 488 - - Works with Storj, MinIO, Backblaze B2, Cloudflare R2 489 - 490 - See [PRESIGNED_URLS.md](./PRESIGNED_URLS.md) for complete technical details and implementation guide. 352 + **Workaround:** Use hostname-based DIDs (`did:web:hold.example.com`) and public holds for now. 491 353 492 354 ## Future Improvements 493 355 494 - 1. **S3 Presigned URLs** - Implement direct S3 URLs (see [PRESIGNED_URLS.md](./PRESIGNED_URLS.md)) 495 - 2. **Automatic failover** - Multiple storage endpoints, fallback to default 496 - 3. **Storage analytics** - Track usage per DID 497 - 4. **Quota integration** - Optional quota tracking in storage service 498 - 5. **Profile management UI** - Web interface for users to manage their sailor profile 499 - 6. **Distributed cache** - Redis/Memcached for hold endpoint cache in multi-instance deployments 500 - 501 - ## Comparison to Default Storage 502 - 503 - | Feature | Default (Shared S3) | BYOS | 504 - |---------|---------------------|------| 505 - | Setup | None required | Deploy storage service | 506 - | Cost | Free (with quota) | User pays for S3/Storj | 507 - | Control | Limited | Full control | 508 - | Performance | Shared | Dedicated | 509 - | Quotas | Enforced by AppView | User managed | 510 - | Privacy | Blobs in shared bucket | Blobs in user's bucket | 356 + 1. **Crew management UI** - Web interface for adding/removing crew members 357 + 2. **Dynamic OAuth** - Support for arbitrary BYOS holds without pre-configuration 358 + 3. **Hold migration** - Tools for moving blobs between holds 359 + 4. **Storage analytics** - Track usage per user/repository 360 + 5. **Distributed cache** - Redis for hold DID cache in multi-instance deployments 511 361 512 362 ## References 513 363 364 + - [EMBEDDED_PDS.md](./EMBEDDED_PDS.md) - Embedded PDS architecture and IAM details 514 365 - [ATProto Lexicon Spec](https://atproto.com/specs/lexicon) 515 366 - [Distribution Storage Drivers](https://distribution.github.io/distribution/storage-drivers/) 516 367 - [S3 Presigned URLs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/PresignedUrlUploadObject.html) 517 - - [Storj Documentation](https://docs.storj.io/)
+142 -1123
docs/CREW_ACCESS_CONTROL.md
··· 2 2 3 3 ## Overview 4 4 5 - ATCR uses a crew-based access control system for hold (storage) services. Hold owners can grant write access to other users by creating crew records in their PDS. This document describes the scalable access control system that supports: 6 - 7 - - **Individual access** - Explicit DID-based crew membership 8 - - **Wildcard access** - Allow all authenticated users 9 - - **Pattern-based access** - Match users by handle patterns (e.g., `*.example.com`) 10 - - **Access revocation** - Bar (ban) specific users or patterns 11 - 12 - ## Problem Statement 13 - 14 - The original crew system required one `io.atcr.hold.crew` record per user. This doesn't scale for: 15 - 16 - 1. **Public/shared holds** - Thousands of users would need individual crew records 17 - 2. **Community holds** - PDS operators want to allow all their users 18 - 3. **Default registries** - AppView operators want to allow all authenticated users 19 - 4. **Access revocation** - No way to selectively remove access from wildcard/pattern grants 20 - 21 - ## Design Goals 22 - 23 - 1. **Preserve ATProto semantics** - Keep `member` as DID type for backlinks 24 - 2. **Scalable** - Support thousands of users with minimal records 25 - 3. **Flexible patterns** - Support wildcards, handle globs, future regex 26 - 4. **Clear semantics** - Separate allow/deny (crew vs barred) 27 - 5. **Backward compatible** - Existing crew records work unchanged 28 - 6. **Performance** - Minimize PDS queries, enable caching 29 - 30 - ## Record Schemas 5 + ATCR uses crew-based access control for hold (storage) services. Crew records are stored in the **hold's embedded PDS** (not the owner's or user's PDS), making the hold a self-contained ATProto actor with its own access control. 31 6 32 - ### io.atcr.hold.crew (Updated) 7 + ## Current Implementation 33 8 34 - Crew membership grants write access to a hold. Stored in the **hold owner's PDS**. 9 + ### Records in Hold's PDS 35 10 11 + **Captain record** - Hold ownership (single record at `io.atcr.hold.captain/self`): 36 12 ```json 37 13 { 38 - "$type": "io.atcr.hold.crew", 39 - "hold": "at://did:plc:owner/io.atcr.hold/shared", 40 - "member": "did:plc:alice123", // Optional: Explicit DID (for backlinks) 41 - "memberPattern": "*.bsky.social", // Optional: Pattern matching 42 - "role": "write", 43 - "createdAt": "2025-10-13T12:00:00Z" 14 + "$type": "io.atcr.hold.captain", 15 + "owner": "did:plc:alice123", 16 + "public": false, 17 + "deployedAt": "2025-10-14T...", 18 + "region": "iad", 19 + "provider": "fly.io" 44 20 } 45 21 ``` 46 22 47 - **Fields:** 48 - 49 - - `hold` (string, at-uri, required) - AT-URI of the hold record 50 - - `member` (string, did, optional) - Explicit DID for individual access (enables backlinks) 51 - - `memberPattern` (string, optional) - Pattern for matching multiple users 52 - - `role` (string, required) - Role: `"owner"` or `"write"` 53 - - `expiresAt` (string, datetime, optional) - Optional expiration 54 - - `createdAt` (string, datetime, required) - Creation timestamp 55 - 56 - **Validation:** Exactly one of `member` or `memberPattern` must be set. 57 - 58 - **Pattern syntax:** 59 - 60 - - `"*"` - Matches all authenticated users 61 - - `"*.domain.com"` - Matches handles ending with `.domain.com` 62 - - `"subdomain.*"` - Matches handles starting with `subdomain.` 63 - - `"*.bsky.*"` - Matches handles containing `.bsky.` 64 - 65 - **Examples:** 66 - 23 + **Crew records** - Access control (one per member at `io.atcr.hold.crew/{rkey}`): 67 24 ```json 68 - // Explicit DID (current behavior, preserved) 69 25 { 70 26 "$type": "io.atcr.hold.crew", 71 - "hold": "at://did:plc:owner/io.atcr.hold/team", 72 - "member": "did:plc:alice123", 73 - "role": "write", 74 - "createdAt": "2025-10-13T12:00:00Z" 75 - } 76 - 77 - // Allow all authenticated users (public hold) 78 - { 79 - "$type": "io.atcr.hold.crew", 80 - "hold": "at://did:plc:owner/io.atcr.hold/shared", 81 - "memberPattern": "*", 82 - "role": "write", 83 - "createdAt": "2025-10-13T12:00:00Z" 84 - } 85 - 86 - // Allow all users from a community 87 - { 88 - "$type": "io.atcr.hold.crew", 89 - "hold": "at://did:plc:owner/io.atcr.hold/community", 90 - "memberPattern": "*.my-community.social", 91 - "role": "write", 92 - "createdAt": "2025-10-13T12:00:00Z" 93 - } 94 - 95 - // Allow specific subdomain 96 - { 97 - "$type": "io.atcr.hold.crew", 98 - "hold": "at://did:plc:owner/io.atcr.hold/corp", 99 - "memberPattern": "*.eng.company.com", 100 - "role": "write", 101 - "createdAt": "2025-10-13T12:00:00Z" 27 + "member": "did:plc:bob456", 28 + "role": "admin", 29 + "permissions": ["blob:read", "blob:write"], 30 + "addedAt": "2025-10-14T..." 102 31 } 103 32 ``` 104 33 105 - ### io.atcr.hold.crew.barred (New) 34 + ### Authorization Logic 106 35 107 - Barred list revokes access for specific users or patterns. Overrides crew membership. Stored in the **hold owner's PDS**. 36 + Write authorization follows this priority: 108 37 109 - ```json 110 - { 111 - "$type": "io.atcr.hold.crew.barred", 112 - "hold": "at://did:plc:owner/io.atcr.hold/shared", 113 - "member": "did:plc:spammer", // Optional: Explicit DID 114 - "memberPattern": "*.spam-instance.com", // Optional: Pattern matching 115 - "reason": "spam/abuse/policy violation", 116 - "barredAt": "2025-10-13T12:00:00Z" 117 - } 118 38 ``` 119 - 120 - **Fields:** 121 - 122 - - `hold` (string, at-uri, required) - AT-URI of the hold record 123 - - `member` (string, did, optional) - Explicit DID to bar 124 - - `memberPattern` (string, optional) - Pattern for barring multiple users 125 - - `reason` (string, optional) - Human-readable reason for access revocation 126 - - `barredAt` (string, datetime, required) - When user was barred 127 - 128 - **Validation:** Exactly one of `member` or `memberPattern` must be set. 129 - 130 - **Pattern syntax:** Same as crew patterns (wildcards, handle globs). 131 - 132 - **Limitations:** Handle-based barring can be circumvented by users changing their handle or acquiring a new domain. However, this requires significant effort (purchasing domains, changing identity), making it an acceptable deterrent for most abuse cases. DID-based barring is permanent (until user creates new DID). 133 - 134 - **Examples:** 135 - 136 - ```json 137 - // Bar specific user 138 - { 139 - "$type": "io.atcr.hold.crew.barred", 140 - "hold": "at://did:plc:owner/io.atcr.hold/shared", 141 - "member": "did:plc:badactor", 142 - "reason": "Terms of service violation", 143 - "barredAt": "2025-10-13T12:00:00Z" 144 - } 145 - 146 - // Bar all users from a spam PDS 147 - { 148 - "$type": "io.atcr.hold.crew.barred", 149 - "hold": "at://did:plc:owner/io.atcr.hold/shared", 150 - "memberPattern": "*.spam-pds.com", 151 - "reason": "Spam instance", 152 - "barredAt": "2025-10-13T14:30:00Z" 153 - } 154 - 155 - // Bar pattern of suspicious accounts 156 - { 157 - "$type": "io.atcr.hold.crew.barred", 158 - "hold": "at://did:plc:owner/io.atcr.hold/shared", 159 - "memberPattern": "bot*", 160 - "reason": "Automated account abuse", 161 - "barredAt": "2025-10-13T15:00:00Z" 162 - } 39 + isAuthorizedWrite(userDID): 40 + 1. If userDID == captain.owner → ALLOW 41 + 2. If crew record exists for userDID → ALLOW 42 + 3. Default → DENY 163 43 ``` 164 44 165 - ## Authorization Logic 166 - 167 - Write authorization follows this priority order: 168 - 169 - ``` 170 - isAuthorizedWrite(did, handle): 171 - 1. If DID is hold owner → ALLOW 172 - 2. If DID or handle matches barred list → DENY 173 - 3. If DID explicitly in crew list → ALLOW 174 - 4. If handle matches crew pattern → ALLOW 175 - 5. Default → DENY 176 - ``` 177 - 178 - **Detailed algorithm:** 179 - 180 - ```go 181 - func (s *HoldService) isAuthorizedWrite(did string) bool { 182 - // 1. Check if owner 183 - if did == s.config.Registration.OwnerDID { 184 - return true // Owner always has access 185 - } 186 - 187 - // 2. Resolve handle from DID 188 - handle, err := resolveHandle(did) 189 - if err != nil { 190 - log.Printf("Failed to resolve handle for DID %s: %v", did, err) 191 - handle = "" // Continue without handle matching 192 - } 193 - 194 - // 3. Check barred list (explicit deny overrides everything) 195 - barred, err := s.isBarred(did, handle) 196 - if err != nil { 197 - log.Printf("Error checking barred status: %v", err) 198 - return false // Fail secure 199 - } 200 - if barred { 201 - return false // Explicitly barred 202 - } 203 - 204 - // 4. Check crew list (explicit allow) 205 - crew, err := s.isCrewMember(did, handle) 206 - if err != nil { 207 - log.Printf("Error checking crew status: %v", err) 208 - return false // Fail secure 209 - } 210 - 211 - return crew // Allow if crew member, deny otherwise 212 - } 213 - 214 - func (s *HoldService) isBarred(did, handle string) (bool, error) { 215 - records := listBarredRecords() 216 - 217 - for _, record := range records { 218 - // Check explicit DID match 219 - if record.Member != "" && record.Member == did { 220 - return true, nil 221 - } 222 - 223 - // Check pattern match (if handle available) 224 - if record.MemberPattern != "" && handle != "" { 225 - if matchPattern(record.MemberPattern, handle) { 226 - return true, nil 227 - } 228 - } 229 - } 230 - 231 - return false, nil 232 - } 233 - 234 - func (s *HoldService) isCrewMember(did, handle string) (bool, error) { 235 - records := listCrewRecords() 236 - 237 - for _, record := range records { 238 - // Check explicit DID match 239 - if record.Member != "" && record.Member == did { 240 - return true, nil 241 - } 242 - 243 - // Check pattern match (if handle available) 244 - if record.MemberPattern != "" && handle != "" { 245 - if matchPattern(record.MemberPattern, handle) { 246 - return true, nil 247 - } 248 - } 249 - } 250 - 251 - return false, nil 252 - } 253 - ``` 254 - 255 - **Pattern matching:** 256 - 257 - ```go 258 - func matchPattern(pattern, handle string) bool { 259 - if pattern == "*" { 260 - return true // Wildcard matches all 261 - } 262 - 263 - // Convert glob pattern to regex 264 - // *.example.com → ^.*\.example\.com$ 265 - // subdomain.* → ^subdomain\..*$ 266 - // *.bsky.* → ^.*\.bsky\..*$ 267 - 268 - regex := globToRegex(pattern) 269 - matched, _ := regexp.MatchString(regex, handle) 270 - return matched 271 - } 272 - ``` 273 - 274 - ## Use Cases 45 + Read authorization depends on `HOLD_PUBLIC` setting: 46 + - **Public hold** (`HOLD_PUBLIC=true`): Anonymous + all authenticated users can read 47 + - **Private hold** (`HOLD_PUBLIC=false`): Requires crew membership for reads 275 48 276 - ### 1. Public Hold (Allow All Users) 49 + ### Configuration 277 50 278 - **Goal:** Shared storage for any authenticated ATCR user. 279 - 280 - **Setup:** 281 51 ```bash 282 - # Create crew record with wildcard 283 - atproto put-record \ 284 - --collection io.atcr.hold.crew \ 285 - --rkey "all-users" \ 286 - --value '{ 287 - "$type": "io.atcr.hold.crew", 288 - "hold": "at://did:plc:owner/io.atcr.hold/public", 289 - "memberPattern": "*", 290 - "role": "write" 291 - }' 52 + # Access control environment variables 53 + HOLD_PUBLIC=false # Require authentication for reads 54 + HOLD_ALLOW_ALL_CREW=false # Only explicit crew members can write 292 55 ``` 293 56 294 - **Result:** All authenticated users can push. Owner can selectively bar bad actors. 57 + ### Crew Management 295 58 296 - ### 2. Community Hold (PDS-Specific) 59 + Crew records are managed by the hold captain (owner) using standard ATProto operations on the hold's embedded PDS: 297 60 298 - **Goal:** Storage for all users from a specific community/PDS. 299 - 300 - **Setup:** 61 + **Add crew member:** 301 62 ```bash 302 - # Allow all community members 63 + # Via hold's PDS (requires captain's OAuth) 303 64 atproto put-record \ 65 + --pds https://hold.example.com \ 304 66 --collection io.atcr.hold.crew \ 305 - --rkey "community-hold" \ 67 + --rkey "{memberDID}" \ 306 68 --value '{ 307 69 "$type": "io.atcr.hold.crew", 308 - "hold": "at://did:plc:owner/io.atcr.hold/community", 309 - "memberPattern": "*.my-community.social", 310 - "role": "write" 70 + "member": "did:plc:bob456", 71 + "role": "admin", 72 + "permissions": ["blob:read", "blob:write"], 73 + "addedAt": "2025-10-14T12:00:00Z" 311 74 }' 312 75 ``` 313 76 314 - **Result:** Anyone with a `@someone.my-community.social` handle can push. 315 - 316 - ### 3. Team Hold with Selective Banning 317 - 318 - **Goal:** Shared team storage, but remove access from former employees. 319 - 320 - **Setup:** 77 + **Remove crew member:** 321 78 ```bash 322 - # Allow team domain 323 - atproto put-record \ 79 + atproto delete-record \ 80 + --pds https://hold.example.com \ 324 81 --collection io.atcr.hold.crew \ 325 - --rkey "team-hold" \ 326 - --value '{ 327 - "$type": "io.atcr.hold.crew", 328 - "hold": "at://did:plc:owner/io.atcr.hold/team", 329 - "memberPattern": "*.company.com", 330 - "role": "write" 331 - }' 332 - 333 - # Bar former employee 334 - atproto put-record \ 335 - --collection io.atcr.hold.crew.barred \ 336 - --rkey "bar-former-employee" \ 337 - --value '{ 338 - "$type": "io.atcr.hold.crew.barred", 339 - "hold": "at://did:plc:owner/io.atcr.hold/team", 340 - "member": "did:plc:former-employee", 341 - "reason": "No longer with company" 342 - }' 82 + --rkey "{memberDID}" 343 83 ``` 344 84 345 - **Result:** All `@*.company.com` users can push, except the explicitly barred DID. 346 - 347 - ### 4. Anti-Spam with Barred Patterns 348 - 349 - **Goal:** Public hold with protection against known spam instances. 350 - 351 - **Setup:** 85 + **List crew members:** 352 86 ```bash 353 - # Allow all users 354 - atproto put-record \ 355 - --collection io.atcr.hold.crew \ 356 - --rkey "public-hold" \ 357 - --value '{ 358 - "$type": "io.atcr.hold.crew", 359 - "hold": "at://did:plc:owner/io.atcr.hold/public", 360 - "memberPattern": "*", 361 - "role": "write" 362 - }' 363 - 364 - # Bar spam instance 365 - atproto put-record \ 366 - --collection io.atcr.hold.crew.barred \ 367 - --rkey "bar-spam-pds" \ 368 - --value '{ 369 - "$type": "io.atcr.hold.crew.barred", 370 - "hold": "at://did:plc:owner/io.atcr.hold/public", 371 - "memberPattern": "*.known-spam.com", 372 - "reason": "Spam source" 373 - }' 87 + # Via XRPC 88 + GET https://hold.example.com/xrpc/com.atproto.repo.listRecords?repo={holdDID}&collection=io.atcr.hold.crew 374 89 ``` 375 90 376 - **Result:** Everyone can push except users from `*.known-spam.com`. 91 + ## Authentication Flow 377 92 378 - ### 5. Mixed Access (Explicit + Patterns) 379 - 380 - **Goal:** Team pattern plus individual guests. 381 - 382 - **Setup:** 383 - ```bash 384 - # Team pattern 385 - atproto put-record \ 386 - --collection io.atcr.hold.crew \ 387 - --rkey "team-pattern" \ 388 - --value '{ 389 - "$type": "io.atcr.hold.crew", 390 - "hold": "at://did:plc:owner/io.atcr.hold/team", 391 - "memberPattern": "*.company.com", 392 - "role": "write" 393 - }' 394 - 395 - # Individual contractor 396 - atproto put-record \ 397 - --collection io.atcr.hold.crew \ 398 - --rkey "contractor-alice" \ 399 - --value '{ 400 - "$type": "io.atcr.hold.crew", 401 - "hold": "at://did:plc:owner/io.atcr.hold/team", 402 - "member": "did:plc:alice-contractor", 403 - "role": "write" 404 - }' 405 93 ``` 406 - 407 - **Result:** Team members + specific contractor all have access. 408 - 409 - ## Implementation Details 410 - 411 - ### Code Changes Required 412 - 413 - **Files to modify:** 414 - 415 - 1. **`lexicons/io/atcr/hold/crew.json`** 416 - - Make `member` optional (remove from `required`) 417 - - Add `memberPattern` field (string, optional) 418 - - Update description 94 + 1. User pushes image to atcr.io/alice/myapp 419 95 420 - 2. **`lexicons/io/atcr/hold/crew/barred.json`** (new file) 421 - - Define new lexicon for barred records 422 - - Same structure as crew (member + memberPattern) 423 - - Add `reason` field 96 + 2. AppView gets service token from alice's PDS: 97 + GET /xrpc/com.atproto.server.getServiceAuth?aud={holdDID} 98 + Response: { "token": "..." } 424 99 425 - 3. **`pkg/atproto/lexicon.go`** 426 - - Update `HoldCrewRecord` struct (add `MemberPattern` field, make `Member` pointer for optional) 427 - - Add `BarredRecord` struct 428 - - Add `NewBarredRecord()` constructor 429 - - Add `BarredCollection` constant 100 + 3. AppView calls hold with service token: 101 + POST /xrpc/io.atcr.hold.initiateUpload 102 + Authorization: Bearer {serviceToken} 430 103 431 - 4. **`pkg/hold/authorization.go`** 432 - - Update `isCrewMember()` to check patterns 433 - - Add `isBarred()` function 434 - - Add `resolveHandle()` helper (DID → handle lookup) 435 - - Add `matchPattern()` helper (glob matching) 436 - - Update `isAuthorizedWrite()` to check barred first 437 - 438 - 5. **`pkg/hold/registration.go`** 439 - - Add `HOLD_ALLOW_ALL_CREW` env var handling 440 - - Check env var on every startup (not just first registration) 441 - - Reconcile desired state (env) vs actual state (PDS) 442 - - Create/delete wildcard crew record as needed 443 - 444 - ### Pattern Matching Implementation 445 - 446 - ```go 447 - // pkg/hold/patterns.go (new file) 448 - 449 - package hold 450 - 451 - import ( 452 - "regexp" 453 - "strings" 454 - ) 455 - 456 - // matchPattern checks if a handle matches a pattern 457 - func matchPattern(pattern, handle string) bool { 458 - if pattern == "*" { 459 - return true 460 - } 461 - 462 - // Convert glob to regex 463 - regex := globToRegex(pattern) 464 - matched, err := regexp.MatchString(regex, handle) 465 - if err != nil { 466 - return false 467 - } 468 - return matched 469 - } 470 - 471 - // globToRegex converts a glob pattern to a regex 472 - // *.example.com → ^.*\.example\.com$ 473 - // subdomain.* → ^subdomain\..*$ 474 - // *.bsky.* → ^.*\.bsky\..*$ 475 - func globToRegex(pattern string) string { 476 - // Escape special regex characters except * 477 - escaped := regexp.QuoteMeta(pattern) 104 + 4. Hold validates service token: 105 + - Checks token is from alice's PDS 106 + - Extracts alice's DID from token 478 107 479 - // Replace escaped \* with .* 480 - regex := strings.ReplaceAll(escaped, "\\*", ".*") 108 + 5. Hold checks crew membership: 109 + - Queries its own PDS: com.atproto.repo.getRecord 110 + - Collection: io.atcr.hold.crew 111 + - Record key: alice's DID 481 112 482 - // Anchor to start and end 483 - return "^" + regex + "$" 484 - } 113 + 6. If crew record found → allow upload 114 + Else → deny with 403 Forbidden 485 115 ``` 486 116 487 - ### Handle Resolution 488 - 489 - ```go 490 - // pkg/hold/resolve.go 117 + **Trust model:** "Trust but verify" 118 + - User OAuth'd to AppView (proves identity) 119 + - Service token from user's PDS (proves AppView is acting on behalf of user) 120 + - Crew record in hold's PDS (proves user has access to this hold) 491 121 492 - package hold 122 + ## Use Cases 493 123 494 - import ( 495 - "context" 496 - "github.com/bluesky-social/indigo/atproto/identity" 497 - "github.com/bluesky-social/indigo/atproto/syntax" 498 - ) 124 + ### 1. Personal Hold (Private) 499 125 500 - // resolveHandle resolves a DID to its current handle 501 - func resolveHandle(did string) (string, error) { 502 - ctx := context.Background() 503 - directory := identity.DefaultDirectory() 504 - 505 - didParsed, err := syntax.ParseDID(did) 506 - if err != nil { 507 - return "", err 508 - } 509 - 510 - ident, err := directory.LookupDID(ctx, didParsed) 511 - if err != nil { 512 - return "", err 513 - } 514 - 515 - return ident.Handle.String(), nil 516 - } 126 + ```bash 127 + # Owner only 128 + HOLD_PUBLIC=false 129 + HOLD_ALLOW_ALL_CREW=false 130 + # No additional crew records needed - captain has implicit access 517 131 ``` 518 132 519 - ### Caching Considerations 520 - 521 - **Problem:** Pattern matching requires handle resolution, which adds latency. 522 - 523 - **Solution:** Cache handle lookups with TTL. 524 - 525 - ```go 526 - type handleCache struct { 527 - mu sync.RWMutex 528 - cache map[string]cacheEntry // did → handle 529 - } 530 - 531 - type cacheEntry struct { 532 - handle string 533 - expiresAt time.Time 534 - } 535 - 536 - const handleCacheTTL = 10 * time.Minute 537 - 538 - func (c *handleCache) get(did string) (string, bool) { 539 - c.mu.RLock() 540 - defer c.mu.RUnlock() 541 - 542 - entry, ok := c.cache[did] 543 - if !ok || time.Now().After(entry.expiresAt) { 544 - return "", false 545 - } 546 - return entry.handle, true 547 - } 133 + ### 2. Team Hold (Shared) 548 134 549 - func (c *handleCache) set(did, handle string) { 550 - c.mu.Lock() 551 - defer c.mu.Unlock() 135 + ```bash 136 + # Multiple team members 137 + HOLD_PUBLIC=false 138 + HOLD_ALLOW_ALL_CREW=false 552 139 553 - c.cache[did] = cacheEntry{ 554 - handle: handle, 555 - expiresAt: time.Now().Add(handleCacheTTL), 556 - } 557 - } 140 + # Captain adds crew members: 141 + # - did:plc:alice (admin) 142 + # - did:plc:bob (member) 143 + # - did:plc:charlie (member) 558 144 ``` 559 145 560 - **Trade-offs:** 561 - - **Cache hit:** Authorization instant 562 - - **Cache miss:** One additional PDS lookup (acceptable for writes) 563 - - **TTL:** 10 minutes balances freshness vs performance 564 - 565 - ### HOLD_ALLOW_ALL_CREW Environment Variable 566 - 567 - **Purpose:** Automatically manage wildcard crew access via environment variable. 568 - 569 - **Behavior:** Checked on **every startup** (not just first registration): 570 - 571 - 1. **Read env var:** `HOLD_ALLOW_ALL_CREW` (true/false) 572 - 2. **Query PDS:** Check for crew record with rkey `"allow-all"` and `memberPattern: "*"` 573 - 3. **Reconcile state:** 574 - - If env=`true` and record missing → **Create wildcard crew record** (requires OAuth) 575 - - If env=`false` (or unset) and record exists → **Delete wildcard crew record** (requires OAuth) 576 - - Otherwise → No action needed 577 - 578 - **Well-known record key:** `"allow-all"` (used exclusively for the managed wildcard record) 579 - 580 - **Implementation:** 581 - 582 - ```go 583 - // pkg/hold/config.go 584 - type Config struct { 585 - Registration struct { 586 - OwnerDID string 587 - AllowAllCrew bool // HOLD_ALLOW_ALL_CREW 588 - } 589 - // ... 590 - } 591 - 592 - // pkg/hold/registration.go 593 - func (s *HoldService) ReconcileAllowAllCrew(callbackHandler *http.HandlerFunc) error { 594 - desiredState := s.config.Registration.AllowAllCrew 595 - 596 - // Query PDS for "allow-all" crew record 597 - actualState, err := s.hasAllowAllCrewRecord() 598 - if err != nil { 599 - return fmt.Errorf("failed to check allow-all crew record: %w", err) 600 - } 601 - 602 - // States match - nothing to do 603 - if desiredState == actualState { 604 - log.Printf("Allow-all crew state matches desired state: %v", desiredState) 605 - return nil 606 - } 607 - 608 - // State mismatch - need to reconcile 609 - if desiredState && !actualState { 610 - // Need to create wildcard crew record 611 - log.Printf("Creating allow-all crew record (HOLD_ALLOW_ALL_CREW=true)") 612 - return s.createAllowAllCrewRecord(callbackHandler) 613 - } 614 - 615 - if !desiredState && actualState { 616 - // Need to delete wildcard crew record 617 - log.Printf("Deleting allow-all crew record (HOLD_ALLOW_ALL_CREW removed/false)") 618 - return s.deleteAllowAllCrewRecord(callbackHandler) 619 - } 620 - 621 - return nil 622 - } 623 - 624 - func (s *HoldService) hasAllowAllCrewRecord() (bool, error) { 625 - ownerDID := s.config.Registration.OwnerDID 626 - if ownerDID == "" { 627 - return false, fmt.Errorf("hold owner DID not configured") 628 - } 629 - 630 - ctx := context.Background() 631 - 632 - // Resolve owner's PDS 633 - pdsEndpoint, err := s.resolveOwnerPDS(ownerDID) 634 - if err != nil { 635 - return false, err 636 - } 637 - 638 - // Query for specific rkey 639 - client := atproto.NewClient(pdsEndpoint, ownerDID, "") 640 - record, err := client.GetRecord(ctx, atproto.HoldCrewCollection, "allow-all") 641 - 642 - if err != nil { 643 - // Record doesn't exist 644 - return false, nil 645 - } 646 - 647 - // Verify it's the wildcard record (memberPattern: "*") 648 - var crewRecord atproto.HoldCrewRecord 649 - if err := json.Unmarshal(record.Value, &crewRecord); err != nil { 650 - return false, err 651 - } 652 - 653 - // Check if it's the exact wildcard pattern 654 - return crewRecord.MemberPattern == "*", nil 655 - } 656 - 657 - func (s *HoldService) createAllowAllCrewRecord(callbackHandler *http.HandlerFunc) error { 658 - // This requires OAuth - reuse registration OAuth flow 659 - // Need authenticated client to create record 660 - 661 - ownerDID := s.config.Registration.OwnerDID 662 - pdsEndpoint, err := s.resolveOwnerPDS(ownerDID) 663 - if err != nil { 664 - return err 665 - } 666 - 667 - // Get handle for OAuth 668 - handle, err := resolveHandleFromDID(ownerDID) 669 - if err != nil { 670 - return err 671 - } 672 - 673 - // Run OAuth flow (similar to registration) 674 - ctx := context.Background() 675 - result, err := oauth.InteractiveFlowWithCallback( 676 - ctx, 677 - s.config.Server.PublicURL, 678 - handle, 679 - s.getCrewManagementScopes(), 680 - func(handler http.HandlerFunc) error { 681 - *callbackHandler = handler 682 - return nil 683 - }, 684 - func(authURL string) error { 685 - log.Printf("\n%s", strings.Repeat("=", 80)) 686 - log.Printf("OAUTH REQUIRED: Creating allow-all crew record") 687 - log.Printf("%s", strings.Repeat("=", 80)) 688 - log.Printf("\nVisit: %s\n", authURL) 689 - log.Printf("Waiting for authorization...") 690 - log.Printf("%s\n", strings.Repeat("=", 80)) 691 - return nil 692 - }, 693 - ) 694 - if err != nil { 695 - return err 696 - } 697 - 698 - // Create authenticated client 699 - apiClient := result.Session.APIClient() 700 - client := atproto.NewClientWithIndigoClient(pdsEndpoint, ownerDID, apiClient) 701 - 702 - // Get hold URI (need to know which hold to grant access to) 703 - holdURI, err := s.getHoldURI() 704 - if err != nil { 705 - return err 706 - } 707 - 708 - // Create wildcard crew record 709 - crewRecord := atproto.HoldCrewRecord{ 710 - Type: atproto.HoldCrewCollection, 711 - Hold: holdURI, 712 - MemberPattern: ptr("*"), // Wildcard - allow all 713 - Role: "write", 714 - CreatedAt: time.Now(), 715 - } 716 - 717 - _, err = client.PutRecord(ctx, atproto.HoldCrewCollection, "allow-all", &crewRecord) 718 - if err != nil { 719 - return fmt.Errorf("failed to create allow-all crew record: %w", err) 720 - } 721 - 722 - log.Printf("✓ Created allow-all crew record (allows all authenticated users)") 723 - return nil 724 - } 725 - 726 - func (s *HoldService) deleteAllowAllCrewRecord(callbackHandler *http.HandlerFunc) error { 727 - // Similar OAuth flow for deletion 728 - // Only delete if it's the exact wildcard pattern (safety check) 729 - 730 - isWildcard, err := s.hasAllowAllCrewRecord() 731 - if err != nil { 732 - return err 733 - } 734 - 735 - if !isWildcard { 736 - log.Printf("Warning: 'allow-all' crew record exists but is not wildcard - skipping deletion") 737 - return nil 738 - } 739 - 740 - // OAuth flow (same as create) 741 - ownerDID := s.config.Registration.OwnerDID 742 - pdsEndpoint, err := s.resolveOwnerPDS(ownerDID) 743 - if err != nil { 744 - return err 745 - } 746 - 747 - handle, err := resolveHandleFromDID(ownerDID) 748 - if err != nil { 749 - return err 750 - } 751 - 752 - ctx := context.Background() 753 - result, err := oauth.InteractiveFlowWithCallback( 754 - ctx, 755 - s.config.Server.PublicURL, 756 - handle, 757 - s.getCrewManagementScopes(), 758 - func(handler http.HandlerFunc) error { 759 - *callbackHandler = handler 760 - return nil 761 - }, 762 - func(authURL string) error { 763 - log.Printf("\n%s", strings.Repeat("=", 80)) 764 - log.Printf("OAUTH REQUIRED: Deleting allow-all crew record") 765 - log.Printf("%s", strings.Repeat("=", 80)) 766 - log.Printf("\nVisit: %s\n", authURL) 767 - log.Printf("Waiting for authorization...") 768 - log.Printf("%s\n", strings.Repeat("=", 80)) 769 - return nil 770 - }, 771 - ) 772 - if err != nil { 773 - return err 774 - } 775 - 776 - // Create authenticated client 777 - apiClient := result.Session.APIClient() 778 - client := atproto.NewClientWithIndigoClient(pdsEndpoint, ownerDID, apiClient) 779 - 780 - // Delete the record 781 - err = client.DeleteRecord(ctx, atproto.HoldCrewCollection, "allow-all") 782 - if err != nil { 783 - return fmt.Errorf("failed to delete allow-all crew record: %w", err) 784 - } 785 - 786 - log.Printf("✓ Deleted allow-all crew record") 787 - return nil 788 - } 789 - 790 - func (s *HoldService) getCrewManagementScopes() []string { 791 - return []string{ 792 - "atproto", 793 - fmt.Sprintf("repo:%s?action=create", atproto.HoldCrewCollection), 794 - fmt.Sprintf("repo:%s?action=update", atproto.HoldCrewCollection), 795 - fmt.Sprintf("repo:%s?action=delete", atproto.HoldCrewCollection), 796 - } 797 - } 798 - 799 - // Helper for pointer 800 - func ptr(s string) *string { 801 - return &s 802 - } 803 - ``` 804 - 805 - **Startup sequence:** 806 - 807 - ```go 808 - // cmd/hold/main.go 809 - func main() { 810 - // ... load config ... 811 - 812 - holdService := hold.NewHoldService(config) 813 - 814 - // Register HTTP routes 815 - var oauthCallbackHandler http.HandlerFunc 816 - http.HandleFunc("/auth/oauth/callback", func(w http.ResponseWriter, r *http.Request) { 817 - if oauthCallbackHandler != nil { 818 - oauthCallbackHandler(w, r) 819 - } else { 820 - http.Error(w, "OAuth callback not initialized", http.StatusInternalServerError) 821 - } 822 - }) 823 - 824 - // Auto-register hold (if HOLD_OWNER set) 825 - if config.Registration.OwnerDID != "" { 826 - err := holdService.AutoRegister(&oauthCallbackHandler) 827 - if err != nil { 828 - log.Fatalf("Failed to register hold: %v", err) 829 - } 830 - 831 - // Reconcile allow-all crew record 832 - err = holdService.ReconcileAllowAllCrew(&oauthCallbackHandler) 833 - if err != nil { 834 - log.Fatalf("Failed to reconcile allow-all crew: %v", err) 835 - } 836 - } 837 - 838 - // Start server... 839 - } 840 - ``` 841 - 842 - **Key properties:** 843 - 844 - 1. **Idempotent:** Safe to run on every startup 845 - 2. **Well-known rkey:** Uses `"allow-all"` exclusively for managed record 846 - 3. **Safety:** Only deletes if `memberPattern` is exactly `"*"` (won't touch custom patterns like `*.example.com`) 847 - 4. **OAuth required:** Both create and delete operations need authentication 848 - 5. **Reuses infrastructure:** Same OAuth flow as registration 849 - 850 - **Example configurations:** 146 + ### 3. Public Hold (Community) 851 147 852 148 ```bash 853 - # Public hold - allow all users 149 + # Allow any authenticated user (TODO: Implement HOLD_ALLOW_ALL_CREW) 150 + HOLD_PUBLIC=true 854 151 HOLD_ALLOW_ALL_CREW=true 855 - 856 - # Private hold - explicit crew only 857 - HOLD_ALLOW_ALL_CREW=false 858 - # (or omit the variable entirely) 859 152 ``` 860 153 861 - **Edge cases handled:** 862 - 863 - - Record exists with different pattern → Won't delete (safety) 864 - - OAuth fails → Service won't start (explicit failure) 865 - - PDS unreachable → Startup fails (can't verify state) 866 - - Record exists but env unset → Deletes wildcard (opt-in behavior) 867 - 868 - **Custom patterns preserved:** 869 - 870 - Hold owners can still manually create pattern-based crew records with different rkeys: 871 - 872 - ```bash 873 - # Manually created pattern (rkey: "community") 874 - atproto put-record \ 875 - --collection io.atcr.hold.crew \ 876 - --rkey "community" \ 877 - --value '{ 878 - "memberPattern": "*.my-community.social", 879 - "role": "write" 880 - }' 881 - ``` 882 - 883 - The `HOLD_ALLOW_ALL_CREW` management **only touches** the `"allow-all"` rkey with exact `memberPattern: "*"`. 154 + ## Planned Features 884 155 885 - ## Migration Path 156 + ### Pattern-Based Access Control 886 157 887 - **Backward Compatibility:** Fully compatible with existing deployments. 158 + **Status:** Planned but not yet implemented. 888 159 889 - 1. **Existing crew records work unchanged** 890 - - Records with `member` (DID) continue to work 891 - - No changes needed to existing records 892 - 893 - 2. **Opt-in patterns** 894 - - Hold owners can add pattern-based crew records 895 - - Mix explicit DIDs and patterns freely 896 - 897 - 3. **Barred list is optional** 898 - - Only needed for selective access revocation 899 - - Empty barred list = no blocking 900 - 901 - 4. **Lexicon evolution** 902 - - Making `member` optional is backward compatible (existing records still have it) 903 - - Adding `memberPattern` is additive (old clients ignore it) 904 - 905 - ## Future Enhancements 906 - 907 - ### 1. PDS-Based Access Control 908 - 909 - **Goal:** Allow/bar users based on their PDS (not handle). 910 - 911 - **Challenge:** ATProto doesn't give PDSes stable identifiers. PDS endpoints are mutable URLs. 912 - 913 - **Potential Solutions:** 914 - 915 - #### Option A: PDS DID Standard (if ATProto adds it) 916 - 917 - If ATProto introduces PDS DIDs: 160 + **Concept:** Allow crew records with pattern matching instead of explicit DIDs: 918 161 919 162 ```json 920 163 { 921 164 "$type": "io.atcr.hold.crew", 922 - "hold": "at://did:plc:owner/io.atcr.hold/community", 923 - "memberPattern": "pds:did:plc:pds-id", 165 + "memberPattern": "*.example.com", 924 166 "role": "write" 925 167 } 926 168 ``` 927 169 928 - #### Option B: Accept PDS URL Mutability 929 - 930 - Store PDS URLs with understanding they can change: 170 + **Use cases:** 171 + - `"*"` - Allow all authenticated users 172 + - `"*.company.com"` - Allow all users from company domain 173 + - `"*.community.social"` - Allow all community members 931 174 932 - ```json 933 - { 934 - "$type": "io.atcr.hold.crew", 935 - "hold": "at://did:plc:owner/io.atcr.hold/community", 936 - "memberPattern": "pds:https://my-community.social", 937 - "role": "write" 938 - } 939 - ``` 175 + **Implementation needed:** 176 + - Add `memberPattern` field to crew record schema (make `member` optional) 177 + - Add handle resolution (DID → handle lookup) 178 + - Add pattern matching logic 179 + - Update authorization to check patterns 940 180 941 - **Trade-off:** User migration bypasses access control, but this requires effort. 181 + ### Barred List (Access Revocation) 942 182 943 - #### Option C: PDS Trust Lists (Federated Model) 183 + **Status:** Planned but not yet implemented. 944 184 945 - Reference curated lists of trusted PDSes: 185 + **Concept:** Explicit deny list that overrides crew membership: 946 186 947 187 ```json 948 188 { 949 - "$type": "io.atcr.hold.crew", 950 - "hold": "at://did:plc:owner/io.atcr.hold/community", 951 - "memberPattern": "trust-list:at://did:plc:curator/trust.list/vetted-pds", 952 - "role": "write" 189 + "$type": "io.atcr.hold.crew.barred", 190 + "member": "did:plc:former-employee", 191 + "reason": "No longer with company", 192 + "barredAt": "2025-10-13T12:00:00Z" 953 193 } 954 194 ``` 955 195 956 - **Status:** Experimental. Requires additional standards. 957 - 958 - ### 2. Advanced Pattern Matching 196 + **Priority:** Barred list checked before crew list. 959 197 960 - **Goal:** Support more sophisticated patterns. 961 - 962 - **Potential patterns:** 198 + ### HOLD_ALLOW_ALL_CREW 963 199 964 - - **Regex:** `memberPattern: "regex:^eng-.*@company.com$"` 965 - - **Multiple patterns:** `memberPattern: ["*.example.com", "*.other.com"]` 966 - - **NOT patterns:** `memberPattern: "!*.spam.com"` (everything except) 200 + **Status:** Environment variable exists but full implementation pending. 967 201 968 - **Implementation:** Extend `matchPattern()` function with pattern type detection. 202 + **Concept:** Automatically create/manage wildcard crew record via env var: 969 203 970 - ### 3. Temporary Access 971 - 972 - **Goal:** Time-limited crew membership. 973 - 974 - **Current support:** `expiresAt` field already in schema (optional). 975 - 976 - **Enhancement:** Hold service automatically checks expiration during authorization: 977 - 978 - ```go 979 - if record.ExpiresAt != nil && time.Now().After(*record.ExpiresAt) { 980 - continue // Skip expired crew record 981 - } 204 + ```bash 205 + HOLD_ALLOW_ALL_CREW=true # Creates crew record with memberPattern: "*" 982 206 ``` 983 207 984 - ### 4. Role-Based Access Control (RBAC) 985 - 986 - **Goal:** Fine-grained permissions beyond read/write. 987 - 988 - **Potential roles:** 989 - - `"read"` - Pull only 990 - - `"write"` - Push + pull 991 - - `"admin"` - Manage crew records 992 - - `"owner"` - Full control 993 - 994 - **Current status:** `role` field exists but only `"owner"` and `"write"` are used. 995 - 996 - ### 5. Audit Logging 997 - 998 - **Goal:** Track access grants/denials for compliance. 999 - 1000 - **Implementation:** 1001 - - Log crew checks to structured log 1002 - - Include: DID, handle, result (allow/deny), reason 1003 - - Optional: Write to ATProto audit log record 1004 - 1005 - ## Security Considerations 1006 - 1007 - ### 1. Public Records 1008 - 1009 - **Consideration:** Crew and barred records are public ATProto records. 1010 - 1011 - **Implications:** 1012 - - Anyone can see who has access to a hold 1013 - - Anyone can see who is barred (and why) 1014 - - Similar to Bluesky block lists being public 1015 - 1016 - **Mitigation:** This is intentional transparency. Hold owners should use generic reasons in barred records if privacy is a concern. 1017 - 1018 - ### 2. Handle Changes 1019 - 1020 - **Consideration:** Handles can change, but DIDs are permanent. 1021 - 1022 - **Implications:** 1023 - - Pattern matching based on handles can be bypassed by changing handle 1024 - - DID-based rules are more stable 1025 - - However, changing handles or acquiring new domains requires significant effort: 1026 - - Purchasing new domain names ($10-100+/year) 1027 - - Updating identity across platforms 1028 - - Loss of established reputation/identity 1029 - 1030 - **Recommendation:** 1031 - - Use DID-based crew/barred records for critical access control (permanent) 1032 - - Use pattern-based rules for convenience and community management 1033 - - The effort required to bypass handle patterns makes them an acceptable deterrent 1034 - - Combine both approaches for defense in depth 1035 - 1036 - ### 3. PDS Migration 208 + **Implementation needed:** 209 + - Auto-create wildcard crew record on startup if env=true 210 + - Auto-delete wildcard crew record if env changes to false 211 + - Use well-known rkey "allow-all" for managed record 1037 212 1038 - **Consideration:** Users can migrate to different PDSes. 213 + ## Architecture Notes 1039 214 1040 - **Implications:** 1041 - - PDS-based patterns (future) can be bypassed by migration 1042 - - Handle patterns persist across PDS migration (if handle stays same) 215 + ### Why Hold's Embedded PDS? 1043 216 1044 - **Recommendation:** Accept this as inherent trade-off. Migration requires user effort and is acceptable "escape hatch." 217 + **Key insight:** Crew records are **shared data** about the hold, not user-specific data. 1045 218 1046 - ### 4. Pattern Matching Performance 1047 - 1048 - **Consideration:** Complex patterns could cause ReDoS (regex denial of service). 1049 - 1050 - **Mitigation:** 1051 - - Limit pattern complexity (only basic globs in v1) 1052 - - Cache handle lookups to minimize repeated work 1053 - - Set timeout on pattern matching operations 1054 - 1055 - ### 5. Barred List Circumvention 1056 - 1057 - **Consideration:** Barred users might create new DIDs. 1058 - 1059 - **Mitigation:** 1060 - - This is fundamental to decentralized identity (users control DIDs) 1061 - - Hold owners can add new DIDs to barred list as discovered 1062 - - Pattern-based barring (handle/PDS patterns) provides broader coverage 1063 - 1064 - ## Testing Strategy 1065 - 1066 - ### Unit Tests 1067 - 1068 - **Pattern matching:** 1069 - ```go 1070 - func TestMatchPattern(t *testing.T) { 1071 - tests := []struct{ 1072 - pattern string 1073 - handle string 1074 - want bool 1075 - }{ 1076 - {"*", "anything.com", true}, 1077 - {"*.example.com", "alice.example.com", true}, 1078 - {"*.example.com", "bob.other.com", false}, 1079 - {"eng.*", "eng.company.com", true}, 1080 - {"eng.*", "sales.company.com", false}, 1081 - } 1082 - // ... 1083 - } 1084 - ``` 219 + **Benefits:** 220 + - **Self-contained**: Hold is independent ATProto actor 221 + - **Portable**: Hold can move without coordinating with user PDSs 222 + - **Discoverable**: Query hold's PDS to see who has access 223 + - **Standard**: Uses normal ATProto sync endpoints (subscribeRepos, getRecord, listRecords) 1085 224 1086 - **Authorization logic:** 1087 - ```go 1088 - func TestIsAuthorizedWrite(t *testing.T) { 1089 - // Test: owner always allowed 1090 - // Test: explicit crew member allowed 1091 - // Test: pattern match allowed 1092 - // Test: barred user denied 1093 - // Test: barred pattern denied 1094 - // Test: barred overrides crew 1095 - } 1096 - ``` 225 + **Comparison:** 226 + - **User's PDS**: Stores user-specific data (manifests, sailor profile) 227 + - **Hold's PDS**: Stores hold-specific data (captain, crew, configuration) 228 + - Clear separation of concerns 1097 229 1098 - ### Integration Tests 230 + ### Security Considerations 1099 231 1100 - 1. **Create hold with wildcard crew** → verify any user can write 1101 - 2. **Add barred record** → verify barred user rejected 1102 - 3. **Pattern-based crew** → verify matching handles allowed 1103 - 4. **Mixed access** → verify explicit + pattern both work 1104 - 5. **Handle resolution failure** → verify fallback to DID-only matching 232 + 1. **Public Records**: Crew records are public (anyone can see who has access to a hold) 233 + 2. **Service Tokens**: Hold trusts user's PDS to issue valid service tokens 234 + 3. **DID-Based**: Crew membership is DID-based (permanent), not handle-based 235 + 4. **Captain Control**: Only captain can modify crew records (via OAuth to hold's PDS) 1105 236 1106 - ### Performance Tests 237 + ## Future Improvements 1107 238 1108 - 1. **Large crew list** (1000+ records) → measure query time 1109 - 2. **Complex patterns** → measure pattern matching time 1110 - 3. **Handle cache** → verify cache hit rate 1111 - 4. **Concurrent requests** → verify no race conditions 239 + 1. **Crew management UI** - Web interface for adding/removing crew members 240 + 2. **Pattern-based matching** - Implement `memberPattern` field 241 + 3. **Barred list** - Implement access revocation 242 + 4. **Role-based permissions** - Fine-grained permissions beyond read/write 243 + 5. **Temporary access** - Time-limited crew membership (`expiresAt` field) 244 + 6. **Audit logging** - Track access grants/denials 1112 245 1113 246 ## References 1114 247 248 + - [EMBEDDED_PDS.md](./EMBEDDED_PDS.md) - Embedded PDS architecture details 249 + - [BYOS.md](./BYOS.md) - BYOS deployment and usage 1115 250 - [ATProto Lexicon Spec](https://atproto.com/specs/lexicon) 1116 - - [Bluesky Block Lists](https://bsky.app/profile/bsky.app/post/3l7wzyc6i622o) (analogous public records) 1117 - - [Go Glob Matching](https://pkg.go.dev/path/filepath#Match) 1118 - - [OAuth Scopes](https://atproto.com/specs/oauth#scopes) (for crew management permissions) 1119 - 1120 - ## Appendix: Lexicon Definitions 1121 - 1122 - ### lexicons/io/atcr/hold/crew.json (Updated) 1123 - 1124 - ```json 1125 - { 1126 - "lexicon": 1, 1127 - "id": "io.atcr.hold.crew", 1128 - "defs": { 1129 - "main": { 1130 - "type": "record", 1131 - "description": "Crew membership for a storage hold. Stored in the hold owner's PDS to maintain control over write access. Supports explicit DIDs (with backlinks), wildcard access, and handle patterns.", 1132 - "key": "any", 1133 - "record": { 1134 - "type": "object", 1135 - "required": ["hold", "role", "createdAt"], 1136 - "properties": { 1137 - "hold": { 1138 - "type": "string", 1139 - "format": "at-uri", 1140 - "description": "AT-URI of the hold record (e.g., 'at://did:plc:owner/io.atcr.hold/hold1')" 1141 - }, 1142 - "member": { 1143 - "type": "string", 1144 - "format": "did", 1145 - "description": "DID of crew member (for individual access with backlinks). Exactly one of 'member' or 'memberPattern' must be set." 1146 - }, 1147 - "memberPattern": { 1148 - "type": "string", 1149 - "description": "Pattern for matching multiple users. Supports wildcards: '*' (all users), '*.domain.com' (handle glob). Exactly one of 'member' or 'memberPattern' must be set." 1150 - }, 1151 - "role": { 1152 - "type": "string", 1153 - "description": "Member's role/permissions. 'owner' = hold owner, 'write' = can push blobs.", 1154 - "knownValues": ["owner", "write"] 1155 - }, 1156 - "expiresAt": { 1157 - "type": "string", 1158 - "format": "datetime", 1159 - "description": "Optional expiration for this membership" 1160 - }, 1161 - "createdAt": { 1162 - "type": "string", 1163 - "format": "datetime", 1164 - "description": "Membership creation timestamp" 1165 - } 1166 - } 1167 - } 1168 - } 1169 - } 1170 - } 1171 - ``` 1172 - 1173 - ### lexicons/io/atcr/hold/crew/barred.json (New) 1174 - 1175 - ```json 1176 - { 1177 - "lexicon": 1, 1178 - "id": "io.atcr.hold.crew.barred", 1179 - "defs": { 1180 - "main": { 1181 - "type": "record", 1182 - "description": "Barred (banned) list for a storage hold. Users/patterns in this list are denied write access, overriding crew membership. Stored in the hold owner's PDS.", 1183 - "key": "any", 1184 - "record": { 1185 - "type": "object", 1186 - "required": ["hold", "barredAt"], 1187 - "properties": { 1188 - "hold": { 1189 - "type": "string", 1190 - "format": "at-uri", 1191 - "description": "AT-URI of the hold record" 1192 - }, 1193 - "member": { 1194 - "type": "string", 1195 - "format": "did", 1196 - "description": "DID of user to bar. Exactly one of 'member' or 'memberPattern' must be set." 1197 - }, 1198 - "memberPattern": { 1199 - "type": "string", 1200 - "description": "Pattern for barring multiple users. Supports wildcards: '*.spam.com', 'bot*', etc. Exactly one of 'member' or 'memberPattern' must be set." 1201 - }, 1202 - "reason": { 1203 - "type": "string", 1204 - "maxLength": 300, 1205 - "description": "Optional human-readable reason for barring (e.g., 'spam', 'abuse', 'policy violation')" 1206 - }, 1207 - "barredAt": { 1208 - "type": "string", 1209 - "format": "datetime", 1210 - "description": "When the user/pattern was barred" 1211 - } 1212 - } 1213 - } 1214 - } 1215 - } 1216 - } 1217 - ``` 1218 - 1219 - ## Summary 1220 - 1221 - This design enables scalable, flexible access control for ATCR holds while: 1222 - 1223 - - **Preserving ATProto semantics** (DID backlinks, public records) 1224 - - **Supporting massive scale** (one record for thousands of users) 1225 - - **Enabling selective revocation** (barred list) 1226 - - **Maintaining backward compatibility** (existing records work unchanged) 1227 - - **Planning for future enhancements** (PDS-based filtering when possible) 1228 - 1229 - --- 1230 - 1231 - **Note on terminology:** "Barred" is an ironic reversal of the idiom "no holds barred" (meaning "without restrictions"). In wrestling, when all holds are allowed, it's unrestricted. In ATCR, being "barred from a hold" means you're restricted from access. The pun works in reverse! 🥁
+220 -799
docs/EMBEDDED_PDS.md
··· 1 1 # Embedded PDS Architecture for Hold Services 2 2 3 - This document explores the evolution of ATCR's hold service architecture toward becoming an embedded ATProto PDS (Personal Data Server). 3 + This document describes ATCR's hold service architecture using embedded ATProto PDS (Personal Data Server) for access control and federation. 4 4 5 5 ## Motivation 6 6 7 - ### Comparison to Other ATProto Projects 7 + ### The Fragmentation Problem 8 8 9 9 Several ATProto projects face similar challenges with large data storage: 10 10 11 - | Project | Large Data | Metadata | Current Solution | 12 - |---------|-----------|----------|------------------| 11 + | Project | Large Data | Metadata | Solution | 12 + |---------|-----------|----------|----------| 13 13 | **tangled.org** | Git objects | Issues, PRs, comments | External knot storage | 14 14 | **stream.place** | Video segments | Stream info, chat | Embedded "static PDS" | 15 - | **ATCR** | Container blobs | Manifests, comments, builds | External hold service | 16 - 17 - **Common problem:** Large binary data can't realistically live in user PDSs, but interaction metadata gets fragmented across different users' PDSs. 18 - 19 - **Emerging pattern:** Application-specific storage services with embedded minimal PDS implementations. 20 - 21 - ### The Fragmentation Problem 22 - 23 - #### Tangled.org Example 24 - ``` 25 - user/myproject repository 26 - ├── Git data → Knot (external storage) 27 - ├── Issues → Created by @alice → Lives in alice's PDS 28 - ├── PRs → Created by @bob → Lives in bob's PDS 29 - └── Comments → Created by @charlie → Lives in charlie's PDS 30 - ``` 31 - 32 - **Problems:** 33 - - Repo owner can't export all issues/PRs easily 34 - - No single source of truth for repo metadata 35 - - Interaction history fragmented across PDSs 36 - - Can't encrypt repo data while maintaining collaboration 37 - 38 - #### ATCR's Similar Challenge 39 - ``` 40 - atcr.io/alice/myapp 41 - ├── Manifests → alice's PDS 42 - ├── Blobs → Hold service (external) 43 - └── Future: Comments, builds, attestations → Where? 44 - ``` 45 - 46 - ### Stream.place's Approach 47 - 48 - Stream.place built a **minimal "static PDS"** embedded in their application with just the XRPC endpoints they need: 49 - - `com.atproto.repo.describeRepo` 50 - - `com.atproto.sync.subscribeRepos` 51 - - Minimal read methods 52 - 53 - **Why:** Avoid rate-limiting Bluesky's infrastructure with video segments while staying ATProto-native. 54 - 55 - ## Current Hold Service Architecture 56 - 57 - The current hold service is intentionally minimal: 58 - 59 - ``` 60 - Hold Service = 61 - - OAuth token validation (call user's PDS) 62 - - Generate presigned S3 URLs 63 - - Return HTTP redirects 64 - - Optional crew membership checks 65 - ``` 66 - 67 - **Endpoints:** 68 - - `POST /get-presigned-url` → S3 download URL 69 - - `POST /put-presigned-url` → S3 upload URL 70 - - `GET /blobs/{digest}` → Proxy fallback 71 - - `PUT /blobs/{digest}` → Proxy fallback 72 - - `GET /health` → Health check 73 - 74 - **Resource footprint:** 75 - - Single Go binary (~20MB) 76 - - No database (stateless) 77 - - No PDS (validates against user's PDS) 78 - - Minimal memory/CPU (just signing URLs) 79 - - S3 does all the heavy lifting 80 - 81 - This is already **as cheap as possible** for what it does - just an OAuth validation + URL signing service. 82 - 83 - ## Why Not Force Blobs into User PDSs? 84 - 85 - ### Size Considerations 15 + | **ATCR** | Container blobs | Manifests, comments, builds | Embedded PDS in hold service | 86 16 87 - **PDS blob limits:** Default ~50MB (Bluesky may be lower) 17 + **Common problem:** Large binary data can't realistically live in user PDSs, but application metadata needs a federated home. 88 18 89 - **Container layer sizes:** 90 - - Alpine base: ~5MB ✓ 91 - - Config blobs: ~1-5KB ✓ 92 - - Small Go binaries: 10-30MB ✓ 93 - - Node.js base: 100-200MB ✗ 94 - - Python base: 50-100MB ✗ 95 - - ML models: 500MB - 10GB ✗ 96 - - Large datasets: huge ✗ 19 + **ATCR's approach:** Each hold service is a full ATProto actor with its own embedded PDS for **shared data** (captain + crew records, not user-specific data). This PDS stores access control and metadata about the hold itself. 97 20 98 - **Reality:** Many/most layers exceed 50MB. A split-brain approach would be the norm, not the exception. 21 + ## Current Architecture 99 22 100 - ### Split-Brain Complexity 23 + ### Hold Service Components 101 24 102 - ```go 103 - func (s *SplitBlobStore) Create(ctx context.Context, options ...) { 104 - // Challenges: 105 - // 1. Monolithic uploads: Size known upfront ✓ 106 - // 2. Chunked uploads: Size unknown until complete ✗ 107 - // 3. Resumable uploads: State management across PDS/hold ✗ 108 - // 4. Mount/cross-repo: Which backend to check? ✗ 109 - } 110 25 ``` 111 - 112 - Detection works for simple cases but breaks down with: 113 - - Multipart/chunked uploads (no size until complete) 114 - - Resumable uploads (stateful across boundaries) 115 - - Cross-repository blob mounts (which backend?) 116 - 117 - ### Pragmatic Decision 118 - 119 - **Accept the trade-off:** 120 - - Blobs in holds (practical for large data) 121 - - Manifests in user's PDS (ownership of metadata) 122 - - Focus on making holds easy to deploy and migrate 123 - 124 - Users still own the **important part** - the manifest is the source of truth for what the image is. 125 - 126 - ## Embedded PDS Vision 127 - 128 - ### Key Insight: Hold is the PDS 129 - 130 - Because blobs are **content-addressed** and **deduplicated globally**, there isn't a singular owner of blob data. Multiple images share the same base layer blobs. 131 - 132 - **Therefore:** The **hold itself** is the PDS (with identity `did:web:hold1.example.com`), not individual image repositories. 133 - 134 - ### Proposed Architecture 135 - 136 - ``` 137 - Hold Service = Minimal PDS (did:web:hold1.example.com) 138 - ├── Standard ATProto blob endpoints: 139 - │ ├── com.atproto.sync.uploadBlob 140 - │ ├── com.atproto.sync.getBlob 141 - │ └── Blob storage → S3 (like normal PDS) 142 - ├── Custom XRPC methods: 143 - │ ├── io.atcr.hold.delegateAccess (IAM) 144 - │ ├── io.atcr.hold.getUploadUrl (optimization) 145 - │ ├── io.atcr.hold.getDownloadUrl (optimization) 146 - │ ├── io.atcr.hold.exportImage (data portability) 147 - │ └── io.atcr.hold.getStats (metadata) 148 - └── Records (hold's own PDS): 149 - ├── io.atcr.hold.captain (single record: ownership & metadata) 150 - ├── io.atcr.hold.crew/* (crew membership & permissions) 151 - └── io.atcr.hold.config (hold configuration) 152 - ``` 153 - 154 - ### Benefits 155 - 156 - 1. **ATProto-native**: Uses standard XRPC, not custom REST API 157 - 2. **Discoverable**: Hold's DID document advertises capabilities 158 - 3. **Portable**: Users can export images via XRPC 159 - 4. **Standardized**: Blob operations use ATProto conventions 160 - 5. **Future-proof**: Can add more XRPC methods as needed 161 - 6. **Interoperable**: Works with ATProto tooling 162 - 163 - ## Implementation Details 164 - 165 - ### 1. SHA256 to CID Mapping 166 - 167 - ATProto uses CIDs (Content Identifiers) for blobs, while OCI uses SHA256 digests. However, CIDs support SHA256 as the hash function. 168 - 169 - **Key insight:** We can construct CIDs directly from SHA256 digests with no additional storage needed! 170 - 171 - ```go 172 - // pkg/hold/cid.go 173 - func DigestToCID(digest string) (cid.Cid, error) { 174 - // sha256:abc123... → raw bytes 175 - hash := parseDigest(digest) 176 - 177 - // Construct CIDv1 with sha256 codec 178 - return cid.NewCidV1( 179 - cid.Raw, // codec 180 - multihash.SHA2_256, // hash function 181 - hash, // hash bytes 182 - ) 183 - } 184 - 185 - func CIDToDigest(c cid.Cid) string { 186 - // Decode multihash → sha256:abc... 187 - mh := c.Hash() 188 - return fmt.Sprintf("sha256:%x", mh) 189 - } 190 - ``` 191 - 192 - **Mapping:** 193 - ``` 194 - OCI digest: sha256:abc123... 195 - ATProto CID: bafybei... (CIDv1 with sha256, base32 encoded) 196 - Storage path: s3://bucket/blobs/sha256/ab/abc123... 197 - ``` 198 - 199 - Blobs stay in distribution's layout, we just compute CID on-the-fly. **No mapping records needed.** 200 - 201 - ### 2. Storage: Distribution Layout with PDS Interface 202 - 203 - The hold's blob storage uses distribution's driver directly - no encoding or transformation: 204 - 205 - ```go 206 - type HoldBlobStore struct { 207 - storageDriver storagedriver.StorageDriver // S3, filesystem, etc 208 - } 209 - 210 - // Implements ATProto blob interface 211 - func (h *HoldBlobStore) UploadBlob(ctx context.Context, data io.Reader) (cid.Cid, error) { 212 - // 1. Compute sha256 while reading 213 - digest, size := computeDigest(data) 214 - 215 - // 2. Store at distribution's path: blobs/sha256/ab/abc123... 216 - path := h.blobPath(digest) 217 - h.storageDriver.PutContent(ctx, path, data) 218 - 219 - // 3. Return CID (computed from sha256) 220 - return DigestToCID(digest), nil 221 - } 222 - 223 - func (h *HoldBlobStore) GetBlob(ctx context.Context, c cid.Cid) (io.Reader, error) { 224 - // 1. Convert CID → sha256 digest 225 - digest := CIDToDigest(c) 226 - 227 - // 2. Fetch from distribution's path 228 - path := h.blobPath(digest) 229 - return h.storageDriver.Reader(ctx, path, 0) 230 - } 231 - ``` 232 - 233 - Storage continues to use distribution's existing S3 layout. The PDS interface is just a wrapper. 234 - 235 - ### 3. Authentication & IAM 236 - 237 - **Challenge:** ATProto operations are authenticated AS the account owner. For hold operations, we need actions to be performed AS the hold (not individual users), but authorized BY crew members. 238 - 239 - **Important context:** AppView manages the user's OAuth session. When users authenticate via the credential helper, they actually authenticate through AppView's web interface. AppView obtains and stores the user's OAuth token and DPoP key. The credential helper only receives a registry JWT. 240 - 241 - **Proposed: DPoP Proof Delegation (Standard ATProto Federation)** 242 - 243 - ``` 244 - 1. User authenticates via AppView (OAuth flow) 245 - - AppView obtains: OAuth token, refresh token, DPoP key, DID 246 - - AppView stores these in its token storage 247 - - Credential helper receives: Registry JWT only 248 - 249 - 2. When AppView needs blob access, it calls hold: 250 - POST /xrpc/io.atcr.hold.delegateAccess 251 - Headers: Authorization: DPoP <user-oauth-token> 252 - DPoP: <proof-signed-with-user-dpop-key> 253 - Body: { 254 - "userDid": "did:plc:alice123", 255 - "purpose": "blob-upload", 256 - "duration": 900 257 - } 258 - 259 - 3. Hold validates (standard ATProto token validation): 260 - - Verify DPoP proof signature matches token's bound key 261 - - Call user's PDS: com.atproto.server.getSession (validates token) 262 - - Extract user's DID from validated session 263 - - Check user's DID in hold's crew records 264 - - If authorized, issue temporary token for blob operations 265 - 266 - 4. AppView uses delegated token for blob operations: 267 - POST /xrpc/com.atproto.sync.uploadBlob 268 - Headers: Authorization: DPoP <hold-token> 269 - DPoP: <proof> 26 + Hold Service (did:web:hold01.atcr.io) 27 + ├── Embedded PDS (SQLite carstore) - Shared data only 28 + │ ├── Captain record (ownership metadata) 29 + │ ├── Crew records (access control) 30 + │ └── ATProto sync/repo endpoints 31 + ├── OCI multipart upload (XRPC) 32 + │ ├── io.atcr.hold.initiateUpload 33 + │ ├── io.atcr.hold.getPartUploadUrl 34 + │ ├── io.atcr.hold.uploadPart 35 + │ ├── io.atcr.hold.completeUpload 36 + │ └── io.atcr.hold.abortUpload 37 + └── Storage driver (S3, filesystem, etc.) 270 38 ``` 271 39 272 - **This is standard ATProto federation** - services pass OAuth tokens with DPoP proofs between each other. Hold independently validates tokens against the user's PDS, so there's no trust relationship required. 40 + **Important distinction:** 41 + - **Hold's embedded PDS** = Shared data (crew members, hold configuration) 42 + - **User's PDS** = User-specific data (manifests, sailor profile, personal records) 43 + - Hold's PDS does NOT store user-specific container data (that stays in user's own PDS) 273 44 274 - **Records stored in hold's PDS:** 45 + ### Records Structure 275 46 47 + **Captain record** (hold ownership, single record at `io.atcr.hold.captain/self`): 276 48 ```json 277 - // io.atcr.hold.captain (single record - hold metadata) 278 49 { 279 50 "$type": "io.atcr.hold.captain", 280 51 "owner": "did:plc:alice123", ··· 283 54 "region": "iad", 284 55 "provider": "fly.io" 285 56 } 57 + ``` 286 58 287 - // io.atcr.hold.crew/* (access control records) 59 + **Crew records** (access control, one per member at `io.atcr.hold.crew/{rkey}`): 60 + ```json 288 61 { 289 62 "$type": "io.atcr.hold.crew", 290 - "member": "did:plc:alice123", 63 + "member": "did:plc:bob456", 291 64 "role": "admin", 292 - "permissions": ["blob:read", "blob:write", "crew:manage"], 65 + "permissions": ["blob:read", "blob:write"], 293 66 "addedAt": "2025-10-14T..." 294 67 } 295 68 ``` 296 69 297 - **Semantic separation:** 298 - - **Captain record** = Hold ownership and metadata (who owns it, where it's deployed) 299 - - **Crew records** = Access control (who can use it, what permissions they have) 70 + ### ATProto PDS Endpoints 300 71 301 - **Security considerations:** 302 - - User's OAuth token is exposed to hold during delegation 303 - - However, hold independently validates it (can't be forged) 304 - - Tokens are short-lived (15min typical) 305 - - Hold only accepts tokens for crew members 306 - - Hold validates DPoP binding (requires private key) 307 - - Standard ATProto security model 72 + Standard ATProto sync endpoints: 73 + - `GET /xrpc/com.atproto.sync.getRepo` - Download repository as CAR file 74 + - `GET /xrpc/com.atproto.sync.getBlob` - Get blob or presigned download URL 75 + - `GET /xrpc/com.atproto.sync.subscribeRepos` - Real-time crew changes 76 + - `GET /xrpc/com.atproto.sync.listRepos` - List repositories 308 77 309 - ### 4. Presigned URLs for Optimized Egress 78 + Repository management: 79 + - `GET /xrpc/com.atproto.repo.describeRepo` - Repository metadata 80 + - `GET /xrpc/com.atproto.repo.getRecord` - Get specific record (captain/crew) 81 + - `GET /xrpc/com.atproto.repo.listRecords` - List crew members 82 + - `POST /xrpc/io.atcr.hold.requestCrew` - Request crew membership 310 83 311 - While standard ATProto blob endpoints work, direct S3 access is more efficient. Hold can expose custom XRPC methods: 84 + DID resolution: 85 + - `GET /.well-known/did.json` - DID document (did:web resolution) 86 + - `GET /.well-known/atproto-did` - DID for handle resolution 312 87 313 - ```go 314 - // io.atcr.hold.getUploadUrl - Get presigned upload URL 315 - type GetUploadUrlRequest struct { 316 - Digest string // sha256:abc... 317 - Size int64 318 - } 88 + ### OCI Multipart Upload Flow 319 89 320 - type GetUploadUrlResponse struct { 321 - UploadURL string // Presigned S3 URL 322 - ExpiresAt time.Time 323 - } 90 + ``` 91 + 1. AppView gets service token from user's PDS: 92 + GET /xrpc/com.atproto.server.getServiceAuth?aud={holdDID} 93 + Response: { "token": "eyJ..." } 324 94 325 - // io.atcr.hold.getDownloadUrl - Get presigned download URL 326 - type GetDownloadUrlRequest struct { 327 - Digest string 328 - } 95 + 2. AppView initiates multipart upload: 96 + POST /xrpc/io.atcr.hold.initiateUpload 97 + Authorization: Bearer {serviceToken} 98 + Body: { "digest": "sha256:abc..." } 99 + Response: { "uploadId": "xyz" } 329 100 330 - type GetDownloadUrlResponse struct { 331 - DownloadURL string // Presigned S3 URL 332 - ExpiresAt time.Time 333 - } 334 - ``` 101 + 3. For each part: 102 + POST /xrpc/io.atcr.hold.getPartUploadUrl 103 + Body: { "uploadId": "xyz", "partNumber": 1 } 104 + Response: { "url": "https://s3.../presigned" } 335 105 336 - **AppView uses optimized path:** 337 - ```go 338 - func (a *ATProtoBlobStore) ServeBlob(ctx, w, r, dgst) error { 339 - // Try optimized presigned URL endpoint 340 - resp, err := a.client.GetDownloadUrl(ctx, dgst) 341 - if err == nil { 342 - // Redirect directly to S3 343 - http.Redirect(w, r, resp.DownloadURL, http.StatusTemporaryRedirect) 344 - return nil 345 - } 106 + 4. Upload part to S3 presigned URL: 107 + PUT {presignedURL} 108 + Body: [part data] 346 109 347 - // Fallback: Standard ATProto blob endpoint (proxied) 348 - reader, _ := a.client.GetBlob(ctx, holdDID, cid) 349 - io.Copy(w, reader) 350 - } 110 + 5. Complete upload: 111 + POST /xrpc/io.atcr.hold.completeUpload 112 + Body: { "uploadId": "xyz", "digest": "sha256:abc...", "parts": [...] } 351 113 ``` 352 114 353 - **Best of both worlds:** Standard ATProto interface + S3 optimization for bandwidth efficiency. 354 - 355 - ### 5. Image Export for Portability 115 + ## Implementation Details 356 116 357 - Custom XRPC method enables users to export entire images: 117 + ### Storage: Indigo Carstore with SQLite 358 118 359 119 ```go 360 - // io.atcr.hold.exportImage - Export all blobs for an image 361 - type ExportImageRequest struct { 362 - Manifest *oci.Manifest // User provides manifest 363 - } 364 - 365 - type ExportImageResponse struct { 366 - ArchiveURL string // Presigned S3 URL to tar.gz 367 - ExpiresAt time.Time 120 + type HoldPDS struct { 121 + did string 122 + carstore carstore.CarStore 123 + session *carstore.DeltaSession // Provides blockstore interface 124 + repo *repo.Repo 125 + dbPath string 126 + uid models.Uid // User ID for carstore (fixed: 1) 368 127 } 369 - 370 - // Implementation: 371 - // 1. Extract all blob digests from manifest (config + layers) 372 - // 2. Create tar.gz with all blobs 373 - // 3. Upload to S3 temp location 374 - // 4. Return presigned download URL (15min expiry) 375 128 ``` 376 129 377 - Users can request all blobs for their images and migrate to different holds. 378 - 379 - ## Changes Required 380 - 381 - ### AppView Changes 130 + **Storage location:** Single SQLite file (`/var/lib/atcr-hold/hold.db`) 131 + - Contains MST nodes, records, commits in carstore tables 132 + - Handles compaction/cleanup automatically 133 + - Migration path to Postgres if needed (same carstore API) 382 134 383 - **Current:** 384 - ```go 385 - type ProxyBlobStore struct { 386 - holdURL string // HTTP endpoint 387 - } 135 + ### Key Implementation Lessons 388 136 389 - func (p *ProxyBlobStore) ServeBlob(...) { 390 - // POST /put-presigned-url 391 - // Return redirect 392 - } 393 - ``` 137 + #### 1. Custom Record Types Need Manual CBOR Decoding 394 138 395 - **New:** 396 139 ```go 397 - type ATProtoBlobStore struct { 398 - holdDID string // did:web:hold1.example.com 399 - holdURL string // Resolved from DID document 400 - client *atproto.Client // XRPC client 401 - delegatedToken string // From io.atcr.hold.delegateAccess 402 - } 140 + // ❌ WRONG - Fails with "unrecognized lexicon type" 141 + record, err := repo.GetRecord(ctx, path, &CrewRecord{}) 403 142 404 - func (a *ATProtoBlobStore) ServeBlob(ctx, w, r, dgst) error { 405 - // Try optimized: io.atcr.hold.getDownloadUrl 406 - // Fallback: com.atproto.sync.getBlob 407 - } 143 + // ✅ CORRECT - Manual CBOR decoding 144 + recordCID, recBytes, err := repo.GetRecordBytes(ctx, path) 145 + var crewRecord CrewRecord 146 + err = crewRecord.UnmarshalCBOR(bytes.NewReader(*recBytes)) 408 147 ``` 409 148 410 - ### Hold Service Changes 149 + Indigo's lexicon system doesn't know about custom types like `io.atcr.hold.crew`. 411 150 412 - Transform from simple HTTP server to minimal PDS: 151 + #### 2. JSON and CBOR Struct Tags Must Match 413 152 414 153 ```go 415 - // cmd/hold/main.go 416 - func main() { 417 - // Storage driver (unchanged) 418 - storageDriver := buildStorageDriver() 419 - 420 - // NEW: Embedded PDS 421 - pds := hold.NewEmbeddedPDS(hold.Config{ 422 - DID: "did:web:hold1.example.com", 423 - BlobStore: storageDriver, 424 - Collections: []string{ 425 - "io.atcr.hold.crew", 426 - "io.atcr.hold.config", 427 - }, 428 - }) 429 - 430 - // Serve XRPC endpoints 431 - mux.Handle("/xrpc/", pds.Handler()) 432 - 433 - // Legacy endpoints (optional for backwards compat) 434 - // mux.Handle("/get-presigned-url", legacyHandler) 154 + // ✅ CORRECT - JSON tags match CBOR tags 155 + type CrewRecord struct { 156 + Type string `json:"$type" cborgen:"$type"` 157 + Member string `json:"member" cborgen:"member"` 158 + Role string `json:"role" cborgen:"role"` 159 + Permissions []string `json:"permissions" cborgen:"permissions"` 160 + AddedAt string `json:"addedAt" cborgen:"addedAt"` 435 161 } 436 162 ``` 437 163 438 - ## Open Questions 439 - 440 - ### 1. Docker Hub Size Limits 441 - 442 - **Research findings:** Docker Hub has soft limits around 10-20GB per layer, with practical issues beyond that. No hard-coded enforcement. 443 - 444 - **For ATCR:** Hold services can theoretically support larger blobs if S3 and network infrastructure allows. May want configurable limits to prevent abuse. 445 - 446 - ### 2. Token Delegation Security Model 164 + CID verification requires identical bytes from JSON and CBOR encodings. 447 165 448 - **Recommended approach:** DPoP proof delegation (standard ATProto federation pattern) 166 + #### 3. MST ForEach Returns Full Paths 449 167 450 - Open questions: 451 - - How long should delegated tokens last? (15min like presigned URLs?) 452 - - Should delegation be per-operation or session-based? 453 - - Do we need audit logs for delegated operations? 454 - - Can AppView cache delegated tokens across requests? 455 - - Should we implement token refresh for long-running operations? 456 - 457 - ### 3. Migration Path 458 - 459 - - Do we support both HTTP and XRPC APIs during transition? 460 - - How do existing manifests with `holdEndpoint: "https://..."` migrate to `holdDid: "did:web:..."`? 461 - - Can AppView auto-detect if hold supports XRPC vs legacy? 462 - 463 - ### 4. PDS Implementation Scope 464 - 465 - **Minimal endpoints needed:** 466 - - `com.atproto.sync.uploadBlob` 467 - - `com.atproto.sync.getBlob` 468 - - `com.atproto.repo.describeRepo` (discovery) 469 - - Custom XRPC methods (delegation, presigned URLs, export) 470 - 471 - **Not needed:** 472 - - `com.atproto.repo.*` (no user repos) 473 - - `com.atproto.server.*` (no user sessions) 474 - - Most sync/admin endpoints 475 - 476 - Can we build a reusable "static PDS" library for apps like ATCR, tangled.org, stream.place? 477 - 478 - ### 5. Crew Management 479 - 480 - - How are crew members added/removed? 481 - - UI in AppView? CLI tool? Direct XRPC calls? 482 - - Can crew members delegate to other crew members? 483 - - Role hierarchy (owner > admin > member)? 484 - 485 - ### 6. Hold Discovery & Registration 486 - 487 - **Decision: No registration records needed in owner's PDS.** 488 - 489 - Since holds are ATProto actors with did:web identity, they are self-describing: 490 - 491 - **Hold's PDS contains everything:** 492 - ``` 493 - did:web:hold01.atcr.io 494 - ├── io.atcr.hold.captain → { owner: "did:plc:alice123", ... } 495 - └── io.atcr.hold.crew/* → Access control records 496 - ``` 497 - 498 - **DID Document with Multiple Services:** 499 - 500 - Holds expose multiple service endpoints to distinguish themselves from generic PDSs: 501 - 502 - ```json 503 - { 504 - "@context": ["https://www.w3.org/ns/did/v1", ...], 505 - "id": "did:web:hold01.atcr.io", 506 - "service": [ 507 - { 508 - "id": "#atproto_pds", 509 - "type": "AtprotoPersonalDataServer", 510 - "serviceEndpoint": "https://hold01.atcr.io" 511 - }, 512 - { 513 - "id": "#atcr_hold", 514 - "type": "AtcrHoldService", 515 - "serviceEndpoint": "https://hold01.atcr.io" 516 - } 517 - ] 518 - } 519 - ``` 520 - 521 - **Service semantics:** 522 - - **`#atproto_pds`** - Standard ATProto PDS operations (crew queries, record sync) 523 - - **`#atcr_hold`** - ATCR-specific operations (blob storage, presigned URLs) 524 - 525 - **Discovery patterns:** 526 - 527 - 1. **Direct deployment** - Owner deploys hold, knows the DID 528 - 2. **Sailor profiles** - Users reference holds by DID in their profile 529 - 3. **DID resolution** - `did:web:hold01.atcr.io` → `https://hold01.atcr.io/.well-known/did.json` 530 - 4. **Service lookup** - Check for `#atcr_hold` service to identify ATCR holds 531 - 5. **Crew queries** - AppView queries hold's PDS directly via `#atproto_pds` endpoint 532 - 533 - **AppView resolution flow:** 534 168 ```go 535 - // 1. Get hold DID from sailor profile 536 - holdDID := profile.DefaultHold // "did:web:hold01.atcr.io" 537 - 538 - // 2. Resolve DID document 539 - didDoc := resolveDidWeb(holdDID) 540 - 541 - // 3. Extract service endpoints 542 - pdsEndpoint := didDoc.GetService("#atproto_pds") // XRPC operations 543 - holdEndpoint := didDoc.GetService("#atcr_hold") // Blob operations 544 - 545 - // 4. Query crew list via PDS endpoint 546 - crew := xrpcClient.ListRecords(pdsEndpoint, "io.atcr.hold.crew") 547 - 548 - // 5. Check if user has access 549 - hasAccess := crew.Contains(userDID) 550 - ``` 551 - 552 - **No need for reverse lookup** (owner → holds). Users know their holds because they deployed them. 553 - 554 - **Benefits:** 555 - - ✅ Single source of truth (hold's PDS) 556 - - ✅ No cross-PDS writes during registration 557 - - ✅ Self-describing ATProto actors 558 - - ✅ Standard DID resolution patterns 559 - - ✅ Clear service semantics (PDS vs ATCR-specific) 560 - - ✅ Discoverable via service type 561 - 562 - **OAuth implications:** 563 - - ✅ OAuth registration removed completely (hold is self-describing) 564 - - Hold creates captain + crew records in its own embedded PDS 565 - - No cross-PDS writes or OAuth flows needed 566 - 567 - ### 7. Multi-Tenancy 568 - 569 - Could one hold PDS serve multiple "logical holds" for different organizations? 570 - 571 - ``` 572 - did:web:hold-provider.com/org1 573 - did:web:hold-provider.com/org2 169 + // ✅ CORRECT - Extract just the rkey 170 + err := repo.ForEach(ctx, "io.atcr.hold.crew", func(k string, v cid.Cid) error { 171 + // k = "io.atcr.hold.crew/3m37dr2ddit22" 172 + parts := strings.Split(k, "/") 173 + rkey := parts[len(parts)-1] // "3m37dr2ddit22" 174 + return nil 175 + }) 574 176 ``` 575 177 576 - Or should each hold be a separate deployment? 577 - 578 - ### 8. Blob Deduplication 579 - 580 - Current behavior: Global deduplication (same layer shared across all images). 581 - 582 - With embedded PDS: 583 - - Does dedup stay global across all crew/users? 584 - - Or is it per-hold (isolated storage)? 585 - - How do we track blob references for garbage collection? 586 - 587 - ### 9. Cost Model 178 + #### 4. CAR Files Must Include Full MST Path 588 179 589 - - Who pays for S3 storage/egress? 590 - - Hold operator? Image owner? Per-pull? 591 - - How to implement metering/billing via XRPC? 180 + For `com.atproto.sync.getRecord`, return CAR with: 181 + 1. **Commit block** - Repo head with signature 182 + 2. **MST tree nodes** - Path from root to record 183 + 3. **Record block** - The actual record data 592 184 593 - ### 10. Disaster Recovery 185 + Use `util.NewLoggingBstore()` to capture all accessed blocks. 594 186 595 - - How to backup hold's PDS (crew records, config)? 596 - - Can holds replicate to other holds? 597 - - Image export handles blobs - what about metadata? 187 + ## IAM Challenges 598 188 599 - ## Implementation Plan 189 + ### Current Implementation: Service Tokens 600 190 601 - ### Phase 1: Basic PDS with Carstore ✅ COMPLETED 602 - 603 - **Implementation: Using indigo's carstore with SQLite + DeltaSession** 191 + AppView uses `com.atproto.server.getServiceAuth` to get tokens for calling holds: 604 192 605 193 ```go 606 - import ( 607 - "github.com/bluesky-social/indigo/carstore" 608 - "github.com/bluesky-social/indigo/models" 609 - "github.com/bluesky-social/indigo/repo" 610 - ) 611 - 612 - type HoldPDS struct { 613 - did string 614 - carstore carstore.CarStore 615 - session *carstore.DeltaSession // Provides blockstore interface 616 - repo *repo.Repo 617 - dbPath string 618 - uid models.Uid // User ID for carstore (fixed: 1) 619 - } 620 - 621 - func NewHoldPDS(ctx context.Context, did, dbPath string) (*HoldPDS, error) { 622 - // Create SQLite-backed carstore 623 - sqlStore, err := carstore.NewSqliteStore(dbPath) 624 - sqlStore.Open(dbPath) 625 - cs := sqlStore.CarStore() 626 - 627 - // For single-hold use, fixed UID 628 - uid := models.Uid(1) 629 - 630 - // Create DeltaSession (provides blockstore interface) 631 - session, err := cs.NewDeltaSession(ctx, uid, nil) 194 + // AppView requests service token from user's PDS 195 + GET /xrpc/com.atproto.server.getServiceAuth?aud={holdDID}&lxm=com.atproto.repo.getRecord 632 196 633 - // Create repo with session as blockstore 634 - r := repo.NewRepo(ctx, did, session) 197 + // PDS returns short-lived token (60 seconds) 198 + { "token": "eyJ..." } 635 199 636 - return &HoldPDS{ 637 - did: did, 638 - carstore: cs, 639 - session: session, 640 - repo: r, 641 - dbPath: dbPath, 642 - uid: uid, 643 - }, nil 644 - } 200 + // AppView uses token to authenticate to hold 201 + Authorization: Bearer eyJ... 645 202 ``` 646 203 647 - **Key learnings:** 648 - - ✅ Carstore provides blockstore via `DeltaSession` (not direct access) 649 - - ✅ `models.Uid` is the user ID type (we use fixed UID(1)) 650 - - ✅ DeltaSession needs to be a pointer (`*carstore.DeltaSession`) 651 - - ✅ `repo.NewRepo()` accepts the session directly as blockstore 204 + ### Known Issues 652 205 653 - **Storage:** 654 - - Single file: `/var/lib/atcr-hold/hold.db` (SQLite) 655 - - Contains MST nodes, records, commits in carstore tables 656 - - Proper indigo repo/MST implementation (production-tested) 206 + #### 1. RPC Permission Format with IP Addresses 657 207 658 - **Why SQLite carstore:** 659 - - ✅ Single file persistence (like appview's SQLite) 660 - - ✅ Official indigo storage backend 661 - - ✅ Handles compaction/cleanup automatically 662 - - ✅ Migration path to Postgres/Scylla if needed 663 - - ✅ Easy to replicate (Litestream, LiteFS, rsync) 664 - - ✅ CAR import/export support built-in 208 + **Problem:** Service token RPC permissions don't work with IP addresses in the audience (`aud`) field: 665 209 666 - **Scale considerations:** 667 - - SQLite carstore marked "experimental" but suitable for single-hold use 668 - - MST designed for massive scale (O(log n) operations) 669 - - 1000 crew records = ~1-2MB database (trivial) 670 - - Bluesky PDSs use carstore for millions of records 671 - - If needed: migrate to Postgres-backed carstore (same API) 672 - 673 - ### Hold as Proper ATProto User 674 - 675 - **Decision:** Make holds full ATProto actors for discoverability and ecosystem integration. 676 - 677 - **What this enables:** 678 - - Hold becomes discoverable via ATProto directory 679 - - Can have profile (`app.bsky.actor.profile`) 680 - - Can post status updates (`app.bsky.feed.post`) 681 - - Users can follow holds 682 - - Social proof/reputation via ATProto social graph 683 - 684 - **MVP Scope:** 685 - We're building the minimal PDS needed for discoverability, not a full social client: 686 - - ✅ Signing keys (ES256K via `atproto/atcrypto`) 687 - - ✅ DID document (did:web at `/.well-known/did.json`) 688 - - ✅ Standard XRPC endpoints (`describeRepo`, `getRecord`, `listRecords`) 689 - - ✅ Profile record (`app.bsky.actor.profile`) 690 - - ⏸️ Posting functionality (later - other services can read our records) 691 - 692 - **Key insight:** Other ATProto services will "just work" as long as they can retrieve records from the hold's PDS. We don't need to implement full social features for the hold to participate in the ecosystem. 693 - 694 - ### Crew Management: Captain + Individual Records 695 - 696 - **Decision: Captain record (ownership) + Individual crew records (access control)** 697 - 698 - ```json 699 - // io.atcr.hold.captain (single record - hold metadata) 700 - { 701 - "$type": "io.atcr.hold.captain", 702 - "owner": "did:plc:alice123", 703 - "public": false, 704 - "deployedAt": "2025-10-14T...", 705 - "region": "iad", 706 - "provider": "fly.io" 707 - } 708 - 709 - // io.atcr.hold.crew/{rkey} (access control) 710 - { 711 - "$type": "io.atcr.hold.crew", 712 - "member": "did:plc:alice123", 713 - "role": "admin", // or "member" 714 - "permissions": ["blob:read", "blob:write"], 715 - "addedAt": "2025-10-14T..." 716 - } 717 - 718 - // io.atcr.hold.config/policy (optional) 719 - { 720 - "$type": "io.atcr.hold.config", 721 - "access": "public", // or "allowlist" 722 - "allowAny": true, // public: allow any authenticated user 723 - "requireAuth": true, // require authentication (no anonymous) 724 - "maxUsers": 1000 // optional limit 725 - } 726 210 ``` 727 - 728 - **Semantic separation:** 729 - - **Captain record** = Who owns/deployed the hold (billing, deletion, migration rights) 730 - - **Crew records** = Who can use the hold (access control, permissions) 731 - - **Config record** = Hold-wide policies 732 - 733 - **Authorization logic:** 734 - ```go 735 - func (p *HoldPDS) CheckAccess(ctx context.Context, userDID string) (bool, error) { 736 - policy := p.GetPolicy(ctx) 737 - 738 - if policy.Access == "public" && policy.AllowAny { 739 - // Public hold - any authenticated ATCR user allowed 740 - // No individual crew record needed 741 - return true, nil 742 - } 743 - 744 - if policy.Access == "allowlist" { 745 - // Check explicit crew membership 746 - _, err := p.GetCrewMember(ctx, userDID) 747 - return err == nil, nil 748 - } 749 - 750 - return false, nil 751 - } 211 + Error: RPC permission format invalid 212 + Permission: rpc:com.atproto.repo.getRecord?aud=172.28.0.3:8080#atcr_hold 213 + Issue: IP address with port not supported in aud field 752 214 ``` 753 215 754 - **Benefits of individual records:** 755 - - Auditability (track who has access) 756 - - Per-user permissions (admin vs member) 757 - - Explicit revocation capabilities 758 - - Analytics (usage tracking) 759 - - Rate limiting (per-user quotas) 760 - - subscribeRepos events on crew changes 216 + **Impact:** Local development with IP-based hold DIDs (e.g., `did:web:172.28.0.3:8080`) fails. 761 217 762 - **Use cases:** 763 - - **Public community hold:** `access: "public", allowAny: true` - no crew records needed 764 - - **Private team hold:** `access: "allowlist"` - explicit crew membership 765 - - **Hybrid:** Public access + explicit admin crew records for elevated permissions 218 + **Workaround:** Falls back to unauthenticated requests (works for public holds only) or use hostname-based DIDs. 766 219 767 - ### Phase 2: XRPC Endpoints Implementation ✅ COMPLETED 220 + #### 2. Dynamic Hold Discovery Limitation 768 221 769 - **Critical Implementation Lessons Learned:** 222 + **Problem:** AppView can only OAuth a user's default hold (configured in AppView), not dynamically discovered holds from sailor profiles. 770 223 771 - #### 1. Custom Record Types Require Manual CBOR Decoding 224 + **Current limitation:** 225 + - User sets `defaultHold = "did:web:alice-storage.fly.dev"` in sailor profile 226 + - AppView discovers hold DID when user pushes 227 + - AppView tries to get service token for alice's hold from user's PDS 228 + - BUT: User never OAuth'd through alice's hold, only through AppView's default hold 229 + - Result: No service token available, can't authenticate to alice's hold 772 230 773 - Indigo's `repo.GetRecord()` uses its lexicon decoder which only knows about built-in ATProto types. For custom types, you must use `GetRecordBytes()` and decode manually: 231 + **Why this matters:** 232 + - Users can't seamlessly use BYOS (Bring Your Own Storage) 233 + - Hold references in sailor profiles are non-functional 234 + - Limits portability and decentralization goals 774 235 775 - ```go 776 - // ❌ WRONG - Fails with "unrecognized lexicon type" 777 - record, err := repo.GetRecord(ctx, path, &CrewRecord{}) 236 + #### 3. Trust Model: "Trust but Verify" 778 237 779 - // ✅ CORRECT - Manual CBOR decoding 780 - recordCID, recBytes, err := repo.GetRecordBytes(ctx, path) 781 - var crewRecord CrewRecord 782 - err = crewRecord.UnmarshalCBOR(bytes.NewReader(*recBytes)) 783 - ``` 238 + **Current approach:** 239 + 1. User OAuth's to AppView (credential helper flow) 240 + 2. Hold has crew member record for user (authorization) 241 + 3. AppView requests service token from user's PDS (proof) 242 + 4. Hold validates service token from user's PDS (verification) 784 243 785 - **Why:** Indigo's lexicon system doesn't know about `io.atcr.hold.crew` or other custom types. 244 + **Philosophy:** "Trust but verify" 245 + - IF user OAuth'd to AppView AND hold has crew member record for user → generally trust 246 + - BUT don't want AppView to lie → need proof from user's PDS that it's actually them 247 + - Service tokens provide this proof (user's PDS says "yes, I authorized this") 786 248 787 - #### 2. JSON Struct Tags Must Match CBOR Tags Exactly 249 + **Challenge:** Service tokens work for this model, but scope/permission format issues (see #1, #2) make it fragile in practice. 788 250 789 - For CID verification to work, JSON and CBOR encodings must produce identical bytes: 251 + ### Potential Solutions 790 252 791 - ```go 792 - // ❌ WRONG - JSON uses capital field names (Member, Role) 793 - type CrewRecord struct { 794 - Type string `cborgen:"$type"` 795 - Member string `cborgen:"member"` 796 - Role string `cborgen:"role"` 797 - Permissions []string `cborgen:"permissions"` 798 - AddedAt string `cborgen:"addedAt"` 799 - } 253 + #### Option A: Direct User-to-Hold Authentication 800 254 801 - // ✅ CORRECT - JSON tags match CBOR tags 802 - type CrewRecord struct { 803 - Type string `json:"$type" cborgen:"$type"` 804 - Member string `json:"member" cborgen:"member"` 805 - Role string `json:"role" cborgen:"role"` 806 - Permissions []string `json:"permissions" cborgen:"permissions"` 807 - AddedAt string `json:"addedAt" cborgen:"addedAt"` 808 - } 809 - ``` 255 + Users authenticate directly to holds (bypassing AppView service tokens). 810 256 811 - **Why:** Verification code CBOR-encodes the JSON record and compares the CID. Mismatched field names produce different bytes and thus different CIDs. 257 + **Pros:** 258 + - ✅ Clear trust model (user ↔ hold) 259 + - ✅ Works with any hold (BYOS friendly) 260 + - ✅ No OAuth scope issues 812 261 813 - #### 3. MST ForEach Returns Full Paths 262 + **Cons:** 263 + - ❌ Multiple OAuth flows (user's PDS + each hold) 264 + - ❌ Complex credential management 265 + - ❌ Poor UX (authenticate to each hold separately) 814 266 815 - The `repo.ForEach()` callback receives full collection paths, not just record keys: 267 + #### Option B: AppView as OAuth Client 816 268 817 - ```go 818 - // ❌ WRONG - Prepends collection prefix again 819 - err := repo.ForEach(ctx, "io.atcr.hold.crew", func(k string, v cid.Cid) error { 820 - // k is already "io.atcr.hold.crew/3m37dr2ddit22" 821 - path := fmt.Sprintf("%s/%s", collection, k) // Double path! 822 - return nil 823 - }) 824 - 825 - // ✅ CORRECT - Extract just the rkey 826 - err := repo.ForEach(ctx, "io.atcr.hold.crew", func(k string, v cid.Cid) error { 827 - // k = "io.atcr.hold.crew/3m37dr2ddit22" 828 - parts := strings.Split(k, "/") 829 - rkey := parts[len(parts)-1] // "3m37dr2ddit22" 830 - return nil 831 - }) 832 - ``` 833 - 834 - #### 4. All Record Endpoints Must Return CIDs 269 + AppView pre-registers with holds and uses its own credentials (not user's). 835 270 836 - Per ATProto spec, `com.atproto.repo.getRecord` and `listRecords` must include the record's CID: 271 + **Pros:** 272 + - ✅ No OAuth scope issues 273 + - ✅ Single OAuth flow for user 274 + - ✅ Simpler credential management 837 275 838 - ```go 839 - // ✅ CORRECT - Include CID in response 840 - response := map[string]any{ 841 - "uri": fmt.Sprintf("at://%s/%s/%s", did, collection, rkey), 842 - "cid": recordCID.String(), // Required! 843 - "value": record, 844 - } 845 - ``` 276 + **Cons:** 277 + - ❌ Holds must trust AppView (centralization) 278 + - ❌ Doesn't work for unknown holds 279 + - ❌ Requires registration process 846 280 847 - **Why:** Clients need the CID to verify record integrity via `com.atproto.sync.getRecord`. 281 + #### Option C: Public Hold API 848 282 849 - #### 5. sync.getRecord CAR Files Must Include Full MST Path 283 + Simplify by making holds public for reads, auth only for writes. 850 284 851 - The `com.atproto.sync.getRecord` endpoint must return a CAR file with ALL blocks needed to verify the record: 285 + **Pros:** 286 + - ✅ No OAuth complexity for reads 287 + - ✅ Works offline (no PDS dependency) 852 288 853 - ```go 854 - // ❌ WRONG - Only includes the record block 855 - blk, _ := repo.Blockstore().Get(ctx, recordCID) 856 - // Write single block to CAR 289 + **Cons:** 290 + - ❌ Private holds still need auth 291 + - ❌ Not standard ATProto pattern 857 292 858 - // ✅ CORRECT - Capture all accessed blocks 859 - loggingBS := util.NewLoggingBstore(session) 860 - tempRepo, _ := repo.OpenRepo(ctx, loggingBS, repoHead) 861 - _, _, _ = tempRepo.GetRecordBytes(ctx, path) 862 - blocks := loggingBS.GetLoggedBlocks() // Commit + MST nodes + record 863 - // Write all blocks to CAR 864 - ``` 293 + #### Option D: Hybrid Service Token + API Key 865 294 866 - **Components included:** 867 - 1. **Commit block** - Repo head with signature, data root, version 868 - 2. **MST tree nodes** - Path from root to record (log N depth) 869 - 3. **Record block** - The actual record data 295 + Use service tokens when available, fall back to API keys for BYOS holds. 870 296 871 - **Why:** Clients need the full Merkle path to cryptographically verify the record against the repo head. 297 + **Pros:** 298 + - ✅ Optimal for default holds 299 + - ✅ BYOS works with API keys 300 + - ✅ Backward compatible 872 301 873 - #### 6. CAR Root Must Be Repo Head, Not Record CID 302 + **Cons:** 303 + - ❌ Two auth mechanisms 304 + - ❌ Not pure ATProto 874 305 875 - The CAR file's root CID must be the repo head (commit), not the record: 306 + ### Recommended Approach 876 307 877 - ```go 878 - // ❌ WRONG - Uses record CID as root 879 - header := &car.CarHeader{ 880 - Roots: []cid.Cid{recordCID}, 881 - Version: 1, 882 - } 308 + **Short-term (MVP):** 309 + 1. Public holds (no auth needed for reads) 310 + 2. Default hold with service tokens (AppView-managed) 311 + 3. Document BYOS limitation 883 312 884 - // ✅ CORRECT - Uses repo head as root 885 - repoHead, _ := carstore.GetUserRepoHead(ctx, uid) 886 - header := &car.CarHeader{ 887 - Roots: []cid.Cid{repoHead}, // Commit CID 888 - Version: 1, 889 - } 890 - ``` 313 + **Medium-term:** 314 + 1. Hybrid approach (service tokens + API key fallback) 315 + 2. Clear security model for hold operators 891 316 892 - **Why:** The CAR represents a slice of the repo from head to record, not just the record itself. 317 + **Long-term:** 318 + 1. Explore direct user-to-hold OAuth 319 + 2. Credential helper manages multiple hold sessions 320 + 3. Auto-discover and authenticate to new holds 893 321 894 - #### 7. Empty Collections Should Return Empty Arrays 322 + ### Understanding getServiceAuth 895 323 896 - Handle empty collections gracefully instead of returning errors: 324 + **Purpose:** `com.atproto.server.getServiceAuth` gives a JWT to a service with access to specific functions in the user's PDS. It's a **temporary grant to a service outside of what you OAuth'd to**. 897 325 898 - ```go 899 - // ✅ CORRECT - Return empty array for missing collection 900 - err := repo.ForEach(ctx, collection, func(k string, v cid.Cid) error { 901 - // ... 902 - }) 903 - if err != nil { 904 - if err.Error() == "mst: not found" { 905 - return []*CrewMemberWithKey{}, nil // Empty collection 906 - } 907 - return nil, err // Real error 908 - } 909 - ``` 326 + **How ATCR uses it:** 327 + - User OAuth's to AppView (gets broad access to their account) 328 + - AppView needs to prove to hold that user authorized it 329 + - AppView calls user's PDS: "give me a token scoped for this hold" 330 + - User's PDS issues service token with narrow scope (e.g., `rpc:com.atproto.repo.getRecord?aud={holdDID}`) 331 + - AppView presents this token to hold as proof 910 332 911 - **Why:** ATProto expects empty arrays for non-existent collections, not 404 errors. 333 + **Industry usage:** 334 + - `getServiceAuth` appears to be the intended pattern for inter-service auth 335 + - Not widely used yet (ATProto ecosystem is young) 336 + - Most apps use `transition:generic` scope for everything (too broad, not ideal) 337 + - RPC permission scopes are finicky and not well documented 912 338 913 - ### Next Steps 339 + ### Open Questions 914 340 915 - 1. ~~**Add indigo dependencies**~~ ✅ 916 - 2. ~~**Implement HoldPDS with carstore**~~ ✅ 917 - 3. ~~**Add crew management**~~ ✅ 918 - 4. ~~**Implement standard PDS endpoints**~~ ✅ 919 - 5. ~~**Add DID document**~~ ✅ 920 - 6. **Custom XRPC methods** - getUploadUrl, getDownloadUrl (presigned URLs) 921 - 7. **Wire up in cmd/hold** - Serve XRPC alongside existing HTTP 922 - 8. **Test basic operations** - Add/list crew, policy checks 923 - 9. **Design delegation/IAM** - Token exchange for authenticated operations 924 - 10. **Implement AppView XRPC client** - Support PDS-based holds 341 + 1. **RPC permission format:** Can the `aud` field in RPC permissions support IP addresses? Is this a spec limitation or implementation bug? 342 + 2. **Scope granularity:** What's the right balance between `transition:generic` (too broad) and fine-grained RPC scopes (finicky)? 343 + 3. **Dynamic discovery + auth:** How should AppView authenticate to arbitrary holds discovered from sailor profiles without pre-registration? 344 + 4. **Service token caching:** Should service tokens be cached across multiple requests? Current: 50 second cache, is this optimal? 925 345 926 346 ## References 927 347 928 348 - **Stream.place embedded PDS:** https://streamplace.leaflet.pub/3lut7mgni5s2k/l-quote/6_318-6_554#6 929 349 - **ATProto OAuth spec:** https://atproto.com/specs/oauth 930 350 - **ATProto XRPC spec:** https://atproto.com/specs/xrpc 351 + - **ATProto Service Auth:** https://docs.bsky.app/docs/api/com-atproto-server-get-service-auth 931 352 - **CID spec:** https://github.com/multiformats/cid 932 353 - **OCI Distribution Spec:** https://github.com/opencontainers/distribution-spec
+733
docs/IMAGE_SIGNING.md
··· 1 + # Image Signing with ATProto 2 + 3 + ATCR can support cryptographic signing of container images to ensure authenticity and integrity. This document explores different approaches and recommends a design based on Notary v2's plugin architecture adapted for ATProto. 4 + 5 + ## Background: Why Not Cosign? 6 + 7 + [Sigstore Cosign](https://github.com/sigstore/cosign) is the most popular OCI image signing tool, but has several incompatibilities with ATProto: 8 + 9 + ### 1. Key Format Mismatch 10 + 11 + **ATProto PDS keys:** 12 + - Format: secp256k1 (K256) for signing 13 + - Purpose: ATProto record signatures, DID authentication 14 + - Access: Private keys never leave the PDS server 15 + - Standard: ATProto specification 16 + 17 + **Cosign expected keys:** 18 + - Format: ECDSA P-256, RSA, or Ed25519 19 + - Purpose: Image signing (not ATProto records) 20 + - Access: User-controlled private keys 21 + - Standard: Sigstore/PKIX 22 + 23 + **Problem:** Can't use PDS keys directly for Cosign signing - wrong curve, wrong access model, wrong security boundary. 24 + 25 + ### 2. No Direct PDS Key Access 26 + 27 + **Security model:** 28 + - PDS private keys are server-side secrets 29 + - Never exposed to clients (even authenticated users) 30 + - Used only by PDS for ATProto operations 31 + - Exposing them would compromise entire account security 32 + 33 + **Cosign requirement:** 34 + - Needs access to private key for signing operations 35 + - Expects user-controlled keys or KMS integration 36 + 37 + **Problem:** Can't sign images client-side with PDS keys without fundamentally breaking ATProto security model. 38 + 39 + ### 3. Keyless Signing Complexity 40 + 41 + Cosign supports "keyless" signing via OIDC + Fulcio CA: 42 + 43 + **What it requires:** 44 + - OIDC identity provider (Google, GitHub, etc.) 45 + - Fulcio certificate authority (issues short-lived certs) 46 + - Rekor transparency log (immutable signature log) 47 + - All infrastructure managed by Sigstore 48 + 49 + **ATProto adaptation would need:** 50 + - **OIDC bridge**: Make ATProto DIDs look like OIDC identities 51 + - Map `did:plc:alice123` → OIDC claims 52 + - PDS as OIDC provider? (not in spec) 53 + - Requires custom OIDC server wrapping ATProto auth 54 + - **Fulcio adaptation**: Issue certs based on ATProto identities 55 + - Deploy and manage CA infrastructure 56 + - Handle DID resolution in cert issuance 57 + - Trust anchor distribution 58 + - **Rekor instance**: Public transparency log for signatures 59 + - High availability requirements 60 + - Storage and indexing at scale 61 + - Replication and backup 62 + 63 + **Problem:** Too much infrastructure for ATCR to host and manage. Defeats the purpose of decentralized architecture. 64 + 65 + ### 4. Signature Storage 66 + 67 + **Cosign storage:** 68 + - OCI registry artifacts (signatures as ORAS manifests) 69 + - Stored alongside images in registry 70 + 71 + **ATCR ideal:** 72 + - Signatures in ATProto records (user's PDS) 73 + - Discoverable via ATProto queries 74 + - Integrated with ATProto's existing signature/verification model 75 + 76 + **Problem:** Would need to patch Cosign or run dual storage (OCI + ATProto) which creates consistency issues. 77 + 78 + ### Conclusion: Cosign Doesn't Fit 79 + 80 + While Cosign is excellent for traditional registries, forcing it into ATProto would require: 81 + - Breaking ATProto security model (exposing PDS keys), OR 82 + - Building massive OIDC/Fulcio/Rekor infrastructure, OR 83 + - Running parallel storage systems with consistency problems 84 + 85 + **Better approach:** Use a more flexible signing framework designed for extensibility. 86 + 87 + ## Notary v2: Plugin-Based Architecture 88 + 89 + [Notary v2](https://notaryproject.dev/) (also called "Notation" or "Notary Project") is a CNCF signature specification with a plugin architecture that fits ATProto better. 90 + 91 + ### Why Notary v2? 92 + 93 + **Flexible plugin system:** 94 + - **Trust store plugins**: Custom key resolution (e.g., from ATProto records) 95 + - **Signature plugins**: Custom signature storage (e.g., in PDS) 96 + - **Verification plugins**: Custom verification logic 97 + - Plugins written in any language, communicate via stdio 98 + 99 + **Multiple key types supported:** 100 + - ECDSA, RSA, Ed25519 out of box 101 + - Can support custom key types via plugins 102 + - Signature envelope format is extensible 103 + 104 + **Designed for extensibility:** 105 + - Not tied to specific PKI (unlike Cosign/Sigstore) 106 + - Trust policies are configurable 107 + - Storage backend is pluggable 108 + - Works with custom identity systems 109 + 110 + **Standard CLI:** 111 + - `notation sign` / `notation verify` commands 112 + - Users don't need to learn new tools 113 + - Integration with Docker/containerd 114 + 115 + ### Notary v2 Architecture 116 + 117 + ``` 118 + ┌─────────────────────┐ 119 + │ notation CLI │ User signs/verifies images 120 + └──────────┬──────────┘ 121 + 122 + ├─────────────────────────────────────┐ 123 + │ │ 124 + ┌──────────▼─────────┐ ┌───────────▼──────────┐ 125 + │ Signing Plugin │ │ Trust Store Plugin │ 126 + │ │ │ │ 127 + │ - Read private key │ │ - Resolve DID → PDS │ 128 + │ - Generate sig │ │ - Fetch public keys │ 129 + │ - Store in PDS │ │ - Verify trust │ 130 + └────────────────────┘ └──────────────────────┘ 131 + │ │ 132 + ▼ ▼ 133 + ┌─────────────────────────────────────────────────────────┐ 134 + │ User's PDS (ATProto) │ 135 + │ │ 136 + │ io.atcr.signing.key (public keys) │ 137 + │ io.atcr.signature (signatures) │ 138 + └─────────────────────────────────────────────────────────┘ 139 + ``` 140 + 141 + ## Proposed Design: ATProto Signing 142 + 143 + ### Key Management 144 + 145 + **Separate signing keys from PDS keys:** 146 + 147 + 1. **User generates signing key pair locally:** 148 + ```bash 149 + notation key generate --id alice-signing-key --type ecdsa 150 + # Or: --type ed25519, --type rsa 151 + ``` 152 + 153 + 2. **Public key published to ATProto:** 154 + ```json 155 + { 156 + "$type": "io.atcr.signing.key", 157 + "keyId": "alice-signing-key", 158 + "keyType": "ecdsa-p256", 159 + "publicKey": "-----BEGIN PUBLIC KEY-----\nMFkw...", 160 + "validFrom": "2025-10-20T12:00:00Z", 161 + "expiresAt": "2026-10-20T12:00:00Z", 162 + "revoked": false, 163 + "createdAt": "2025-10-20T12:00:00Z" 164 + } 165 + ``` 166 + 167 + 3. **Private key stored locally:** 168 + - Docker credential store 169 + - OS keychain (macOS Keychain, Windows Credential Manager) 170 + - File with restrictive permissions 171 + - Hardware security module (future) 172 + 173 + **Why separate keys?** 174 + - ✅ No need to access PDS private keys 175 + - ✅ Standard key formats (ECDSA, Ed25519, RSA) 176 + - ✅ User controls key lifecycle 177 + - ✅ Can use hardware tokens (YubiKey, etc.) 178 + - ✅ Security boundary separation (signing ≠ identity) 179 + - ✅ Key rotation without changing DID 180 + 181 + ### Signing Flow 182 + 183 + ``` 184 + 1. User: notation sign atcr.io/alice/myapp:latest --key alice-signing-key 185 + 186 + 2. notation-atproto plugin: 187 + a. Resolve image → manifest digest 188 + b. Read private key from local keystore 189 + c. Generate signature over manifest digest 190 + d. Get OAuth token for alice's PDS 191 + e. Create signature record in alice's PDS 192 + 193 + 3. Signature stored in alice's PDS: 194 + { 195 + "$type": "io.atcr.signature", 196 + "repository": "alice/myapp", 197 + "digest": "sha256:abc123...", 198 + "signature": "MEUCIQDx...", // base64 signature bytes 199 + "keyId": "alice-signing-key", 200 + "signatureAlgorithm": "ecdsa-p256-sha256", 201 + "signedAt": "2025-10-20T12:34:56Z" 202 + } 203 + 204 + 4. Record key: sha256 of (digest + keyId) for deduplication 205 + ``` 206 + 207 + ### Verification Flow 208 + 209 + ``` 210 + 1. User: notation verify atcr.io/alice/myapp:latest 211 + 212 + 2. notation-atproto plugin: 213 + a. Resolve "alice" → did:plc:alice123 → pds.alice.com 214 + b. Fetch manifest digest: sha256:abc123 215 + c. Query alice's PDS for signatures: 216 + GET /xrpc/com.atproto.repo.listRecords? 217 + repo=did:plc:alice123& 218 + collection=io.atcr.signature 219 + d. Filter records matching digest: sha256:abc123 220 + e. For each signature: 221 + - Fetch public key from io.atcr.signing.key record 222 + - Check key not revoked, not expired 223 + - Verify signature bytes over digest 224 + - Check trust policy (is this key trusted?) 225 + 226 + 3. Trust policy evaluation: 227 + - Signature valid cryptographically? ✓ 228 + - Key belongs to image owner (alice)? ✓ 229 + - Key not revoked? ✓ 230 + - Key not expired? ✓ 231 + - Trust policy satisfied? ✓ 232 + 233 + 4. Output: Verification succeeded ✓ 234 + ``` 235 + 236 + ### Trust Policies 237 + 238 + Notary v2 uses trust policies to define what signatures are required: 239 + 240 + ```json 241 + { 242 + "version": "1.0", 243 + "trustPolicies": [ 244 + { 245 + "name": "atcr-images", 246 + "registryScopes": ["atcr.io/*/*"], 247 + "signatureVerification": { 248 + "level": "strict" 249 + }, 250 + "trustStores": ["atproto:default"], 251 + "trustedIdentities": [ 252 + "did:plc:*" // Trust any ATProto DID 253 + ] 254 + } 255 + ] 256 + } 257 + ``` 258 + 259 + **Policy options:** 260 + - `level: strict` - Signature required, verification must pass 261 + - `level: permissive` - Signature optional, but verified if present 262 + - `level: audit` - Signature logged but doesn't block 263 + - `level: skip` - No verification 264 + 265 + **Trust store resolution:** 266 + - `atproto:default` - Use ATProto plugin to resolve keys 267 + - Plugin queries user's PDS for `io.atcr.signing.key` records 268 + - Verifies key is owned by the image owner (DID match) 269 + 270 + ### ATProto Records 271 + 272 + **io.atcr.signing.key** - Public signing keys 273 + 274 + ```json 275 + { 276 + "$type": "io.atcr.signing.key", 277 + "keyId": "alice-signing-key", 278 + "keyType": "ecdsa-p256", 279 + "publicKey": "-----BEGIN PUBLIC KEY-----\nMFkwEwYHKoZI...", 280 + "validFrom": "2025-10-20T12:00:00Z", 281 + "expiresAt": "2026-10-20T12:00:00Z", 282 + "revoked": false, 283 + "purpose": ["image-signing"], 284 + "createdAt": "2025-10-20T12:00:00Z" 285 + } 286 + ``` 287 + 288 + **Record key:** `keyId` (user-chosen identifier) 289 + 290 + **Fields:** 291 + - `keyId`: Unique identifier for this key 292 + - `keyType`: Algorithm (ecdsa-p256, ed25519, rsa-2048, rsa-4096) 293 + - `publicKey`: PEM-encoded public key 294 + - `validFrom`: Key becomes valid at this time 295 + - `expiresAt`: Key expires at this time (null = no expiry) 296 + - `revoked`: Key has been revoked (true/false) 297 + - `purpose`: Array of purposes (image-signing, sbom-signing, etc.) 298 + 299 + **io.atcr.signature** - Image signatures 300 + 301 + ```json 302 + { 303 + "$type": "io.atcr.signature", 304 + "repository": "alice/myapp", 305 + "digest": "sha256:abc123...", 306 + "signature": "MEUCIQDxH7...", 307 + "keyId": "alice-signing-key", 308 + "signatureAlgorithm": "ecdsa-p256-sha256", 309 + "signedAt": "2025-10-20T12:34:56Z", 310 + "createdAt": "2025-10-20T12:34:56Z" 311 + } 312 + ``` 313 + 314 + **Record key:** SHA256 hash of `(digest || keyId)` for deduplication 315 + 316 + **Fields:** 317 + - `repository`: Image repository (alice/myapp) 318 + - `digest`: Manifest digest being signed 319 + - `signature`: Base64-encoded signature bytes 320 + - `keyId`: Reference to signing key record 321 + - `signatureAlgorithm`: Algorithm used for signing 322 + - `signedAt`: When signature was created 323 + 324 + ### Plugin Implementation 325 + 326 + **notation-atproto** - Notary v2 plugin for ATProto 327 + 328 + **Trust store plugin:** 329 + ```go 330 + // Implements: notation trust store plugin spec 331 + // https://notaryproject.dev/docs/user-guides/how-to/plugin-management/ 332 + 333 + type ATProtoTrustStore struct { 334 + resolver *atproto.Resolver 335 + client *atproto.Client 336 + } 337 + 338 + // GetKeys resolves public keys for a given identity (DID) 339 + func (t *ATProtoTrustStore) GetKeys(did string) ([]PublicKey, error) { 340 + // 1. Resolve DID → PDS endpoint 341 + pds, err := t.resolver.ResolvePDS(did) 342 + 343 + // 2. Query PDS for io.atcr.signing.key records 344 + records, err := t.client.ListRecords(pds, did, "io.atcr.signing.key") 345 + 346 + // 3. Filter active keys (not revoked, not expired) 347 + keys := []PublicKey{} 348 + for _, record := range records { 349 + if !record.Revoked && !record.Expired() { 350 + keys = append(keys, ParsePublicKey(record.PublicKey)) 351 + } 352 + } 353 + 354 + return keys, nil 355 + } 356 + ``` 357 + 358 + **Signature store plugin:** 359 + ```go 360 + // Store signature in user's PDS 361 + func (s *ATProtoSignatureStore) StoreSignature(sig Signature) error { 362 + // 1. Get OAuth token for user's PDS 363 + token, err := s.oauthClient.GetToken() 364 + 365 + // 2. Create signature record 366 + record := SignatureRecord{ 367 + Type: "io.atcr.signature", 368 + Repository: sig.Repository, 369 + Digest: sig.Digest, 370 + Signature: base64.Encode(sig.Bytes), 371 + KeyId: sig.KeyId, 372 + SignatureAlgorithm: sig.Algorithm, 373 + SignedAt: time.Now(), 374 + } 375 + 376 + // 3. Generate record key (hash of digest + keyId) 377 + rkey := sha256.Sum256([]byte(sig.Digest + sig.KeyId)) 378 + 379 + // 4. Write to PDS 380 + err = s.client.PutRecord(pds, did, "io.atcr.signature", hex.Encode(rkey), record) 381 + 382 + return err 383 + } 384 + 385 + // Retrieve signatures for a digest 386 + func (s *ATProtoSignatureStore) GetSignatures(did, digest string) ([]Signature, error) { 387 + // Query PDS for matching signatures 388 + records, err := s.client.ListRecords(pds, did, "io.atcr.signature") 389 + 390 + // Filter by digest 391 + sigs := []Signature{} 392 + for _, record := range records { 393 + if record.Digest == digest { 394 + sigs = append(sigs, ParseSignature(record)) 395 + } 396 + } 397 + 398 + return sigs, nil 399 + } 400 + ``` 401 + 402 + **Plugin installation:** 403 + ```bash 404 + # Install notation CLI 405 + brew install notation 406 + 407 + # Install ATProto plugin 408 + notation plugin install notation-atproto --version v1.0.0 409 + 410 + # Configure trust policy 411 + cat > ~/.config/notation/trustpolicy.json <<EOF 412 + { 413 + "version": "1.0", 414 + "trustPolicies": [ 415 + { 416 + "name": "atcr-images", 417 + "registryScopes": ["atcr.io/*/*"], 418 + "signatureVerification": {"level": "strict"}, 419 + "trustStores": ["atproto:default"], 420 + "trustedIdentities": ["did:plc:*"] 421 + } 422 + ] 423 + } 424 + EOF 425 + ``` 426 + 427 + ## User Workflows 428 + 429 + ### Initial Setup 430 + 431 + ```bash 432 + # 1. Generate signing key pair 433 + notation key generate --id alice-signing-key --type ecdsa 434 + 435 + # Private key stored in: ~/.config/notation/keys/ 436 + # Public key extracted by plugin 437 + 438 + # 2. Publish public key to PDS 439 + notation-atproto key publish alice-signing-key 440 + 441 + # Plugin uploads io.atcr.signing.key record to alice's PDS 442 + # Requires OAuth authentication to alice's PDS 443 + 444 + # 3. Verify key is published 445 + notation-atproto key list 446 + 447 + # Output: 448 + # alice-signing-key (ecdsa-p256) - Active 449 + # Published: 2025-10-20T12:00:00Z 450 + # Expires: 2026-10-20T12:00:00Z 451 + # DID: did:plc:alice123 452 + ``` 453 + 454 + ### Signing Images 455 + 456 + ```bash 457 + # Sign an image after pushing 458 + docker push atcr.io/alice/myapp:latest 459 + 460 + notation sign atcr.io/alice/myapp:latest \ 461 + --key alice-signing-key \ 462 + --plugin atproto 463 + 464 + # Plugin: 465 + # 1. Reads private key from ~/.config/notation/keys/ 466 + # 2. Signs manifest digest 467 + # 3. Uploads signature to alice's PDS (io.atcr.signature record) 468 + # 4. Returns success 469 + 470 + # Output: 471 + # Successfully signed atcr.io/alice/myapp:latest 472 + # Signature stored in PDS: did:plc:alice123 473 + ``` 474 + 475 + ### Verifying Images 476 + 477 + ```bash 478 + # Verify before running 479 + notation verify atcr.io/alice/myapp:latest 480 + 481 + # Plugin: 482 + # 1. Resolves "alice" → did:plc:alice123 → pds.alice.com 483 + # 2. Fetches manifest digest 484 + # 3. Queries alice's PDS for signatures 485 + # 4. Fetches public key from io.atcr.signing.key 486 + # 5. Verifies signature cryptographically 487 + # 6. Checks trust policy 488 + 489 + # Output: 490 + # ✓ Signature verification succeeded 491 + # 492 + # Signed by: did:plc:alice123 493 + # Key ID: alice-signing-key 494 + # Signed at: 2025-10-20T12:34:56Z 495 + # Algorithm: ecdsa-p256-sha256 496 + ``` 497 + 498 + ### Key Rotation 499 + 500 + ```bash 501 + # Generate new key 502 + notation key generate --id alice-signing-key-2 --type ecdsa 503 + 504 + # Publish new key 505 + notation-atproto key publish alice-signing-key-2 506 + 507 + # Re-sign images with new key 508 + notation sign atcr.io/alice/myapp:latest --key alice-signing-key-2 509 + 510 + # Revoke old key 511 + notation-atproto key revoke alice-signing-key 512 + 513 + # Plugin updates io.atcr.signing.key record: 514 + # { ..., "revoked": true, "revokedAt": "2025-11-01T..." } 515 + 516 + # Old signatures still exist but verification will fail 517 + # (revoked key = untrusted) 518 + ``` 519 + 520 + ### Key Expiration 521 + 522 + ```bash 523 + # Generate key with expiration 524 + notation key generate \ 525 + --id alice-signing-key \ 526 + --type ecdsa \ 527 + --expires 365d # 1 year 528 + 529 + # Publish with expiration 530 + notation-atproto key publish alice-signing-key 531 + 532 + # PDS record: 533 + # { 534 + # "validFrom": "2025-10-20T12:00:00Z", 535 + # "expiresAt": "2026-10-20T12:00:00Z" 536 + # } 537 + 538 + # After expiration, verification fails: 539 + notation verify atcr.io/alice/myapp:latest 540 + # ✗ Signature verification failed 541 + # Signing key expired on 2026-10-20T12:00:00Z 542 + ``` 543 + 544 + ## Security Considerations 545 + 546 + ### Key Storage 547 + 548 + **Private keys must be protected:** 549 + - File permissions: `0600` (owner read/write only) 550 + - Use OS keychain when possible (macOS Keychain, Windows Credential Manager) 551 + - Consider hardware tokens (YubiKey, TPM) for production 552 + - Never commit private keys to git 553 + 554 + **Public keys are public:** 555 + - Stored in user's PDS (publicly readable) 556 + - Anyone can verify signatures 557 + - Revocation is public and immediate 558 + 559 + ### Trust Model 560 + 561 + **What signatures prove:** 562 + - ✅ Image manifest hasn't been tampered with since signing 563 + - ✅ Signer had access to private key at signing time 564 + - ✅ Signer's DID matches image owner (alice signed alice/myapp) 565 + 566 + **What signatures don't prove:** 567 + - ❌ Image is free of vulnerabilities 568 + - ❌ Image contents are safe to run 569 + - ❌ Signer's identity is verified (depends on DID trust) 570 + 571 + **Trust anchors:** 572 + - Trust PDS to correctly serve signing key records 573 + - Trust DID resolution (PLC directory, did:web DNS) 574 + - Trust signature algorithms (ECDSA, Ed25519, RSA) 575 + - Trust user to protect their private keys 576 + 577 + ### Key Compromise 578 + 579 + If a private signing key is compromised: 580 + 581 + ```bash 582 + # 1. Immediately revoke the key 583 + notation-atproto key revoke alice-signing-key --reason "Key compromised" 584 + 585 + # 2. Generate new key 586 + notation key generate --id alice-signing-key-new --type ecdsa 587 + 588 + # 3. Publish new key 589 + notation-atproto key publish alice-signing-key-new 590 + 591 + # 4. Re-sign all images with new key 592 + for image in $(docker images --format "{{.Repository}}:{{.Tag}}"); do 593 + notation sign $image --key alice-signing-key-new 594 + done 595 + 596 + # 5. Alert users to only trust new key 597 + # (Old signatures will fail verification due to revocation) 598 + ``` 599 + 600 + **Revocation is immediate:** 601 + - PDS record updated with `"revoked": true` 602 + - All verification attempts fail instantly 603 + - No need to update certificate revocation lists (CRLs) 604 + - ATProto record queries are always fresh 605 + 606 + ### Multiple Signatures 607 + 608 + Images can have multiple signatures: 609 + 610 + ```bash 611 + # Alice signs with her key 612 + notation sign atcr.io/alice/myapp:latest --key alice-signing-key 613 + 614 + # CI/CD system signs with separate key 615 + notation sign atcr.io/alice/myapp:latest --key ci-signing-key 616 + 617 + # Both signatures stored in alice's PDS 618 + # Verification requires both (configurable in trust policy) 619 + ``` 620 + 621 + **Trust policy:** 622 + ```json 623 + { 624 + "trustPolicies": [{ 625 + "name": "require-dual-signature", 626 + "registryScopes": ["atcr.io/alice/*"], 627 + "signatureVerification": { 628 + "level": "strict", 629 + "verifyTimestamp": true, 630 + "override": { 631 + "all": ["alice-signing-key", "ci-signing-key"] 632 + } 633 + } 634 + }] 635 + } 636 + ``` 637 + 638 + ## Implementation Roadmap 639 + 640 + ### Phase 1: Core Plugin (4-6 weeks) 641 + 642 + **Week 1-2: Trust store plugin** 643 + - Implement DID resolution 644 + - Query `io.atcr.signing.key` records 645 + - Parse and validate public keys 646 + - Handle revocation and expiration 647 + 648 + **Week 3-4: Signature store plugin** 649 + - OAuth integration for PDS writes 650 + - Create `io.atcr.signature` records 651 + - Query signatures for verification 652 + - Handle record key generation 653 + 654 + **Week 5-6: Integration testing** 655 + - End-to-end sign/verify workflows 656 + - Key rotation scenarios 657 + - Revocation handling 658 + - Multi-signature support 659 + 660 + ### Phase 2: Tooling (2-3 weeks) 661 + 662 + **CLI commands:** 663 + ```bash 664 + notation-atproto key generate 665 + notation-atproto key publish 666 + notation-atproto key list 667 + notation-atproto key revoke 668 + notation-atproto signature list <image> 669 + notation-atproto signature inspect <image> 670 + ``` 671 + 672 + **Helper utilities:** 673 + - Bulk re-signing for key rotation 674 + - Signature audit logs 675 + - Trust policy generators 676 + - Key lifecycle management 677 + 678 + ### Phase 3: AppView Integration (2-3 weeks) 679 + 680 + **Web UI features:** 681 + - Display signature status on repository pages 682 + - Show signing keys for users 683 + - Signature verification badges 684 + - Key management interface 685 + 686 + **API endpoints:** 687 + - `GET /v2/alice/myapp/signatures` - List signatures for image 688 + - `GET /v2/alice/keys` - List user's signing keys 689 + - `POST /v2/alice/keys/revoke` - Revoke key via web UI 690 + 691 + ### Phase 4: Advanced Features (ongoing) 692 + 693 + **Hardware token support:** 694 + - YubiKey integration 695 + - TPM-backed keys 696 + - Hardware-backed keystores 697 + 698 + **Timestamp verification:** 699 + - Trusted timestamp authorities 700 + - Prove signature was created at specific time 701 + - Long-term signature validity 702 + 703 + **SBOM signing:** 704 + - Sign SBOMs with same keys 705 + - Link SBOM signatures to image signatures 706 + - Unified verification workflow 707 + 708 + ## Comparison: Cosign vs Notary v2 for ATCR 709 + 710 + | Feature | Cosign | Notary v2 | Winner | 711 + |---------|--------|-----------|--------| 712 + | **ATProto integration** | Requires OIDC bridge | Plugin system | ✅ Notary | 713 + | **Key format flexibility** | Limited | Extensible | ✅ Notary | 714 + | **Custom storage** | OCI only | Pluggable | ✅ Notary | 715 + | **Infrastructure needs** | Fulcio + Rekor | None | ✅ Notary | 716 + | **Keyless signing** | Yes (complex) | No | ⚠️ Cosign* | 717 + | **Ecosystem maturity** | High | Medium | ⚠️ Cosign* | 718 + | **CLI simplicity** | Very simple | Simple | ⚠️ Cosign* | 719 + | **Plugin development** | N/A | Required | ⚠️ Mixed | 720 + 721 + *Cosign advantages don't outweigh ATProto incompatibilities 722 + 723 + **Recommendation: Notary v2 with ATProto plugin** 724 + 725 + ## References 726 + 727 + - [Notary v2 Specification](https://notaryproject.dev/) 728 + - [Notation CLI](https://github.com/notaryproject/notation) 729 + - [Notary Plugin Specification](https://notaryproject.dev/docs/user-guides/how-to/plugin-management/) 730 + - [Sigstore Cosign](https://github.com/sigstore/cosign) (for comparison) 731 + - [ATProto Specification](https://atproto.com/) 732 + - [OCI Artifacts](https://github.com/opencontainers/artifacts) 733 + - [RFC 7515 - JSON Web Signature](https://datatracker.ietf.org/doc/html/rfc7515) (signature formats)
+365 -72
docs/SAILOR.md
··· 1 - User approved Claude's plan: 2 - ⎿ Implement Sailor Profile & Hold Discovery System 1 + # Sailor Profile System 2 + 3 + ## Overview 4 + 5 + The sailor profile system allows users to choose which hold (storage service) to use for their container images. This enables: 6 + - **Personal holds** - Use your own S3/Storj/Minio storage 7 + - **Shared holds** - Join a team or community hold 8 + - **Default holds** - Use AppView's default storage (free tier) 9 + - **Transparent infrastructure** - Hold choice doesn't affect image URL 10 + 11 + ## Concepts 12 + 13 + **Sailor Profile** (`io.atcr.sailor.profile`): 14 + - Record stored in user's PDS 15 + - Contains `defaultHold` preference (DID or URL) 16 + - Created automatically on first authentication 17 + - Managed via web UI or ATProto client 18 + 19 + **Hold Discovery Priority**: 20 + 1. User's sailor profile `defaultHold` (if set) 21 + 2. User's own hold records (`io.atcr.hold`) - legacy 22 + 3. AppView's `default_hold_did` configuration 23 + 24 + ## Sailor Profile Record 25 + 26 + ```json 27 + { 28 + "$type": "io.atcr.sailor.profile", 29 + "defaultHold": "did:web:hold.example.com", 30 + "createdAt": "2025-10-02T12:00:00Z", 31 + "updatedAt": "2025-10-02T12:00:00Z" 32 + } 33 + ``` 34 + 35 + **Fields:** 36 + - `defaultHold` (string, optional) - Hold DID or URL (auto-normalized to DID) 37 + - `createdAt` (datetime, required) - Profile creation timestamp 38 + - `updatedAt` (datetime, required) - Last update timestamp 39 + 40 + **Record key:** Always `"self"` (only one profile per user) 41 + 42 + **Collection:** `io.atcr.sailor.profile` 43 + 44 + ## Profile Management 45 + 46 + ### Automatic Creation 47 + 48 + Profiles are created automatically on first authentication: 49 + 50 + ```go 51 + // During OAuth login or Basic Auth token exchange 52 + func (h *Handler) HandleCallback(w http.ResponseWriter, r *http.Request) { 53 + // ... OAuth flow ... 54 + 55 + // Create ATProto client with user's OAuth session 56 + client := atproto.NewClientWithIndigoClient(pdsEndpoint, did, apiClient) 57 + 58 + // Ensure profile exists (creates with AppView's default if not) 59 + err := atproto.EnsureProfile(ctx, client, appViewDefaultHoldDID) 60 + } 61 + ``` 62 + 63 + **Behavior:** 64 + - If profile exists → no-op 65 + - If profile doesn't exist → creates with `defaultHold` set to AppView's default 66 + - If AppView has no default configured → creates with empty `defaultHold` 67 + 68 + ### Web UI Management 69 + 70 + Users can update their profile via the settings page (`/settings`): 71 + 72 + **View current profile:** 73 + ``` 74 + GET /settings 75 + → Shows current defaultHold value 76 + ``` 77 + 78 + **Update defaultHold:** 79 + ``` 80 + POST /api/settings/update-hold 81 + Form data: hold_endpoint=did:web:team-hold.fly.dev 82 + 83 + → Updates sailor profile in user's PDS 84 + → Returns success confirmation 85 + ``` 86 + 87 + **Implementation** (`pkg/appview/handlers/settings.go`): 88 + - Requires OAuth session (user must be logged in) 89 + - Fetches existing profile or creates new one 90 + - Normalizes URLs to DIDs automatically 91 + - Updates `updatedAt` timestamp 92 + 93 + ### ATProto Client Management 94 + 95 + Users can also manage their profile using standard ATProto tools: 96 + 97 + **Get profile:** 98 + ```bash 99 + atproto get-record \ 100 + --collection io.atcr.sailor.profile \ 101 + --rkey self 102 + ``` 103 + 104 + **Update profile:** 105 + ```bash 106 + atproto put-record \ 107 + --collection io.atcr.sailor.profile \ 108 + --rkey self \ 109 + --value '{ 110 + "$type": "io.atcr.sailor.profile", 111 + "defaultHold": "did:web:my-hold.example.com", 112 + "updatedAt": "2025-10-20T12:00:00Z" 113 + }' 114 + ``` 115 + 116 + **Clear default hold** (opt out): 117 + ```bash 118 + atproto put-record \ 119 + --collection io.atcr.sailor.profile \ 120 + --rkey self \ 121 + --value '{ 122 + "$type": "io.atcr.sailor.profile", 123 + "defaultHold": "", 124 + "updatedAt": "2025-10-20T12:00:00Z" 125 + }' 126 + ``` 127 + 128 + ## URL-to-DID Migration 129 + 130 + The system automatically migrates old URL-based `defaultHold` values to DID format for consistency: 131 + 132 + **Old format (deprecated):** 133 + ```json 134 + { 135 + "defaultHold": "https://hold.example.com" 136 + } 137 + ``` 138 + 139 + **New format (preferred):** 140 + ```json 141 + { 142 + "defaultHold": "did:web:hold.example.com" 143 + } 144 + ``` 145 + 146 + **Migration behavior:** 147 + - `GetProfile()` detects URL format automatically 148 + - Converts URL → DID transparently (strips protocol, converts to `did:web:`) 149 + - Persists migration to PDS in background goroutine 150 + - Uses locks to prevent duplicate migrations 151 + - Completely transparent to user 152 + 153 + **Why DIDs?** 154 + - **Portable**: DIDs work offline, URLs require DNS 155 + - **Canonical**: One DID per hold, multiple URLs possible 156 + - **Standard**: ATProto uses DIDs for identity 157 + 158 + ## Hold Discovery Flow 159 + 160 + When a user pushes an image, AppView discovers which hold to use: 161 + 162 + ``` 163 + 1. User: docker push atcr.io/alice/myapp:latest 164 + 165 + 2. AppView resolves alice → did:plc:alice123 166 + 167 + 3. AppView calls findHoldDID(did, pdsEndpoint): 168 + a. Query alice's PDS for io.atcr.sailor.profile/self 169 + b. If profile.defaultHold is set → use it 170 + c. Else check alice's io.atcr.hold records (legacy) 171 + d. Else use AppView's default_hold_did 172 + 173 + 4. Found: alice.profile.defaultHold = "did:web:team-hold.fly.dev" 174 + 175 + 5. AppView uses team-hold.fly.dev for blob storage 176 + 177 + 6. Manifest stored in alice's PDS includes: 178 + - holdDid: "did:web:team-hold.fly.dev" (for future pulls) 179 + - holdEndpoint: "https://team-hold.fly.dev" (backward compat) 180 + ``` 181 + 182 + **Implementation** (`pkg/appview/middleware/registry.go:findHoldDID()`): 183 + 184 + ```go 185 + func (nr *NamespaceResolver) findHoldDID(ctx context.Context, did, pdsEndpoint string) string { 186 + client := atproto.NewClient(pdsEndpoint, did, "") 187 + 188 + // 1. Check sailor profile 189 + profile, err := atproto.GetProfile(ctx, client) 190 + if profile != nil && profile.DefaultHold != "" { 191 + return profile.DefaultHold // DID or URL (auto-normalized) 192 + } 193 + 194 + // 2. Check own hold records (legacy) 195 + records, _ := client.ListRecords(ctx, "io.atcr.hold", 10) 196 + for _, record := range records { 197 + // Return first hold's endpoint 198 + if holdRecord.Endpoint != "" { 199 + return atproto.ResolveHoldDIDFromURL(holdRecord.Endpoint) 200 + } 201 + } 202 + 203 + // 3. Use AppView default 204 + return nr.defaultHoldDID 205 + } 206 + ``` 207 + 208 + ## Use Cases 209 + 210 + ### 1. Default Hold (Free Tier) 211 + 212 + User doesn't need to do anything: 213 + 214 + ``` 215 + 1. User authenticates to atcr.io 216 + 2. Profile created with defaultHold = AppView's default 217 + 3. User pushes images → blobs go to default hold 218 + ``` 219 + 220 + **Profile:** 221 + ```json 222 + { 223 + "defaultHold": "did:web:hold01.atcr.io" 224 + } 225 + ``` 3 226 4 - Summary 227 + ### 2. Join Team Hold 5 228 6 - Add io.atcr.sailor.profile record type to manage user's default hold preference, and update manifest to store historical hold endpoint reference. This enables transparent hold 7 - routing while preserving image ownership semantics. 229 + User joins a shared team hold: 8 230 9 - Changes Required 231 + ``` 232 + 1. Team admin deploys hold service (did:web:team-hold.fly.dev) 233 + 2. Team admin adds user to crew (via hold's PDS) 234 + 3. User updates profile: 235 + - Via web UI: /settings → set hold to "did:web:team-hold.fly.dev" 236 + - Or via ATProto client: put-record 237 + 4. User pushes images → blobs go to team hold 238 + ``` 10 239 11 - 1. Create Sailor Profile Lexicon 240 + **Profile:** 241 + ```json 242 + { 243 + "defaultHold": "did:web:team-hold.fly.dev" 244 + } 245 + ``` 12 246 13 - File: lexicons/io/atcr/sailor/profile.json 14 - - New record type: io.atcr.sailor.profile 15 - - Fields: defaultHold (string, nullable), createdAt, updatedAt 247 + **Benefits:** 248 + - Team pays for storage (not individual users) 249 + - Centralized access control 250 + - Shared bandwidth limits 16 251 17 - 2. Update Manifest Lexicon 252 + ### 3. Personal Hold (BYOS) 18 253 19 - File: lexicons/io/atcr/manifest.json 20 - - Add holdEndpoint field (string, required) 21 - - This is historical reference (immutable per manifest) 254 + User deploys their own hold: 22 255 23 - 3. Update Go Types 256 + ``` 257 + 1. User deploys hold service to Fly.io (did:web:alice-hold.fly.dev) 258 + 2. Hold auto-creates captain + crew records on first run 259 + 3. User updates profile to use their hold 260 + 4. User pushes images → blobs go to personal hold 261 + ``` 24 262 25 - File: pkg/atproto/lexicon.go 26 - - Add SailorProfileCollection = "io.atcr.sailor.profile" 27 - - Add SailorProfileRecord struct 28 - - Add NewSailorProfileRecord() constructor 29 - - Update ManifestRecord struct to include HoldEndpoint field 263 + **Profile:** 264 + ```json 265 + { 266 + "defaultHold": "did:web:alice-hold.fly.dev" 267 + } 268 + ``` 30 269 31 - 4. Create Profile Management 270 + **Benefits:** 271 + - Full control over storage 272 + - Choose storage provider (S3, Storj, Minio, etc.) 273 + - No quotas/limits (except what you pay for) 32 274 33 - File: pkg/atproto/profile.go (new file) 34 - - EnsureProfile(ctx, client, defaultHoldDID) function 35 - - Logic: check if profile exists, create with default if not 275 + ### 4. Opt Out of Defaults 36 276 37 - 5. Update Auth Handlers 277 + User wants to use only their own hold records (legacy model): 38 278 39 - Files: pkg/auth/exchange/handler.go and pkg/auth/token/service.go 40 - - Call EnsureProfile() after token validation 41 - - Use authenticated client (has write access to user's PDS) 42 - - Pass AppView's default_hold_did config (format: "did:web:hold01.atcr.io") 279 + ```json 280 + { 281 + "defaultHold": "" 282 + } 283 + ``` 43 284 44 - 6. Update Hold Resolution 285 + **Behavior:** 286 + - Skips profile's defaultHold (set to empty/null) 287 + - Falls back to `io.atcr.hold` records in user's PDS 288 + - If no hold records found → uses AppView default 45 289 46 - File: pkg/middleware/registry.go 47 - - Update findStorageEndpoint() priority: 48 - a. Check io.atcr.sailor.profile.defaultHold 49 - b. If null (opted out): check user's io.atcr.hold, then AppView default 50 - c. If no profile: check user's io.atcr.hold, then AppView default 290 + ## Architecture Notes 51 291 52 - 7. Store Hold in Manifest 292 + ### Why Sailor Profile? 53 293 54 - File: pkg/atproto/manifest_store.go 55 - - When creating manifest, include resolved holdEndpoint 56 - - Pass hold endpoint through context or parameter 294 + **Problem solved:** 295 + - Users can be crew members of multiple holds 296 + - Need explicit way to choose which hold to use 297 + - Want to support both personal and shared holds 57 298 58 - 8. Update Pull to Use Manifest Hold 299 + **Without sailor profile:** 300 + ``` 301 + Alice is crew of: 302 + - team-hold.fly.dev (team storage) 303 + - community-hold.fly.dev (community storage) 59 304 60 - File: pkg/atproto/manifest_store.go and pkg/storage/routing_repository.go 61 - - On pull, extract holdEndpoint from manifest 62 - - Route blob requests to that hold (not via discovery) 305 + Which one should AppView use? 🤔 306 + ``` 63 307 64 - 9. Update Documentation 308 + **With sailor profile:** 309 + ``` 310 + Alice sets profile.defaultHold = "did:web:team-hold.fly.dev" 311 + → AppView knows to use team hold 312 + → Alice can change anytime via settings 313 + ``` 314 + 315 + ### Image Ownership vs Hold Choice 316 + 317 + **Key insight:** Image ownership stays with the user, hold is just infrastructure. 318 + 319 + **URL structure:** `atcr.io/<owner>/<image>:<tag>` 320 + - Owner = Alice (clear ownership) 321 + - Hold = Team storage (infrastructure detail) 322 + 323 + **Analogy:** Like choosing an S3 region 324 + - Your files, your ownership 325 + - Region is just where bits live 326 + - Can move regions without changing ownership 327 + 328 + ### Historical Hold References 329 + 330 + Manifests store `holdDid` for immutable blob location tracking: 331 + 332 + ```json 333 + { 334 + "digest": "sha256:abc123", 335 + "holdDid": "did:web:team-hold.fly.dev", 336 + "holdEndpoint": "https://team-hold.fly.dev", 337 + "layers": [...] 338 + } 339 + ``` 340 + 341 + **Why store hold in manifest?** 342 + - Pull uses historical reference (not re-discovered) 343 + - Image stays pullable even if user changes defaultHold 344 + - Blobs fetched from where they were originally pushed 345 + - Immutable references (manifests don't change) 346 + 347 + **Hold cache:** 348 + - In-memory cache: `(userDID, repository) → holdDid` 349 + - TTL: 10 minutes (covers typical pull operation) 350 + - Avoids re-querying PDS for every blob 351 + 352 + ## Configuration 353 + 354 + ### AppView Configuration 65 355 66 - Files: CLAUDE.md, docs/BYOS.md, .env.example 67 - - Document sailor profile concept 68 - - Explain hold resolution priority 69 - - Update examples for shared holds 70 - - Document how crew members configure profile 356 + ```bash 357 + # Default hold for new users 358 + ATCR_DEFAULT_HOLD_DID=did:web:hold01.atcr.io 359 + 360 + # Test mode: fallback to default if user's hold unreachable 361 + ATCR_TEST_MODE=false 362 + ``` 71 363 72 - Benefits 364 + **Test mode behavior:** 365 + - Checks if user's defaultHold is reachable (HTTP/HTTPS) 366 + - Falls back to AppView default if unreachable 367 + - Useful for local development (prevents errors from unreachable holds) 73 368 74 - - ✅ URL structure remains atcr.io/<owner>/<image> (ownership clear) 75 - - ✅ Hold is transparent infrastructure (like S3 region) 76 - - ✅ Supports personal, shared, and public holds 77 - - ✅ Historical reference in manifest (pull works even if defaults change) 78 - - ✅ User can opt-out (set defaultHold to null) 79 - - ✅ Future: UI for self-service profile management 369 + ### Legacy Support 80 370 371 + **Old hold registration model** (`io.atcr.hold` records in user's PDS): 372 + - Still supported for backward compatibility 373 + - Checked if profile.defaultHold is empty 374 + - New deployments should use sailor profiles instead 81 375 82 - Progress Summary 376 + **Migration path:** 377 + - Existing holds continue to work 378 + - Users with `io.atcr.hold` records can set profile.defaultHold 379 + - Profile takes priority over hold records 83 380 84 - ✅ Completed: 85 - 1. Created io.atcr.sailor.profile lexicon 86 - 2. Updated io.atcr.manifest lexicon with holdEndpoint field 87 - 3. Updated Go types in pkg/atproto/lexicon.go 88 - 4. Created profile management in pkg/atproto/profile.go 89 - 5. Updated /auth/exchange handler to manage profile 381 + ## Future Improvements 90 382 91 - ⏳ In Progress: 92 - - Need to update /auth/token handler similarly (add defaultHoldDID parameter and profile management) 93 - - Fix compilation error in extractDefaultHoldDID() - should use configuration.Middleware type not any 383 + 1. **Multi-hold support** - Set different holds for different repositories 384 + 2. **Hold suggestions** - Recommend holds based on geography/cost 385 + 3. **Hold migration tools** - Move blobs between holds 386 + 4. **Profile templates** - Pre-configured profiles for teams 387 + 5. **Hold analytics** - Show storage usage per hold in UI 94 388 95 - 🔜 Remaining: 96 - - Update findStorageEndpoint() for new priority logic (check profile → own hold → default) 97 - - Update manifest_store.go to include holdEndpoint when creating manifests 98 - - Update pull flow to use manifest holdEndpoint 99 - - Update documentation 389 + ## References 100 390 101 - The architecture is solid - just need to finish the token handler update and fix the type issue in the config extraction. Would you like me to continue? 391 + - [BYOS.md](./BYOS.md) - BYOS deployment and hold management 392 + - [EMBEDDED_PDS.md](./EMBEDDED_PDS.md) - Hold's embedded PDS architecture 393 + - [CREW_ACCESS_CONTROL.md](./CREW_ACCESS_CONTROL.md) - Crew membership and permissions 394 + - [ATProto Lexicon Spec](https://atproto.com/specs/lexicon)
+568
docs/SBOM_SCANNING.md
··· 1 + # SBOM Scanning 2 + 3 + ATCR supports optional Software Bill of Materials (SBOM) generation for container images stored in holds. This feature enables automated security scanning and vulnerability analysis while maintaining the decentralized architecture. 4 + 5 + ## Overview 6 + 7 + When enabled, holds automatically generate SBOMs for uploaded container images in the background. The scanning process: 8 + 9 + - **Async execution**: Scanning happens after upload completes (non-blocking) 10 + - **ORAS artifacts**: SBOMs stored as OCI Registry as Storage (ORAS) artifacts 11 + - **ATProto integration**: Scan results stored as `io.atcr.manifest` records in hold's embedded PDS 12 + - **Tool agnostic**: Results accessible via XRPC, ATProto queries, and direct blob URLs 13 + - **Opt-in**: Disabled by default, enabled per-hold via configuration 14 + 15 + ### Default Scanner: Syft 16 + 17 + ATCR uses [Anchore Syft](https://github.com/anchore/syft) for SBOM generation: 18 + - Industry-standard SBOM generator 19 + - Supports SPDX and CycloneDX formats 20 + - Comprehensive package detection (OS packages, language libraries, etc.) 21 + - Active maintenance and CVE database updates 22 + 23 + Future enhancements may include [Grype](https://github.com/anchore/grype) for vulnerability scanning and [Trivy](https://github.com/aquasecurity/trivy) for comprehensive security analysis. 24 + 25 + ## Trust Model 26 + 27 + ### Same Trust as Docker Hub 28 + 29 + SBOM scanning follows the same trust model as Docker Hub or other centralized registries: 30 + 31 + **Docker Hub model:** 32 + - Docker Hub scans your image on their infrastructure 33 + - Results stored in their database 34 + - You trust Docker Hub's scanner version and scan integrity 35 + 36 + **ATCR hold model:** 37 + - Hold scans image on their infrastructure 38 + - Results stored in hold's embedded PDS 39 + - You trust hold operator's scanner version and scan integrity 40 + 41 + The security comes from **reproducibility** and **transparency**, not storage location: 42 + - Anyone can re-scan the same digest and verify results 43 + - Multiple holds scanning the same image provide independent verification 44 + - Scanner version and scan timestamp are recorded in ATProto records 45 + 46 + ### Why Hold's PDS? 47 + 48 + Scan results are stored in the **hold's embedded PDS** rather than the user's PDS: 49 + 50 + **Advantages:** 51 + 1. **No OAuth expiry issues**: Hold owns its PDS, no service tokens needed 52 + 2. **Hold-scoped metadata**: Scanner version, scan time, hold configuration 53 + 3. **Multiple perspectives**: Different holds can scan the same image independently 54 + 4. **Simpler auth**: Hold writes directly to its own PDS 55 + 5. **Keeps user PDS lean**: Potentially large SBOM data doesn't bloat user's repo 56 + 57 + **Security properties:** 58 + - Same trust level as trusting hold to serve correct blobs 59 + - DID signatures prove which hold generated the SBOM 60 + - Reproducible scans enable independent verification 61 + - Multiple holds scanning same digest → compare results for tampering detection 62 + 63 + ## ORAS Manifest Format 64 + 65 + SBOMs are stored as ORAS artifacts that reference their subject image using the OCI referrers specification. 66 + 67 + ### Example Manifest Record 68 + 69 + ```json 70 + { 71 + "$type": "io.atcr.manifest", 72 + "repository": "alice/myapp", 73 + "digest": "sha256:4a5e...", 74 + "holdDid": "did:web:hold01.atcr.io", 75 + "holdEndpoint": "https://hold01.atcr.io", 76 + "schemaVersion": 2, 77 + "mediaType": "application/vnd.oci.image.manifest.v1+json", 78 + "artifactType": "application/spdx+json", 79 + "subject": { 80 + "mediaType": "application/vnd.oci.image.manifest.v1+json", 81 + "digest": "sha256:abc123...", 82 + "size": 1234 83 + }, 84 + "config": { 85 + "mediaType": "application/vnd.oci.empty.v1+json", 86 + "digest": "sha256:44136f...", 87 + "size": 2 88 + }, 89 + "layers": [ 90 + { 91 + "mediaType": "application/spdx+json", 92 + "digest": "sha256:def456...", 93 + "size": 5678, 94 + "annotations": { 95 + "org.opencontainers.image.title": "sbom.spdx.json" 96 + } 97 + } 98 + ], 99 + "manifestBlob": { 100 + "$type": "blob", 101 + "ref": { "$link": "bafyrei..." }, 102 + "mimeType": "application/vnd.oci.image.manifest.v1+json", 103 + "size": 789 104 + }, 105 + "ownerDid": "did:plc:alice123", 106 + "scannedAt": "2025-10-20T12:34:56.789Z", 107 + "scannerVersion": "syft-v1.0.0", 108 + "createdAt": "2025-10-20T12:34:56.789Z" 109 + } 110 + ``` 111 + 112 + ### Key Fields 113 + 114 + - `artifactType`: Distinguishes SBOM artifact from regular image manifest 115 + - `application/spdx+json` for SPDX format 116 + - `application/vnd.cyclonedx+json` for CycloneDX format 117 + - `subject`: Reference to the original image manifest 118 + - `ownerDid`: DID of the image owner (for multi-tenant holds) 119 + - `scannedAt`: ISO 8601 timestamp of when scan completed 120 + - `scannerVersion`: Tool version for reproducibility tracking 121 + 122 + ### SBOM Blob 123 + 124 + The actual SBOM document is stored as a blob in the hold's storage backend and referenced in the manifest's `layers` array. The blob contains the full SPDX or CycloneDX JSON document. 125 + 126 + ## Configuration 127 + 128 + SBOM scanning is configured via environment variables on the hold service. 129 + 130 + ### Environment Variables 131 + 132 + ```bash 133 + # Enable SBOM scanning (opt-in) 134 + HOLD_SBOM_ENABLED=true 135 + 136 + # Number of concurrent scan workers (default: 2) 137 + # Higher values = faster scanning, more CPU/memory usage 138 + HOLD_SBOM_WORKERS=4 139 + 140 + # SBOM output format (default: spdx-json) 141 + # Options: spdx-json, cyclonedx-json 142 + HOLD_SBOM_FORMAT=spdx-json 143 + 144 + # Future: Enable vulnerability scanning with Grype 145 + # HOLD_VULN_ENABLED=true 146 + ``` 147 + 148 + ### Example Configuration 149 + 150 + ```bash 151 + # .env.hold 152 + HOLD_PUBLIC_URL=https://hold01.atcr.io 153 + STORAGE_DRIVER=s3 154 + S3_BUCKET=my-hold-blobs 155 + HOLD_OWNER=did:plc:xyz123 156 + HOLD_DATABASE_PATH=/var/lib/atcr/hold.db 157 + 158 + # Enable SBOM scanning 159 + HOLD_SBOM_ENABLED=true 160 + HOLD_SBOM_WORKERS=2 161 + HOLD_SBOM_FORMAT=spdx-json 162 + ``` 163 + 164 + ## Scanning Workflow 165 + 166 + ### 1. Upload Completes 167 + 168 + When a container image is successfully pushed to a hold: 169 + 170 + ``` 171 + 1. Client: docker push atcr.io/alice/myapp:latest 172 + 2. AppView routes blobs to hold service 173 + 3. Hold receives multipart upload via XRPC 174 + 4. Hold completes upload and stores blobs 175 + 5. Hold checks: HOLD_SBOM_ENABLED=true? 176 + 6. If yes: enqueue scan job (non-blocking) 177 + 7. Upload completes immediately 178 + ``` 179 + 180 + ### 2. Background Scanning 181 + 182 + Scan workers process jobs from the queue: 183 + 184 + ``` 185 + 1. Worker pulls job from queue 186 + 2. Extracts image layers from storage 187 + 3. Runs Syft on extracted filesystem 188 + 4. Generates SBOM in configured format 189 + 5. Uploads SBOM blob to storage 190 + 6. Creates ORAS manifest record in hold's PDS 191 + 7. Job complete 192 + ``` 193 + 194 + ### 3. Result Storage 195 + 196 + SBOM results are stored in two places: 197 + 198 + 1. **SBOM blob**: Full JSON document in hold's blob storage 199 + 2. **ORAS manifest**: Metadata record in hold's embedded PDS 200 + - Collection: `io.atcr.manifest` 201 + - Record key: SBOM manifest digest 202 + - Contains reference to subject image 203 + 204 + ## Accessing SBOMs 205 + 206 + Multiple methods for discovering and retrieving SBOM data. 207 + 208 + ### 1. XRPC Query Endpoint 209 + 210 + Query for SBOMs by image digest: 211 + 212 + ```bash 213 + # Get SBOM for a specific image 214 + curl "https://hold01.atcr.io/xrpc/io.atcr.hold.getSBOM?\ 215 + digest=sha256:abc123&\ 216 + ownerDid=did:plc:alice123&\ 217 + repository=alice/myapp" 218 + 219 + # Response: ORAS manifest JSON 220 + { 221 + "manifest": { 222 + "schemaVersion": 2, 223 + "mediaType": "application/vnd.oci.image.manifest.v1+json", 224 + "artifactType": "application/spdx+json", 225 + "subject": { "digest": "sha256:abc123...", ... }, 226 + "layers": [ { "digest": "sha256:def456...", ... } ] 227 + }, 228 + "scannedAt": "2025-10-20T12:34:56.789Z", 229 + "scannerVersion": "syft-v1.0.0" 230 + } 231 + ``` 232 + 233 + ### 2. ATProto Repository Queries 234 + 235 + Use standard ATProto XRPC to list all SBOMs: 236 + 237 + ```bash 238 + # List all SBOM manifests in hold's PDS 239 + curl "https://hold01.atcr.io/xrpc/com.atproto.repo.listRecords?\ 240 + repo=did:web:hold01.atcr.io&\ 241 + collection=io.atcr.manifest" 242 + 243 + # Filter by artifactType (requires AppView indexing) 244 + # Returns all SBOM artifacts 245 + ``` 246 + 247 + ### 3. Direct SBOM Blob Download 248 + 249 + Download the full SBOM JSON file: 250 + 251 + ```bash 252 + # Get SBOM blob CID from manifest layers[0].digest 253 + SBOM_DIGEST="sha256:def456..." 254 + 255 + # Request presigned download URL 256 + curl "https://hold01.atcr.io/xrpc/com.atproto.sync.getBlob?\ 257 + did=did:web:hold01.atcr.io&\ 258 + cid=$SBOM_DIGEST" 259 + 260 + # Response: presigned S3 URL or direct blob 261 + { 262 + "url": "https://s3.amazonaws.com/bucket/blob?signature=...", 263 + "expiresAt": "2025-10-20T12:49:56Z" 264 + } 265 + 266 + # Download SBOM JSON 267 + curl "$URL" > sbom.spdx.json 268 + ``` 269 + 270 + ### 4. ORAS CLI Integration 271 + 272 + Use the ORAS CLI to discover and pull SBOMs: 273 + 274 + ```bash 275 + # Discover referrers (SBOMs) for an image 276 + oras discover atcr.io/alice/myapp:latest 277 + 278 + # Output shows SBOM artifacts: 279 + # digest: sha256:abc123... 280 + # referrers: 281 + # - artifactType: application/spdx+json 282 + # digest: sha256:4a5e... 283 + 284 + # Pull SBOM artifact 285 + oras pull atcr.io/alice/myapp@sha256:4a5e... 286 + 287 + # Downloads sbom.spdx.json to current directory 288 + ``` 289 + 290 + ### 5. AppView Web UI (Future) 291 + 292 + Future enhancement: AppView web interface will display SBOM information on repository pages: 293 + 294 + - Link to SBOM JSON download 295 + - Vulnerability count (if Grype enabled) 296 + - Scanner version and scan timestamp 297 + - Comparison across multiple holds 298 + 299 + ## Tool Integration 300 + 301 + ### SPDX/CycloneDX Tools 302 + 303 + Any tool that understands SPDX or CycloneDX formats can consume the SBOMs: 304 + 305 + **Example tools:** 306 + - [OSV Scanner](https://github.com/google/osv-scanner) - Vulnerability scanning 307 + - [Grype](https://github.com/anchore/grype) - Vulnerability scanning 308 + - [Dependency-Track](https://dependencytrack.org/) - Software composition analysis 309 + - [SBOM Quality Score](https://github.com/eBay/sbom-scorecard) - SBOM completeness 310 + 311 + **Usage:** 312 + ```bash 313 + # Download SBOM 314 + curl "https://hold01.atcr.io/xrpc/io.atcr.hold.getSBOM?..." | \ 315 + jq -r '.manifest.layers[0].digest' | \ 316 + # ... fetch blob ... > sbom.spdx.json 317 + 318 + # Scan with OSV 319 + osv-scanner --sbom sbom.spdx.json 320 + 321 + # Scan with Grype 322 + grype sbom:./sbom.spdx.json 323 + ``` 324 + 325 + ### OCI Registry API 326 + 327 + ORAS manifests are fully OCI-compliant and discoverable via standard registry APIs: 328 + 329 + ```bash 330 + # Discover referrers for an image 331 + curl -H "Accept: application/vnd.oci.image.index.v1+json" \ 332 + "https://atcr.io/v2/alice/myapp/referrers/sha256:abc123" 333 + 334 + # Returns referrers index with SBOM manifests 335 + { 336 + "schemaVersion": 2, 337 + "mediaType": "application/vnd.oci.image.index.v1+json", 338 + "manifests": [ 339 + { 340 + "mediaType": "application/vnd.oci.image.manifest.v1+json", 341 + "digest": "sha256:4a5e...", 342 + "artifactType": "application/spdx+json" 343 + } 344 + ] 345 + } 346 + ``` 347 + 348 + ### Programmatic Access 349 + 350 + Use the ATProto SDK to query SBOMs: 351 + 352 + ```go 353 + import "github.com/bluesky-social/indigo/atproto" 354 + 355 + // List all SBOMs for a hold 356 + records, err := client.RepoListRecords(ctx, 357 + "did:web:hold01.atcr.io", 358 + "io.atcr.manifest", 359 + 100, // limit 360 + "", // cursor 361 + ) 362 + 363 + // Filter for SBOM artifacts 364 + for _, record := range records.Records { 365 + manifest := record.Value.(ManifestRecord) 366 + if manifest.ArtifactType == "application/spdx+json" { 367 + // Process SBOM manifest 368 + } 369 + } 370 + ``` 371 + 372 + ## Future Enhancements 373 + 374 + ### Vulnerability Scanning (Grype) 375 + 376 + Add vulnerability scanning to SBOM generation: 377 + 378 + ```bash 379 + # Configuration 380 + HOLD_VULN_ENABLED=true 381 + HOLD_VULN_DB_UPDATE_INTERVAL=24h 382 + 383 + # Extended manifest with vulnerability count 384 + { 385 + "artifactType": "application/spdx+json", 386 + "annotations": { 387 + "io.atcr.vuln.critical": "2", 388 + "io.atcr.vuln.high": "15", 389 + "io.atcr.vuln.medium": "42", 390 + "io.atcr.vuln.low": "8", 391 + "io.atcr.vuln.scannedWith": "grype-v0.74.0", 392 + "io.atcr.vuln.dbVersion": "2025-10-20" 393 + } 394 + } 395 + ``` 396 + 397 + ### Multi-Scanner Support (Trivy) 398 + 399 + Support multiple scanner backends: 400 + 401 + ```bash 402 + HOLD_SBOM_SCANNER=trivy # syft (default), trivy, grype 403 + HOLD_TRIVY_SCAN_TYPE=os,library,config,secret 404 + ``` 405 + 406 + ### Multi-Hold Verification 407 + 408 + Compare SBOMs from different holds for the same image: 409 + 410 + ```bash 411 + # Alice pushes to hold1 and hold2 412 + docker push atcr.io/alice/myapp:latest 413 + 414 + # Both holds scan independently 415 + # Compare results: 416 + atcr-cli compare-sboms \ 417 + --image atcr.io/alice/myapp:latest \ 418 + --holds hold1.atcr.io,hold2.atcr.io 419 + 420 + # Output: Package count differences, version mismatches, etc. 421 + ``` 422 + 423 + ### Signature Verification (Cosign) 424 + 425 + Sign SBOMs with Sigstore Cosign: 426 + 427 + ```bash 428 + HOLD_SBOM_SIGN=true 429 + HOLD_COSIGN_KEY_PATH=/var/lib/atcr/cosign.key 430 + 431 + # SBOM artifacts get signed 432 + # Verification: 433 + cosign verify --key cosign.pub atcr.io/alice/myapp@sha256:4a5e... 434 + ``` 435 + 436 + ## Security Considerations 437 + 438 + ### Reproducibility 439 + 440 + SBOMs should be reproducible for the same image digest: 441 + 442 + **Best practices:** 443 + - Pin scanner versions in production holds 444 + - Record scanner version in manifest annotations 445 + - Document vulnerability database versions 446 + - Re-scan periodically to catch new CVEs 447 + 448 + **Validation:** 449 + ```bash 450 + # Compare SBOMs from different holds 451 + diff <(curl hold1/sbom.json | jq -S) \ 452 + <(curl hold2/sbom.json | jq -S) 453 + 454 + # Differences indicate: 455 + # - Different scanner versions 456 + # - Different scan times (new CVEs discovered) 457 + # - Potential tampering (investigate) 458 + ``` 459 + 460 + ### Multiple Hold Verification 461 + 462 + Running multiple holds provides defense in depth: 463 + 464 + 1. User pushes to hold1 (uses hold1 by default) 465 + 2. User also pushes to hold2 (backup/verification) 466 + 3. Both holds scan independently 467 + 4. Compare SBOM results: 468 + - Similar results = confidence in accuracy 469 + - Divergent results = investigate discrepancy 470 + 471 + ### Transparency 472 + 473 + Hold operators should publish scanning policies: 474 + 475 + - Scanner version and update schedule 476 + - Vulnerability database update frequency 477 + - SBOM format and schema version 478 + - Data retention policies 479 + 480 + ### Trust Anchors 481 + 482 + Users can verify scanner integrity: 483 + 484 + 1. **Scanner version**: Check `scannerVersion` field matches expected version 485 + 2. **DID signature**: ATProto record signed by hold's DID 486 + 3. **Timestamp**: Check `scannedAt` for stale scans 487 + 4. **Reproducibility**: Re-scan locally and compare results 488 + 489 + ## Example Workflows 490 + 491 + ### Enable Scanning on Your Hold 492 + 493 + ```bash 494 + # 1. Configure hold with SBOM enabled 495 + cat > .env.hold <<EOF 496 + HOLD_PUBLIC_URL=https://myhold.example.com 497 + STORAGE_DRIVER=s3 498 + S3_BUCKET=my-blobs 499 + HOLD_OWNER=did:plc:myid 500 + 501 + # Enable SBOM scanning 502 + HOLD_SBOM_ENABLED=true 503 + HOLD_SBOM_WORKERS=2 504 + HOLD_SBOM_FORMAT=spdx-json 505 + EOF 506 + 507 + # 2. Start hold service 508 + ./bin/atcr-hold 509 + 510 + # 3. Push an image 511 + docker push atcr.io/alice/myapp:latest 512 + 513 + # 4. Wait for background scan (check logs) 514 + # 2025-10-20T12:34:56Z INFO Scanning image sha256:abc123... 515 + # 2025-10-20T12:35:12Z INFO SBOM generated sha256:def456... 516 + 517 + # 5. Query for SBOM 518 + curl "https://myhold.example.com/xrpc/io.atcr.hold.getSBOM?..." 519 + ``` 520 + 521 + ### Consume SBOMs in CI/CD 522 + 523 + ```yaml 524 + # .github/workflows/security-scan.yml 525 + name: Security Scan 526 + on: push 527 + 528 + jobs: 529 + scan: 530 + runs-on: ubuntu-latest 531 + steps: 532 + - name: Pull image 533 + run: docker pull atcr.io/alice/myapp:latest 534 + 535 + - name: Get SBOM from hold 536 + run: | 537 + IMAGE_DIGEST=$(docker inspect atcr.io/alice/myapp:latest \ 538 + --format='{{.RepoDigests}}') 539 + 540 + curl "https://hold01.atcr.io/xrpc/io.atcr.hold.getSBOM?\ 541 + digest=$IMAGE_DIGEST&\ 542 + ownerDid=did:plc:alice123&\ 543 + repository=alice/myapp" \ 544 + -o sbom-manifest.json 545 + 546 + SBOM_DIGEST=$(jq -r '.manifest.layers[0].digest' sbom-manifest.json) 547 + 548 + curl "https://hold01.atcr.io/xrpc/com.atproto.sync.getBlob?\ 549 + did=did:web:hold01.atcr.io&\ 550 + cid=$SBOM_DIGEST" \ 551 + | jq -r '.url' | xargs curl -o sbom.spdx.json 552 + 553 + - name: Scan with Grype 554 + uses: anchore/scan-action@v3 555 + with: 556 + sbom: sbom.spdx.json 557 + fail-build: true 558 + severity-cutoff: high 559 + ``` 560 + 561 + ## References 562 + 563 + - [ORAS Specification](https://oras.land/) 564 + - [OCI Artifacts](https://github.com/opencontainers/artifacts) 565 + - [SPDX Specification](https://spdx.dev/) 566 + - [CycloneDX Specification](https://cyclonedx.org/) 567 + - [Syft Documentation](https://github.com/anchore/syft) 568 + - [ATProto Specification](https://atproto.com/)
-821
docs/XRPC_BLOB_MIGRATION.md
··· 1 - # XRPC Blob Upload Migration 2 - 3 - This document describes how to migrate from separate legacy multipart upload endpoints to a unified `com.atproto.repo.uploadBlob` endpoint that supports both standard single-blob uploads and OCI container layer multipart uploads. 4 - 5 - ## Current State 6 - 7 - ### Legacy HTTP Endpoints (cmd/hold/main.go) 8 - 9 - ```go 10 - // Unified presigned URL endpoint (handles upload AND download) 11 - mux.HandleFunc("/presigned-url", service.HandlePresignedURL) 12 - 13 - // Internal move operation (used by multipart complete) 14 - mux.HandleFunc("/move", service.HandleMove) 15 - 16 - // Multipart upload endpoints 17 - mux.HandleFunc("/start-multipart", service.HandleStartMultipart) 18 - mux.HandleFunc("/part-presigned-url", service.HandleGetPartURL) 19 - mux.HandleFunc("/complete-multipart", service.HandleCompleteMultipart) 20 - mux.HandleFunc("/abort-multipart", service.HandleAbortMultipart) 21 - 22 - // Buffered part upload (when presigned URLs unavailable) 23 - mux.HandleFunc("/multipart-parts/", func(w http.ResponseWriter, r *http.Request) { 24 - // Parse URL: /multipart-parts/{uploadID}/{partNumber} 25 - // ... 26 - service.HandleMultipartPartUpload(w, r, uploadID, partNumber, did, service.MultipartMgr) 27 - }) 28 - ``` 29 - 30 - ### Existing XRPC Endpoint (pkg/hold/pds/xrpc.go) 31 - 32 - ```go 33 - // Current implementation - redirects to presigned URL 34 - func (h *XRPCHandler) HandleUploadBlob(w http.ResponseWriter, r *http.Request) { 35 - digest := r.URL.Query().Get("digest") 36 - uploadURL, err := h.blobStore.GetPresignedUploadURL(digest) 37 - http.Redirect(w, r, uploadURL, http.StatusFound) 38 - } 39 - ``` 40 - 41 - ### Supporting Code 42 - 43 - **pkg/hold/multipart.go:** 44 - - `MultipartManager` - Tracks upload sessions 45 - - `MultipartSession` - State for each upload (parts, mode, etc.) 46 - - Modes: `S3Native` (presigned URLs), `Buffered` (proxy uploads) 47 - 48 - **pkg/hold/blobstore_adapter.go:** 49 - - `HoldServiceBlobStore` - Adapter wrapping HoldService for XRPC handlers 50 - - Implements presigned URL generation 51 - - Currently not used by XRPC handlers 52 - 53 - **pkg/hold/handlers.go:** 54 - - `HandlePresignedURL()` - Unified endpoint for GET/HEAD/PUT presigned URLs 55 - - `HandleMove()` - Moves blob from temp to final location (internal operation) 56 - - `HandleStartMultipart()` - Starts upload, returns uploadID 57 - - `HandleGetPartURL()` - Returns presigned URL for part 58 - - `HandleCompleteMultipart()` - Finalizes upload, assembles parts (calls Move internally) 59 - - `HandleAbortMultipart()` - Cancels upload 60 - - `HandleMultipartPartUpload()` - Buffered part upload fallback 61 - 62 - ## Legacy Endpoint Mapping 63 - 64 - ### `/presigned-url` → Multiple XRPC Operations 65 - 66 - The legacy `/presigned-url` endpoint is a **unified endpoint** that handles both upload and download operations based on the `operation` field in the JSON body: 67 - 68 - **Legacy format:** 69 - ``` 70 - POST /presigned-url 71 - Content-Type: application/json 72 - 73 - { 74 - "operation": "GET", // or "HEAD" or "PUT" 75 - "did": "did:plc:alice123", 76 - "digest": "sha256:abc123...", 77 - "size": 1234567890 // Only for PUT operations 78 - } 79 - 80 - Response: 81 - { 82 - "url": "https://s3.amazonaws.com/...", 83 - "expires_at": "2025-10-16T..." 84 - } 85 - ``` 86 - 87 - **XRPC mapping:** 88 - - `operation: "GET"` → `GET /xrpc/com.atproto.sync.getBlob?did=...&cid=sha256:abc...` 89 - - `operation: "HEAD"` → `HEAD /xrpc/com.atproto.sync.getBlob?did=...&cid=sha256:abc...` 90 - - `operation: "PUT"` → `com.atproto.repo.uploadBlob` (single upload via presigned URL) 91 - 92 - **Note:** For GET/HEAD operations, AppView passes OCI digest directly as `cid` parameter. Hold detects `sha256:` prefix and uses digest directly (no CID conversion needed). 93 - 94 - ### `/move` → Internal to Multipart Complete 95 - 96 - The legacy `/move` endpoint moves a blob from temporary location to final digest-based location: 97 - 98 - **Legacy format:** 99 - ``` 100 - POST /move?from=uploads/temp-123&to=sha256:abc123...&did=did:plc:alice123 101 - 102 - Response: 200 OK 103 - ``` 104 - 105 - **Purpose:** Server-side S3 copy after multipart assembly. Used in this flow: 106 - 107 - 1. Multipart parts uploaded → `uploads/temp-{uploadID}/part-1`, `part-2`, etc. 108 - 2. Complete multipart → S3 assembles parts at `uploads/temp-{uploadID}` 109 - 3. **Move operation** → S3 copy from `uploads/temp-{uploadID}` → `blobs/sha256/ab/abc123...` 110 - 111 - **XRPC mapping:** 112 - - **Not a separate endpoint** - becomes internal operation in `uploadBlob?action=complete` 113 - - The `complete` action automatically handles the move after multipart assembly 114 - - AppView doesn't need to call move explicitly in XRPC flow 115 - 116 - ## New Unified Design 117 - 118 - ### Single Endpoint: `com.atproto.repo.uploadBlob` 119 - 120 - Content-Type discrimination determines operation: 121 - - `application/octet-stream` → Standard blob upload (profile images, small media) 122 - - `application/json` → Multipart operations (large OCI layers) 123 - 124 - ### Complementary Endpoint: `com.atproto.sync.getBlob` 125 - 126 - For blob downloads (maps from legacy `/presigned-url` with operation=GET/HEAD): 127 - 128 - **Standard ATProto blobs (CID):** 129 - ``` 130 - GET /xrpc/com.atproto.sync.getBlob?did={holdDID}&cid=bafyreib... 131 - 132 - Response: 307 Temporary Redirect 133 - Location: https://s3.amazonaws.com/bucket/...?presigned-params 134 - ``` 135 - 136 - **OCI container layers (digest):** 137 - ``` 138 - GET /xrpc/com.atproto.sync.getBlob?did={holdDID}&cid=sha256:abc123... 139 - 140 - Response: 307 Temporary Redirect 141 - Location: https://s3.amazonaws.com/bucket/...?presigned-params 142 - ``` 143 - 144 - **Implementation - Flexible CID parameter:** 145 - ```go 146 - func (h *XRPCHandler) HandleGetBlob(w http.ResponseWriter, r *http.Request) { 147 - cidOrDigest := r.URL.Query().Get("cid") 148 - 149 - var digest string 150 - if strings.HasPrefix(cidOrDigest, "sha256:") { 151 - // OCI digest - use directly (no conversion needed) 152 - digest = cidOrDigest 153 - } else { 154 - // Standard CID - convert to digest 155 - c, _ := cid.Decode(cidOrDigest) 156 - digest = cidToDigest(c) // bafyreib... → sha256:abc... 157 - } 158 - 159 - // Generate presigned URL for S3 160 - url := h.blobStore.GetPresignedDownloadURL(digest) 161 - http.Redirect(w, r, url, http.StatusTemporaryRedirect) 162 - } 163 - ``` 164 - 165 - **Key insight:** The `cid` parameter accepts both formats. Hold service checks prefix and handles accordingly. This keeps the endpoint spec-compliant (GET with query params) while supporting OCI digests natively. 166 - 167 - ### API Specification 168 - 169 - #### Standard Single Upload (ATProto Spec Compliant) 170 - 171 - ``` 172 - POST /xrpc/com.atproto.repo.uploadBlob 173 - Content-Type: application/octet-stream 174 - 175 - [raw blob bytes] 176 - 177 - Response (200 OK): 178 - { 179 - "blob": { 180 - "$type": "blob", 181 - "ref": { 182 - "$link": "bafyreib..." // CID 183 - }, 184 - "mimeType": "application/octet-stream", 185 - "size": 12345 186 - } 187 - } 188 - ``` 189 - 190 - **Use case:** Profile images, small media (< 10MB), standard ATProto blobs 191 - 192 - #### Multipart Start (ATCR Extension) 193 - 194 - ``` 195 - POST /xrpc/com.atproto.repo.uploadBlob 196 - Content-Type: application/json 197 - 198 - { 199 - "action": "start", 200 - "digest": "sha256:abc123...", 201 - "size": 1234567890 // Optional hint for storage allocation 202 - } 203 - 204 - Response (200 OK): 205 - { 206 - "uploadId": "upload-1634567890", 207 - "expiresAt": "2025-10-16T12:00:00Z", 208 - "mode": "s3-native" // or "buffered" 209 - } 210 - ``` 211 - 212 - **Implementation:** 213 - - Calls `service.StartMultipartUploadWithManager(ctx, digest, multipartMgr)` 214 - - Returns uploadID and mode from MultipartSession 215 - 216 - #### Multipart Get Part URL (ATCR Extension) 217 - 218 - ``` 219 - POST /xrpc/com.atproto.repo.uploadBlob 220 - Content-Type: application/json 221 - 222 - { 223 - "action": "part", 224 - "uploadId": "upload-1634567890", 225 - "partNumber": 1, 226 - "digest": "sha256:abc123..." 227 - } 228 - 229 - Response (200 OK): 230 - { 231 - "url": "https://s3.amazonaws.com/bucket/...?X-Amz-...", 232 - "expiresAt": "2025-10-16T12:15:00Z", 233 - "method": "PUT" 234 - } 235 - 236 - // OR for buffered mode: 237 - { 238 - "url": "https://hold01.atcr.io/xrpc/com.atproto.repo.uploadBlob", 239 - "method": "PUT", 240 - "headers": { 241 - "X-Upload-Id": "upload-1634567890", 242 - "X-Part-Number": "1" 243 - }, 244 - "expiresAt": "2025-10-16T12:15:00Z" 245 - } 246 - ``` 247 - 248 - **Implementation:** 249 - - Retrieve session: `multipartMgr.GetSession(uploadID)` 250 - - S3Native mode: Call `service.GetPartUploadURL(ctx, session, partNumber, did)` 251 - - Buffered mode: Return self-referential URL with headers 252 - 253 - #### Multipart Upload Part (Buffered Mode) 254 - 255 - ``` 256 - PUT /xrpc/com.atproto.repo.uploadBlob 257 - Content-Type: application/octet-stream 258 - X-Upload-Id: upload-1634567890 259 - X-Part-Number: 1 260 - 261 - [part data bytes] 262 - 263 - Response (200 OK): 264 - { 265 - "etag": "abc123def456", 266 - "partNumber": 1 267 - } 268 - ``` 269 - 270 - **Implementation:** 271 - - Extract headers: `X-Upload-Id`, `X-Part-Number` 272 - - Call `service.HandleMultipartPartUpload(w, r, uploadID, partNumber, did, multipartMgr)` 273 - - Return ETag for completion 274 - 275 - #### Multipart Complete (ATCR Extension) 276 - 277 - ``` 278 - POST /xrpc/com.atproto.repo.uploadBlob 279 - Content-Type: application/json 280 - 281 - { 282 - "action": "complete", 283 - "uploadId": "upload-1634567890", 284 - "digest": "sha256:abc123...", 285 - "parts": [ 286 - { "partNumber": 1, "etag": "abc123" }, 287 - { "partNumber": 2, "etag": "def456" } 288 - ] 289 - } 290 - 291 - Response (200 OK): 292 - { 293 - "status": "completed", 294 - "blob": { 295 - "$type": "blob", 296 - "ref": { 297 - "$link": "bafyreib..." // CID computed from digest 298 - }, 299 - "mimeType": "application/octet-stream", 300 - "size": 1234567890 301 - } 302 - } 303 - ``` 304 - 305 - **Implementation:** 306 - - Retrieve session: `multipartMgr.GetSession(uploadID)` 307 - - For S3Native: Record parts via `session.RecordS3Part()` 308 - - Call `service.CompleteMultipartUploadWithManager(ctx, session, multipartMgr)` 309 - - This internally calls S3 CompleteMultipartUpload to assemble parts 310 - - Then performs server-side S3 copy from temp location to final digest location 311 - - Equivalent to legacy `/move` endpoint operation 312 - - Convert digest to CID for response 313 - 314 - #### Multipart Abort (ATCR Extension) 315 - 316 - ``` 317 - POST /xrpc/com.atproto.repo.uploadBlob 318 - Content-Type: application/json 319 - 320 - { 321 - "action": "abort", 322 - "uploadId": "upload-1634567890", 323 - "digest": "sha256:abc123..." 324 - } 325 - 326 - Response (200 OK): 327 - { 328 - "status": "aborted" 329 - } 330 - ``` 331 - 332 - **Implementation:** 333 - - Retrieve session: `multipartMgr.GetSession(uploadID)` 334 - - Call `service.AbortMultipartUploadWithManager(ctx, session, multipartMgr)` 335 - 336 - ## Implementation Strategy 337 - 338 - ### Phase 1: Add Unified Handler (Keep Legacy Endpoints) 339 - 340 - **File:** `pkg/hold/pds/xrpc.go` 341 - 342 - ```go 343 - // HandleUploadBlob unified handler supporting both single and multipart uploads 344 - func (h *XRPCHandler) HandleUploadBlob(w http.ResponseWriter, r *http.Request) { 345 - if r.Method != http.MethodPost && r.Method != http.MethodPut { 346 - http.Error(w, "method not allowed", http.StatusMethodNotAllowed) 347 - return 348 - } 349 - 350 - contentType := r.Header.Get("Content-Type") 351 - 352 - // Buffered multipart part upload (PUT with headers) 353 - if r.Method == http.MethodPut && r.Header.Get("X-Upload-Id") != "" { 354 - h.handleBufferedPartUpload(w, r) 355 - return 356 - } 357 - 358 - // Multipart operations (JSON body) 359 - if strings.Contains(contentType, "application/json") { 360 - h.handleMultipartOperation(w, r) 361 - return 362 - } 363 - 364 - // Standard single blob upload (raw bytes) 365 - h.handleSingleBlobUpload(w, r) 366 - } 367 - 368 - func (h *XRPCHandler) handleMultipartOperation(w http.ResponseWriter, r *http.Request) { 369 - var req struct { 370 - Action string `json:"action"` 371 - Digest string `json:"digest,omitempty"` 372 - Size int64 `json:"size,omitempty"` 373 - UploadID string `json:"uploadId,omitempty"` 374 - PartNumber int `json:"partNumber,omitempty"` 375 - Parts []struct { 376 - PartNumber int `json:"partNumber"` 377 - ETag string `json:"etag"` 378 - } `json:"parts,omitempty"` 379 - } 380 - 381 - if err := json.NewDecoder(r.Body).Decode(&req); err != nil { 382 - http.Error(w, fmt.Sprintf("invalid JSON: %v", err), http.StatusBadRequest) 383 - return 384 - } 385 - 386 - // TODO: Add authentication check 387 - // user, err := ValidateDPoPRequest(r) 388 - 389 - ctx := r.Context() 390 - 391 - switch req.Action { 392 - case "start": 393 - h.handleMultipartStart(w, r, req.Digest, req.Size) 394 - case "part": 395 - h.handleMultipartPart(w, r, req.UploadID, req.PartNumber, req.Digest) 396 - case "complete": 397 - h.handleMultipartComplete(w, r, req.UploadID, req.Digest, req.Parts) 398 - case "abort": 399 - h.handleMultipartAbort(w, r, req.UploadID, req.Digest) 400 - default: 401 - http.Error(w, "invalid action", http.StatusBadRequest) 402 - } 403 - } 404 - 405 - func (h *XRPCHandler) handleMultipartStart(w http.ResponseWriter, r *http.Request, digest string, size int64) { 406 - ctx := r.Context() 407 - 408 - // Use HoldService multipart manager 409 - // Note: h.blobStore is HoldServiceBlobStore which wraps the service 410 - uploadID, mode, err := h.blobStore.StartMultipart(ctx, digest, size) 411 - if err != nil { 412 - http.Error(w, fmt.Sprintf("failed to start upload: %v", err), http.StatusInternalServerError) 413 - return 414 - } 415 - 416 - response := map[string]any{ 417 - "uploadId": uploadID, 418 - "expiresAt": time.Now().Add(24 * time.Hour), 419 - "mode": mode, // "s3-native" or "buffered" 420 - } 421 - 422 - w.Header().Set("Content-Type", "application/json") 423 - json.NewEncoder(w).Encode(response) 424 - } 425 - 426 - func (h *XRPCHandler) handleMultipartPart(w http.ResponseWriter, r *http.Request, uploadID string, partNumber int, digest string) { 427 - ctx := r.Context() 428 - 429 - // Get part upload URL (presigned S3 or buffered endpoint) 430 - partURL, err := h.blobStore.GetPartUploadURL(ctx, uploadID, partNumber, digest) 431 - if err != nil { 432 - http.Error(w, fmt.Sprintf("failed to get part URL: %v", err), http.StatusInternalServerError) 433 - return 434 - } 435 - 436 - response := map[string]any{ 437 - "url": partURL, 438 - "expiresAt": time.Now().Add(15 * time.Minute), 439 - "method": "PUT", 440 - } 441 - 442 - w.Header().Set("Content-Type", "application/json") 443 - json.NewEncoder(w).Encode(response) 444 - } 445 - 446 - func (h *XRPCHandler) handleMultipartComplete(w http.ResponseWriter, r *http.Request, uploadID string, digest string, parts []struct{ PartNumber int; ETag string }) { 447 - ctx := r.Context() 448 - 449 - // Convert parts format 450 - completedParts := make([]hold.CompletedPart, len(parts)) 451 - for i, p := range parts { 452 - completedParts[i] = hold.CompletedPart{ 453 - PartNumber: p.PartNumber, 454 - ETag: p.ETag, 455 - } 456 - } 457 - 458 - // Complete upload 459 - if err := h.blobStore.CompleteMultipart(ctx, uploadID, digest, completedParts); err != nil { 460 - http.Error(w, fmt.Sprintf("failed to complete upload: %v", err), http.StatusInternalServerError) 461 - return 462 - } 463 - 464 - // Convert digest to CID for ATProto response format 465 - cid, err := digestToCID(digest) 466 - if err != nil { 467 - http.Error(w, fmt.Sprintf("failed to generate CID: %v", err), http.StatusInternalServerError) 468 - return 469 - } 470 - 471 - response := map[string]any{ 472 - "status": "completed", 473 - "blob": map[string]any{ 474 - "$type": "blob", 475 - "ref": map[string]any{ 476 - "$link": cid.String(), 477 - }, 478 - "mimeType": "application/octet-stream", 479 - // Size would need to be tracked in session 480 - }, 481 - } 482 - 483 - w.Header().Set("Content-Type", "application/json") 484 - json.NewEncoder(w).Encode(response) 485 - } 486 - 487 - func (h *XRPCHandler) handleMultipartAbort(w http.ResponseWriter, r *http.Request, uploadID string, digest string) { 488 - ctx := r.Context() 489 - 490 - if err := h.blobStore.AbortMultipart(ctx, uploadID, digest); err != nil { 491 - http.Error(w, fmt.Sprintf("failed to abort upload: %v", err), http.StatusInternalServerError) 492 - return 493 - } 494 - 495 - response := map[string]any{ 496 - "status": "aborted", 497 - } 498 - 499 - w.Header().Set("Content-Type", "application/json") 500 - json.NewEncoder(w).Encode(response) 501 - } 502 - 503 - func (h *XRPCHandler) handleBufferedPartUpload(w http.ResponseWriter, r *http.Request) { 504 - uploadID := r.Header.Get("X-Upload-Id") 505 - partNumberStr := r.Header.Get("X-Part-Number") 506 - 507 - partNumber, err := strconv.Atoi(partNumberStr) 508 - if err != nil { 509 - http.Error(w, "invalid part number", http.StatusBadRequest) 510 - return 511 - } 512 - 513 - // Stream part data to storage 514 - etag, err := h.blobStore.UploadPart(r.Context(), uploadID, partNumber, r.Body) 515 - if err != nil { 516 - http.Error(w, fmt.Sprintf("failed to upload part: %v", err), http.StatusInternalServerError) 517 - return 518 - } 519 - 520 - response := map[string]any{ 521 - "etag": etag, 522 - "partNumber": partNumber, 523 - } 524 - 525 - w.Header().Set("Content-Type", "application/json") 526 - json.NewEncoder(w).Encode(response) 527 - } 528 - 529 - func (h *XRPCHandler) handleSingleBlobUpload(w http.ResponseWriter, r *http.Request) { 530 - // Standard ATProto uploadBlob behavior 531 - // Read blob data 532 - data, err := io.ReadAll(r.Body) 533 - if err != nil { 534 - http.Error(w, "failed to read blob", http.StatusInternalServerError) 535 - return 536 - } 537 - 538 - // Upload to storage (single operation) 539 - cid, size, err := h.blobStore.UploadBlob(r.Context(), bytes.NewReader(data)) 540 - if err != nil { 541 - http.Error(w, fmt.Sprintf("failed to upload blob: %v", err), http.StatusInternalServerError) 542 - return 543 - } 544 - 545 - // Standard ATProto blob response format 546 - response := map[string]any{ 547 - "blob": map[string]any{ 548 - "$type": "blob", 549 - "ref": map[string]any{ 550 - "$link": cid.String(), 551 - }, 552 - "mimeType": "application/octet-stream", 553 - "size": size, 554 - }, 555 - } 556 - 557 - w.Header().Set("Content-Type", "application/json") 558 - json.NewEncoder(w).Encode(response) 559 - } 560 - 561 - // digestToCID converts OCI digest (sha256:abc...) to ATProto CID 562 - func digestToCID(digest string) (cid.Cid, error) { 563 - // Implementation in pkg/hold/cid.go or similar 564 - // Strip "sha256:" prefix, decode hex, construct CIDv1 with sha256 multihash 565 - return cid.Undef, fmt.Errorf("not implemented") 566 - } 567 - ``` 568 - 569 - ### Phase 2: Extend HoldServiceBlobStore (pkg/hold/blobstore_adapter.go) 570 - 571 - The `HoldServiceBlobStore` currently wraps HoldService for presigned URLs. Extend it to support multipart operations: 572 - 573 - ```go 574 - // Add multipart methods to HoldServiceBlobStore 575 - 576 - func (h *HoldServiceBlobStore) StartMultipart(ctx context.Context, digest string, size int64) (uploadID string, mode string, err error) { 577 - uploadID, uploadMode, err := h.service.StartMultipartUploadWithManager(ctx, digest, h.service.MultipartMgr) 578 - if err != nil { 579 - return "", "", err 580 - } 581 - 582 - modeStr := "s3-native" 583 - if uploadMode == hold.Buffered { 584 - modeStr = "buffered" 585 - } 586 - 587 - return uploadID, modeStr, nil 588 - } 589 - 590 - func (h *HoldServiceBlobStore) GetPartUploadURL(ctx context.Context, uploadID string, partNumber int, digest string) (string, error) { 591 - session, err := h.service.MultipartMgr.GetSession(uploadID) 592 - if err != nil { 593 - return "", err 594 - } 595 - 596 - // For S3Native: return presigned URL 597 - // For Buffered: return self-referential URL with upload instructions 598 - if session.Mode == hold.S3Native { 599 - return h.service.GetPartUploadURL(ctx, session, partNumber, h.holdDID) 600 - } 601 - 602 - // Buffered mode: client will PUT to uploadBlob with headers 603 - return fmt.Sprintf("%s/xrpc/com.atproto.repo.uploadBlob", h.publicURL), nil 604 - } 605 - 606 - func (h *HoldServiceBlobStore) UploadPart(ctx context.Context, uploadID string, partNumber int, data io.Reader) (string, error) { 607 - // Buffered part upload - streams data to storage 608 - // Used when client PUTs to uploadBlob with X-Upload-Id header 609 - session, err := h.service.MultipartMgr.GetSession(uploadID) 610 - if err != nil { 611 - return "", err 612 - } 613 - 614 - // Stream to storage, return ETag 615 - // This wraps HandleMultipartPartUpload logic 616 - etag, err := h.service.UploadPartBuffered(ctx, session, partNumber, data) 617 - return etag, err 618 - } 619 - 620 - func (h *HoldServiceBlobStore) CompleteMultipart(ctx context.Context, uploadID string, digest string, parts []hold.CompletedPart) error { 621 - session, err := h.service.MultipartMgr.GetSession(uploadID) 622 - if err != nil { 623 - return err 624 - } 625 - 626 - // For S3Native: record parts ETags 627 - if session.Mode == hold.S3Native { 628 - for _, p := range parts { 629 - session.RecordS3Part(p.PartNumber, p.ETag, 0) 630 - } 631 - } 632 - 633 - return h.service.CompleteMultipartUploadWithManager(ctx, session, h.service.MultipartMgr) 634 - } 635 - 636 - func (h *HoldServiceBlobStore) AbortMultipart(ctx context.Context, uploadID string, digest string) error { 637 - session, err := h.service.MultipartMgr.GetSession(uploadID) 638 - if err != nil { 639 - return err 640 - } 641 - 642 - return h.service.AbortMultipartUploadWithManager(ctx, session, h.service.MultipartMgr) 643 - } 644 - 645 - func (h *HoldServiceBlobStore) UploadBlob(ctx context.Context, data io.Reader) (cid.Cid, int64, error) { 646 - // Single blob upload for standard ATProto use case 647 - // Compute digest, store via service driver 648 - // Return CID and size 649 - // Implementation TBD 650 - return cid.Undef, 0, fmt.Errorf("not implemented") 651 - } 652 - ``` 653 - 654 - ### Phase 3: Update AppView Client (pkg/appview/storage/) 655 - 656 - Create new XRPC client or update ProxyBlobStore to use unified endpoint: 657 - 658 - **Download (GET/HEAD):** 659 - ```go 660 - func (p *ProxyBlobStore) ServeBlob(ctx context.Context, w http.ResponseWriter, r *http.Request, dgst digest.Digest) error { 661 - // Pass digest directly as cid parameter (no conversion) 662 - url := fmt.Sprintf("%s/xrpc/com.atproto.sync.getBlob?did=%s&cid=%s", 663 - p.storageEndpoint, p.holdDID, dgst.String()) // cid=sha256:abc... 664 - 665 - http.Redirect(w, r, url, http.StatusTemporaryRedirect) 666 - return nil 667 - } 668 - ``` 669 - 670 - **Multipart Upload:** 671 - ```go 672 - func (p *ProxyBlobStore) startMultipartUpload(ctx context.Context, digest string) (string, error) { 673 - reqBody := map[string]any{ 674 - "action": "start", 675 - "digest": digest, 676 - } 677 - 678 - body, _ := json.Marshal(reqBody) 679 - url := fmt.Sprintf("%s/xrpc/com.atproto.repo.uploadBlob", p.storageEndpoint) 680 - req, _ := http.NewRequestWithContext(ctx, "POST", url, bytes.NewReader(body)) 681 - req.Header.Set("Content-Type", "application/json") 682 - 683 - resp, err := p.httpClient.Do(req) 684 - // ... parse response, return uploadID 685 - } 686 - 687 - func (p *ProxyBlobStore) getPartPresignedURL(ctx context.Context, digest, uploadID string, partNumber int) (string, error) { 688 - reqBody := map[string]any{ 689 - "action": "part", 690 - "uploadId": uploadID, 691 - "partNumber": partNumber, 692 - "digest": digest, 693 - } 694 - 695 - body, _ := json.Marshal(reqBody) 696 - url := fmt.Sprintf("%s/xrpc/com.atproto.repo.uploadBlob", p.storageEndpoint) 697 - req, _ := http.NewRequestWithContext(ctx, "POST", url, bytes.NewReader(body)) 698 - req.Header.Set("Content-Type", "application/json") 699 - 700 - resp, err := p.httpClient.Do(req) 701 - // ... parse response, return presigned URL 702 - } 703 - 704 - // Similar for complete, abort 705 - ``` 706 - 707 - ### Phase 4: Testing Period 708 - 709 - **During transition:** 710 - - Both legacy HTTP endpoints AND new XRPC endpoint active 711 - - AppView can use either based on configuration/feature flag 712 - - New deployments use XRPC 713 - - Old deployments continue with legacy 714 - 715 - **Detection logic:** 716 - ```go 717 - func (r *RoutingRepository) Blobs(ctx context.Context) distribution.BlobStore { 718 - // Try XRPC first (check for /.well-known/did.json) 719 - if supportsXRPC(storageEndpoint) { 720 - return NewXRPCBlobStore(storageEndpoint, ...) 721 - } 722 - // Fallback to legacy 723 - return NewProxyBlobStore(storageEndpoint, ...) 724 - } 725 - ``` 726 - 727 - ### Phase 5: Remove Legacy Endpoints 728 - 729 - Once all holds migrated and tested: 730 - 731 - **cmd/hold/main.go - Remove:** 732 - ```go 733 - // DELETE these lines 734 - mux.HandleFunc("/presigned-url", service.HandlePresignedURL) 735 - mux.HandleFunc("/move", service.HandleMove) 736 - mux.HandleFunc("/start-multipart", service.HandleStartMultipart) 737 - mux.HandleFunc("/part-presigned-url", service.HandleGetPartURL) 738 - mux.HandleFunc("/complete-multipart", service.HandleCompleteMultipart) 739 - mux.HandleFunc("/abort-multipart", service.HandleAbortMultipart) 740 - mux.HandleFunc("/multipart-parts/", ...) 741 - ``` 742 - 743 - **pkg/hold/handlers.go - Remove HTTP handler wrappers:** 744 - ```go 745 - // DELETE these functions: 746 - // - HandlePresignedURL() - replaced by uploadBlob + getBlob XRPC endpoints 747 - // - HandleMove() - now internal operation in CompleteMultipartUploadWithManager() 748 - // - HandleStartMultipart() - replaced by uploadBlob?action=start 749 - // - HandleGetPartURL() - replaced by uploadBlob?action=part 750 - // - HandleCompleteMultipart() - replaced by uploadBlob?action=complete 751 - // - HandleAbortMultipart() - replaced by uploadBlob?action=abort 752 - // - HandleMultipartPartUpload() - replaced by uploadBlob PUT with headers 753 - 754 - // KEEP internal service methods: 755 - // - s.getPresignedURL() - still used by blobstore_adapter 756 - // - s.driver.Move() - still used for temp→final move 757 - // - s.StartMultipartUploadWithManager() - core multipart logic 758 - // - s.GetPartUploadURL() - presigned URL generation 759 - // - s.CompleteMultipartUploadWithManager() - includes move operation 760 - // - s.AbortMultipartUploadWithManager() - cleanup logic 761 - ``` 762 - 763 - ## Key Design Decisions 764 - 765 - 1. **Content-Type discrimination**: Natural way to distinguish single vs multipart uploads 766 - 2. **JSON bodies for all parameters**: Follows XRPC conventions (like putRecord, deleteRecord) 767 - - **No query parameters** - all operation details in request body 768 - - Makes requests more inspectable and debuggable 769 - - Easier to extend with new fields 770 - 3. **Preserve standard uploadBlob**: Raw bytes still work for profile images, small media 771 - 4. **Reuse existing code**: HoldService multipart logic unchanged, just new HTTP layer 772 - 5. **Backward compatibility**: Both endpoints active during transition 773 - 6. **Action-based routing**: Clear, extensible JSON structure 774 - 7. **Move is internal**: `/move` endpoint logic absorbed into multipart complete operation 775 - - No separate XRPC endpoint needed 776 - - Simplifies AppView client code 777 - 8. **Unified presigned URL handling**: Single `uploadBlob`/`getBlob` pair replaces operation-based routing 778 - 9. **Flexible CID parameter**: `getBlob` accepts both standard CIDs and OCI digests via prefix detection 779 - - Keeps endpoint spec-compliant (GET with query params) 780 - - No conversion overhead on AppView side 781 - - Hold does simple prefix check: `sha256:` → use directly, else → convert CID 782 - 783 - ## Benefits 784 - 785 - - ✅ Single endpoint for all blob operations 786 - - ✅ Standard ATProto uploadBlob preserved 787 - - ✅ XRPC-like JSON request/response 788 - - ✅ Reuses existing multipart.go logic 789 - - ✅ Gradual migration path 790 - - ✅ Less endpoints to maintain 791 - - ✅ Cleaner AppView client code 792 - 793 - ## Testing Checklist 794 - 795 - - [ ] Single blob upload (< 10MB, raw bytes) 796 - - [ ] Multipart start → part → complete flow 797 - - [ ] S3Native mode (presigned URLs) 798 - - [ ] Buffered mode (proxy uploads) 799 - - [ ] Multipart abort 800 - - [ ] Large blob upload (> 5GB, many parts) 801 - - [ ] Concurrent uploads 802 - - [ ] Upload resume after network failure 803 - - [ ] Legacy endpoint backward compatibility 804 - - [ ] AppView XRPC client integration 805 - - [ ] Performance comparison (XRPC vs legacy) 806 - 807 - ## Migration Timeline 808 - 809 - 1. **Week 1**: Implement unified uploadBlob handler (Phase 1-2) 810 - 2. **Week 2**: Update AppView client, feature flag (Phase 3) 811 - 3. **Week 3**: Deploy to dev/staging, test both paths (Phase 4) 812 - 4. **Week 4**: Roll out to production (gradual) 813 - 5. **Week 5-6**: Monitor, verify all holds migrated 814 - 6. **Week 7**: Remove legacy endpoints (Phase 5) 815 - 816 - ## References 817 - 818 - - ATProto uploadBlob spec: https://docs.bsky.app/docs/api/com-atproto-repo-upload-blob 819 - - XRPC conventions: https://atproto.com/specs/xrpc 820 - - Existing multipart implementation: pkg/hold/multipart.go 821 - - Blob store adapter: pkg/hold/blobstore_adapter.go
+19
pkg/appview/db/migrations/0005_normalize_hold_endpoint_to_did.yaml
··· 1 + description: Normalize hold_endpoint column to store DIDs instead of URLs 2 + query: | 3 + -- Convert any URL-formatted hold_endpoint values to DID format 4 + -- This ensures all hold identifiers are stored consistently as did:web:hostname 5 + 6 + -- Convert HTTPS URLs to did:web: format 7 + -- https://hold.example.com → did:web:hold.example.com 8 + UPDATE manifests 9 + SET hold_endpoint = 'did:web:' || substr(hold_endpoint, 9) 10 + WHERE hold_endpoint LIKE 'https://%'; 11 + 12 + -- Convert HTTP URLs to did:web: format 13 + -- http://172.28.0.3:8080 → did:web:172.28.0.3:8080 14 + UPDATE manifests 15 + SET hold_endpoint = 'did:web:' || substr(hold_endpoint, 8) 16 + WHERE hold_endpoint LIKE 'http://%'; 17 + 18 + -- Entries already in did:web: format are left unchanged 19 + -- did:web:hold.example.com → did:web:hold.example.com (no change)
+15 -11
pkg/appview/db/models.go
··· 65 65 66 66 // Push represents a combined tag and manifest for the recent pushes view 67 67 type Push struct { 68 - DID string 69 - Handle string 70 - Repository string 71 - Tag string 72 - Digest string 73 - Title string 74 - Description string 75 - IconURL string 76 - StarCount int 77 - PullCount int 78 - CreatedAt time.Time 68 + DID string 69 + Handle string 70 + Repository string 71 + Tag string 72 + Digest string 73 + Title string 74 + Description string 75 + IconURL string 76 + StarCount int 77 + PullCount int 78 + CreatedAt time.Time 79 + HoldEndpoint string // Hold endpoint for health checking 80 + Reachable bool // Whether the hold endpoint is reachable 79 81 } 80 82 81 83 // Repository represents an aggregated view of a user's repository ··· 156 158 Platforms []PlatformInfo 157 159 PlatformCount int 158 160 IsManifestList bool 161 + Reachable bool // Whether the hold endpoint is reachable 162 + Pending bool // Whether health check is still in progress 159 163 }
+6 -4
pkg/appview/db/queries.go
··· 44 44 COALESCE(m.icon_url, ''), 45 45 COALESCE(rs.pull_count, 0), 46 46 COALESCE((SELECT COUNT(*) FROM stars WHERE owner_did = u.did AND repository = t.repository), 0), 47 - t.created_at 47 + t.created_at, 48 + m.hold_endpoint 48 49 FROM tags t 49 50 JOIN users u ON t.did = u.did 50 51 JOIN manifests m ON t.did = m.did AND t.repository = m.repository AND t.digest = m.digest ··· 70 71 var pushes []Push 71 72 for rows.Next() { 72 73 var p Push 73 - if err := rows.Scan(&p.DID, &p.Handle, &p.Repository, &p.Tag, &p.Digest, &p.Title, &p.Description, &p.IconURL, &p.PullCount, &p.StarCount, &p.CreatedAt); err != nil { 74 + if err := rows.Scan(&p.DID, &p.Handle, &p.Repository, &p.Tag, &p.Digest, &p.Title, &p.Description, &p.IconURL, &p.PullCount, &p.StarCount, &p.CreatedAt, &p.HoldEndpoint); err != nil { 74 75 return nil, 0, err 75 76 } 76 77 pushes = append(pushes, p) ··· 113 114 COALESCE(m.icon_url, ''), 114 115 COALESCE(rs.pull_count, 0), 115 116 COALESCE((SELECT COUNT(*) FROM stars WHERE owner_did = u.did AND repository = t.repository), 0), 116 - t.created_at 117 + t.created_at, 118 + m.hold_endpoint 117 119 FROM tags t 118 120 JOIN users u ON t.did = u.did 119 121 JOIN manifests m ON t.did = m.did AND t.repository = m.repository AND t.digest = m.digest ··· 136 138 var pushes []Push 137 139 for rows.Next() { 138 140 var p Push 139 - if err := rows.Scan(&p.DID, &p.Handle, &p.Repository, &p.Tag, &p.Digest, &p.Title, &p.Description, &p.IconURL, &p.PullCount, &p.StarCount, &p.CreatedAt); err != nil { 141 + if err := rows.Scan(&p.DID, &p.Handle, &p.Repository, &p.Tag, &p.Digest, &p.Title, &p.Description, &p.IconURL, &p.PullCount, &p.StarCount, &p.CreatedAt, &p.HoldEndpoint); err != nil { 140 142 return nil, 0, err 141 143 } 142 144 pushes = append(pushes, p)
+1 -1
pkg/appview/db/schema.go
··· 38 38 did TEXT NOT NULL, 39 39 repository TEXT NOT NULL, 40 40 digest TEXT NOT NULL, 41 - hold_endpoint TEXT NOT NULL, 41 + hold_endpoint TEXT NOT NULL, -- Stored as DID (e.g., did:web:hold.example.com) 42 42 schema_version INTEGER NOT NULL, 43 43 media_type TEXT NOT NULL, 44 44 config_digest TEXT,
+34 -3
pkg/appview/handlers/home.go
··· 7 7 "strconv" 8 8 9 9 "atcr.io/pkg/appview/db" 10 + "atcr.io/pkg/appview/holdhealth" 10 11 ) 11 12 12 13 // HomeHandler handles the home page ··· 54 55 55 56 // RecentPushesHandler handles the HTMX request for recent pushes 56 57 type RecentPushesHandler struct { 57 - DB *sql.DB 58 - Templates *template.Template 59 - RegistryURL string 58 + DB *sql.DB 59 + Templates *template.Template 60 + RegistryURL string 61 + HealthChecker *holdhealth.Checker 60 62 } 61 63 62 64 func (h *RecentPushesHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { ··· 76 78 if err != nil { 77 79 http.Error(w, err.Error(), http.StatusInternalServerError) 78 80 return 81 + } 82 + 83 + // Check health status and filter out unreachable manifests for home page 84 + // Use GetCachedStatus only (no blocking) - background worker keeps cache fresh 85 + if h.HealthChecker != nil { 86 + reachablePushes := []db.Push{} 87 + for i := range pushes { 88 + if pushes[i].HoldEndpoint != "" { 89 + // Use cached status only - don't block on health checks 90 + cached := h.HealthChecker.GetCachedStatus(pushes[i].HoldEndpoint) 91 + if cached != nil { 92 + pushes[i].Reachable = cached.Reachable 93 + // Only show reachable pushes on home page 94 + if cached.Reachable { 95 + reachablePushes = append(reachablePushes, pushes[i]) 96 + } 97 + } else { 98 + // No cached status - optimistically show it (background worker will check) 99 + pushes[i].Reachable = true 100 + reachablePushes = append(reachablePushes, pushes[i]) 101 + } 102 + } 103 + } 104 + pushes = reachablePushes 105 + } else { 106 + // If no health checker, assume all are reachable (backward compatibility) 107 + for i := range pushes { 108 + pushes[i].Reachable = true 109 + } 79 110 } 80 111 81 112 data := struct {
+76
pkg/appview/handlers/manifest_health.go
··· 1 + package handlers 2 + 3 + import ( 4 + "context" 5 + "net/http" 6 + "net/url" 7 + "time" 8 + 9 + "atcr.io/pkg/appview/holdhealth" 10 + ) 11 + 12 + // ManifestHealthHandler handles HTMX polling for manifest health status 13 + type ManifestHealthHandler struct { 14 + HealthChecker *holdhealth.Checker 15 + } 16 + 17 + func (h *ManifestHealthHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { 18 + // Get endpoint from query parameter 19 + endpoint := r.URL.Query().Get("endpoint") 20 + if endpoint == "" { 21 + http.Error(w, "endpoint parameter required", http.StatusBadRequest) 22 + return 23 + } 24 + 25 + // Decode URL-encoded endpoint 26 + endpoint, err := url.QueryUnescape(endpoint) 27 + if err != nil { 28 + http.Error(w, "invalid endpoint parameter", http.StatusBadRequest) 29 + return 30 + } 31 + 32 + // Try to get cached status first (instant if background worker has checked it) 33 + cached := h.HealthChecker.GetCachedStatus(endpoint) 34 + if cached != nil { 35 + // Cache hit - return final status 36 + h.renderBadge(w, endpoint, cached.Reachable, false) 37 + return 38 + } 39 + 40 + // Cache miss - perform quick check with 2 second timeout 41 + ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second) 42 + defer cancel() 43 + 44 + reachable, err := h.HealthChecker.CheckHealth(ctx, endpoint) 45 + 46 + if ctx.Err() == context.DeadlineExceeded { 47 + // Still pending - render "Checking..." badge with HTMX retry 48 + h.renderBadge(w, endpoint, false, true) 49 + } else if err != nil { 50 + // Error - mark as unreachable 51 + h.renderBadge(w, endpoint, false, false) 52 + } else { 53 + // Success 54 + h.renderBadge(w, endpoint, reachable, false) 55 + } 56 + } 57 + 58 + // renderBadge renders the appropriate badge HTML snippet 59 + func (h *ManifestHealthHandler) renderBadge(w http.ResponseWriter, endpoint string, reachable, pending bool) { 60 + w.Header().Set("Content-Type", "text/html") 61 + 62 + if pending { 63 + // Still checking - render badge with HTMX retry after 3 seconds 64 + retryURL := "/api/manifest-health?endpoint=" + url.QueryEscape(endpoint) 65 + w.Write([]byte(`<span class="checking-badge" 66 + hx-get="` + retryURL + `" 67 + hx-trigger="load delay:3s" 68 + hx-swap="outerHTML">🔄 Checking...</span>`)) 69 + } else if !reachable { 70 + // Unreachable - render offline badge 71 + w.Write([]byte(`<span class="offline-badge">⚠️ Offline</span>`)) 72 + } else { 73 + // Reachable - no badge (empty response) 74 + w.Write([]byte(``)) 75 + } 76 + }
+80 -12
pkg/appview/handlers/repository.go
··· 1 1 package handlers 2 2 3 3 import ( 4 + "context" 4 5 "database/sql" 5 6 "html/template" 6 7 "log" 7 8 "net/http" 9 + "sync" 10 + "time" 8 11 9 12 "atcr.io/pkg/appview/db" 13 + "atcr.io/pkg/appview/holdhealth" 10 14 "atcr.io/pkg/appview/middleware" 11 15 "atcr.io/pkg/atproto" 12 16 "atcr.io/pkg/auth/oauth" ··· 16 20 17 21 // RepositoryPageHandler handles the public repository page 18 22 type RepositoryPageHandler struct { 19 - DB *sql.DB 20 - Templates *template.Template 21 - RegistryURL string 22 - Directory identity.Directory 23 - Refresher *oauth.Refresher 23 + DB *sql.DB 24 + Templates *template.Template 25 + RegistryURL string 26 + Directory identity.Directory 27 + Refresher *oauth.Refresher 28 + HealthChecker *holdhealth.Checker 24 29 } 25 30 26 31 func (h *RepositoryPageHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { ··· 54 59 return 55 60 } 56 61 62 + // Check health status for each manifest's hold endpoint (concurrent with 1s timeout) 63 + if h.HealthChecker != nil { 64 + // Create context with 1 second deadline for fast-fail 65 + ctx, cancel := context.WithTimeout(r.Context(), 1*time.Second) 66 + defer cancel() 67 + 68 + var wg sync.WaitGroup 69 + var mu sync.Mutex 70 + 71 + for i := range manifests { 72 + if manifests[i].HoldEndpoint == "" { 73 + // No hold endpoint, mark as unreachable 74 + manifests[i].Reachable = false 75 + manifests[i].Pending = false 76 + continue 77 + } 78 + 79 + wg.Add(1) 80 + go func(idx int) { 81 + defer wg.Done() 82 + 83 + endpoint := manifests[idx].HoldEndpoint 84 + 85 + // Try to get cached status first (instant) 86 + if cached := h.HealthChecker.GetCachedStatus(endpoint); cached != nil { 87 + mu.Lock() 88 + manifests[idx].Reachable = cached.Reachable 89 + manifests[idx].Pending = false 90 + mu.Unlock() 91 + return 92 + } 93 + 94 + // Perform health check with timeout context 95 + reachable, err := h.HealthChecker.CheckHealth(ctx, endpoint) 96 + 97 + mu.Lock() 98 + if ctx.Err() == context.DeadlineExceeded { 99 + // Timeout - mark as pending for HTMX polling 100 + manifests[idx].Reachable = false 101 + manifests[idx].Pending = true 102 + } else if err != nil { 103 + // Error - mark as unreachable 104 + manifests[idx].Reachable = false 105 + manifests[idx].Pending = false 106 + } else { 107 + // Success 108 + manifests[idx].Reachable = reachable 109 + manifests[idx].Pending = false 110 + } 111 + mu.Unlock() 112 + }(i) 113 + } 114 + 115 + // Wait for all checks to complete or timeout 116 + wg.Wait() 117 + } else { 118 + // If no health checker, assume all are reachable (backward compatibility) 119 + for i := range manifests { 120 + manifests[i].Reachable = true 121 + manifests[i].Pending = false 122 + } 123 + } 124 + 57 125 if len(tagsWithPlatforms) == 0 && len(manifests) == 0 { 58 126 http.Error(w, "Repository not found", http.StatusNotFound) 59 127 return ··· 100 168 101 169 data := struct { 102 170 PageData 103 - Owner *db.User // Repository owner 104 - Repository *db.Repository // Repository summary 105 - Tags []db.TagWithPlatforms // Tags with platform info 106 - Manifests []db.ManifestWithMetadata // Top-level manifests only 107 - StarCount int 108 - IsStarred bool 109 - IsOwner bool // Whether current user owns this repository 171 + Owner *db.User // Repository owner 172 + Repository *db.Repository // Repository summary 173 + Tags []db.TagWithPlatforms // Tags with platform info 174 + Manifests []db.ManifestWithMetadata // Top-level manifests only 175 + StarCount int 176 + IsStarred bool 177 + IsOwner bool // Whether current user owns this repository 110 178 }{ 111 179 PageData: NewPageData(r, h.RegistryURL), 112 180 Owner: owner,
+179
pkg/appview/holdhealth/checker.go
··· 1 + package holdhealth 2 + 3 + import ( 4 + "context" 5 + "fmt" 6 + "net/http" 7 + "sync" 8 + "time" 9 + 10 + "atcr.io/pkg/appview" 11 + ) 12 + 13 + // HealthStatus represents the health status of a hold endpoint 14 + type HealthStatus struct { 15 + Reachable bool 16 + LastChecked time.Time 17 + LastError error 18 + } 19 + 20 + // Checker manages health checking for hold endpoints 21 + type Checker struct { 22 + client *http.Client 23 + cache map[string]*HealthStatus 24 + cacheMu sync.RWMutex 25 + cacheTTL time.Duration 26 + cleanupMu sync.Mutex 27 + } 28 + 29 + // NewChecker creates a new health checker with the specified cache TTL 30 + func NewChecker(cacheTTL time.Duration) *Checker { 31 + return NewCheckerWithTimeout(cacheTTL, 2*time.Second) 32 + } 33 + 34 + // NewCheckerWithTimeout creates a new health checker with custom timeout 35 + // Useful for testing with shorter timeouts 36 + func NewCheckerWithTimeout(cacheTTL, httpTimeout time.Duration) *Checker { 37 + return &Checker{ 38 + client: &http.Client{ 39 + Timeout: httpTimeout, 40 + }, 41 + cache: make(map[string]*HealthStatus), 42 + cacheTTL: cacheTTL, 43 + } 44 + } 45 + 46 + // CheckHealth performs an HTTP health check on the hold endpoint 47 + // Accepts either DID (did:web:host) or URL (https://host) format 48 + // Checks {endpoint}/xrpc/_health and returns true if reachable 49 + func (c *Checker) CheckHealth(ctx context.Context, endpoint string) (bool, error) { 50 + // Convert DID to HTTP URL if needed 51 + // did:web:hold.example.com → https://hold.example.com 52 + // https://hold.example.com → https://hold.example.com (passthrough) 53 + httpURL := appview.ResolveHoldURL(endpoint) 54 + 55 + // Build health check URL 56 + healthURL := httpURL + "/xrpc/_health" 57 + 58 + // Create request with context 59 + req, err := http.NewRequestWithContext(ctx, "GET", healthURL, nil) 60 + if err != nil { 61 + return false, fmt.Errorf("failed to create request: %w", err) 62 + } 63 + 64 + // Perform request 65 + resp, err := c.client.Do(req) 66 + if err != nil { 67 + return false, fmt.Errorf("request failed: %w", err) 68 + } 69 + defer resp.Body.Close() 70 + 71 + // Check status code 72 + if resp.StatusCode >= 200 && resp.StatusCode < 300 { 73 + return true, nil 74 + } 75 + 76 + return false, fmt.Errorf("unexpected status code: %d", resp.StatusCode) 77 + } 78 + 79 + // GetStatus returns the cached health status for an endpoint 80 + // If the cache is expired or missing, it performs an on-demand check 81 + func (c *Checker) GetStatus(ctx context.Context, endpoint string) *HealthStatus { 82 + // Check cache first 83 + c.cacheMu.RLock() 84 + status, exists := c.cache[endpoint] 85 + c.cacheMu.RUnlock() 86 + 87 + // If cached and not expired, return it 88 + if exists && time.Since(status.LastChecked) < c.cacheTTL { 89 + return status 90 + } 91 + 92 + // On-demand check 93 + reachable, err := c.CheckHealth(ctx, endpoint) 94 + 95 + // Update cache 96 + newStatus := &HealthStatus{ 97 + Reachable: reachable, 98 + LastChecked: time.Now(), 99 + LastError: err, 100 + } 101 + 102 + c.cacheMu.Lock() 103 + c.cache[endpoint] = newStatus 104 + c.cacheMu.Unlock() 105 + 106 + return newStatus 107 + } 108 + 109 + // GetCachedStatus returns the cached status without performing a check 110 + // Returns nil if no cached status exists 111 + func (c *Checker) GetCachedStatus(endpoint string) *HealthStatus { 112 + c.cacheMu.RLock() 113 + defer c.cacheMu.RUnlock() 114 + 115 + status, exists := c.cache[endpoint] 116 + if !exists { 117 + return nil 118 + } 119 + 120 + // Return nil if expired 121 + if time.Since(status.LastChecked) > c.cacheTTL { 122 + return nil 123 + } 124 + 125 + return status 126 + } 127 + 128 + // SetStatus manually sets the health status for an endpoint 129 + // Used by the background worker to update cache 130 + func (c *Checker) SetStatus(endpoint string, reachable bool, err error) { 131 + status := &HealthStatus{ 132 + Reachable: reachable, 133 + LastChecked: time.Now(), 134 + LastError: err, 135 + } 136 + 137 + c.cacheMu.Lock() 138 + c.cache[endpoint] = status 139 + c.cacheMu.Unlock() 140 + } 141 + 142 + // Cleanup removes stale cache entries (older than 30 minutes) 143 + func (c *Checker) Cleanup() { 144 + c.cleanupMu.Lock() 145 + defer c.cleanupMu.Unlock() 146 + 147 + c.cacheMu.Lock() 148 + defer c.cacheMu.Unlock() 149 + 150 + cutoff := time.Now().Add(-30 * time.Minute) 151 + for endpoint, status := range c.cache { 152 + if status.LastChecked.Before(cutoff) { 153 + delete(c.cache, endpoint) 154 + } 155 + } 156 + } 157 + 158 + // GetCacheStats returns cache statistics for debugging 159 + func (c *Checker) GetCacheStats() map[string]any { 160 + c.cacheMu.RLock() 161 + defer c.cacheMu.RUnlock() 162 + 163 + reachable := 0 164 + unreachable := 0 165 + 166 + for _, status := range c.cache { 167 + if status.Reachable { 168 + reachable++ 169 + } else { 170 + unreachable++ 171 + } 172 + } 173 + 174 + return map[string]any{ 175 + "total": len(c.cache), 176 + "reachable": reachable, 177 + "unreachable": unreachable, 178 + } 179 + }
+253
pkg/appview/holdhealth/checker_test.go
··· 1 + package holdhealth 2 + 3 + import ( 4 + "context" 5 + "net/http" 6 + "net/http/httptest" 7 + "testing" 8 + "time" 9 + ) 10 + 11 + func TestNewChecker(t *testing.T) { 12 + cacheTTL := 15 * time.Minute 13 + checker := NewChecker(cacheTTL) 14 + 15 + if checker == nil { 16 + t.Fatal("NewChecker returned nil") 17 + } 18 + 19 + if checker.cacheTTL != cacheTTL { 20 + t.Errorf("Expected cache TTL %v, got %v", cacheTTL, checker.cacheTTL) 21 + } 22 + 23 + if checker.cache == nil { 24 + t.Error("Cache map not initialized") 25 + } 26 + } 27 + 28 + func TestCheckHealth_Success(t *testing.T) { 29 + // Create test server that returns 200 30 + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { 31 + if r.URL.Path != "/xrpc/_health" { 32 + t.Errorf("Expected path /xrpc/_health, got %s", r.URL.Path) 33 + } 34 + w.WriteHeader(http.StatusOK) 35 + w.Write([]byte(`{"version": "1.0.0"}`)) 36 + })) 37 + defer server.Close() 38 + 39 + checker := NewChecker(15 * time.Minute) 40 + ctx := context.Background() 41 + 42 + reachable, err := checker.CheckHealth(ctx, server.URL) 43 + if err != nil { 44 + t.Errorf("CheckHealth returned error: %v", err) 45 + } 46 + 47 + if !reachable { 48 + t.Error("Expected hold to be reachable") 49 + } 50 + } 51 + 52 + func TestCheckHealth_WithDID(t *testing.T) { 53 + // Create test server that returns 200 54 + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { 55 + if r.URL.Path != "/xrpc/_health" { 56 + t.Errorf("Expected path /xrpc/_health, got %s", r.URL.Path) 57 + } 58 + w.WriteHeader(http.StatusOK) 59 + w.Write([]byte(`{"version": "1.0.0"}`)) 60 + })) 61 + defer server.Close() 62 + 63 + checker := NewChecker(15 * time.Minute) 64 + ctx := context.Background() 65 + 66 + // Test with DID format (did:web:host) 67 + // Extract host:port from test server URL 68 + // http://127.0.0.1:12345 → did:web:127.0.0.1:12345 69 + serverURL := server.URL 70 + didFormat := "did:web:" + serverURL[7:] // Remove "http://" 71 + 72 + reachable, err := checker.CheckHealth(ctx, didFormat) 73 + if err != nil { 74 + t.Errorf("CheckHealth with DID returned error: %v", err) 75 + } 76 + 77 + if !reachable { 78 + t.Error("Expected hold to be reachable with DID format") 79 + } 80 + } 81 + 82 + func TestCheckHealth_Failure(t *testing.T) { 83 + // Create test server that returns 500 84 + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { 85 + w.WriteHeader(http.StatusInternalServerError) 86 + })) 87 + defer server.Close() 88 + 89 + checker := NewChecker(15 * time.Minute) 90 + ctx := context.Background() 91 + 92 + reachable, err := checker.CheckHealth(ctx, server.URL) 93 + if err == nil { 94 + t.Error("Expected error for 500 status code") 95 + } 96 + 97 + if reachable { 98 + t.Error("Expected hold to be unreachable") 99 + } 100 + } 101 + 102 + func TestCheckHealth_Timeout(t *testing.T) { 103 + // Create test server that delays longer than client timeout 104 + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { 105 + time.Sleep(200 * time.Millisecond) // Longer than 100ms test timeout 106 + })) 107 + defer server.Close() 108 + 109 + // Use custom timeout of 100ms for faster test 110 + checker := NewCheckerWithTimeout(15*time.Minute, 100*time.Millisecond) 111 + ctx := context.Background() 112 + 113 + reachable, err := checker.CheckHealth(ctx, server.URL) 114 + if err == nil { 115 + t.Error("Expected timeout error") 116 + } 117 + 118 + if reachable { 119 + t.Error("Expected hold to be unreachable due to timeout") 120 + } 121 + } 122 + 123 + func TestGetStatus_CacheHit(t *testing.T) { 124 + checker := NewChecker(15 * time.Minute) 125 + endpoint := "https://example.com" 126 + 127 + // Manually set cached status 128 + checker.SetStatus(endpoint, true, nil) 129 + 130 + // Get status should return cached value 131 + status := checker.GetStatus(context.Background(), endpoint) 132 + if status == nil { 133 + t.Fatal("GetStatus returned nil") 134 + } 135 + 136 + if !status.Reachable { 137 + t.Error("Expected cached status to be reachable") 138 + } 139 + 140 + if status.LastError != nil { 141 + t.Errorf("Expected no error, got %v", status.LastError) 142 + } 143 + } 144 + 145 + func TestGetStatus_CacheMiss(t *testing.T) { 146 + // Create test server 147 + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { 148 + w.WriteHeader(http.StatusOK) 149 + })) 150 + defer server.Close() 151 + 152 + checker := NewChecker(15 * time.Minute) 153 + 154 + // Get status should perform check on cache miss 155 + status := checker.GetStatus(context.Background(), server.URL) 156 + if status == nil { 157 + t.Fatal("GetStatus returned nil") 158 + } 159 + 160 + if !status.Reachable { 161 + t.Error("Expected status to be reachable") 162 + } 163 + } 164 + 165 + func TestGetStatus_CacheExpiry(t *testing.T) { 166 + // Create checker with very short TTL 167 + checker := NewChecker(100 * time.Millisecond) 168 + endpoint := "https://example.com" 169 + 170 + // Set cached status 171 + checker.SetStatus(endpoint, true, nil) 172 + 173 + // Wait for cache to expire 174 + time.Sleep(150 * time.Millisecond) 175 + 176 + // GetCachedStatus should return nil for expired entry 177 + status := checker.GetCachedStatus(endpoint) 178 + if status != nil { 179 + t.Error("Expected nil for expired cache entry") 180 + } 181 + } 182 + 183 + func TestSetStatus(t *testing.T) { 184 + checker := NewChecker(15 * time.Minute) 185 + endpoint := "https://example.com" 186 + 187 + // Set status 188 + checker.SetStatus(endpoint, true, nil) 189 + 190 + // Verify it was set 191 + status := checker.GetCachedStatus(endpoint) 192 + if status == nil { 193 + t.Fatal("Status not found in cache") 194 + } 195 + 196 + if !status.Reachable { 197 + t.Error("Expected status to be reachable") 198 + } 199 + } 200 + 201 + func TestCleanup(t *testing.T) { 202 + checker := NewChecker(1 * time.Minute) 203 + 204 + // Add old entry (simulate old timestamp by manually setting it) 205 + endpoint := "https://example.com" 206 + checker.cache[endpoint] = &HealthStatus{ 207 + Reachable: true, 208 + LastChecked: time.Now().Add(-31 * time.Minute), // 31 minutes ago 209 + } 210 + 211 + // Add recent entry 212 + recentEndpoint := "https://recent.com" 213 + checker.SetStatus(recentEndpoint, true, nil) 214 + 215 + // Run cleanup 216 + checker.Cleanup() 217 + 218 + // Old entry should be removed 219 + if checker.GetCachedStatus(endpoint) != nil { 220 + t.Error("Expected old entry to be cleaned up") 221 + } 222 + 223 + // Recent entry should remain 224 + if checker.GetCachedStatus(recentEndpoint) == nil { 225 + t.Error("Expected recent entry to remain after cleanup") 226 + } 227 + } 228 + 229 + func TestGetCacheStats(t *testing.T) { 230 + checker := NewChecker(15 * time.Minute) 231 + 232 + // Add some entries 233 + checker.SetStatus("https://reachable1.com", true, nil) 234 + checker.SetStatus("https://reachable2.com", true, nil) 235 + checker.SetStatus("https://unreachable1.com", false, nil) 236 + 237 + stats := checker.GetCacheStats() 238 + 239 + total, ok := stats["total"].(int) 240 + if !ok || total != 3 { 241 + t.Errorf("Expected total=3, got %v", stats["total"]) 242 + } 243 + 244 + reachable, ok := stats["reachable"].(int) 245 + if !ok || reachable != 2 { 246 + t.Errorf("Expected reachable=2, got %v", stats["reachable"]) 247 + } 248 + 249 + unreachable, ok := stats["unreachable"].(int) 250 + if !ok || unreachable != 1 { 251 + t.Errorf("Expected unreachable=1, got %v", stats["unreachable"]) 252 + } 253 + }
+169
pkg/appview/holdhealth/worker.go
··· 1 + package holdhealth 2 + 3 + import ( 4 + "context" 5 + "database/sql" 6 + "fmt" 7 + "log" 8 + "sync" 9 + "time" 10 + ) 11 + 12 + // DBQuerier interface for database queries (allows mocking in tests) 13 + type DBQuerier interface { 14 + GetUniqueHoldEndpoints() ([]string, error) 15 + } 16 + 17 + // Worker runs background health checks for hold endpoints 18 + type Worker struct { 19 + checker *Checker 20 + db DBQuerier 21 + refreshTicker *time.Ticker 22 + cleanupTicker *time.Ticker 23 + stopChan chan struct{} 24 + wg sync.WaitGroup 25 + } 26 + 27 + // NewWorker creates a new background worker 28 + func NewWorker(checker *Checker, db DBQuerier, refreshInterval time.Duration) *Worker { 29 + return &Worker{ 30 + checker: checker, 31 + db: db, 32 + refreshTicker: time.NewTicker(refreshInterval), 33 + cleanupTicker: time.NewTicker(30 * time.Minute), // Cleanup every 30 minutes 34 + stopChan: make(chan struct{}), 35 + } 36 + } 37 + 38 + // Start begins the background worker 39 + func (w *Worker) Start(ctx context.Context) { 40 + w.wg.Add(1) 41 + go func() { 42 + defer w.wg.Done() 43 + 44 + log.Println("Hold health worker: Starting background health checks") 45 + 46 + // Perform initial check immediately 47 + w.refreshAllHolds(ctx) 48 + 49 + for { 50 + select { 51 + case <-ctx.Done(): 52 + log.Println("Hold health worker: Context cancelled, stopping") 53 + return 54 + case <-w.stopChan: 55 + log.Println("Hold health worker: Stop signal received") 56 + return 57 + case <-w.refreshTicker.C: 58 + w.refreshAllHolds(ctx) 59 + case <-w.cleanupTicker.C: 60 + log.Println("Hold health worker: Running cache cleanup") 61 + w.checker.Cleanup() 62 + } 63 + } 64 + }() 65 + } 66 + 67 + // Stop gracefully stops the worker 68 + func (w *Worker) Stop() { 69 + close(w.stopChan) 70 + w.refreshTicker.Stop() 71 + w.cleanupTicker.Stop() 72 + w.wg.Wait() 73 + log.Println("Hold health worker: Stopped") 74 + } 75 + 76 + // refreshAllHolds queries the database for unique hold endpoints and refreshes their health status 77 + func (w *Worker) refreshAllHolds(ctx context.Context) { 78 + log.Println("Hold health worker: Starting refresh cycle") 79 + 80 + // Get unique hold endpoints from database 81 + endpoints, err := w.db.GetUniqueHoldEndpoints() 82 + if err != nil { 83 + log.Printf("Hold health worker: Failed to fetch hold endpoints: %v", err) 84 + return 85 + } 86 + 87 + if len(endpoints) == 0 { 88 + log.Println("Hold health worker: No hold endpoints to check") 89 + return 90 + } 91 + 92 + log.Printf("Hold health worker: Checking %d unique hold endpoints", len(endpoints)) 93 + 94 + // Check health concurrently with rate limiting 95 + // Use a semaphore to limit concurrent requests (max 10 at a time) 96 + sem := make(chan struct{}, 10) 97 + var wg sync.WaitGroup 98 + 99 + reachable := 0 100 + unreachable := 0 101 + var statsMu sync.Mutex 102 + 103 + for _, endpoint := range endpoints { 104 + wg.Add(1) 105 + 106 + go func(ep string) { 107 + defer wg.Done() 108 + 109 + // Acquire semaphore 110 + sem <- struct{}{} 111 + defer func() { <-sem }() 112 + 113 + // Check health 114 + isReachable, err := w.checker.CheckHealth(ctx, ep) 115 + 116 + // Update cache 117 + w.checker.SetStatus(ep, isReachable, err) 118 + 119 + // Update stats 120 + statsMu.Lock() 121 + if isReachable { 122 + reachable++ 123 + } else { 124 + unreachable++ 125 + log.Printf("Hold health worker: Hold unreachable: %s (error: %v)", ep, err) 126 + } 127 + statsMu.Unlock() 128 + }(endpoint) 129 + } 130 + 131 + // Wait for all checks to complete 132 + wg.Wait() 133 + 134 + log.Printf("Hold health worker: Refresh complete - %d reachable, %d unreachable", reachable, unreachable) 135 + } 136 + 137 + // DBAdapter wraps sql.DB to implement DBQuerier interface 138 + type DBAdapter struct { 139 + db *sql.DB 140 + } 141 + 142 + // NewDBAdapter creates a new database adapter 143 + func NewDBAdapter(db *sql.DB) *DBAdapter { 144 + return &DBAdapter{db: db} 145 + } 146 + 147 + // GetUniqueHoldEndpoints queries the database for unique hold endpoints 148 + func (a *DBAdapter) GetUniqueHoldEndpoints() ([]string, error) { 149 + rows, err := a.db.Query(`SELECT DISTINCT hold_endpoint FROM manifests WHERE hold_endpoint != ''`) 150 + if err != nil { 151 + return nil, fmt.Errorf("failed to query hold endpoints: %w", err) 152 + } 153 + defer rows.Close() 154 + 155 + var endpoints []string 156 + for rows.Next() { 157 + var endpoint string 158 + if err := rows.Scan(&endpoint); err != nil { 159 + return nil, fmt.Errorf("failed to scan endpoint: %w", err) 160 + } 161 + endpoints = append(endpoints, endpoint) 162 + } 163 + 164 + if err := rows.Err(); err != nil { 165 + return nil, fmt.Errorf("error iterating rows: %w", err) 166 + } 167 + 168 + return endpoints, nil 169 + }
+55
pkg/appview/static/css/style.css
··· 1044 1044 margin-top: 0.5rem; 1045 1045 } 1046 1046 1047 + /* Offline manifest badge */ 1048 + .offline-badge { 1049 + display: inline-block; 1050 + padding: 0.25rem 0.5rem; 1051 + background: var(--warning-bg); 1052 + color: var(--warning); 1053 + border: 1px solid var(--warning); 1054 + border-radius: 4px; 1055 + font-size: 0.85rem; 1056 + font-weight: 600; 1057 + margin-left: 0.5rem; 1058 + } 1059 + 1060 + /* Checking manifest badge (health check in progress) */ 1061 + .checking-badge { 1062 + display: inline-block; 1063 + padding: 0.25rem 0.5rem; 1064 + background: #e3f2fd; 1065 + color: #1976d2; 1066 + border: 1px solid #1976d2; 1067 + border-radius: 4px; 1068 + font-size: 0.85rem; 1069 + font-weight: 600; 1070 + margin-left: 0.5rem; 1071 + } 1072 + 1073 + /* Hide offline manifests by default */ 1074 + .manifest-item[data-reachable="false"] { 1075 + display: none; 1076 + } 1077 + 1078 + /* Show offline manifests when toggle is checked */ 1079 + .manifests-list.show-offline .manifest-item[data-reachable="false"] { 1080 + display: block; 1081 + opacity: 0.6; 1082 + } 1083 + 1084 + /* Show offline images toggle styling */ 1085 + .show-offline-toggle { 1086 + display: flex; 1087 + align-items: center; 1088 + gap: 0.5rem; 1089 + cursor: pointer; 1090 + user-select: none; 1091 + } 1092 + 1093 + .show-offline-toggle input[type="checkbox"] { 1094 + cursor: pointer; 1095 + } 1096 + 1097 + .show-offline-toggle span { 1098 + font-size: 0.9rem; 1099 + color: var(--border-dark); 1100 + } 1101 + 1047 1102 .manifest-detail-label { 1048 1103 font-weight: 500; 1049 1104 color: var(--secondary);
+38
pkg/appview/static/js/app.js
··· 243 243 console.error('Error loading star count:', err); 244 244 } 245 245 } 246 + 247 + // Toggle offline manifests visibility 248 + function toggleOfflineManifests() { 249 + const checkbox = document.getElementById('show-offline-toggle'); 250 + const manifestsList = document.querySelector('.manifests-list'); 251 + 252 + if (!checkbox || !manifestsList) return; 253 + 254 + // Store preference in localStorage 255 + localStorage.setItem('showOfflineManifests', checkbox.checked); 256 + 257 + // Toggle visibility of offline manifests 258 + if (checkbox.checked) { 259 + manifestsList.classList.add('show-offline'); 260 + } else { 261 + manifestsList.classList.remove('show-offline'); 262 + } 263 + } 264 + 265 + // Restore offline manifests toggle state on page load 266 + document.addEventListener('DOMContentLoaded', () => { 267 + const checkbox = document.getElementById('show-offline-toggle'); 268 + if (!checkbox) return; 269 + 270 + // Restore state from localStorage 271 + const showOffline = localStorage.getItem('showOfflineManifests') === 'true'; 272 + checkbox.checked = showOffline; 273 + 274 + // Apply initial state 275 + const manifestsList = document.querySelector('.manifests-list'); 276 + if (manifestsList) { 277 + if (showOffline) { 278 + manifestsList.classList.add('show-offline'); 279 + } else { 280 + manifestsList.classList.remove('show-offline'); 281 + } 282 + } 283 + });
+4 -15
pkg/appview/storage/proxy_blob_store.go
··· 8 8 "io" 9 9 "net/http" 10 10 "net/url" 11 - "strings" 12 11 "sync" 13 12 "time" 14 13 14 + "atcr.io/pkg/appview" 15 15 "atcr.io/pkg/atproto" 16 16 "github.com/distribution/distribution/v3" 17 17 "github.com/opencontainers/go-digest" ··· 199 199 return nil 200 200 } 201 201 202 - // resolveHoldURL converts a hold DID to an HTTP URL for XRPC requests 203 - // did:web:hold01.atcr.io → https://hold01.atcr.io 204 - // did:web:172.28.0.3:8080 → http://172.28.0.3:8080 202 + // resolveHoldURL converts a hold identifier (DID or URL) to an HTTP URL 203 + // Deprecated: Use appview.ResolveHoldURL instead 205 204 func resolveHoldURL(holdDID string) string { 206 - hostname := strings.TrimPrefix(holdDID, "did:web:") 207 - 208 - // Use HTTP for localhost/IP addresses with ports, HTTPS for domains 209 - if strings.Contains(hostname, ":") || 210 - strings.Contains(hostname, "127.0.0.1") || 211 - strings.Contains(hostname, "localhost") || 212 - // Check if it's an IP address (contains only digits and dots) 213 - (len(hostname) > 0 && (hostname[0] >= '0' && hostname[0] <= '9')) { 214 - return "http://" + hostname 215 - } 216 - return "https://" + hostname 205 + return appview.ResolveHoldURL(holdDID) 217 206 } 218 207 219 208 // Stat returns the descriptor for a blob
+18 -2
pkg/appview/templates/pages/repository.html
··· 140 140 141 141 <!-- Manifests Section --> 142 142 <div class="repo-section"> 143 - <h2>Manifests</h2> 143 + <div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 1rem;"> 144 + <h2>Manifests</h2> 145 + <label class="show-offline-toggle"> 146 + <input type="checkbox" id="show-offline-toggle" onchange="toggleOfflineManifests()"> 147 + <span>Show offline images</span> 148 + </label> 149 + </div> 144 150 {{ if .Manifests }} 145 151 <div class="manifests-list"> 146 152 {{ range .Manifests }} 147 - <div class="manifest-item" id="manifest-{{ sanitizeID .Manifest.Digest }}"> 153 + <div class="manifest-item" id="manifest-{{ sanitizeID .Manifest.Digest }}" data-reachable="{{ .Reachable }}"> 148 154 <div class="manifest-item-header"> 149 155 <div> 150 156 {{ if .IsManifestList }} 151 157 <span class="manifest-type">📦 Multi-arch</span> 152 158 {{ else }} 153 159 <span class="manifest-type">📄 Image</span> 160 + {{ end }} 161 + {{ if .Pending }} 162 + <span class="checking-badge" 163 + hx-get="/api/manifest-health?endpoint={{ .Manifest.HoldEndpoint | urlquery }}" 164 + hx-trigger="load delay:2s" 165 + hx-swap="outerHTML"> 166 + 🔄 Checking... 167 + </span> 168 + {{ else if not .Reachable }} 169 + <span class="offline-badge">⚠️ Offline</span> 154 170 {{ end }} 155 171 <code class="manifest-digest">{{ .Manifest.Digest }}</code> 156 172 </div>
+33
pkg/appview/utils.go
··· 1 + package appview 2 + 3 + import "strings" 4 + 5 + // ResolveHoldURL converts a hold identifier (DID or URL) to an HTTP/HTTPS URL 6 + // Handles both formats for backward compatibility: 7 + // - DID format: did:web:hold01.atcr.io → https://hold01.atcr.io 8 + // - DID with port: did:web:172.28.0.3:8080 → http://172.28.0.3:8080 9 + // - URL format: https://hold.example.com → https://hold.example.com (passthrough) 10 + func ResolveHoldURL(holdIdentifier string) string { 11 + // If it's already a URL (has scheme), return as-is 12 + if strings.HasPrefix(holdIdentifier, "http://") || strings.HasPrefix(holdIdentifier, "https://") { 13 + return holdIdentifier 14 + } 15 + 16 + // If it's a DID, convert to URL 17 + if strings.HasPrefix(holdIdentifier, "did:web:") { 18 + hostname := strings.TrimPrefix(holdIdentifier, "did:web:") 19 + 20 + // Use HTTP for localhost/IP addresses with ports, HTTPS for domains 21 + if strings.Contains(hostname, ":") || 22 + strings.Contains(hostname, "127.0.0.1") || 23 + strings.Contains(hostname, "localhost") || 24 + // Check if it's an IP address (contains only digits and dots in first part) 25 + (len(hostname) > 0 && hostname[0] >= '0' && hostname[0] <= '9') { 26 + return "http://" + hostname 27 + } 28 + return "https://" + hostname 29 + } 30 + 31 + // Fallback: assume it's a hostname and use HTTPS 32 + return "https://" + holdIdentifier 33 + }
+61
pkg/appview/utils_test.go
··· 1 + package appview 2 + 3 + import "testing" 4 + 5 + func TestResolveHoldURL(t *testing.T) { 6 + tests := []struct { 7 + name string 8 + input string 9 + expected string 10 + }{ 11 + { 12 + name: "DID with HTTPS domain", 13 + input: "did:web:hold.example.com", 14 + expected: "https://hold.example.com", 15 + }, 16 + { 17 + name: "DID with HTTP and port (IP)", 18 + input: "did:web:172.28.0.3:8080", 19 + expected: "http://172.28.0.3:8080", 20 + }, 21 + { 22 + name: "DID with HTTP and port (localhost)", 23 + input: "did:web:127.0.0.1:8080", 24 + expected: "http://127.0.0.1:8080", 25 + }, 26 + { 27 + name: "DID with localhost", 28 + input: "did:web:localhost:8080", 29 + expected: "http://localhost:8080", 30 + }, 31 + { 32 + name: "Already HTTPS URL (passthrough)", 33 + input: "https://hold.example.com", 34 + expected: "https://hold.example.com", 35 + }, 36 + { 37 + name: "Already HTTP URL (passthrough)", 38 + input: "http://172.28.0.3:8080", 39 + expected: "http://172.28.0.3:8080", 40 + }, 41 + { 42 + name: "Plain hostname (fallback to HTTPS)", 43 + input: "hold.example.com", 44 + expected: "https://hold.example.com", 45 + }, 46 + { 47 + name: "DID with subdomain", 48 + input: "did:web:hold01.atcr.io", 49 + expected: "https://hold01.atcr.io", 50 + }, 51 + } 52 + 53 + for _, tt := range tests { 54 + t.Run(tt.name, func(t *testing.T) { 55 + result := ResolveHoldURL(tt.input) 56 + if result != tt.expected { 57 + t.Errorf("ResolveHoldURL(%q) = %q, want %q", tt.input, result, tt.expected) 58 + } 59 + }) 60 + } 61 + }
+13 -13
pkg/atproto/profile_test.go
··· 149 149 // TestGetProfile tests retrieving a user's profile 150 150 func TestGetProfile(t *testing.T) { 151 151 tests := []struct { 152 - name string 153 - serverResponse string 154 - serverStatus int 155 - wantProfile *SailorProfileRecord 156 - wantNil bool 157 - wantErr bool 158 - expectMigration bool // Whether URL-to-DID migration should happen 159 - originalHoldURL string 160 - expectedHoldDID string 152 + name string 153 + serverResponse string 154 + serverStatus int 155 + wantProfile *SailorProfileRecord 156 + wantNil bool 157 + wantErr bool 158 + expectMigration bool // Whether URL-to-DID migration should happen 159 + originalHoldURL string 160 + expectedHoldDID string 161 161 }{ 162 162 { 163 163 name: "profile with DID (no migration needed)", ··· 359 359 // TestUpdateProfile tests updating a user's profile 360 360 func TestUpdateProfile(t *testing.T) { 361 361 tests := []struct { 362 - name string 363 - profile *SailorProfileRecord 364 - wantNormalized string // Expected defaultHold after normalization 365 - wantErr bool 362 + name string 363 + profile *SailorProfileRecord 364 + wantNormalized string // Expected defaultHold after normalization 365 + wantErr bool 366 366 }{ 367 367 { 368 368 name: "update with DID",