# Embedded PDS Architecture for Hold Services This document describes ATCR's hold service architecture using embedded ATProto PDS (Personal Data Server) for access control and federation. ## Motivation ### The Fragmentation Problem Several ATProto projects face similar challenges with large data storage: | Project | Large Data | Metadata | Solution | |---------|-----------|----------|----------| | **tangled.org** | Git objects | Issues, PRs, comments | External knot storage | | **stream.place** | Video segments | Stream info, chat | Embedded "static PDS" | | **ATCR** | Container blobs | Manifests, comments, builds | Embedded PDS in hold service | **Common problem:** Large binary data can't realistically live in user PDSs, but application metadata needs a federated home. **ATCR's approach:** Each hold service is a full ATProto actor with its own embedded PDS for **shared data** (captain + crew records, not user-specific data). This PDS stores access control and metadata about the hold itself. ## Current Architecture ### Hold Service Components ``` Hold Service (did:web:hold01.atcr.io) ├── Embedded PDS (SQLite carstore) - Shared data only │ ├── Captain record (ownership metadata) │ ├── Crew records (access control) │ └── ATProto sync/repo endpoints ├── OCI multipart upload (XRPC) │ ├── io.atcr.hold.initiateUpload │ ├── io.atcr.hold.getPartUploadUrl │ ├── io.atcr.hold.uploadPart │ ├── io.atcr.hold.completeUpload │ └── io.atcr.hold.abortUpload └── Storage driver (S3, filesystem, etc.) ``` **Important distinction:** - **Hold's embedded PDS** = Shared data (crew members, hold configuration) - **User's PDS** = User-specific data (manifests, sailor profile, personal records) - Hold's PDS does NOT store user-specific container data (that stays in user's own PDS) ### Records Structure **Captain record** (hold ownership, single record at `io.atcr.hold.captain/self`): ```json { "$type": "io.atcr.hold.captain", "owner": "did:plc:alice123", "public": false, "deployedAt": "2025-10-14T...", "region": "iad", "provider": "fly.io" } ``` **Crew records** (access control, one per member at `io.atcr.hold.crew/{rkey}`): ```json { "$type": "io.atcr.hold.crew", "member": "did:plc:bob456", "role": "admin", "permissions": ["blob:read", "blob:write"], "addedAt": "2025-10-14T..." } ``` ### ATProto PDS Endpoints Standard ATProto sync endpoints: - `GET /xrpc/com.atproto.sync.getRepo` - Download repository as CAR file - `GET /xrpc/com.atproto.sync.getBlob` - Get blob or presigned download URL - `GET /xrpc/com.atproto.sync.subscribeRepos` - Real-time crew changes - `GET /xrpc/com.atproto.sync.listRepos` - List repositories Repository management: - `GET /xrpc/com.atproto.repo.describeRepo` - Repository metadata - `GET /xrpc/com.atproto.repo.getRecord` - Get specific record (captain/crew) - `GET /xrpc/com.atproto.repo.listRecords` - List crew members - `POST /xrpc/io.atcr.hold.requestCrew` - Request crew membership DID resolution: - `GET /.well-known/did.json` - DID document (did:web resolution) - `GET /.well-known/atproto-did` - DID for handle resolution ### OCI Multipart Upload Flow ``` 1. AppView gets service token from user's PDS: GET /xrpc/com.atproto.server.getServiceAuth?aud={holdDID} Response: { "token": "eyJ..." } 2. AppView initiates multipart upload: POST /xrpc/io.atcr.hold.initiateUpload Authorization: Bearer {serviceToken} Body: { "digest": "sha256:abc..." } Response: { "uploadId": "xyz" } 3. For each part: POST /xrpc/io.atcr.hold.getPartUploadUrl Body: { "uploadId": "xyz", "partNumber": 1 } Response: { "url": "https://s3.../presigned" } 4. Upload part to S3 presigned URL: PUT {presignedURL} Body: [part data] 5. Complete upload: POST /xrpc/io.atcr.hold.completeUpload Body: { "uploadId": "xyz", "digest": "sha256:abc...", "parts": [...] } ``` ## Implementation Details ### Storage: Indigo Carstore with SQLite ```go type HoldPDS struct { did string carstore carstore.CarStore session *carstore.DeltaSession // Provides blockstore interface repo *repo.Repo dbPath string uid models.Uid // User ID for carstore (fixed: 1) } ``` **Storage location:** Single SQLite file (`/var/lib/atcr-hold/hold.db`) - Contains MST nodes, records, commits in carstore tables - Handles compaction/cleanup automatically - Migration path to Postgres if needed (same carstore API) ### Key Implementation Lessons #### 1. Custom Record Types Need Manual CBOR Decoding ```go // ❌ WRONG - Fails with "unrecognized lexicon type" record, err := repo.GetRecord(ctx, path, &CrewRecord{}) // ✅ CORRECT - Manual CBOR decoding recordCID, recBytes, err := repo.GetRecordBytes(ctx, path) var crewRecord CrewRecord err = crewRecord.UnmarshalCBOR(bytes.NewReader(*recBytes)) ``` Indigo's lexicon system doesn't know about custom types like `io.atcr.hold.crew`. #### 2. JSON and CBOR Struct Tags Must Match ```go // ✅ CORRECT - JSON tags match CBOR tags type CrewRecord struct { Type string `json:"$type" cborgen:"$type"` Member string `json:"member" cborgen:"member"` Role string `json:"role" cborgen:"role"` Permissions []string `json:"permissions" cborgen:"permissions"` AddedAt string `json:"addedAt" cborgen:"addedAt"` } ``` CID verification requires identical bytes from JSON and CBOR encodings. #### 3. MST ForEach Returns Full Paths ```go // ✅ CORRECT - Extract just the rkey err := repo.ForEach(ctx, "io.atcr.hold.crew", func(k string, v cid.Cid) error { // k = "io.atcr.hold.crew/3m37dr2ddit22" parts := strings.Split(k, "/") rkey := parts[len(parts)-1] // "3m37dr2ddit22" return nil }) ``` #### 4. CAR Files Must Include Full MST Path For `com.atproto.sync.getRecord`, return CAR with: 1. **Commit block** - Repo head with signature 2. **MST tree nodes** - Path from root to record 3. **Record block** - The actual record data Use `util.NewLoggingBstore()` to capture all accessed blocks. ## IAM Challenges ### Current Implementation: Service Tokens AppView uses `com.atproto.server.getServiceAuth` to get tokens for calling holds: ```go // AppView requests service token from user's PDS GET /xrpc/com.atproto.server.getServiceAuth?aud={holdDID}&lxm=com.atproto.repo.getRecord // PDS returns short-lived token (60 seconds) { "token": "eyJ..." } // AppView uses token to authenticate to hold Authorization: Bearer eyJ... ``` ### Known Issues #### 1. RPC Permission Format with IP Addresses **Problem:** Service token RPC permissions don't work with IP addresses in the audience (`aud`) field: ``` Error: RPC permission format invalid Permission: rpc:com.atproto.repo.getRecord?aud=172.28.0.3:8080#atcr_hold Issue: IP address with port not supported in aud field ``` **Impact:** Local development with IP-based hold DIDs (e.g., `did:web:172.28.0.3:8080`) fails. **Workaround:** Falls back to unauthenticated requests (works for public holds only) or use hostname-based DIDs. #### 2. Dynamic Hold Discovery Limitation **Problem:** AppView can only OAuth a user's default hold (configured in AppView), not dynamically discovered holds from sailor profiles. **Current limitation:** - User sets `defaultHold = "did:web:alice-storage.fly.dev"` in sailor profile - AppView discovers hold DID when user pushes - AppView tries to get service token for alice's hold from user's PDS - BUT: User never OAuth'd through alice's hold, only through AppView's default hold - Result: No service token available, can't authenticate to alice's hold **Why this matters:** - Users can't seamlessly use BYOS (Bring Your Own Storage) - Hold references in sailor profiles are non-functional - Limits portability and decentralization goals #### 3. Trust Model: "Trust but Verify" **Current approach:** 1. User OAuth's to AppView (credential helper flow) 2. Hold has crew member record for user (authorization) 3. AppView requests service token from user's PDS (proof) 4. Hold validates service token from user's PDS (verification) **Philosophy:** "Trust but verify" - IF user OAuth'd to AppView AND hold has crew member record for user → generally trust - BUT don't want AppView to lie → need proof from user's PDS that it's actually them - Service tokens provide this proof (user's PDS says "yes, I authorized this") **Challenge:** Service tokens work for this model, but scope/permission format issues (see #1, #2) make it fragile in practice. ### Potential Solutions #### Option A: Direct User-to-Hold Authentication Users authenticate directly to holds (bypassing AppView service tokens). **Pros:** - ✅ Clear trust model (user ↔ hold) - ✅ Works with any hold (BYOS friendly) - ✅ No OAuth scope issues **Cons:** - ❌ Multiple OAuth flows (user's PDS + each hold) - ❌ Complex credential management - ❌ Poor UX (authenticate to each hold separately) #### Option B: AppView as OAuth Client AppView pre-registers with holds and uses its own credentials (not user's). **Pros:** - ✅ No OAuth scope issues - ✅ Single OAuth flow for user - ✅ Simpler credential management **Cons:** - ❌ Holds must trust AppView (centralization) - ❌ Doesn't work for unknown holds - ❌ Requires registration process #### Option C: Public Hold API Simplify by making holds public for reads, auth only for writes. **Pros:** - ✅ No OAuth complexity for reads - ✅ Works offline (no PDS dependency) **Cons:** - ❌ Private holds still need auth - ❌ Not standard ATProto pattern #### Option D: Hybrid Service Token + API Key Use service tokens when available, fall back to API keys for BYOS holds. **Pros:** - ✅ Optimal for default holds - ✅ BYOS works with API keys - ✅ Backward compatible **Cons:** - ❌ Two auth mechanisms - ❌ Not pure ATProto ### Recommended Approach **Short-term (MVP):** 1. Public holds (no auth needed for reads) 2. Default hold with service tokens (AppView-managed) 3. Document BYOS limitation **Medium-term:** 1. Hybrid approach (service tokens + API key fallback) 2. Clear security model for hold operators **Long-term:** 1. Explore direct user-to-hold OAuth 2. Credential helper manages multiple hold sessions 3. Auto-discover and authenticate to new holds ### Understanding getServiceAuth **Purpose:** `com.atproto.server.getServiceAuth` gives a JWT to a service with access to specific functions in the user's PDS. It's a **temporary grant to a service outside of what you OAuth'd to**. **How ATCR uses it:** - User OAuth's to AppView (gets broad access to their account) - AppView needs to prove to hold that user authorized it - AppView calls user's PDS: "give me a token scoped for this hold" - User's PDS issues service token with narrow scope (e.g., `rpc:com.atproto.repo.getRecord?aud={holdDID}`) - AppView presents this token to hold as proof **Industry usage:** - `getServiceAuth` appears to be the intended pattern for inter-service auth - Not widely used yet (ATProto ecosystem is young) - Most apps use `transition:generic` scope for everything (too broad, not ideal) - RPC permission scopes are finicky and not well documented ### Open Questions 1. **RPC permission format:** Can the `aud` field in RPC permissions support IP addresses? Is this a spec limitation or implementation bug? 2. **Scope granularity:** What's the right balance between `transition:generic` (too broad) and fine-grained RPC scopes (finicky)? 3. **Dynamic discovery + auth:** How should AppView authenticate to arbitrary holds discovered from sailor profiles without pre-registration? 4. **Service token caching:** Should service tokens be cached across multiple requests? Current: 50 second cache, is this optimal? ## References - **Stream.place embedded PDS:** https://streamplace.leaflet.pub/3lut7mgni5s2k/l-quote/6_318-6_554#6 - **ATProto OAuth spec:** https://atproto.com/specs/oauth - **ATProto XRPC spec:** https://atproto.com/specs/xrpc - **ATProto Service Auth:** https://docs.bsky.app/docs/api/com-atproto-server-get-service-auth - **CID spec:** https://github.com/multiformats/cid - **OCI Distribution Spec:** https://github.com/opencontainers/distribution-spec