Embedded PDS Architecture for Hold Services#
This document describes ATCR's hold service architecture using embedded ATProto PDS (Personal Data Server) for access control and federation.
Motivation#
The Fragmentation Problem#
Several ATProto projects face similar challenges with large data storage:
| Project | Large Data | Metadata | Solution |
|---|---|---|---|
| tangled.org | Git objects | Issues, PRs, comments | External knot storage |
| stream.place | Video segments | Stream info, chat | Embedded "static PDS" |
| ATCR | Container blobs | Manifests, comments, builds | Embedded PDS in hold service |
Common problem: Large binary data can't realistically live in user PDSs, but application metadata needs a federated home.
ATCR's approach: Each hold service is a full ATProto actor with its own embedded PDS for shared data (captain + crew records, not user-specific data). This PDS stores access control and metadata about the hold itself.
Current Architecture#
Hold Service Components#
Hold Service (did:web:hold01.atcr.io)
├── Embedded PDS (SQLite carstore) - Shared data only
│ ├── Captain record (ownership metadata)
│ ├── Crew records (access control)
│ └── ATProto sync/repo endpoints
├── OCI multipart upload (XRPC)
│ ├── io.atcr.hold.initiateUpload
│ ├── io.atcr.hold.getPartUploadUrl
│ ├── io.atcr.hold.uploadPart
│ ├── io.atcr.hold.completeUpload
│ └── io.atcr.hold.abortUpload
└── Storage driver (S3, filesystem, etc.)
Important distinction:
- Hold's embedded PDS = Shared data (crew members, hold configuration)
- User's PDS = User-specific data (manifests, sailor profile, personal records)
- Hold's PDS does NOT store user-specific container data (that stays in user's own PDS)
Records Structure#
Captain record (hold ownership, single record at io.atcr.hold.captain/self):
{
"$type": "io.atcr.hold.captain",
"owner": "did:plc:alice123",
"public": false,
"deployedAt": "2025-10-14T...",
"region": "iad",
"provider": "fly.io"
}
Crew records (access control, one per member at io.atcr.hold.crew/{rkey}):
{
"$type": "io.atcr.hold.crew",
"member": "did:plc:bob456",
"role": "admin",
"permissions": ["blob:read", "blob:write"],
"addedAt": "2025-10-14T..."
}
ATProto PDS Endpoints#
Standard ATProto sync endpoints:
GET /xrpc/com.atproto.sync.getRepo- Download repository as CAR fileGET /xrpc/com.atproto.sync.getBlob- Get blob or presigned download URLGET /xrpc/com.atproto.sync.subscribeRepos- Real-time crew changesGET /xrpc/com.atproto.sync.listRepos- List repositories
Repository management:
GET /xrpc/com.atproto.repo.describeRepo- Repository metadataGET /xrpc/com.atproto.repo.getRecord- Get specific record (captain/crew)GET /xrpc/com.atproto.repo.listRecords- List crew membersPOST /xrpc/io.atcr.hold.requestCrew- Request crew membership
DID resolution:
GET /.well-known/did.json- DID document (did:web resolution)GET /.well-known/atproto-did- DID for handle resolution
OCI Multipart Upload Flow#
1. AppView gets service token from user's PDS:
GET /xrpc/com.atproto.server.getServiceAuth?aud={holdDID}
Response: { "token": "eyJ..." }
2. AppView initiates multipart upload:
POST /xrpc/io.atcr.hold.initiateUpload
Authorization: Bearer {serviceToken}
Body: { "digest": "sha256:abc..." }
Response: { "uploadId": "xyz" }
3. For each part:
POST /xrpc/io.atcr.hold.getPartUploadUrl
Body: { "uploadId": "xyz", "partNumber": 1 }
Response: { "url": "https://s3.../presigned" }
4. Upload part to S3 presigned URL:
PUT {presignedURL}
Body: [part data]
5. Complete upload:
POST /xrpc/io.atcr.hold.completeUpload
Body: { "uploadId": "xyz", "digest": "sha256:abc...", "parts": [...] }
Implementation Details#
Storage: Indigo Carstore with SQLite#
type HoldPDS struct {
did string
carstore carstore.CarStore
session *carstore.DeltaSession // Provides blockstore interface
repo *repo.Repo
dbPath string
uid models.Uid // User ID for carstore (fixed: 1)
}
Storage location: Single SQLite file (/var/lib/atcr-hold/hold.db)
- Contains MST nodes, records, commits in carstore tables
- Handles compaction/cleanup automatically
- Migration path to Postgres if needed (same carstore API)
Key Implementation Lessons#
1. Custom Record Types Need Manual CBOR Decoding#
// ❌ WRONG - Fails with "unrecognized lexicon type"
record, err := repo.GetRecord(ctx, path, &CrewRecord{})
// ✅ CORRECT - Manual CBOR decoding
recordCID, recBytes, err := repo.GetRecordBytes(ctx, path)
var crewRecord CrewRecord
err = crewRecord.UnmarshalCBOR(bytes.NewReader(*recBytes))
Indigo's lexicon system doesn't know about custom types like io.atcr.hold.crew.
2. JSON and CBOR Struct Tags Must Match#
// ✅ CORRECT - JSON tags match CBOR tags
type CrewRecord struct {
Type string `json:"$type" cborgen:"$type"`
Member string `json:"member" cborgen:"member"`
Role string `json:"role" cborgen:"role"`
Permissions []string `json:"permissions" cborgen:"permissions"`
AddedAt string `json:"addedAt" cborgen:"addedAt"`
}
CID verification requires identical bytes from JSON and CBOR encodings.
3. MST ForEach Returns Full Paths#
// ✅ CORRECT - Extract just the rkey
err := repo.ForEach(ctx, "io.atcr.hold.crew", func(k string, v cid.Cid) error {
// k = "io.atcr.hold.crew/3m37dr2ddit22"
parts := strings.Split(k, "/")
rkey := parts[len(parts)-1] // "3m37dr2ddit22"
return nil
})
4. CAR Files Must Include Full MST Path#
For com.atproto.sync.getRecord, return CAR with:
- Commit block - Repo head with signature
- MST tree nodes - Path from root to record
- Record block - The actual record data
Use util.NewLoggingBstore() to capture all accessed blocks.
IAM Challenges#
Current Implementation: Service Tokens#
AppView uses com.atproto.server.getServiceAuth to get tokens for calling holds:
// AppView requests service token from user's PDS
GET /xrpc/com.atproto.server.getServiceAuth?aud={holdDID}&lxm=com.atproto.repo.getRecord
// PDS returns short-lived token (60 seconds)
{ "token": "eyJ..." }
// AppView uses token to authenticate to hold
Authorization: Bearer eyJ...
Known Issues#
1. RPC Permission Format with IP Addresses#
Problem: Service token RPC permissions don't work with IP addresses in the audience (aud) field:
Error: RPC permission format invalid
Permission: rpc:com.atproto.repo.getRecord?aud=172.28.0.3:8080#atcr_hold
Issue: IP address with port not supported in aud field
Impact: Local development with IP-based hold DIDs (e.g., did:web:172.28.0.3:8080) fails.
Workaround: Falls back to unauthenticated requests (works for public holds only) or use hostname-based DIDs.
2. Dynamic Hold Discovery Limitation#
Problem: AppView can only OAuth a user's default hold (configured in AppView), not dynamically discovered holds from sailor profiles.
Current limitation:
- User sets
defaultHold = "did:web:alice-storage.fly.dev"in sailor profile - AppView discovers hold DID when user pushes
- AppView tries to get service token for alice's hold from user's PDS
- BUT: User never OAuth'd through alice's hold, only through AppView's default hold
- Result: No service token available, can't authenticate to alice's hold
Why this matters:
- Users can't seamlessly use BYOS (Bring Your Own Storage)
- Hold references in sailor profiles are non-functional
- Limits portability and decentralization goals
3. Trust Model: "Trust but Verify"#
Current approach:
- User OAuth's to AppView (credential helper flow)
- Hold has crew member record for user (authorization)
- AppView requests service token from user's PDS (proof)
- Hold validates service token from user's PDS (verification)
Philosophy: "Trust but verify"
- IF user OAuth'd to AppView AND hold has crew member record for user → generally trust
- BUT don't want AppView to lie → need proof from user's PDS that it's actually them
- Service tokens provide this proof (user's PDS says "yes, I authorized this")
Challenge: Service tokens work for this model, but scope/permission format issues (see #1, #2) make it fragile in practice.
Potential Solutions#
Option A: Direct User-to-Hold Authentication#
Users authenticate directly to holds (bypassing AppView service tokens).
Pros:
- ✅ Clear trust model (user ↔ hold)
- ✅ Works with any hold (BYOS friendly)
- ✅ No OAuth scope issues
Cons:
- ❌ Multiple OAuth flows (user's PDS + each hold)
- ❌ Complex credential management
- ❌ Poor UX (authenticate to each hold separately)
Option B: AppView as OAuth Client#
AppView pre-registers with holds and uses its own credentials (not user's).
Pros:
- ✅ No OAuth scope issues
- ✅ Single OAuth flow for user
- ✅ Simpler credential management
Cons:
- ❌ Holds must trust AppView (centralization)
- ❌ Doesn't work for unknown holds
- ❌ Requires registration process
Option C: Public Hold API#
Simplify by making holds public for reads, auth only for writes.
Pros:
- ✅ No OAuth complexity for reads
- ✅ Works offline (no PDS dependency)
Cons:
- ❌ Private holds still need auth
- ❌ Not standard ATProto pattern
Option D: Hybrid Service Token + API Key#
Use service tokens when available, fall back to API keys for BYOS holds.
Pros:
- ✅ Optimal for default holds
- ✅ BYOS works with API keys
- ✅ Backward compatible
Cons:
- ❌ Two auth mechanisms
- ❌ Not pure ATProto
Recommended Approach#
Short-term (MVP):
- Public holds (no auth needed for reads)
- Default hold with service tokens (AppView-managed)
- Document BYOS limitation
Medium-term:
- Hybrid approach (service tokens + API key fallback)
- Clear security model for hold operators
Long-term:
- Explore direct user-to-hold OAuth
- Credential helper manages multiple hold sessions
- Auto-discover and authenticate to new holds
Understanding getServiceAuth#
Purpose: com.atproto.server.getServiceAuth gives a JWT to a service with access to specific functions in the user's PDS. It's a temporary grant to a service outside of what you OAuth'd to.
How ATCR uses it:
- User OAuth's to AppView (gets broad access to their account)
- AppView needs to prove to hold that user authorized it
- AppView calls user's PDS: "give me a token scoped for this hold"
- User's PDS issues service token with narrow scope (e.g.,
rpc:com.atproto.repo.getRecord?aud={holdDID}) - AppView presents this token to hold as proof
Industry usage:
getServiceAuthappears to be the intended pattern for inter-service auth- Not widely used yet (ATProto ecosystem is young)
- Most apps use
transition:genericscope for everything (too broad, not ideal) - RPC permission scopes are finicky and not well documented
Open Questions#
- RPC permission format: Can the
audfield in RPC permissions support IP addresses? Is this a spec limitation or implementation bug? - Scope granularity: What's the right balance between
transition:generic(too broad) and fine-grained RPC scopes (finicky)? - Dynamic discovery + auth: How should AppView authenticate to arbitrary holds discovered from sailor profiles without pre-registration?
- Service token caching: Should service tokens be cached across multiple requests? Current: 50 second cache, is this optimal?
References#
- Stream.place embedded PDS: https://streamplace.leaflet.pub/3lut7mgni5s2k/l-quote/6_318-6_554#6
- ATProto OAuth spec: https://atproto.com/specs/oauth
- ATProto XRPC spec: https://atproto.com/specs/xrpc
- ATProto Service Auth: https://docs.bsky.app/docs/api/com-atproto-server-get-service-auth
- CID spec: https://github.com/multiformats/cid
- OCI Distribution Spec: https://github.com/opencontainers/distribution-spec