A container registry that uses the AT Protocol for manifest storage and S3 for blob storage.
1# ATCR Troubleshooting Guide
2
3This document provides troubleshooting guidance for common ATCR deployment and operational issues.
4
5## OAuth Authentication Failures
6
7### JWT Timestamp Validation Errors
8
9**Symptom:**
10```
11error: invalid_client
12error_description: Validation of "client_assertion" failed: "iat" claim timestamp check failed (it should be in the past)
13```
14
15**Root Cause:**
16The AppView server's system clock is ahead of the PDS server's clock. When the AppView generates a JWT for OAuth client authentication (confidential client mode), the "iat" (issued at) claim appears to be in the future from the PDS's perspective.
17
18**Diagnosis:**
19
201. Check AppView system time:
21```bash
22date -u
23timedatectl status
24```
25
262. Check if NTP is active and synchronized:
27```bash
28timedatectl show-timesync --all
29```
30
313. Compare AppView time with PDS time (if accessible):
32```bash
33# On AppView
34date +%s
35
36# On PDS (or via HTTP headers)
37curl -I https://your-pds.example.com | grep -i date
38```
39
404. Check AppView logs for clock information (logged at startup):
41```bash
42docker logs atcr-appview 2>&1 | grep "Configured confidential OAuth client"
43```
44
45Example log output:
46```
47level=INFO msg="Configured confidential OAuth client"
48 key_id=did:key:z...
49 system_time_unix=1731844215
50 system_time_rfc3339=2025-11-17T14:30:15Z
51 timezone=UTC
52```
53
54**Solution:**
55
561. **Enable NTP synchronization** (recommended):
57
58 On most Linux systems using systemd:
59 ```bash
60 # Enable and start systemd-timesyncd
61 sudo timedatectl set-ntp true
62
63 # Verify NTP is active
64 timedatectl status
65 ```
66
67 Expected output:
68 ```
69 System clock synchronized: yes
70 NTP service: active
71 ```
72
732. **Alternative: Use chrony** (if systemd-timesyncd is not available):
74 ```bash
75 # Install chrony
76 sudo apt-get install chrony # Debian/Ubuntu
77 sudo yum install chrony # RHEL/CentOS
78
79 # Enable and start chronyd
80 sudo systemctl enable chronyd
81 sudo systemctl start chronyd
82
83 # Check sync status
84 chronyc tracking
85 ```
86
873. **Force immediate sync**:
88 ```bash
89 # systemd-timesyncd
90 sudo systemctl restart systemd-timesyncd
91
92 # Or with chrony
93 sudo chronyc makestep
94 ```
95
964. **In Docker/Kubernetes environments:**
97
98 The container inherits the host's system clock, so fix NTP on the **host** machine:
99 ```bash
100 # On Docker host
101 sudo timedatectl set-ntp true
102
103 # Restart AppView container to pick up correct time
104 docker restart atcr-appview
105 ```
106
1075. **Verify clock skew is resolved**:
108 ```bash
109 # Should show clock offset < 1 second
110 timedatectl timesync-status
111 ```
112
113**Acceptable Clock Skew:**
114- Most OAuth implementations tolerate ±30-60 seconds of clock skew
115- DPoP proof validation is typically stricter (±10 seconds)
116- Aim for < 1 second skew for reliable operation
117
118**Prevention:**
119- Configure NTP synchronization in your infrastructure-as-code (Terraform, Ansible, etc.)
120- Monitor clock skew in production (e.g., Prometheus node_exporter includes clock metrics)
121- Use managed container platforms (ECS, GKE, AKS) that handle NTP automatically
122
123---
124
125### DPoP Nonce Mismatch Errors
126
127**Symptom:**
128```
129error: use_dpop_nonce
130error_description: DPoP "nonce" mismatch
131```
132
133Repeated multiple times, potentially followed by:
134```
135error: server_error
136error_description: Server error
137```
138
139**Root Cause:**
140DPoP (Demonstrating Proof-of-Possession) requires a server-provided nonce for replay protection. These errors typically occur when:
1411. Multiple concurrent requests create a DPoP nonce race condition
1422. Clock skew causes DPoP proof timestamps to fail validation
1433. PDS session state becomes corrupted after repeated failures
144
145**Diagnosis:**
146
1471. Check if errors occur during concurrent operations:
148```bash
149# During docker push with multiple layers
150docker logs atcr-appview 2>&1 | grep "use_dpop_nonce" | wc -l
151```
152
1532. Check for clock skew (see section above):
154```bash
155timedatectl status
156```
157
1583. Look for session lock acquisition in logs:
159```bash
160docker logs atcr-appview 2>&1 | grep "Acquired session lock"
161```
162
163**Solution:**
164
1651. **If caused by clock skew**: Fix NTP synchronization (see section above)
166
1672. **If caused by session corruption**:
168 ```bash
169 # The AppView will automatically delete corrupted sessions
170 # User just needs to re-authenticate
171 docker login atcr.io
172 ```
173
1743. **If persistent despite clock sync**:
175 - Check PDS health and logs (may be a PDS-side issue)
176 - Verify network connectivity between AppView and PDS
177 - Check if PDS supports latest OAuth/DPoP specifications
178
179**What ATCR does automatically:**
180- Per-DID locking prevents concurrent DPoP nonce races
181- Indigo library automatically retries with fresh nonces
182- Sessions are auto-deleted after repeated failures
183- Service token cache prevents excessive PDS requests
184
185**Prevention:**
186- Ensure reliable NTP synchronization
187- Use a stable, well-maintained PDS implementation
188- Monitor AppView error rates for DPoP-related issues
189
190---
191
192### OAuth Session Not Found
193
194**Symptom:**
195```
196error: failed to get OAuth session: no session found for DID
197```
198
199**Root Cause:**
200- User has never authenticated via OAuth
201- OAuth session was deleted due to corruption or expiry
202- Database migration cleared sessions
203
204**Solution:**
205
2061. User re-authenticates via OAuth flow:
207 ```bash
208 docker login atcr.io
209 # Or for web UI: visit https://atcr.io/login
210 ```
211
2122. If using app passwords (legacy), verify token is cached:
213 ```bash
214 # Check if app-password token exists
215 docker logout atcr.io
216 docker login atcr.io -u your.handle -p your-app-password
217 ```
218
219---
220
221## AppView Deployment Issues
222
223### Client Metadata URL Not Accessible
224
225**Symptom:**
226```
227error: unauthorized_client
228error_description: Client metadata endpoint returned 404
229```
230
231**Root Cause:**
232PDS cannot fetch OAuth client metadata from `{ATCR_BASE_URL}/client-metadata.json`
233
234**Diagnosis:**
235
2361. Verify client metadata endpoint is accessible:
237 ```bash
238 curl https://your-atcr-instance.com/client-metadata.json
239 ```
240
2412. Check AppView logs for startup errors:
242 ```bash
243 docker logs atcr-appview 2>&1 | grep "client-metadata"
244 ```
245
2463. Verify `ATCR_BASE_URL` is set correctly:
247 ```bash
248 echo $ATCR_BASE_URL
249 ```
250
251**Solution:**
252
2531. Ensure `ATCR_BASE_URL` matches your public URL:
254 ```bash
255 export ATCR_BASE_URL=https://atcr.example.com
256 ```
257
2582. Verify reverse proxy (nginx, Caddy, etc.) routes `/.well-known/*` and `/client-metadata.json`:
259 ```nginx
260 location / {
261 proxy_pass http://localhost:5000;
262 proxy_set_header Host $host;
263 proxy_set_header X-Forwarded-Proto $scheme;
264 }
265 ```
266
2673. Check firewall rules allow inbound HTTPS:
268 ```bash
269 sudo ufw status
270 sudo iptables -L -n | grep 443
271 ```
272
273---
274
275## Hold Service Issues
276
277### Blob Storage Connectivity
278
279**Symptom:**
280```
281error: failed to upload blob: connection refused
282```
283
284**Diagnosis:**
285
2861. Check hold service logs:
287 ```bash
288 docker logs atcr-hold 2>&1 | grep -i error
289 ```
290
2912. Verify S3 credentials are correct:
292 ```bash
293 # Test S3 access
294 aws s3 ls s3://your-bucket --endpoint-url=$S3_ENDPOINT
295 ```
296
2973. Check hold configuration:
298 ```bash
299 env | grep -E "(S3_|AWS_|STORAGE_)"
300 ```
301
302**Solution:**
303
3041. Verify environment variables in hold service:
305 ```bash
306 export AWS_ACCESS_KEY_ID=your-key
307 export AWS_SECRET_ACCESS_KEY=your-secret
308 export S3_BUCKET=your-bucket
309 export S3_ENDPOINT=https://s3.us-west-2.amazonaws.com
310 ```
311
3122. Test S3 connectivity from hold container:
313 ```bash
314 docker exec atcr-hold curl -v $S3_ENDPOINT
315 ```
316
3173. Check S3 bucket permissions (requires PutObject, GetObject, DeleteObject)
318
319---
320
321## Performance Issues
322
323### High Database Lock Contention
324
325**Symptom:**
326Slow Docker push/pull operations, high CPU usage on AppView
327
328**Diagnosis:**
329
3301. Check SQLite database size:
331 ```bash
332 ls -lh /var/lib/atcr/ui.db
333 ```
334
3352. Look for long-running queries:
336 ```bash
337 docker logs atcr-appview 2>&1 | grep "database is locked"
338 ```
339
340**Solution:**
341
3421. For production, migrate to PostgreSQL (recommended):
343 ```bash
344 export ATCR_UI_DATABASE_TYPE=postgres
345 export ATCR_UI_DATABASE_URL=postgresql://user:pass@localhost/atcr
346 ```
347
3482. Or increase SQLite busy timeout:
349 ```go
350 // In code: db.SetMaxOpenConns(1) for SQLite
351 ```
352
3533. Vacuum the database to reclaim space:
354 ```bash
355 sqlite3 /var/lib/atcr/ui.db "VACUUM;"
356 ```
357
358---
359
360## Logging and Debugging
361
362### Enable Debug Logging
363
364Set log level to debug for detailed troubleshooting:
365
366```bash
367export ATCR_LOG_LEVEL=debug
368docker restart atcr-appview
369```
370
371### Useful Log Queries
372
373**OAuth token exchange errors:**
374```bash
375docker logs atcr-appview 2>&1 | grep "OAuth callback failed"
376```
377
378**Service token request failures:**
379```bash
380docker logs atcr-appview 2>&1 | grep "OAuth authentication failed during service token request"
381```
382
383**Clock diagnostics:**
384```bash
385docker logs atcr-appview 2>&1 | grep "system_time"
386```
387
388**DPoP nonce issues:**
389```bash
390docker logs atcr-appview 2>&1 | grep -E "(use_dpop_nonce|DPoP)"
391```
392
393### Health Checks
394
395**AppView health:**
396```bash
397curl http://localhost:5000/v2/
398# Should return: {"errors":[{"code":"UNAUTHORIZED",...}]}
399```
400
401**Hold service health:**
402```bash
403curl http://localhost:8080/.well-known/did.json
404# Should return DID document
405```
406
407---
408
409## Getting Help
410
411If issues persist after following this guide:
412
4131. **Check GitHub Issues**: https://github.com/ericvolp12/atcr/issues
4142. **Collect logs**: Include output from `docker logs` for AppView and Hold services
4153. **Include diagnostics**:
416 - `timedatectl status` output
417 - AppView version: `docker exec atcr-appview cat /VERSION` (if available)
418 - PDS version and implementation (Bluesky PDS, other)
4194. **File an issue** with reproducible steps
420
421---
422
423## Common Error Reference
424
425| Error Code | Component | Common Cause | Fix |
426|------------|-----------|--------------|-----|
427| `invalid_client` (iat timestamp) | OAuth | Clock skew | Enable NTP sync |
428| `use_dpop_nonce` | OAuth/DPoP | Concurrent requests or clock skew | Fix NTP, wait for auto-retry |
429| `server_error` (500) | PDS | PDS internal error | Check PDS logs |
430| `invalid_grant` | OAuth | Expired auth code | Retry OAuth flow |
431| `unauthorized_client` | OAuth | Client metadata unreachable | Check ATCR_BASE_URL and firewall |
432| `RecordNotFound` | ATProto | Manifest doesn't exist | Verify repository name |
433| Connection refused | Hold/S3 | Network/credentials | Check S3 config and connectivity |