A container registry that uses the AT Protocol for manifest storage and S3 for blob storage.
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

at codeberg-source 433 lines 10 kB view raw view rendered
1# ATCR Troubleshooting Guide 2 3This document provides troubleshooting guidance for common ATCR deployment and operational issues. 4 5## OAuth Authentication Failures 6 7### JWT Timestamp Validation Errors 8 9**Symptom:** 10``` 11error: invalid_client 12error_description: Validation of "client_assertion" failed: "iat" claim timestamp check failed (it should be in the past) 13``` 14 15**Root Cause:** 16The AppView server's system clock is ahead of the PDS server's clock. When the AppView generates a JWT for OAuth client authentication (confidential client mode), the "iat" (issued at) claim appears to be in the future from the PDS's perspective. 17 18**Diagnosis:** 19 201. Check AppView system time: 21```bash 22date -u 23timedatectl status 24``` 25 262. Check if NTP is active and synchronized: 27```bash 28timedatectl show-timesync --all 29``` 30 313. Compare AppView time with PDS time (if accessible): 32```bash 33# On AppView 34date +%s 35 36# On PDS (or via HTTP headers) 37curl -I https://your-pds.example.com | grep -i date 38``` 39 404. Check AppView logs for clock information (logged at startup): 41```bash 42docker logs atcr-appview 2>&1 | grep "Configured confidential OAuth client" 43``` 44 45Example log output: 46``` 47level=INFO msg="Configured confidential OAuth client" 48 key_id=did:key:z... 49 system_time_unix=1731844215 50 system_time_rfc3339=2025-11-17T14:30:15Z 51 timezone=UTC 52``` 53 54**Solution:** 55 561. **Enable NTP synchronization** (recommended): 57 58 On most Linux systems using systemd: 59 ```bash 60 # Enable and start systemd-timesyncd 61 sudo timedatectl set-ntp true 62 63 # Verify NTP is active 64 timedatectl status 65 ``` 66 67 Expected output: 68 ``` 69 System clock synchronized: yes 70 NTP service: active 71 ``` 72 732. **Alternative: Use chrony** (if systemd-timesyncd is not available): 74 ```bash 75 # Install chrony 76 sudo apt-get install chrony # Debian/Ubuntu 77 sudo yum install chrony # RHEL/CentOS 78 79 # Enable and start chronyd 80 sudo systemctl enable chronyd 81 sudo systemctl start chronyd 82 83 # Check sync status 84 chronyc tracking 85 ``` 86 873. **Force immediate sync**: 88 ```bash 89 # systemd-timesyncd 90 sudo systemctl restart systemd-timesyncd 91 92 # Or with chrony 93 sudo chronyc makestep 94 ``` 95 964. **In Docker/Kubernetes environments:** 97 98 The container inherits the host's system clock, so fix NTP on the **host** machine: 99 ```bash 100 # On Docker host 101 sudo timedatectl set-ntp true 102 103 # Restart AppView container to pick up correct time 104 docker restart atcr-appview 105 ``` 106 1075. **Verify clock skew is resolved**: 108 ```bash 109 # Should show clock offset < 1 second 110 timedatectl timesync-status 111 ``` 112 113**Acceptable Clock Skew:** 114- Most OAuth implementations tolerate ±30-60 seconds of clock skew 115- DPoP proof validation is typically stricter (±10 seconds) 116- Aim for < 1 second skew for reliable operation 117 118**Prevention:** 119- Configure NTP synchronization in your infrastructure-as-code (Terraform, Ansible, etc.) 120- Monitor clock skew in production (e.g., Prometheus node_exporter includes clock metrics) 121- Use managed container platforms (ECS, GKE, AKS) that handle NTP automatically 122 123--- 124 125### DPoP Nonce Mismatch Errors 126 127**Symptom:** 128``` 129error: use_dpop_nonce 130error_description: DPoP "nonce" mismatch 131``` 132 133Repeated multiple times, potentially followed by: 134``` 135error: server_error 136error_description: Server error 137``` 138 139**Root Cause:** 140DPoP (Demonstrating Proof-of-Possession) requires a server-provided nonce for replay protection. These errors typically occur when: 1411. Multiple concurrent requests create a DPoP nonce race condition 1422. Clock skew causes DPoP proof timestamps to fail validation 1433. PDS session state becomes corrupted after repeated failures 144 145**Diagnosis:** 146 1471. Check if errors occur during concurrent operations: 148```bash 149# During docker push with multiple layers 150docker logs atcr-appview 2>&1 | grep "use_dpop_nonce" | wc -l 151``` 152 1532. Check for clock skew (see section above): 154```bash 155timedatectl status 156``` 157 1583. Look for session lock acquisition in logs: 159```bash 160docker logs atcr-appview 2>&1 | grep "Acquired session lock" 161``` 162 163**Solution:** 164 1651. **If caused by clock skew**: Fix NTP synchronization (see section above) 166 1672. **If caused by session corruption**: 168 ```bash 169 # The AppView will automatically delete corrupted sessions 170 # User just needs to re-authenticate 171 docker login atcr.io 172 ``` 173 1743. **If persistent despite clock sync**: 175 - Check PDS health and logs (may be a PDS-side issue) 176 - Verify network connectivity between AppView and PDS 177 - Check if PDS supports latest OAuth/DPoP specifications 178 179**What ATCR does automatically:** 180- Per-DID locking prevents concurrent DPoP nonce races 181- Indigo library automatically retries with fresh nonces 182- Sessions are auto-deleted after repeated failures 183- Service token cache prevents excessive PDS requests 184 185**Prevention:** 186- Ensure reliable NTP synchronization 187- Use a stable, well-maintained PDS implementation 188- Monitor AppView error rates for DPoP-related issues 189 190--- 191 192### OAuth Session Not Found 193 194**Symptom:** 195``` 196error: failed to get OAuth session: no session found for DID 197``` 198 199**Root Cause:** 200- User has never authenticated via OAuth 201- OAuth session was deleted due to corruption or expiry 202- Database migration cleared sessions 203 204**Solution:** 205 2061. User re-authenticates via OAuth flow: 207 ```bash 208 docker login atcr.io 209 # Or for web UI: visit https://atcr.io/login 210 ``` 211 2122. If using app passwords (legacy), verify token is cached: 213 ```bash 214 # Check if app-password token exists 215 docker logout atcr.io 216 docker login atcr.io -u your.handle -p your-app-password 217 ``` 218 219--- 220 221## AppView Deployment Issues 222 223### Client Metadata URL Not Accessible 224 225**Symptom:** 226``` 227error: unauthorized_client 228error_description: Client metadata endpoint returned 404 229``` 230 231**Root Cause:** 232PDS cannot fetch OAuth client metadata from `{ATCR_BASE_URL}/client-metadata.json` 233 234**Diagnosis:** 235 2361. Verify client metadata endpoint is accessible: 237 ```bash 238 curl https://your-atcr-instance.com/client-metadata.json 239 ``` 240 2412. Check AppView logs for startup errors: 242 ```bash 243 docker logs atcr-appview 2>&1 | grep "client-metadata" 244 ``` 245 2463. Verify `ATCR_BASE_URL` is set correctly: 247 ```bash 248 echo $ATCR_BASE_URL 249 ``` 250 251**Solution:** 252 2531. Ensure `ATCR_BASE_URL` matches your public URL: 254 ```bash 255 export ATCR_BASE_URL=https://atcr.example.com 256 ``` 257 2582. Verify reverse proxy (nginx, Caddy, etc.) routes `/.well-known/*` and `/client-metadata.json`: 259 ```nginx 260 location / { 261 proxy_pass http://localhost:5000; 262 proxy_set_header Host $host; 263 proxy_set_header X-Forwarded-Proto $scheme; 264 } 265 ``` 266 2673. Check firewall rules allow inbound HTTPS: 268 ```bash 269 sudo ufw status 270 sudo iptables -L -n | grep 443 271 ``` 272 273--- 274 275## Hold Service Issues 276 277### Blob Storage Connectivity 278 279**Symptom:** 280``` 281error: failed to upload blob: connection refused 282``` 283 284**Diagnosis:** 285 2861. Check hold service logs: 287 ```bash 288 docker logs atcr-hold 2>&1 | grep -i error 289 ``` 290 2912. Verify S3 credentials are correct: 292 ```bash 293 # Test S3 access 294 aws s3 ls s3://your-bucket --endpoint-url=$S3_ENDPOINT 295 ``` 296 2973. Check hold configuration: 298 ```bash 299 env | grep -E "(S3_|AWS_|STORAGE_)" 300 ``` 301 302**Solution:** 303 3041. Verify environment variables in hold service: 305 ```bash 306 export AWS_ACCESS_KEY_ID=your-key 307 export AWS_SECRET_ACCESS_KEY=your-secret 308 export S3_BUCKET=your-bucket 309 export S3_ENDPOINT=https://s3.us-west-2.amazonaws.com 310 ``` 311 3122. Test S3 connectivity from hold container: 313 ```bash 314 docker exec atcr-hold curl -v $S3_ENDPOINT 315 ``` 316 3173. Check S3 bucket permissions (requires PutObject, GetObject, DeleteObject) 318 319--- 320 321## Performance Issues 322 323### High Database Lock Contention 324 325**Symptom:** 326Slow Docker push/pull operations, high CPU usage on AppView 327 328**Diagnosis:** 329 3301. Check SQLite database size: 331 ```bash 332 ls -lh /var/lib/atcr/ui.db 333 ``` 334 3352. Look for long-running queries: 336 ```bash 337 docker logs atcr-appview 2>&1 | grep "database is locked" 338 ``` 339 340**Solution:** 341 3421. For production, migrate to PostgreSQL (recommended): 343 ```bash 344 export ATCR_UI_DATABASE_TYPE=postgres 345 export ATCR_UI_DATABASE_URL=postgresql://user:pass@localhost/atcr 346 ``` 347 3482. Or increase SQLite busy timeout: 349 ```go 350 // In code: db.SetMaxOpenConns(1) for SQLite 351 ``` 352 3533. Vacuum the database to reclaim space: 354 ```bash 355 sqlite3 /var/lib/atcr/ui.db "VACUUM;" 356 ``` 357 358--- 359 360## Logging and Debugging 361 362### Enable Debug Logging 363 364Set log level to debug for detailed troubleshooting: 365 366```bash 367export ATCR_LOG_LEVEL=debug 368docker restart atcr-appview 369``` 370 371### Useful Log Queries 372 373**OAuth token exchange errors:** 374```bash 375docker logs atcr-appview 2>&1 | grep "OAuth callback failed" 376``` 377 378**Service token request failures:** 379```bash 380docker logs atcr-appview 2>&1 | grep "OAuth authentication failed during service token request" 381``` 382 383**Clock diagnostics:** 384```bash 385docker logs atcr-appview 2>&1 | grep "system_time" 386``` 387 388**DPoP nonce issues:** 389```bash 390docker logs atcr-appview 2>&1 | grep -E "(use_dpop_nonce|DPoP)" 391``` 392 393### Health Checks 394 395**AppView health:** 396```bash 397curl http://localhost:5000/v2/ 398# Should return: {"errors":[{"code":"UNAUTHORIZED",...}]} 399``` 400 401**Hold service health:** 402```bash 403curl http://localhost:8080/.well-known/did.json 404# Should return DID document 405``` 406 407--- 408 409## Getting Help 410 411If issues persist after following this guide: 412 4131. **Check GitHub Issues**: https://github.com/ericvolp12/atcr/issues 4142. **Collect logs**: Include output from `docker logs` for AppView and Hold services 4153. **Include diagnostics**: 416 - `timedatectl status` output 417 - AppView version: `docker exec atcr-appview cat /VERSION` (if available) 418 - PDS version and implementation (Bluesky PDS, other) 4194. **File an issue** with reproducible steps 420 421--- 422 423## Common Error Reference 424 425| Error Code | Component | Common Cause | Fix | 426|------------|-----------|--------------|-----| 427| `invalid_client` (iat timestamp) | OAuth | Clock skew | Enable NTP sync | 428| `use_dpop_nonce` | OAuth/DPoP | Concurrent requests or clock skew | Fix NTP, wait for auto-retry | 429| `server_error` (500) | PDS | PDS internal error | Check PDS logs | 430| `invalid_grant` | OAuth | Expired auth code | Retry OAuth flow | 431| `unauthorized_client` | OAuth | Client metadata unreachable | Check ATCR_BASE_URL and firewall | 432| `RecordNotFound` | ATProto | Manifest doesn't exist | Verify repository name | 433| Connection refused | Hold/S3 | Network/credentials | Check S3 config and connectivity |