WIP! A BB-style forum, on the ATmosphere! We're still working... we'll be back soon when we have something to show off!
node typescript hono htmx atproto
4
fork

Configure Feed

Select the types of activity you want to include in your feed.

docs: add ATB-18 Forum DID agent design specification

Design document for ForumAgent service with:
- Graceful degradation (soft failure with fallback)
- Smart retry logic (network errors retry, auth errors fail permanently)
- Proactive session refresh
- Health endpoint with granular status states
- AppContext integration pattern
- Comprehensive testing strategy

Malpercio e21f20b1 da743dc4

+444
+444
docs/plans/2026-02-13-atb-18-forum-agent-design.md
··· 1 + # ATB-18: Forum DID Authenticated Agent Design 2 + 3 + **Date:** 2026-02-13 4 + **Status:** Approved 5 + **Linear Issue:** [ATB-18](https://linear.app/atbb/issue/ATB-18) 6 + 7 + ## Overview 8 + 9 + The ForumAgent service provides authenticated AT Protocol agent functionality for the Forum DID, enabling server-side PDS writes for role assignment, moderation actions, and category management. The design emphasizes graceful degradation, smart retry logic, and operational visibility. 10 + 11 + ## Architecture 12 + 13 + ### Components 14 + 15 + 1. **ForumAgent class** (`apps/appview/src/lib/forum-agent.ts`) 16 + - Wraps `@atproto/api` AtpAgent for Forum DID authentication 17 + - Manages authentication lifecycle (login, refresh, retry) 18 + - Exposes `isAuthenticated()` and `getAgent()` methods 19 + - Handles errors gracefully with network vs auth error distinction 20 + 21 + 2. **AppContext integration** (modify `apps/appview/src/lib/app-context.ts`) 22 + - Add `forumAgent: ForumAgent | null` to AppContext interface 23 + - Initialize in `createAppContext()` after DB but before routes 24 + - Clean up in `destroyAppContext()` (stop refresh timers) 25 + 26 + 3. **Health endpoint** (`apps/appview/src/routes/health.ts`) 27 + - Unauthed `GET /api/health` endpoint 28 + - Reports status of database, firehose, forumAgent 29 + - Safe for public exposure (no sensitive data) 30 + 31 + 4. **Configuration** (extend `apps/appview/src/lib/config.ts`) 32 + - Read `FORUM_HANDLE` and `FORUM_PASSWORD` from env 33 + - Optional `FORUM_PDS_URL` (defaults to `PDS_URL`) 34 + 35 + ## ForumAgent Service Design 36 + 37 + ### Class Structure 38 + 39 + ```typescript 40 + export class ForumAgent { 41 + private agent: AtpAgent | null = null; 42 + private status: ForumAgentStatus = 'initializing'; 43 + private authenticated = false; 44 + private retryCount = 0; 45 + private maxRetries = 5; 46 + private refreshTimer: NodeJS.Timeout | null = null; 47 + private lastError: string | null = null; 48 + private lastAuthAttempt: Date | null = null; 49 + private nextRetryAt: Date | null = null; 50 + 51 + constructor( 52 + private pdsUrl: string, 53 + private handle: string, 54 + private password: string 55 + ) {} 56 + 57 + async initialize(): Promise<void> 58 + isAuthenticated(): boolean 59 + getAgent(): AtpAgent | null 60 + async shutdown(): Promise<void> 61 + getStatus(): ForumAgentState 62 + } 63 + 64 + type ForumAgentStatus = 65 + | 'initializing' // First auth attempt in progress 66 + | 'authenticated' // Successfully authenticated and ready 67 + | 'retrying' // Failed but retrying with backoff (transient error) 68 + | 'failed' // Permanently failed (auth error or max retries exceeded) 69 + | 'unavailable' // Not configured (missing credentials in env) 70 + 71 + interface ForumAgentState { 72 + status: ForumAgentStatus; 73 + authenticated: boolean; 74 + lastAuthAttempt?: Date; 75 + nextRetryAt?: Date; 76 + retryCount?: number; 77 + error?: string; // User-safe error message (no credentials) 78 + } 79 + ``` 80 + 81 + ### Initialization Flow 82 + 83 + 1. **`initialize()`** - Called from `createAppContext()` 84 + - Create `AtpAgent` with `pdsUrl` 85 + - Attempt `agent.login({ identifier: handle, password })` 86 + - If success: set status to `authenticated`, schedule proactive refresh 87 + - If network error: set status to `retrying`, schedule retry with exponential backoff 88 + - If auth error (401): set status to `failed`, don't retry, log clear error 89 + - Never throw - always return gracefully (soft failure pattern) 90 + 91 + 2. **Proactive session refresh** 92 + - AT Protocol sessions include `accessJwt` and `refreshJwt` tokens 93 + - `@atproto/api` Agent handles refresh automatically via `agent.resumeSession()` 94 + - Schedule refresh check every 30 minutes 95 + - If session expires, attempt re-login with same error handling as initialization 96 + 97 + 3. **Retry mechanism** 98 + - Network errors: retry at 10s, 30s, 1m, 5m, 10m (exponential backoff) 99 + - After 5 failed attempts, stop retrying (set status to `failed`) 100 + - Auth errors: fail permanently on first attempt (no retry, prevent account lockouts) 101 + 102 + ### Status Transitions 103 + 104 + ``` 105 + unavailable (if credentials missing) 106 + 107 + initializing 108 + 109 + ├─→ authenticated (success) 110 + ├─→ retrying (network error) 111 + └─→ failed (auth error) 112 + 113 + retrying 114 + ├─→ authenticated (retry succeeds) 115 + └─→ failed (max retries exceeded) 116 + 117 + authenticated 118 + └─→ retrying (session refresh fails with network error) 119 + ``` 120 + 121 + ## Error Handling & Retry Logic 122 + 123 + ### Error Classification 124 + 125 + **Network/transient errors (safe to retry):** 126 + - `ECONNREFUSED`, `ETIMEDOUT`, `ENOTFOUND` (DNS) 127 + - Network unreachable, connection reset 128 + - PDS returning 503 Service Unavailable 129 + - **Action:** Retry with exponential backoff 130 + 131 + **Authentication errors (fail permanently):** 132 + - 401 Unauthorized (wrong credentials) 133 + - 400 Bad Request (invalid handle format) 134 + - AtpAgent throws "Invalid identifier or password" 135 + - **Action:** Log clear error, set status to `failed`, never retry (prevents account lockouts) 136 + 137 + **Server errors (limited retry):** 138 + - 500 Internal Server Error from PDS 139 + - Unexpected errors from AtpAgent 140 + - **Action:** Retry 2-3 times with backoff, then set status to `failed` 141 + 142 + ### Implementation Pattern 143 + 144 + ```typescript 145 + private async attemptAuth(): Promise<boolean> { 146 + try { 147 + await this.agent.login({ identifier: this.handle, password: this.password }); 148 + this.status = 'authenticated'; 149 + this.authenticated = true; 150 + this.retryCount = 0; 151 + this.lastError = null; 152 + this.nextRetryAt = null; 153 + return true; 154 + } catch (error) { 155 + this.lastAuthAttempt = new Date(); 156 + 157 + // Check error type 158 + if (isAuthError(error)) { 159 + // Permanent failure - don't retry 160 + this.status = 'failed'; 161 + this.lastError = "Authentication failed: invalid credentials"; 162 + console.error("Forum DID auth failed permanently", { handle: this.handle }); 163 + return false; 164 + } 165 + 166 + if (isNetworkError(error) && this.retryCount < this.maxRetries) { 167 + // Schedule retry 168 + this.status = 'retrying'; 169 + const delay = Math.min(10000 * Math.pow(2, this.retryCount), 600000); 170 + this.retryCount++; 171 + this.nextRetryAt = new Date(Date.now() + delay); 172 + setTimeout(() => this.attemptAuth(), delay); 173 + return false; 174 + } 175 + 176 + // Unknown error or max retries exceeded 177 + this.status = 'failed'; 178 + this.lastError = "Auth failed after max retries"; 179 + return false; 180 + } 181 + } 182 + ``` 183 + 184 + ### Logging Strategy 185 + 186 + - Auth errors: `console.error()` with clear operator message 187 + - Network errors on retry: `console.warn()` 188 + - Successful auth after retry: `console.info()` 189 + - All structured logs include: `{ service: "ForumAgent", handle, attempt: retryCount }` 190 + 191 + ## Health Endpoint Design 192 + 193 + ### API Contract 194 + 195 + ``` 196 + GET /api/health 197 + Status: 200 OK (always returns 200, even if services degraded) 198 + 199 + Response: 200 + { 201 + "status": "healthy" | "degraded" | "unhealthy", 202 + "timestamp": "2026-02-13T10:30:00.000Z", 203 + "services": { 204 + "database": { 205 + "status": "up" | "down", 206 + "latency_ms": 5 207 + }, 208 + "firehose": { 209 + "status": "up" | "down", 210 + "connected": true, 211 + "last_event_at": "2026-02-13T10:29:55.000Z" 212 + }, 213 + "forumAgent": { 214 + "status": "initializing" | "authenticated" | "retrying" | "failed" | "unavailable", 215 + "authenticated": false, 216 + "last_auth_attempt": "2026-02-13T10:29:00.000Z", 217 + "next_retry_at": "2026-02-13T10:31:00.000Z", // Only if status=retrying 218 + "retry_count": 3, // Only if status=retrying 219 + "error": "Connection to PDS temporarily unavailable" // User-safe message 220 + } 221 + } 222 + } 223 + ``` 224 + 225 + ### Overall Status Logic 226 + 227 + - `healthy`: All services up, forumAgent authenticated 228 + - `degraded`: Database + firehose up, forumAgent not authenticated (read-only mode) 229 + - `unhealthy`: Database or firehose down 230 + 231 + ### Security Considerations 232 + 233 + - ✅ No authentication required (public endpoint for monitoring) 234 + - ✅ No DIDs, handles, or PDS URLs exposed 235 + - ✅ Error messages are user-safe (no stack traces, no credential hints) 236 + - ✅ Only exposes operational state, not configuration details 237 + 238 + ### Usage Examples 239 + 240 + - **Kubernetes liveness probe:** Check `status !== "unhealthy"` 241 + - **Web UI banner:** Show read-only warning if `forumAgent.status !== "authenticated"` 242 + - **Admin dashboard:** Show retry countdown if `forumAgent.status === "retrying"` 243 + 244 + ## AppContext Integration 245 + 246 + ### Interface Changes 247 + 248 + ```typescript 249 + // apps/appview/src/lib/app-context.ts 250 + 251 + export interface AppContext { 252 + config: AppConfig; 253 + db: Database; 254 + firehose: FirehoseService; 255 + oauthClient: NodeOAuthClient; 256 + oauthStateStore: OAuthStateStore; 257 + oauthSessionStore: OAuthSessionStore; 258 + cookieSessionStore: CookieSessionStore; 259 + forumAgent: ForumAgent | null; // ← NEW: null if credentials not configured 260 + } 261 + ``` 262 + 263 + ### Initialization 264 + 265 + ```typescript 266 + export async function createAppContext(config: AppConfig): Promise<AppContext> { 267 + const db = createDb(config.databaseUrl); 268 + const firehose = new FirehoseService(db, config.jetstreamUrl); 269 + 270 + // ... existing OAuth setup ... 271 + 272 + // Initialize ForumAgent (soft failure - never throws) 273 + let forumAgent: ForumAgent | null = null; 274 + if (config.forumHandle && config.forumPassword) { 275 + forumAgent = new ForumAgent( 276 + config.pdsUrl, 277 + config.forumHandle, 278 + config.forumPassword 279 + ); 280 + await forumAgent.initialize(); // Returns gracefully even on failure 281 + } else { 282 + console.warn("Forum DID credentials not configured - write operations disabled"); 283 + } 284 + 285 + return { 286 + config, 287 + db, 288 + firehose, 289 + oauthClient, 290 + oauthStateStore, 291 + oauthSessionStore, 292 + cookieSessionStore, 293 + forumAgent, // ← NEW 294 + }; 295 + } 296 + ``` 297 + 298 + ### Cleanup 299 + 300 + ```typescript 301 + export async function destroyAppContext(ctx: AppContext): Promise<void> { 302 + await ctx.firehose.stop(); 303 + 304 + if (ctx.forumAgent) { 305 + await ctx.forumAgent.shutdown(); // Stop refresh timers, clear resources 306 + } 307 + 308 + ctx.oauthStateStore.destroy(); 309 + ctx.oauthSessionStore.destroy(); 310 + ctx.cookieSessionStore.destroy(); 311 + } 312 + ``` 313 + 314 + ### Usage in Route Handlers 315 + 316 + ```typescript 317 + // Example: Future role assignment endpoint 318 + export function createRoleRoutes(ctx: AppContext) { 319 + return new Hono().post("/assign", async (c) => { 320 + // Check if ForumAgent is available 321 + if (!ctx.forumAgent?.isAuthenticated()) { 322 + return c.json( 323 + { error: "Forum write operations temporarily unavailable" }, 324 + 503 325 + ); 326 + } 327 + 328 + const agent = ctx.forumAgent.getAgent()!; 329 + // Use agent to write role record to Forum PDS... 330 + }); 331 + } 332 + ``` 333 + 334 + ## Testing Strategy 335 + 336 + ### Unit Tests 337 + 338 + ```typescript 339 + // apps/appview/src/lib/__tests__/forum-agent.test.ts 340 + 341 + describe("ForumAgent", () => { 342 + describe("initialization", () => { 343 + it("authenticates successfully on first attempt") 344 + it("transitions to 'authenticated' status after successful login") 345 + it("schedules proactive session refresh after successful auth") 346 + it("handles network errors with retry backoff") 347 + it("handles auth errors without retry (permanent failure)") 348 + it("handles missing credentials (unavailable status)") 349 + }); 350 + 351 + describe("session refresh", () => { 352 + it("proactively refreshes session before expiry") 353 + it("retries if refresh fails with network error") 354 + it("fails permanently if refresh fails with auth error") 355 + }); 356 + 357 + describe("retry mechanism", () => { 358 + it("retries network errors with exponential backoff") 359 + it("stops retrying after max attempts") 360 + it("resets retry count after successful auth") 361 + it("never retries auth errors (401)") 362 + }); 363 + 364 + describe("status reporting", () => { 365 + it("returns correct status for each state") 366 + it("includes nextRetryAt when status is 'retrying'") 367 + it("includes user-safe error messages (no credentials)") 368 + }); 369 + }); 370 + ``` 371 + 372 + ### Mocking Approach 373 + 374 + ```typescript 375 + import { vi } from 'vitest'; 376 + import { AtpAgent } from '@atproto/api'; 377 + 378 + // Mock the AtpAgent module 379 + vi.mock('@atproto/api', () => ({ 380 + AtpAgent: vi.fn(() => ({ 381 + login: vi.fn(), 382 + session: null, 383 + })), 384 + })); 385 + 386 + // In tests, control login behavior 387 + const mockLogin = vi.fn(); 388 + (AtpAgent as any).mockImplementation(() => ({ 389 + login: mockLogin, 390 + session: { did: 'did:plc:test', accessJwt: 'token' }, 391 + })); 392 + 393 + // Simulate success 394 + mockLogin.mockResolvedValueOnce({ success: true }); 395 + 396 + // Simulate auth error (permanent failure) 397 + mockLogin.mockRejectedValueOnce(new Error('Invalid identifier or password')); 398 + 399 + // Simulate network error (retry) 400 + mockLogin.mockRejectedValueOnce(new Error('ECONNREFUSED')); 401 + ``` 402 + 403 + ### Integration Tests 404 + 405 + ```typescript 406 + // apps/appview/src/lib/__tests__/app-context.test.ts 407 + 408 + describe("AppContext with ForumAgent", () => { 409 + it("creates context with authenticated ForumAgent") 410 + it("creates context with null ForumAgent if credentials missing") 411 + it("gracefully handles ForumAgent auth failure during context creation") 412 + it("cleans up ForumAgent resources on destroyAppContext()") 413 + }); 414 + 415 + // apps/appview/src/routes/__tests__/health.test.ts 416 + 417 + describe("GET /api/health", () => { 418 + it("returns 'healthy' when all services up") 419 + it("returns 'degraded' when ForumAgent unavailable") 420 + it("includes retry info when ForumAgent is retrying") 421 + it("does not expose sensitive data (DIDs, credentials)") 422 + }); 423 + ``` 424 + 425 + ## Acceptance Criteria 426 + 427 + - [x] Design validated with stakeholder 428 + - [ ] `ForumAgent` service authenticates as Forum DID on AppView startup 429 + - [ ] Available via `ctx.forumAgent` in AppContext 430 + - [ ] Auto-refreshes expired sessions proactively 431 + - [ ] Graceful degradation if auth fails (server starts, write ops return 503) 432 + - [ ] Smart retry logic: network errors retry with backoff, auth errors fail permanently 433 + - [ ] Health endpoint exposes ForumAgent status with granular states 434 + - [ ] Unit tests with mocked PDS 435 + - [ ] Integration tests verifying agent is wired into AppContext 436 + - [ ] Health endpoint tests verify no sensitive data exposure 437 + 438 + ## Implementation Notes 439 + 440 + - Use exponential backoff: 10s, 30s, 1m, 5m, 10m (max 5 retries) 441 + - Session refresh interval: 30 minutes 442 + - Never expose credentials, DIDs, or detailed errors in health endpoint 443 + - Log all auth attempts with structured context for debugging 444 + - Distinguish network errors (safe to retry) from auth errors (fail permanently)