audio streaming app plyr.fm
38
fork

Configure Feed

Select the types of activity you want to include in your feed.

chore: stop tracking STATUS.md (#314)

* update gitignore

Stop tracking STATUS.md as it is a local living document.

* jrrz

authored by

nate nowack and committed by
GitHub
c9004561 a1b3efc3

+3 -1149
+3 -1
.gitignore
··· 57 57 simple-build.log 58 58 59 59 # gemini 60 - .gemini/ 60 + .gemini/ 61 + 62 + STATUS.md
-1148
STATUS.md
··· 1 - # plyr.fm - status update 2 - 3 - Status as of: 2025-11-18 4 - 5 - ## long-term vision 6 - 7 - ### the problem 8 - 9 - today's music streaming is fundamentally broken: 10 - - spotify and apple music trap your data in proprietary silos 11 - - artists pay distribution fees and streaming cuts to multiple gatekeepers 12 - - listeners can't own their music collections - they rent them 13 - - switching platforms means losing everything: playlists, play history, social connections 14 - 15 - ### the atproto solution 16 - 17 - plyr.fm is built on the AT Protocol (the protocol powering Bluesky) and enables: 18 - - **portable identity**: your music collection, playlists, and listening history belong to you, stored in your personal data server (PDS) 19 - - **decentralized distribution**: artists publish directly to the network without platform gatekeepers 20 - - **interoperable data**: any client can read your music records - you're not locked into plyr.fm 21 - - **authentic social**: artist profiles are real ATProto identities with verifiable handles (@artist.bsky.social) 22 - 23 - ### the dream state 24 - 25 - plyr.fm should become: 26 - 27 - 1. **for artists**: the easiest way to publish music to the decentralized web 28 - - upload once, available everywhere in the ATProto network 29 - - direct connection to listeners without platform intermediaries 30 - - real ownership of audience relationships 31 - 32 - 2. **for listeners**: a streaming platform where you actually own your data 33 - - your collection lives in your PDS, playable by any ATProto music client 34 - - switch between plyr.fm and other clients freely - your data travels with you 35 - - share tracks as native ATProto posts to Bluesky 36 - 37 - 3. **for developers**: a reference implementation showing how to build on ATProto 38 - - open source end-to-end example of ATProto integration 39 - - demonstrates OAuth, record creation, federation patterns 40 - - proves decentralized music streaming is viable 41 - 42 - ## medium-term vision (next 3-6 months) 43 - 44 - ### core feature priorities 45 - 46 - 1. **rich track metadata** (#155) 47 - - genres, tags, descriptions 48 - - enhanced discoverability 49 - - proper music taxonomy 50 - 51 - 2. **audio transcoding pipeline** (#153) 52 - - support AIFF/AIF and other formats 53 - - automatic conversion to web-friendly formats 54 - - consistent playback experience 55 - 56 - 3. **PWA installability** (#165) 57 - - desktop and mobile installation 58 - - offline capability exploration 59 - - native app-like experience 60 - 61 - 4. **fullscreen player view** (#122) 62 - - immersive playback interface 63 - - album art showcase 64 - - enhanced mobile experience 65 - 66 - ### platform maturity 67 - 68 - 1. **content moderation** (#166, #167) 69 - - image content moderation for user uploads 70 - - DMCA safe harbor compliance 71 - - automated detection systems 72 - 73 - 2. **public developer API** (#56) 74 - - versioned REST API 75 - - authentication patterns 76 - - rate limiting and quotas 77 - - documentation and SDKs 78 - 79 - 3. **content-addressable storage** (#146) 80 - - hash-based URLs for deduplication 81 - - bandwidth optimization 82 - - cache-friendly architecture 83 - 84 - ### ecosystem integration 85 - 86 - - deeper ATProto federation 87 - - cross-client compatibility testing 88 - - Bluesky social sharing improvements 89 - - PDS integration patterns 90 - 91 - ## current state 92 - 93 - **production is live and gaining real users** 🎉 94 - - latest release: 2025.1110.042349 95 - - frontend: https://plyr.fm 96 - - backend: https://relay-api.fly.dev 97 - - three-tier deployment working: dev → staging → production 98 - - first external user (@stellz) actively uploading content (75+ minute tracks tested successfully) 99 - 100 - ## short-term priorities (this week) 101 - 102 - ### active issues 103 - 104 - 1. **playback auto-start on refresh** (#225) 105 - - symptom: page refresh sometimes starts playing immediately 106 - - suspected cause: client-side caching of playback state or queue restoration 107 - - `autoplay_next` preference set to false but not always respected 108 - - needs investigation into what state is persisting and why 109 - 110 - 2. **liquid glass visual effects** (#186) 111 - - user-configurable frosted glass effects 112 - - aesthetic enhancement feature 113 - - low priority, high polish 114 - 115 - ## recent work (november 2025) 116 - 117 - ### summary 118 - 119 - **major features**: 120 - - ✅ ATProto namespace separation (#263-264) - environment-specific namespaces, removed hardcoded strings, migrated staging data 121 - - ✅ secure browser authentication (#237, #239, #244) - HttpOnly cookies, XSS protection, environment-aware configuration 122 - - ✅ albums feature (PRs #214-222) - database schema, CRUD, browsing, detail pages, cover art 123 - - ✅ frontend data loading overhaul (PRs #210, #227) - server-side rendering, centralized auth 124 - - ✅ link preview system (PRs #230-231) - rich OG metadata for albums, homepage, tracks 125 - - ✅ liked tracks feature (#157) - persistent collections with ATProto records 126 - - ✅ transcoder API service (#156) - standalone Rust service for AIFF/FLAC→MP3 conversion 127 - - ✅ track detail pages (#164) - dedicated pages with large cover art 128 - 129 - **data integrity fixes** (PR #191, Nov 13, 2025): 130 - - ✅ duplicate upload detection - prevents re-uploading same file from creating multiple tracks pointing to shared R2 object 131 - - ✅ refcount-based R2 deletion - only deletes R2 files when refcount = 1, prevents breaking other tracks 132 - - ✅ ATProto cleanup on delete - removes PDS records when tracks deleted, prevents orphaned records 133 - - ✅ exact key deletion - uses stored `file_type` to delete precise R2 key instead of guessing extensions 134 - 135 - **performance improvements**: 136 - - ✅ eliminated N+1 R2 API calls (#184) - store image URLs in database 137 - - ✅ async I/O throughout backend (#149-151) - async R2 operations, concurrent PDS resolution, async storage writes 138 - - ✅ queue hydration optimization - per-session token locks prevent race conditions 139 - 140 - **reliability & UX**: 141 - - ✅ streaming uploads with progress (#182, #282) - prevents OOM, better mobile experience 142 - - ✅ graceful ATProto recovery (#180) - tracks with missing records can self-restore 143 - - ✅ mobile UI polish (#159-185) - consistent layouts, better touch targets, improved navigation 144 - - ✅ wave loading animation (#283) - music-themed loading states, clickable refresh on homepage 145 - - ✅ share action parity - added artist detail share button with mobile-friendly placement for consistent copying behavior across track/album/artist pages 146 - - ✅ modularized player - split artwork/metadata and transport controls into focused components for easier maintenance and future UX tweaks 147 - 148 - **security & validation**: 149 - - ✅ origin validation for image URLs (#168) 150 - - ✅ AIFF/AIF format rejection (#152) - prevents browser compatibility issues 151 - 152 - ### detailed history 153 - 154 - ### wave loading animation (PR #283, Nov 18, 2025) 155 - 156 - **motivation**: loading states across the app used inconsistent text ("loading...", "loading tracks...") or generic spinners. wanted a distinctive, music-themed animation that reflects the platform's aesthetic and provides visual consistency. 157 - 158 - **what shipped**: 159 - - **WaveLoading component**: new loading animation with 5 vertical bars that pulse in sequence like an audio equalizer 160 - - configurable sizes: sm (16px), md (24px), lg (32px) 161 - - uses accent color for brand consistency 162 - - optional message text below animation 163 - - smooth ease-in-out animations for polish 164 - - **replaced all loading states**: 165 - - homepage: added `initialLoad` state to prevent "no tracks yet" flash before data loads 166 - - portal: main loading + tracks/albums sections 167 - - broken tracks: replaced LoadingSpinner with WaveLoading 168 - - **clickable refresh**: "latest tracks" heading on homepage now interactive 169 - - click to force fresh data fetch (bypasses cache) 170 - - hover effect shows it's clickable 171 - - simple way for users to check for new content 172 - - **cleanup**: deleted unused LoadingSpinner and LoadingOverlay components 173 - 174 - **design rationale**: 175 - - wave pattern chosen as music-themed visual metaphor (like audio visualizer) 176 - - vertical bars instead of horizontal to save space and work well in narrow layouts 177 - - staggered animation (0.1s delay per bar) creates smooth wave effect 178 - - accent color ties to brand identity 179 - - kept intentionally simple - can enhance later with more bars, color variations, or audio responsiveness 180 - 181 - **impact**: 182 - - ✅ consistent loading experience across entire app 183 - - ✅ on-brand visual identity (music-themed) 184 - - ✅ eliminated "no tracks yet" flash on homepage 185 - - ✅ users can manually refresh latest tracks 186 - - ✅ net code reduction (-10 lines: +112 new, -122 removed) 187 - 188 - **future enhancements** (optional): 189 - - could add more bars for fuller effect 190 - - could vary colors or respond to playing audio 191 - - could add particle effects or other visual flourishes 192 - - established pattern for future loading states 193 - 194 - ### ATProto namespace separation (PRs #263-264, Nov 17, 2025) 195 - 196 - **motivation**: dev and staging environments were writing ATProto records to production `fm.plyr.*` namespace, polluting production collections with test data. namespace configuration was also hardcoded in multiple places instead of using environment-aware config. 197 - 198 - **what shipped**: 199 - - **removed hardcoded namespaces** (PR #263): 200 - - replaced hardcoded `"fm.plyr.like"` strings in `src/backend/_internal/atproto/records.py` 201 - - added `like_collection` computed field to config (mirrors existing `track_collection`) 202 - - fixed OAuth scope generation to use computed fields instead of hardcoded strings 203 - - updated `scripts/backfill_atproto_records.py` to use settings (was using hardcoded namespace) 204 - - **environment-specific namespaces**: 205 - - development: `fm.plyr.dev` (local .env) 206 - - staging: `fm.plyr.stg` (flyctl secrets) 207 - - production: `fm.plyr` (flyctl secrets) 208 - - **data migration**: 209 - - migrated 7 tracks + 5 likes from `fm.plyr.*` to `fm.plyr.dev.*` in development 210 - - migrated 7 tracks + 5 likes from `fm.plyr.*` to `fm.plyr.stg.*` in staging 211 - - used combination of automated script + manual cleanup with neon MCP and pdsx 212 - - cleaned up old staging records from production namespace 213 - - **documentation** (PR #264): 214 - - updated `docs/deployment/environments.md` with namespace configuration 215 - - updated `docs/backend/configuration.md` with environment-specific examples 216 - - removed typer from project dependencies (moved to PEP 723 inline script deps) 217 - - created `sandbox/stg-namespace-migration/README.md` documenting migration process 218 - 219 - **impact**: 220 - - ✅ test tracks/likes no longer pollute production collections 221 - - ✅ OAuth scopes environment-specific and automatically generated from config 222 - - ✅ database and ATProto records stay aligned within each environment 223 - - ✅ proper data separation for dev/staging/production environments 224 - - ✅ eliminated hardcoded namespace strings throughout codebase 225 - 226 - **lessons learned**: 227 - - PEP 723 inline script dependencies work well for ad-hoc migration scripts 228 - - database as source of truth more reliable than PDS for stale record lookups 229 - - manual cleanup sometimes faster than debugging complex migration logic 230 - 231 - **follow-up cleanup** (Nov 18, 2025): 232 - - discovered 82 orphaned test/dev records remaining in production `fm.plyr.track` namespace 233 - - created analysis script (`scripts/identify_orphaned_records.py`) to cross-reference PDS records against production database 234 - - verified all 13 production tracks safe (including critical tracks: webhook with features, dinah, lil blues improv) 235 - - automated deletion via generated script with proper PDS authentication 236 - - result: 95 → 13 records in production namespace, all production data intact 237 - - filed upstream issue ([pdsx#43](https://github.com/zzstoatzz/pdsx/issues/43)) for batch/concurrent CRUD operations 238 - 239 - ### mobile UI polish (PRs #259-261, #265, #268, Nov 17, 2025) 240 - 241 - **serialization improvements** (PRs #259-260): 242 - - created `TrackResponse` Pydantic model for consistent track serialization 243 - - fixed album endpoint to properly serialize tracks (was mixing dict/model types) 244 - - eliminated manual dict construction in favor of model-based serialization 245 - - better type safety and consistency across endpoints 246 - 247 - **notifications fix** (PR #261): 248 - - notification bot was using hardcoded `https://plyr.fm` URL 249 - - now uses environment-aware `settings.frontend.url` (staging uses `https://stg.plyr.fm`) 250 - - ensures notifications link to correct environment 251 - 252 - **sticky player padding** (PRs #265, #268): 253 - - fixed album tracks overlapping with sticky bottom player on mobile (#265) 254 - - attempted centralized padding approach (#266) but created excessive whitespace on mobile 255 - - reverted to per-page padding handling (#268) while keeping album track clearance fix 256 - - mobile padding now matches pre-centralization behavior 257 - 258 - **impact**: 259 - - ✅ consistent track serialization across all endpoints 260 - - ✅ notifications link to correct environment 261 - - ✅ album tracks properly clear sticky player on mobile 262 - - ✅ mobile padding back to appropriate levels (no excessive whitespace) 263 - 264 - ### secure browser authentication (issue #237, PRs #239-244, Nov 14-15, 2025) 265 - 266 - **motivation**: session tokens stored in localStorage were vulnerable to XSS attacks. any malicious script could read `session_id` from localStorage and hijack accounts for the full 14-day session lifetime. 267 - 268 - **what shipped**: 269 - - **HttpOnly cookies** (PR #244): backend sets `Set-Cookie: session_id=...; HttpOnly; Secure; SameSite=Lax` 270 - - HttpOnly prevents JavaScript access (XSS protection) 271 - - Secure requires HTTPS (except localhost for dev) 272 - - SameSite=Lax prevents CSRF while allowing same-site requests 273 - - cookies automatically sent with requests (no manual auth header management) 274 - - **cookie-aware auth dependencies** (PR #243): 275 - - `require_auth` checks cookies first, falls back to Authorization header 276 - - `require_artist_profile` updated with same pattern 277 - - optional auth endpoints (tracks list, track detail, album detail) now support cookies 278 - - proper parameter aliasing (`Cookie(alias="session_id")`) for FastAPI 279 - - **environment-aware cookie configuration**: 280 - - localhost: `secure=False` for HTTP development 281 - - staging/production: `secure=True` for HTTPS 282 - - no explicit domain set (prevents cross-environment session leakage) 283 - - **same-site detection**: 284 - - compares origin host vs request host 285 - - uses `SameSite=lax` when same-site (localhost→localhost, stg.plyr.fm→api-stg.plyr.fm) 286 - - prevents cookies from being sent cross-site 287 - - **frontend cleanup** (PR #239): 288 - - removed all localStorage session_id read/write operations 289 - - removed `getSessionId()`, `setSessionId()`, `getAuthHeaders()` helpers 290 - - all fetch calls use `credentials: 'include'` to send cookies 291 - - `XMLHttpRequest` uses `withCredentials: true` 292 - - auth state now managed entirely by backend via HttpOnly cookies 293 - 294 - **environment architecture**: 295 - - all environments use custom domains on same eTLD+1 for cookie sharing: 296 - - **staging**: `stg.plyr.fm` → `api-stg.plyr.fm` (both `.plyr.fm`) 297 - - **production**: `plyr.fm` → `api.plyr.fm` (both `.plyr.fm`) 298 - - **local**: `localhost:5173` → `localhost:8001` (both `localhost`) 299 - - separate cloudflare pages projects prevent staging/production cookie conflicts: 300 - - `plyr-fm-stg` for staging (tracks `main` branch) 301 - - `plyr-fm` for production (tracks `production-fe` branch) 302 - 303 - **security improvements**: 304 - - ✅ eliminated XSS session hijacking vector 305 - - ✅ tokens no longer accessible to JavaScript 306 - - ✅ CSRF protection via SameSite=Lax 307 - - ✅ secure transport enforcement (HTTPS in production) 308 - - ✅ environment isolation (no cookie sharing between staging/prod) 309 - 310 - **compatibility maintained**: 311 - - browser clients: use HttpOnly cookies automatically 312 - - future SDK/CLI clients: can still use `Authorization: Bearer <token>` header 313 - - backend accepts both cookie and header auth (cookie preferred) 314 - 315 - **documentation created**: 316 - - `docs/backend/atproto-identity.md`: ATProto OAuth client metadata discovery patterns 317 - - `docs/deployment/environments.md`: updated with staging/production cookie architecture 318 - - PR #243 description: comprehensive explanation of cookie domain behavior 319 - 320 - **impact**: 321 - - closed high-priority security issue #237 322 - - production-grade auth implementation 323 - - foundation for future session management features (device tracking, forced logout) 324 - - eliminated most common web application security vulnerability 325 - 326 - ### albums feature (PRs #214-222, Nov 13-14, 2025) 327 - 328 - **motivation**: users wanted to group tracks into albums with dedicated pages, cover art, and metadata. 329 - 330 - **what shipped**: 331 - - **database schema** (PR #222): new `albums` table with title, slug, description, image_id, artist_did 332 - - album-track relationship via `album_rel` on tracks table 333 - - migration to backfill albums from existing track `extra->>'album'` metadata 334 - - 8 albums created from existing 32 tracks in production 335 - - **backend CRUD** (PR #222): full album management endpoints 336 - - `GET /albums/{handle}` - list artist's albums 337 - - `GET /albums/{handle}/{slug}` - album detail with tracks 338 - - `POST /albums` - create album (authenticated) 339 - - `PATCH /albums/{id}` - update album metadata 340 - - album cover art upload and storage in R2 341 - - **frontend pages** (PRs #214, #216-220): 342 - - album detail pages (`/u/{handle}/album/{slug}`) with track lists 343 - - artist discography sections on artist pages 344 - - album cover art display throughout UI 345 - - server-side rendering for SEO and link previews 346 - - **UI polish** (PR #228): long album title handling 347 - - 100-character slug limit with word-boundary truncation 348 - - CSS text truncation for inline album links 349 - - proper wrapping for album detail page titles 350 - - tested with 91-character production album title 351 - - **link previews** (PRs #230-231): 352 - - rich Open Graph metadata for albums (music.album type) 353 - - artist musician property, image dimensions, canonical URLs 354 - - fixed layout metadata conflicts (prevented generic tags from overriding page-specific ones) 355 - 356 - **what's NOT done** (issue #221 still open): 357 - - ATProto records for albums (consciously deferred) 358 - - reason: want to thoughtfully design the lexicon before committing to a schema 359 - - tracks work fine without album ATProto records for now 360 - 361 - **impact**: 362 - - albums now first-class citizens in UI and database 363 - - better content organization for artists with multiple releases 364 - - improved SEO with album-specific link previews 365 - - foundation for future features (album likes, album playlists) 366 - 367 - ### frontend architecture improvements (PRs #210, #227, Nov 13-14, 2025) 368 - 369 - **motivation**: eliminate "flash of loading", improve SEO, reduce code duplication, fix performance bottlenecks. 370 - 371 - **PR #210 - centralized auth and client-side load functions**: 372 - - created `lib/auth.svelte.ts` - centralized auth manager with SSR-safe guards 373 - - added `+layout.ts` - loads auth state once for entire app 374 - - added `+page.ts` to liked tracks page - loads data before component mounts 375 - - refactored all pages to use centralized auth (eliminated scattered localStorage calls) 376 - - code reduction: +256 lines, -308 lines (net -52 lines) 377 - 378 - **PR #227 - artist pages moved to server-side rendering**: 379 - - replaced client-side `onMount` fetches with `+page.server.ts` 380 - - parallel server loading of artist info, tracks, and albums 381 - - data ready before page renders (eliminates loading states) 382 - - performance: ~1.66s sequential waterfall → instant render 383 - 384 - **pattern shift**: 385 - ``` 386 - old: page loads → onMount → fetch artist → fetch tracks → fetch albums → render 387 - new: server fetches all in parallel → page renders immediately with data 388 - ``` 389 - 390 - **impact**: 391 - - eliminated "flash of loading" across artist and album pages 392 - - improved lighthouse scores and SEO (real data in initial HTML) 393 - - consistent auth patterns throughout app 394 - - better UX - pages feel instant instead of progressive 395 - 396 - **documentation**: see `docs/frontend/data-loading.md` for patterns and anti-patterns 397 - 398 - ### link preview system (PRs #230-231, Nov 14, 2025) 399 - 400 - **problem**: album pages and homepage had no Open Graph metadata, leading to poor link previews on social media. 401 - 402 - **PR #230 - add rich metadata**: 403 - - homepage: complete OG tags (type, title, description, url, site_name) 404 - - album pages: rich music.album metadata matching track page quality 405 - - added canonical URL, site name, musician property 406 - - added image dimensions (1200x1200), alt text, secure_url 407 - - improved meta description 408 - 409 - **PR #231 - fix metadata conflicts**: 410 - - root layout was rendering duplicate OG tags on all pages 411 - - social scrapers use first tags encountered (generic layout ones) 412 - - page-specific metadata was being ignored 413 - - solution: exclude pages with their own metadata from layout defaults 414 - - homepage (`/`) 415 - - track pages (`/track/*`) 416 - - album pages (`/u/*/album/*`) 417 - 418 - **result**: album links now show rich previews with cover art, artist info, track counts when shared on social platforms. 419 - 420 - ### Banana mix incident fixes (PR #191, Nov 13, 2025) 421 - 422 - **Why:** stellz uploaded "banana mix" twice due to slow UI feedback, creating duplicate tracks (56 and 57) 423 - pointing to the same R2 file. When track 57 was deleted, it removed the shared R2 file, breaking track 56 424 - with 404 errors. ATProto record for track 57 was orphaned on her PDS. Investigation also revealed storage 425 - layer was guessing file extensions by trying all formats until finding a match. 426 - 427 - **What shipped:** 428 - - **duplicate detection** (tracks.py:181-203): after saving file, checks if track with same `file_id` 429 - and `artist_did` exists. rejects upload with error instead of creating duplicate. 430 - - **refcount-based deletion** (r2.py:175-197): before deleting R2 file, queries database for refcount. 431 - only deletes if `refcount == 1`. logs when deletion skipped due to `refcount > 1`. 432 - - **exact key deletion** (r2.py:163-233, filesystem.py:85-123): updated `delete()` signature to accept 433 - optional `file_type` parameter. when provided, deletes exact key `audio/{file_id}.{file_type}` instead 434 - of looping through all formats. fallback to loop only when `file_type` is None (legacy rows, images). 435 - - upload cleanup passes `audio_format.value` 436 - - track deletion passes `track.file_type` 437 - - image deletion still uses fallback (no `image_format` field yet - tech debt) 438 - - **ATProto cleanup** (tracks.py:683-712): deletes PDS record when track deleted. handles 404 gracefully 439 - (record already gone), bubbles other errors. 440 - 441 - **Impact:** prevents "delete duplicate and nuke original" scenario. logs show exact keys being deleted 442 - instead of trying wrong extensions first. manual e2e test confirmed: uploaded .wav file, verified exact 443 - key deletion via R2 API, confirmed clean deletion with no orphans in DB/PDS/R2. 444 - 445 - **Tech debt identified:** 446 - - storage layer has accumulated naive patterns that work but aren't elegant: 447 - - image deletion still loops through formats (no `image_format` column on tracks) 448 - - could store image format alongside `image_id` to enable exact deletion 449 - - or maintain separate image metadata table 450 - - functional for now, but should clean up later 451 - 452 - ### detailed history 453 - 454 - ### Queue hydration + ATProto token hardening (Nov 12, 2025) 455 - 456 - **Why:** queue endpoints were occasionally taking 2s+ and restore operations could 401 457 - when multiple requests refreshed an expired ATProto token simultaneously. 458 - 459 - **What shipped:** 460 - - Added persistent `image_url` on `Track` rows so queue hydration no longer probes R2 461 - for every track. Queue payloads now pull art directly from Postgres, with a one-time 462 - fallback for legacy rows. 463 - - Updated `_internal/queue.py` to backfill any missing URLs once (with caching) instead 464 - of per-request GETs. 465 - - Introduced per-session locks in `_refresh_session_tokens` so only one coroutine hits 466 - `oauth_client.refresh_session` at a time; others reuse the refreshed tokens. This 467 - removes the race that caused the batch restore flow to intermittently 500/401. 468 - 469 - **Impact:** queue tail latency dropped back under 500 ms in staging tests, ATProto 470 - restore flows are now reliable under concurrent use, and Logfire no longer shows 500s 471 - from the PDS. 472 - 473 - ### Liked tracks feature (PR #157, Nov 11, 2025) 474 - 475 - - ✅ server-side persistent collections 476 - - ✅ ATProto record publication for cross-platform visibility 477 - - ✅ UI for adding/removing tracks from liked collection 478 - - ✅ like counts displayed in track responses and analytics (#170) 479 - - ✅ analytics cards now clickable links to track detail pages (#171) 480 - - ✅ liked state shown on artist page tracks (#163) 481 - 482 - ### Upload streaming + progress UX (PR #182, Nov 11, 2025) 483 - 484 - - Frontend switched from `fetch` to `XMLHttpRequest` so we can display upload progress 485 - toasts (critical for >50 MB mixes on mobile). 486 - - Upload form now clears only after the request succeeds; failed attempts leave the 487 - form intact so users don't lose metadata. 488 - - Backend writes uploads/images to temp files in 8 MB chunks before handing them to the 489 - storage layer, eliminating whole-file buffering and iOS crashes for hour-long mixes. 490 - - Deployment verified locally and by rerunning the exact repro Stella hit (85 minute 491 - mix from mobile). 492 - 493 - ### transcoder API deployment (PR #156, Nov 11, 2025) 494 - 495 - **standalone Rust transcoding service** 🎉 496 - - **deployed**: https://plyr-transcoder.fly.dev/ 497 - - **purpose**: convert AIFF/FLAC/etc. to MP3 for browser compatibility 498 - - **technology**: Axum + ffmpeg + Docker 499 - - **security**: `X-Transcoder-Key` header authentication (shared secret) 500 - - **capacity**: handles 1GB uploads, tested with 85-minute AIFF files (~858MB → 195MB MP3 in 32 seconds) 501 - - **architecture**: 502 - - 2 Fly machines for high availability 503 - - auto-stop/start for cost efficiency 504 - - stateless design (no R2 integration yet) 505 - - 320kbps MP3 output with proper ID3 tags 506 - - **status**: deployed and tested, ready for integration into plyr.fm upload pipeline 507 - - **next steps**: wire into backend with R2 integration and job queue (see issue #153) 508 - 509 - ### AIFF/AIF browser compatibility fix (PR #152, Nov 11, 2025) 510 - 511 - **format validation improvements** 512 - - **problem discovered**: AIFF/AIF files only work in Safari, not Chrome/Firefox 513 - - browsers throw `MediaError code 4: MEDIA_ERR_SRC_NOT_SUPPORTED` 514 - - users could upload files but they wouldn't play in most browsers 515 - - **immediate solution**: reject AIFF/AIF uploads at both backend and frontend 516 - - removed AIFF/AIF from AudioFormat enum 517 - - added format hints to upload UI: "supported: mp3, wav, m4a" 518 - - client-side validation with helpful error messages 519 - - **long-term solution**: deployed standalone transcoder service (see above) 520 - - separate Rust/Axum service with ffmpeg 521 - - accepts all formats, converts to browser-compatible MP3 522 - - integration into upload pipeline pending (issue #153) 523 - 524 - **observability improvements**: 525 - - added logfire instrumentation to upload background tasks 526 - - added logfire spans to R2 storage operations 527 - - documented logfire querying patterns in `docs/logfire-querying.md` 528 - 529 - ### async I/O performance fixes (PRs #149-151, Nov 10-11, 2025) 530 - 531 - Eliminated event loop blocking across backend with three critical PRs: 532 - 533 - 1. **PR #149: async R2 reads** - converted R2 `head_object` operations from sync boto3 to async aioboto3 534 - - portal page load time: 2+ seconds → ~200ms 535 - - root cause: `track.image_url` was blocking on serial R2 HEAD requests 536 - 537 - 2. **PR #150: concurrent PDS resolution** - parallelized ATProto PDS URL lookups 538 - - homepage load time: 2-6 seconds → 200-400ms 539 - - root cause: serial `resolve_atproto_data()` calls (8 artists × 200-300ms each) 540 - - fix: `asyncio.gather()` for batch resolution, database caching for subsequent loads 541 - 542 - 3. **PR #151: async storage writes/deletes** - made save/delete operations non-blocking 543 - - R2: switched to `aioboto3` for uploads/deletes (async S3 operations) 544 - - filesystem: used `anyio.Path` and `anyio.open_file()` for chunked async I/O (64KB chunks) 545 - - impact: multi-MB uploads no longer monopolize worker thread, constant memory usage 546 - 547 - ### cover art support (PRs #123-126, #132-139) 548 - - ✅ track cover image upload and storage (separate R2 bucket) 549 - - ✅ image display on track pages and player 550 - - ✅ Open Graph meta tags for track sharing 551 - - ✅ mobile-optimized layouts with cover art 552 - - ✅ sticky bottom player on mobile with cover 553 - 554 - ### track detail pages (PR #164, Nov 12, 2025) 555 - 556 - - ✅ dedicated track detail pages with large cover art 557 - - ✅ play button updates queue state correctly (#169) 558 - - ✅ liked state loaded efficiently via server-side fetch 559 - - ✅ mobile-optimized layouts with proper scrolling constraints 560 - - ✅ origin validation for image URLs (#168) 561 - 562 - ### mobile UI improvements (PRs #159-185, Nov 11-12, 2025) 563 - 564 - - ✅ compact action menus and better navigation (#161) 565 - - ✅ improved mobile responsiveness (#159) 566 - - ✅ consistent button layouts across mobile/desktop (#176-181, #185) 567 - - ✅ always show play count and like count on mobile (#177) 568 - - ✅ login page UX improvements (#174-175) 569 - - ✅ liked page UX improvements (#173) 570 - - ✅ accent color for liked tracks (#160) 571 - 572 - ### queue management improvements (PRs #110-113, #115) 573 - - ✅ visual feedback on queue add/remove 574 - - ✅ toast notifications for queue actions 575 - - ✅ better error handling for queue operations 576 - - ✅ improved shuffle and auto-advance UX 577 - 578 - ### infrastructure and tooling 579 - - ✅ R2 bucket separation: audio-prod and images-prod (PR #124) 580 - - ✅ admin script for content moderation (`scripts/delete_track.py`) 581 - - ✅ bluesky attribution link in header 582 - - ✅ changelog target added (#183) 583 - - ✅ documentation updates (#158) 584 - - ✅ track metadata edits now persist correctly (#162) 585 - 586 - ## immediate priorities 587 - 588 - ### high priority features 589 - 1. **audio transcoding pipeline integration** (issue #153) 590 - - ✅ standalone transcoder service deployed at https://plyr-transcoder.fly.dev/ 591 - - ✅ Rust/Axum service with ffmpeg, tested with 85-minute files 592 - - ✅ secure auth via X-Transcoder-Key header 593 - - ⏳ next: integrate into plyr.fm upload pipeline 594 - - backend calls transcoder API for unsupported formats 595 - - queue-based job system for async processing 596 - - R2 integration (fetch original, store MP3) 597 - - maintain original file hash for deduplication 598 - - handle transcoding failures gracefully 599 - 600 - ### critical bugs 601 - 1. **upload reliability** (issue #147): upload returns 200 but file missing from R2, no error logged 602 - - priority: high (data loss risk) 603 - - need better error handling and retry logic in background upload task 604 - 605 - 2. **database connection pool SSL errors**: intermittent failures on first request 606 - - symptom: `/tracks/` returns 500 on first request, succeeds after 607 - - fix: set `pool_pre_ping=True`, adjust `pool_recycle` for Neon timeouts 608 - - documented in `docs/logfire-querying.md` 609 - 610 - ### performance optimizations 611 - 3. **persist concrete file extensions in database**: currently brute-force probing all supported formats on read 612 - - already know `Track.file_type` and image format during upload 613 - - eliminating repeated `exists()` checks reduces filesystem/R2 HEAD spam 614 - - improves audio streaming latency (`/audio/{file_id}` endpoint walks extensions sequentially) 615 - 616 - 4. **stream large uploads directly to storage**: current implementation reads entire file into memory before background task 617 - - multi-GB uploads risk OOM 618 - - stream from `UploadFile.file` → storage backend for constant memory usage 619 - 620 - ### new features 621 - 5. **content-addressable storage** (issue #146) 622 - - hash-based file storage for automatic deduplication 623 - - reduces storage costs when multiple artists upload same file 624 - - enables content verification 625 - 626 - 6. **liked tracks feature** (issue #144): design schema and ATProto record format 627 - - server-side persistent collections 628 - - ATProto record publication for cross-platform visibility 629 - - UI for adding/removing tracks from liked collection 630 - 631 - ## open issues by timeline 632 - 633 - ### immediate 634 - - issue #153: audio transcoding pipeline (ffmpeg worker for AIFF/FLAC→MP3) 635 - - issue #147: upload reliability bug (data loss risk) 636 - - issue #144: likes feature for personal collections 637 - 638 - ### short-term 639 - - issue #146: content-addressable storage (hash-based deduplication) 640 - - issue #24: implement play count abuse prevention 641 - - database connection pool tuning (SSL errors) 642 - - file extension persistence in database 643 - 644 - ### medium-term 645 - - issue #39: postmortem - cross-domain auth deployment and remaining security TODOs 646 - - issue #46: consider removing init_db() from lifespan in favor of migration-only approach 647 - - issue #56: design public developer API and versioning 648 - - issue #57: support multiple audio item types (voice memos/snippets) 649 - - issue #122: fullscreen player for immersive playback 650 - 651 - ### long-term 652 - - migrate to plyr-owned lexicon (custom ATProto namespace with richer metadata) 653 - - publish to multiple ATProto AppViews for cross-platform visibility 654 - - explore ATProto-native notifications (replace Bluesky DM bot) 655 - - realtime queue syncing across devices via SSE/WebSocket 656 - - artist analytics dashboard improvements 657 - - issue #44: modern music streaming feature parity 658 - 659 - ## technical state 660 - 661 - ### architecture 662 - 663 - **backend** 664 - - language: Python 3.11+ 665 - - framework: FastAPI with uvicorn 666 - - database: Neon PostgreSQL (serverless, fully managed) 667 - - storage: Cloudflare R2 (S3-compatible object storage) 668 - - hosting: Fly.io (2x shared-cpu VMs, auto-scaling) 669 - - observability: Pydantic Logfire (traces, metrics, logs) 670 - - auth: ATProto OAuth 2.1 (forked SDK: github.com/zzstoatzz/atproto) 671 - 672 - **frontend** 673 - - framework: SvelteKit (latest v2.43.2) 674 - - runtime: Bun (fast JS runtime) 675 - - hosting: Cloudflare Pages (edge network) 676 - - styling: vanilla CSS with lowercase aesthetic 677 - - state management: Svelte 5 runes ($state, $derived, $effect) 678 - 679 - **deployment** 680 - - ci/cd: GitHub Actions 681 - - backend: automatic on main branch merge (fly.io deploy) 682 - - frontend: automatic on every push to main (cloudflare pages) 683 - - migrations: automated via fly.io release_command 684 - - environments: dev → staging → production (full separation) 685 - - versioning: nebula timestamp format (YYYY.MMDD.HHMMSS) 686 - 687 - **key dependencies** 688 - - atproto: forked SDK for OAuth and record management 689 - - sqlalchemy: async ORM for postgres 690 - - alembic: database migrations 691 - - boto3/aioboto3: R2 storage client 692 - - logfire: observability (FastAPI + SQLAlchemy instrumentation) 693 - - httpx: async HTTP client 694 - 695 - ### what's working 696 - 697 - **core functionality** 698 - - ✅ ATProto OAuth 2.1 authentication with encrypted state 699 - - ✅ secure session management via HttpOnly cookies (XSS protection) 700 - - ✅ artist profiles synced with Bluesky (avatar, display name, handle) 701 - - ✅ track upload with streaming to prevent OOM 702 - - ✅ track edit (title, artist, album, features metadata) 703 - - ✅ track deletion with cascade cleanup 704 - - ✅ audio streaming via HTML5 player with 307 redirects to R2 CDN 705 - - ✅ track metadata published as ATProto records (fm.plyr.track namespace) 706 - - ✅ play count tracking with threshold (30% or 30s, whichever comes first) 707 - - ✅ like functionality with counts 708 - - ✅ artist analytics dashboard 709 - - ✅ queue management (shuffle, auto-advance, reorder) 710 - - ✅ mobile-optimized responsive UI 711 - - ✅ cross-tab queue synchronization via BroadcastChannel 712 - - ✅ share tracks via URL with Open Graph previews (including cover art) 713 - - ✅ image URL caching in database (eliminates N+1 R2 calls) 714 - - ✅ format validation (rejects AIFF/AIF, accepts MP3/WAV/M4A with helpful error messages) 715 - - ✅ admin content moderation script for removing inappropriate uploads 716 - 717 - **albums** 718 - - ✅ album database schema with track relationships 719 - - ✅ album browsing pages (`/u/{handle}` shows discography) 720 - - ✅ album detail pages (`/u/{handle}/album/{slug}`) with full track lists 721 - - ✅ album cover art upload and display 722 - - ✅ server-side rendering for SEO 723 - - ✅ rich Open Graph metadata for link previews (music.album type) 724 - - ✅ long album title handling (100-char slugs, CSS truncation) 725 - - ⏸ ATProto records for albums (deferred, see issue #221) 726 - 727 - **frontend architecture** 728 - - ✅ server-side data loading (`+page.server.ts`) for artist and album pages 729 - - ✅ client-side data loading (`+page.ts`) for auth-dependent pages 730 - - ✅ centralized auth manager (`lib/auth.svelte.ts`) 731 - - ✅ layout-level auth state (`+layout.ts`) shared across all pages 732 - - ✅ eliminated "flash of loading" via proper load functions 733 - - ✅ consistent auth patterns (no scattered localStorage calls) 734 - 735 - **deployment (fully automated)** 736 - - **production**: 737 - - frontend: https://plyr.fm (cloudflare pages) 738 - - backend: https://relay-api.fly.dev (fly.io: 2 machines, 1GB RAM, 1 shared CPU, min 1 running) 739 - - database: neon postgresql 740 - - storage: cloudflare R2 (audio-prod and images-prod buckets) 741 - - deploy: github release → automatic 742 - 743 - - **staging**: 744 - - backend: https://api-stg.plyr.fm (fly.io: relay-api-staging) 745 - - frontend: https://stg.plyr.fm (cloudflare pages: plyr-fm-stg) 746 - - database: neon postgresql (relay-staging) 747 - - storage: cloudflare R2 (audio-stg bucket) 748 - - deploy: push to main → automatic 749 - 750 - - **development**: 751 - - backend: localhost:8000 752 - - frontend: localhost:5173 753 - - database: neon postgresql (relay-dev) 754 - - storage: cloudflare R2 (audio-dev and images-dev buckets) 755 - 756 - - **developer tooling**: 757 - - `just serve` - run backend locally 758 - - `just dev` - run frontend locally 759 - - `just test` - run test suite 760 - - `just release` - create production release (backend + frontend) 761 - - `just release-frontend-only` - deploy only frontend changes (added Nov 13) 762 - 763 - ### what's in progress 764 - 765 - **immediate work** 766 - - investigating playback auto-start behavior (#225) 767 - - page refresh sometimes starts playing immediately 768 - - may be related to queue state restoration or localStorage caching 769 - - `autoplay_next` preference not being respected in all cases 770 - - liquid glass effects as user-configurable setting (#186) 771 - 772 - **active research** 773 - - transcoding pipeline architecture (see sandbox/transcoding-pipeline-plan.md) 774 - - content moderation systems (#166, #167) 775 - - PWA capabilities and offline support (#165) 776 - 777 - ### known issues 778 - 779 - **player behavior** 780 - - playback auto-start on refresh (#225) 781 - - sometimes plays immediately after page load 782 - - investigating localStorage/queue state persistence 783 - - may not respect `autoplay_next` preference in all scenarios 784 - 785 - **missing features** 786 - - no ATProto records for albums yet (#221 - consciously deferred) 787 - - no track genres/tags/descriptions yet (#155) 788 - - no AIFF/AIF transcoding support (#153) 789 - - no PWA installation prompts (#165) 790 - - no fullscreen player view (#122) 791 - - no public API for third-party integrations (#56) 792 - 793 - **technical debt** 794 - - multi-tab playback synchronization could be more robust 795 - - queue state conflicts can occur with rapid operations 796 - - no automated content moderation yet 797 - - no DMCA compliance workflow 798 - 799 - ### technical decisions 800 - 801 - **why Python/FastAPI instead of Rust?** 802 - - rapid prototyping velocity during MVP phase 803 - - rich ecosystem for web APIs (fastapi, sqlalchemy, pydantic) 804 - - excellent async support with asyncio 805 - - lower barrier to contribution 806 - - trade-off: accepting higher latency for faster development 807 - - future: can migrate hot paths to Rust if needed (transcoding service already planned) 808 - 809 - **why Fly.io instead of AWS/GCP?** 810 - - simple deployment model (dockerfile → production) 811 - - automatic SSL/TLS certificates 812 - - built-in global load balancing 813 - - reasonable pricing for MVP ($5/month) 814 - - easy migration path to larger providers later 815 - - trade-off: vendor-specific features, less control 816 - 817 - **why Cloudflare R2 instead of S3?** 818 - - zero egress fees (critical for audio streaming) 819 - - S3-compatible API (easy migration if needed) 820 - - integrated CDN for fast delivery 821 - - significantly cheaper than S3 for bandwidth-heavy workloads 822 - 823 - **why forked atproto SDK?** 824 - - upstream SDK lacked OAuth 2.1 support 825 - - needed custom record management patterns 826 - - maintains compatibility with ATProto spec 827 - - contributes improvements back when possible 828 - 829 - **why SvelteKit instead of React/Next.js?** 830 - - Svelte 5 runes provide excellent reactivity model 831 - - smaller bundle sizes (critical for mobile) 832 - - less boilerplate than React 833 - - SSR + static generation flexibility 834 - - modern DX with TypeScript 835 - 836 - **why Neon instead of self-hosted Postgres?** 837 - - serverless autoscaling (no capacity planning) 838 - - branch-per-PR workflow (preview databases) 839 - - automatic backups and point-in-time recovery 840 - - generous free tier for MVP 841 - - trade-off: higher latency than co-located DB, but acceptable 842 - 843 - **why reject AIFF instead of transcoding immediately?** 844 - - MVP speed: transcoding requires queue infrastructure, ffmpeg setup, error handling 845 - - user communication: better to be upfront about limitations than silent failures 846 - - resource management: transcoding is CPU-intensive, needs proper worker architecture 847 - - future flexibility: can add transcoding as optional feature (high-quality uploads → MP3 delivery) 848 - - trade-off: some users can't upload AIFF now, but those who can upload MP3 have working experience 849 - 850 - **why async everywhere?** 851 - - event loop performance: single-threaded async handles high concurrency 852 - - I/O-bound workload: most time spent waiting on network/disk 853 - - recent work (PRs #149-151) eliminated all blocking operations 854 - - alternative: thread pools for blocking I/O, but increases complexity 855 - - trade-off: debugging async code harder than sync, but worth throughput gains 856 - 857 - **why anyio.Path over thread pools?** 858 - - true async I/O: `anyio` uses OS-level async file operations where available 859 - - constant memory: chunked reads/writes (64KB) prevent OOM on large files 860 - - thread pools: would work but less efficient, more context switching 861 - - trade-off: anyio API slightly different from stdlib `pathlib`, but cleaner async semantics 862 - 863 - ## cost structure 864 - 865 - current monthly costs: ~$5-6 866 - 867 - - cloudflare pages: $0 (free tier) 868 - - cloudflare R2: ~$0.16 (storage + operations, no egress fees) 869 - - fly.io production: $5.00 (2x shared-cpu-1x VMs with auto-stop) 870 - - fly.io staging: $0 (auto-stop, only runs during testing) 871 - - neon: $0 (free tier, 0.5 CPU, 512MB RAM, 3GB storage) 872 - - logfire: $0 (free tier) 873 - - domain: $12/year (~$1/month) 874 - 875 - ## deployment URLs 876 - 877 - - **production frontend**: https://plyr.fm 878 - - **production backend**: https://relay-api.fly.dev (redirects to https://api.plyr.fm) 879 - - **staging backend**: https://api-stg.plyr.fm 880 - - **staging frontend**: https://stg.plyr.fm 881 - - **repository**: https://github.com/zzstoatzz/plyr.fm (private) 882 - - **monitoring**: https://logfire-us.pydantic.dev/zzstoatzz/relay 883 - - **bluesky**: https://bsky.app/profile/plyr.fm 884 - - **latest release**: 2025.1110.042349 885 - 886 - ## health indicators 887 - 888 - **production status**: ✅ healthy 889 - - uptime: consistently available 890 - - response times: <500ms p95 for API endpoints 891 - - error rate: <1% (mostly invalid OAuth states) 892 - - storage: ~12 tracks uploaded, functioning correctly 893 - 894 - **key metrics** 895 - - total tracks: ~12 896 - - total artists: ~3 897 - - play counts: tracked per-track 898 - - storage used: <1GB R2 899 - - database size: <10MB postgres 900 - 901 - ## next session prep 902 - 903 - **context for new agent:** 904 - 1. player race condition was attempted but reverted (PR #187) 905 - 2. main branch is clean and deployable 906 - 3. current branch: fix/player-rapid-click-race-condition (can be deleted) 907 - 4. focus should be on understanding what broke with the race condition fix 908 - 5. liquid glass effects (#186) is a nice-to-have enhancement 909 - 910 - **debugging resources:** 911 - - Logfire telemetry: logfire-us.pydantic.dev/zzstoatzz/relay 912 - - recent work documented in: sandbox/double-loading-analysis.md 913 - - relevant code: frontend/src/lib/components/Player.svelte, frontend/src/lib/queue.svelte.ts 914 - 915 - **useful commands:** 916 - - `just serve` - run backend 917 - - `just dev` - run frontend 918 - - `just test` - run test suite 919 - - `date` - get current time (don't assume, always check) 920 - - `git log --oneline -20` - see recent work 921 - - `gh issue list` - check open issues 922 - 923 - ## admin tooling 924 - 925 - ### content moderation 926 - script: `scripts/delete_track.py` 927 - - requires `ADMIN_*` prefixed environment variables 928 - - deletes audio file from R2 929 - - deletes cover image from R2 (if exists) 930 - - deletes database record (cascades to likes and queue entries) 931 - - notes ATProto records for manual cleanup (can't delete from other users' PDS) 932 - 933 - usage: 934 - ```bash 935 - # dry run 936 - uv run scripts/delete_track.py <track_id> --dry-run 937 - 938 - # delete with confirmation 939 - uv run scripts/delete_track.py <track_id> 940 - 941 - # delete without confirmation 942 - uv run scripts/delete_track.py <track_id> --yes 943 - 944 - # by URL 945 - uv run scripts/delete_track.py --url https://plyr.fm/track/34 946 - ``` 947 - 948 - required environment variables: 949 - - `ADMIN_DATABASE_URL` - production database connection 950 - - `ADMIN_AWS_ACCESS_KEY_ID` - R2 access key 951 - - `ADMIN_AWS_SECRET_ACCESS_KEY` - R2 secret 952 - - `ADMIN_R2_ENDPOINT_URL` - R2 endpoint 953 - - `ADMIN_R2_BUCKET` - R2 bucket name 954 - 955 - ## known issues 956 - 957 - ### non-blocking 958 - - cloudflare pages preview URLs return 404 (production works fine) 959 - - some "relay" references remain in docs and comments 960 - - ATProto like records can't be deleted when removing tracks (orphaned on users' PDS) 961 - 962 - ## for new contributors 963 - 964 - ### getting started 965 - 1. clone: `gh repo clone zzstoatzz/plyr.fm` 966 - 2. install dependencies: `uv sync && cd frontend && bun install` 967 - 3. run backend: `uv run uvicorn backend.main:app --reload` 968 - 4. run frontend: `cd frontend && bun run dev` 969 - 5. visit http://localhost:5173 970 - 971 - ### development workflow 972 - 1. create issue on github 973 - 2. create PR from feature branch 974 - 3. ensure pre-commit hooks pass 975 - 4. test locally 976 - 5. merge to main → deploys to staging automatically 977 - 6. verify on staging 978 - 7. create github release → deploys to production automatically 979 - 980 - ### key principles 981 - - type hints everywhere 982 - - lowercase aesthetic 983 - - generic terminology (use "items" not "tracks" where appropriate) 984 - - ATProto first 985 - - mobile matters 986 - - cost conscious 987 - - async everywhere (no blocking I/O) 988 - 989 - ### project structure 990 - ``` 991 - plyr.fm/ 992 - ├── src/backend/ # fastapi backend 993 - │ ├── api/ # public HTTP endpoints 994 - │ ├── _internal/ # internal services 995 - │ ├── atproto/ # ATProto integration 996 - │ ├── models/ # sqlalchemy schemas 997 - │ ├── storage/ # R2 and filesystem backends 998 - │ └── utilities/ # helpers and config 999 - ├── frontend/ # sveltekit app 1000 - │ ├── src/lib/ # components and stores 1001 - │ └── src/routes/ # pages 1002 - ├── tests/ # pytest suite 1003 - ├── alembic/ # database migrations 1004 - ├── docs/ # deployment guides 1005 - ├── .github/workflows/ # CI/CD pipelines 1006 - └── CLAUDE.md # project instructions 1007 - ``` 1008 - 1009 - ## documentation 1010 - 1011 - - [deployment overview](docs/deployment/overview.md) 1012 - - [configuration guide](docs/configuration.md) 1013 - - [queue design](docs/queue-design.md) 1014 - - [logfire querying](docs/logfire-querying.md) 1015 - - [pdsx guide](docs/pdsx-guide.md) 1016 - - [neon mcp guide](docs/neon-mcp-guide.md) 1017 - 1018 - ## performance optimization session (Nov 12, 2025) 1019 - 1020 - ### issue: slow /tracks/liked endpoint 1021 - 1022 - **symptoms**: 1023 - - `/tracks/liked` taking 600-900ms consistently 1024 - - only ~25ms spent in database queries 1025 - - mysterious 575ms gap with no spans in Logfire traces 1026 - - endpoint felt sluggish compared to other pages 1027 - 1028 - **investigation**: 1029 - - examined Logfire traces for `/tracks/liked` requests 1030 - - found 5-6 liked tracks being returned per request 1031 - - DB queries completing fast (track data, artist info, like counts all under 10ms each) 1032 - - noticed R2 storage calls weren't appearing in traces despite taking majority of request time 1033 - 1034 - **root cause**: 1035 - - PR #184 added `image_url` column to tracks table to eliminate N+1 R2 API calls 1036 - - new tracks (uploaded after PR) have `image_url` populated at upload time ✅ 1037 - - legacy tracks (15 tracks uploaded before PR) had `image_url = NULL` ❌ 1038 - - fallback code called `track.get_image_url()` for NULL values 1039 - - `get_image_url()` makes uninstrumented R2 `head_object` API calls to find image extensions 1040 - - each track with NULL `image_url` = ~100-120ms of R2 API calls per request 1041 - - 5 tracks × 120ms = ~600ms of uninstrumented latency 1042 - 1043 - **why R2 calls weren't visible**: 1044 - - `storage.get_url()` method had no Logfire instrumentation 1045 - - R2 API calls happening but not creating spans 1046 - - appeared as mysterious gap in trace timeline 1047 - 1048 - **solution implemented**: 1049 - 1. created `scripts/backfill_image_urls.py` to populate missing `image_url` values 1050 - 2. ran script against production database with production R2 credentials 1051 - 3. backfilled 11 tracks successfully (4 already done in previous partial run) 1052 - 4. 3 tracks "failed" but actually have non-existent images (optional, expected) 1053 - 5. script uses concurrent `asyncio.gather()` for performance 1054 - 1055 - **key learning: environment configuration matters**: 1056 - - initial script runs failed silently because: 1057 - - script used local `.env` credentials (dev R2 bucket) 1058 - - production images stored in different R2 bucket (`images-prod`) 1059 - - `get_url()` returned `None` when images not found in dev bucket 1060 - - fix: passed production R2 credentials via environment variables: 1061 - - `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` 1062 - - `R2_IMAGE_BUCKET=images-prod` 1063 - - `R2_PUBLIC_IMAGE_BUCKET_URL=https://pub-7ea7ea9a6f224f4f8c0321a2bb008c5a.r2.dev` 1064 - 1065 - **results**: 1066 - - before: 15 tracks needed backfill, causing ~600-900ms latency on `/tracks/liked` 1067 - - after: 13 tracks populated with `image_url`, 3 legitimately have no images 1068 - - `/tracks/liked` now loads with 0 R2 API calls instead of 5-11 1069 - - endpoint feels "really, really snappy" (user feedback) 1070 - - performance improvement visible immediately after backfill 1071 - 1072 - **database cleanup: queue_state table bloat**: 1073 - - discovered `queue_state` had 265% bloat (53 dead rows, 20 live rows) 1074 - - ran `VACUUM (FULL, ANALYZE) queue_state` against production 1075 - - result: 0 dead rows, table clean 1076 - - configured autovacuum for queue_state to prevent future bloat: 1077 - - frequent updates to this table make it prone to bloat 1078 - - should tune `autovacuum_vacuum_scale_factor` to 0.05 (5% vs default 20%) 1079 - 1080 - **endpoint performance snapshot** (post-fix, last 10 minutes): 1081 - - `GET /tracks/`: 410ms (down from 2+ seconds) 1082 - - `GET /queue/`: 399ms (down from 2+ seconds) 1083 - - `GET /tracks/liked`: now sub-200ms (down from 600-900ms) 1084 - - `GET /preferences/`: 200ms median 1085 - - `GET /auth/me`: 114ms median 1086 - - `POST /tracks/{track_id}/play`: 34ms 1087 - 1088 - **PR #184 context**: 1089 - - PR claimed "opportunistic backfill: legacy records update on first access" 1090 - - but actual implementation never saved computed `image_url` back to database 1091 - - fallback code only computed URLs on-demand, didn't persist them 1092 - - this is why repeated visits kept hitting R2 API for same tracks 1093 - - one-time backfill script was correct solution vs adding write logic to read endpoints 1094 - 1095 - **graceful ATProto recovery (PR #180)**: 1096 - - reviewed recent work on handling tracks with missing `atproto_record_uri` 1097 - - 4 tracks in production have NULL ATProto records (expected from upload failures) 1098 - - system already handles this gracefully: 1099 - - like buttons disabled with helpful tooltips 1100 - - track owners can self-service restore via portal 1101 - - `restore-record` endpoint recreates with correct TID timestamps 1102 - - no action needed - existing recovery system working as designed 1103 - 1104 - **performance metrics pre/post all recent PRs**: 1105 - - PR #184 (image_url storage): eliminated hundreds of R2 API calls per request 1106 - - today's backfill: eliminated remaining R2 calls for legacy tracks 1107 - - combined impact: queue/tracks endpoints now 5-10x faster than before PR #184 1108 - - all endpoints now consistently sub-second response times 1109 - 1110 - **documentation created**: 1111 - - `docs/neon-mcp-guide.md`: comprehensive guide for using Neon MCP 1112 - - project/branch management 1113 - - database schema inspection 1114 - - SQL query patterns for plyr.fm 1115 - - connection string generation 1116 - - environment mapping (dev/staging/prod) 1117 - - debugging workflows 1118 - - `scripts/backfill_image_urls.py`: reusable for any future image_url gaps 1119 - - dry-run mode for safety 1120 - - concurrent R2 API calls 1121 - - detailed error logging 1122 - - production-tested 1123 - 1124 - **tools and patterns established**: 1125 - - Neon MCP for database inspection and queries 1126 - - Logfire arbitrary queries for performance analysis 1127 - - production secret management via Fly.io 1128 - - `flyctl ssh console` for environment inspection 1129 - - backfill scripts with dry-run mode 1130 - - environment variable overrides for production operations 1131 - 1132 - **system health indicators**: 1133 - - ✅ no 5xx errors in recent spans 1134 - - ✅ database queries all under 70ms p95 1135 - - ✅ SSL connection pool issues resolved (no errors in recent traces) 1136 - - ✅ queue_state table bloat eliminated 1137 - - ✅ all track images either in DB or legitimately NULL 1138 - - ✅ application feels fast and responsive 1139 - 1140 - **next steps**: 1141 - 1. configure autovacuum for `queue_state` table (prevent future bloat) 1142 - 2. add Logfire instrumentation to `storage.get_url()` for visibility 1143 - 3. monitor `/tracks/liked` performance over next few days 1144 - 4. consider adding similar backfill pattern for any future column additions 1145 - 1146 - --- 1147 - 1148 - this is a living document. last updated 2025-11-18 after ATProto namespace cleanup.