···11-# plyr.fm - status update
22-33-Status as of: 2025-11-18
44-55-## long-term vision
66-77-### the problem
88-99-today's music streaming is fundamentally broken:
1010-- spotify and apple music trap your data in proprietary silos
1111-- artists pay distribution fees and streaming cuts to multiple gatekeepers
1212-- listeners can't own their music collections - they rent them
1313-- switching platforms means losing everything: playlists, play history, social connections
1414-1515-### the atproto solution
1616-1717-plyr.fm is built on the AT Protocol (the protocol powering Bluesky) and enables:
1818-- **portable identity**: your music collection, playlists, and listening history belong to you, stored in your personal data server (PDS)
1919-- **decentralized distribution**: artists publish directly to the network without platform gatekeepers
2020-- **interoperable data**: any client can read your music records - you're not locked into plyr.fm
2121-- **authentic social**: artist profiles are real ATProto identities with verifiable handles (@artist.bsky.social)
2222-2323-### the dream state
2424-2525-plyr.fm should become:
2626-2727-1. **for artists**: the easiest way to publish music to the decentralized web
2828- - upload once, available everywhere in the ATProto network
2929- - direct connection to listeners without platform intermediaries
3030- - real ownership of audience relationships
3131-3232-2. **for listeners**: a streaming platform where you actually own your data
3333- - your collection lives in your PDS, playable by any ATProto music client
3434- - switch between plyr.fm and other clients freely - your data travels with you
3535- - share tracks as native ATProto posts to Bluesky
3636-3737-3. **for developers**: a reference implementation showing how to build on ATProto
3838- - open source end-to-end example of ATProto integration
3939- - demonstrates OAuth, record creation, federation patterns
4040- - proves decentralized music streaming is viable
4141-4242-## medium-term vision (next 3-6 months)
4343-4444-### core feature priorities
4545-4646-1. **rich track metadata** (#155)
4747- - genres, tags, descriptions
4848- - enhanced discoverability
4949- - proper music taxonomy
5050-5151-2. **audio transcoding pipeline** (#153)
5252- - support AIFF/AIF and other formats
5353- - automatic conversion to web-friendly formats
5454- - consistent playback experience
5555-5656-3. **PWA installability** (#165)
5757- - desktop and mobile installation
5858- - offline capability exploration
5959- - native app-like experience
6060-6161-4. **fullscreen player view** (#122)
6262- - immersive playback interface
6363- - album art showcase
6464- - enhanced mobile experience
6565-6666-### platform maturity
6767-6868-1. **content moderation** (#166, #167)
6969- - image content moderation for user uploads
7070- - DMCA safe harbor compliance
7171- - automated detection systems
7272-7373-2. **public developer API** (#56)
7474- - versioned REST API
7575- - authentication patterns
7676- - rate limiting and quotas
7777- - documentation and SDKs
7878-7979-3. **content-addressable storage** (#146)
8080- - hash-based URLs for deduplication
8181- - bandwidth optimization
8282- - cache-friendly architecture
8383-8484-### ecosystem integration
8585-8686-- deeper ATProto federation
8787-- cross-client compatibility testing
8888-- Bluesky social sharing improvements
8989-- PDS integration patterns
9090-9191-## current state
9292-9393-**production is live and gaining real users** 🎉
9494-- latest release: 2025.1110.042349
9595-- frontend: https://plyr.fm
9696-- backend: https://relay-api.fly.dev
9797-- three-tier deployment working: dev → staging → production
9898-- first external user (@stellz) actively uploading content (75+ minute tracks tested successfully)
9999-100100-## short-term priorities (this week)
101101-102102-### active issues
103103-104104-1. **playback auto-start on refresh** (#225)
105105- - symptom: page refresh sometimes starts playing immediately
106106- - suspected cause: client-side caching of playback state or queue restoration
107107- - `autoplay_next` preference set to false but not always respected
108108- - needs investigation into what state is persisting and why
109109-110110-2. **liquid glass visual effects** (#186)
111111- - user-configurable frosted glass effects
112112- - aesthetic enhancement feature
113113- - low priority, high polish
114114-115115-## recent work (november 2025)
116116-117117-### summary
118118-119119-**major features**:
120120-- ✅ ATProto namespace separation (#263-264) - environment-specific namespaces, removed hardcoded strings, migrated staging data
121121-- ✅ secure browser authentication (#237, #239, #244) - HttpOnly cookies, XSS protection, environment-aware configuration
122122-- ✅ albums feature (PRs #214-222) - database schema, CRUD, browsing, detail pages, cover art
123123-- ✅ frontend data loading overhaul (PRs #210, #227) - server-side rendering, centralized auth
124124-- ✅ link preview system (PRs #230-231) - rich OG metadata for albums, homepage, tracks
125125-- ✅ liked tracks feature (#157) - persistent collections with ATProto records
126126-- ✅ transcoder API service (#156) - standalone Rust service for AIFF/FLAC→MP3 conversion
127127-- ✅ track detail pages (#164) - dedicated pages with large cover art
128128-129129-**data integrity fixes** (PR #191, Nov 13, 2025):
130130-- ✅ duplicate upload detection - prevents re-uploading same file from creating multiple tracks pointing to shared R2 object
131131-- ✅ refcount-based R2 deletion - only deletes R2 files when refcount = 1, prevents breaking other tracks
132132-- ✅ ATProto cleanup on delete - removes PDS records when tracks deleted, prevents orphaned records
133133-- ✅ exact key deletion - uses stored `file_type` to delete precise R2 key instead of guessing extensions
134134-135135-**performance improvements**:
136136-- ✅ eliminated N+1 R2 API calls (#184) - store image URLs in database
137137-- ✅ async I/O throughout backend (#149-151) - async R2 operations, concurrent PDS resolution, async storage writes
138138-- ✅ queue hydration optimization - per-session token locks prevent race conditions
139139-140140-**reliability & UX**:
141141-- ✅ streaming uploads with progress (#182, #282) - prevents OOM, better mobile experience
142142-- ✅ graceful ATProto recovery (#180) - tracks with missing records can self-restore
143143-- ✅ mobile UI polish (#159-185) - consistent layouts, better touch targets, improved navigation
144144-- ✅ wave loading animation (#283) - music-themed loading states, clickable refresh on homepage
145145-- ✅ share action parity - added artist detail share button with mobile-friendly placement for consistent copying behavior across track/album/artist pages
146146-- ✅ modularized player - split artwork/metadata and transport controls into focused components for easier maintenance and future UX tweaks
147147-148148-**security & validation**:
149149-- ✅ origin validation for image URLs (#168)
150150-- ✅ AIFF/AIF format rejection (#152) - prevents browser compatibility issues
151151-152152-### detailed history
153153-154154-### wave loading animation (PR #283, Nov 18, 2025)
155155-156156-**motivation**: loading states across the app used inconsistent text ("loading...", "loading tracks...") or generic spinners. wanted a distinctive, music-themed animation that reflects the platform's aesthetic and provides visual consistency.
157157-158158-**what shipped**:
159159-- **WaveLoading component**: new loading animation with 5 vertical bars that pulse in sequence like an audio equalizer
160160- - configurable sizes: sm (16px), md (24px), lg (32px)
161161- - uses accent color for brand consistency
162162- - optional message text below animation
163163- - smooth ease-in-out animations for polish
164164-- **replaced all loading states**:
165165- - homepage: added `initialLoad` state to prevent "no tracks yet" flash before data loads
166166- - portal: main loading + tracks/albums sections
167167- - broken tracks: replaced LoadingSpinner with WaveLoading
168168-- **clickable refresh**: "latest tracks" heading on homepage now interactive
169169- - click to force fresh data fetch (bypasses cache)
170170- - hover effect shows it's clickable
171171- - simple way for users to check for new content
172172-- **cleanup**: deleted unused LoadingSpinner and LoadingOverlay components
173173-174174-**design rationale**:
175175-- wave pattern chosen as music-themed visual metaphor (like audio visualizer)
176176-- vertical bars instead of horizontal to save space and work well in narrow layouts
177177-- staggered animation (0.1s delay per bar) creates smooth wave effect
178178-- accent color ties to brand identity
179179-- kept intentionally simple - can enhance later with more bars, color variations, or audio responsiveness
180180-181181-**impact**:
182182-- ✅ consistent loading experience across entire app
183183-- ✅ on-brand visual identity (music-themed)
184184-- ✅ eliminated "no tracks yet" flash on homepage
185185-- ✅ users can manually refresh latest tracks
186186-- ✅ net code reduction (-10 lines: +112 new, -122 removed)
187187-188188-**future enhancements** (optional):
189189-- could add more bars for fuller effect
190190-- could vary colors or respond to playing audio
191191-- could add particle effects or other visual flourishes
192192-- established pattern for future loading states
193193-194194-### ATProto namespace separation (PRs #263-264, Nov 17, 2025)
195195-196196-**motivation**: dev and staging environments were writing ATProto records to production `fm.plyr.*` namespace, polluting production collections with test data. namespace configuration was also hardcoded in multiple places instead of using environment-aware config.
197197-198198-**what shipped**:
199199-- **removed hardcoded namespaces** (PR #263):
200200- - replaced hardcoded `"fm.plyr.like"` strings in `src/backend/_internal/atproto/records.py`
201201- - added `like_collection` computed field to config (mirrors existing `track_collection`)
202202- - fixed OAuth scope generation to use computed fields instead of hardcoded strings
203203- - updated `scripts/backfill_atproto_records.py` to use settings (was using hardcoded namespace)
204204-- **environment-specific namespaces**:
205205- - development: `fm.plyr.dev` (local .env)
206206- - staging: `fm.plyr.stg` (flyctl secrets)
207207- - production: `fm.plyr` (flyctl secrets)
208208-- **data migration**:
209209- - migrated 7 tracks + 5 likes from `fm.plyr.*` to `fm.plyr.dev.*` in development
210210- - migrated 7 tracks + 5 likes from `fm.plyr.*` to `fm.plyr.stg.*` in staging
211211- - used combination of automated script + manual cleanup with neon MCP and pdsx
212212- - cleaned up old staging records from production namespace
213213-- **documentation** (PR #264):
214214- - updated `docs/deployment/environments.md` with namespace configuration
215215- - updated `docs/backend/configuration.md` with environment-specific examples
216216- - removed typer from project dependencies (moved to PEP 723 inline script deps)
217217- - created `sandbox/stg-namespace-migration/README.md` documenting migration process
218218-219219-**impact**:
220220-- ✅ test tracks/likes no longer pollute production collections
221221-- ✅ OAuth scopes environment-specific and automatically generated from config
222222-- ✅ database and ATProto records stay aligned within each environment
223223-- ✅ proper data separation for dev/staging/production environments
224224-- ✅ eliminated hardcoded namespace strings throughout codebase
225225-226226-**lessons learned**:
227227-- PEP 723 inline script dependencies work well for ad-hoc migration scripts
228228-- database as source of truth more reliable than PDS for stale record lookups
229229-- manual cleanup sometimes faster than debugging complex migration logic
230230-231231-**follow-up cleanup** (Nov 18, 2025):
232232-- discovered 82 orphaned test/dev records remaining in production `fm.plyr.track` namespace
233233-- created analysis script (`scripts/identify_orphaned_records.py`) to cross-reference PDS records against production database
234234-- verified all 13 production tracks safe (including critical tracks: webhook with features, dinah, lil blues improv)
235235-- automated deletion via generated script with proper PDS authentication
236236-- result: 95 → 13 records in production namespace, all production data intact
237237-- filed upstream issue ([pdsx#43](https://github.com/zzstoatzz/pdsx/issues/43)) for batch/concurrent CRUD operations
238238-239239-### mobile UI polish (PRs #259-261, #265, #268, Nov 17, 2025)
240240-241241-**serialization improvements** (PRs #259-260):
242242-- created `TrackResponse` Pydantic model for consistent track serialization
243243-- fixed album endpoint to properly serialize tracks (was mixing dict/model types)
244244-- eliminated manual dict construction in favor of model-based serialization
245245-- better type safety and consistency across endpoints
246246-247247-**notifications fix** (PR #261):
248248-- notification bot was using hardcoded `https://plyr.fm` URL
249249-- now uses environment-aware `settings.frontend.url` (staging uses `https://stg.plyr.fm`)
250250-- ensures notifications link to correct environment
251251-252252-**sticky player padding** (PRs #265, #268):
253253-- fixed album tracks overlapping with sticky bottom player on mobile (#265)
254254-- attempted centralized padding approach (#266) but created excessive whitespace on mobile
255255-- reverted to per-page padding handling (#268) while keeping album track clearance fix
256256-- mobile padding now matches pre-centralization behavior
257257-258258-**impact**:
259259-- ✅ consistent track serialization across all endpoints
260260-- ✅ notifications link to correct environment
261261-- ✅ album tracks properly clear sticky player on mobile
262262-- ✅ mobile padding back to appropriate levels (no excessive whitespace)
263263-264264-### secure browser authentication (issue #237, PRs #239-244, Nov 14-15, 2025)
265265-266266-**motivation**: session tokens stored in localStorage were vulnerable to XSS attacks. any malicious script could read `session_id` from localStorage and hijack accounts for the full 14-day session lifetime.
267267-268268-**what shipped**:
269269-- **HttpOnly cookies** (PR #244): backend sets `Set-Cookie: session_id=...; HttpOnly; Secure; SameSite=Lax`
270270- - HttpOnly prevents JavaScript access (XSS protection)
271271- - Secure requires HTTPS (except localhost for dev)
272272- - SameSite=Lax prevents CSRF while allowing same-site requests
273273- - cookies automatically sent with requests (no manual auth header management)
274274-- **cookie-aware auth dependencies** (PR #243):
275275- - `require_auth` checks cookies first, falls back to Authorization header
276276- - `require_artist_profile` updated with same pattern
277277- - optional auth endpoints (tracks list, track detail, album detail) now support cookies
278278- - proper parameter aliasing (`Cookie(alias="session_id")`) for FastAPI
279279-- **environment-aware cookie configuration**:
280280- - localhost: `secure=False` for HTTP development
281281- - staging/production: `secure=True` for HTTPS
282282- - no explicit domain set (prevents cross-environment session leakage)
283283-- **same-site detection**:
284284- - compares origin host vs request host
285285- - uses `SameSite=lax` when same-site (localhost→localhost, stg.plyr.fm→api-stg.plyr.fm)
286286- - prevents cookies from being sent cross-site
287287-- **frontend cleanup** (PR #239):
288288- - removed all localStorage session_id read/write operations
289289- - removed `getSessionId()`, `setSessionId()`, `getAuthHeaders()` helpers
290290- - all fetch calls use `credentials: 'include'` to send cookies
291291- - `XMLHttpRequest` uses `withCredentials: true`
292292- - auth state now managed entirely by backend via HttpOnly cookies
293293-294294-**environment architecture**:
295295-- all environments use custom domains on same eTLD+1 for cookie sharing:
296296- - **staging**: `stg.plyr.fm` → `api-stg.plyr.fm` (both `.plyr.fm`)
297297- - **production**: `plyr.fm` → `api.plyr.fm` (both `.plyr.fm`)
298298- - **local**: `localhost:5173` → `localhost:8001` (both `localhost`)
299299-- separate cloudflare pages projects prevent staging/production cookie conflicts:
300300- - `plyr-fm-stg` for staging (tracks `main` branch)
301301- - `plyr-fm` for production (tracks `production-fe` branch)
302302-303303-**security improvements**:
304304-- ✅ eliminated XSS session hijacking vector
305305-- ✅ tokens no longer accessible to JavaScript
306306-- ✅ CSRF protection via SameSite=Lax
307307-- ✅ secure transport enforcement (HTTPS in production)
308308-- ✅ environment isolation (no cookie sharing between staging/prod)
309309-310310-**compatibility maintained**:
311311-- browser clients: use HttpOnly cookies automatically
312312-- future SDK/CLI clients: can still use `Authorization: Bearer <token>` header
313313-- backend accepts both cookie and header auth (cookie preferred)
314314-315315-**documentation created**:
316316-- `docs/backend/atproto-identity.md`: ATProto OAuth client metadata discovery patterns
317317-- `docs/deployment/environments.md`: updated with staging/production cookie architecture
318318-- PR #243 description: comprehensive explanation of cookie domain behavior
319319-320320-**impact**:
321321-- closed high-priority security issue #237
322322-- production-grade auth implementation
323323-- foundation for future session management features (device tracking, forced logout)
324324-- eliminated most common web application security vulnerability
325325-326326-### albums feature (PRs #214-222, Nov 13-14, 2025)
327327-328328-**motivation**: users wanted to group tracks into albums with dedicated pages, cover art, and metadata.
329329-330330-**what shipped**:
331331-- **database schema** (PR #222): new `albums` table with title, slug, description, image_id, artist_did
332332- - album-track relationship via `album_rel` on tracks table
333333- - migration to backfill albums from existing track `extra->>'album'` metadata
334334- - 8 albums created from existing 32 tracks in production
335335-- **backend CRUD** (PR #222): full album management endpoints
336336- - `GET /albums/{handle}` - list artist's albums
337337- - `GET /albums/{handle}/{slug}` - album detail with tracks
338338- - `POST /albums` - create album (authenticated)
339339- - `PATCH /albums/{id}` - update album metadata
340340- - album cover art upload and storage in R2
341341-- **frontend pages** (PRs #214, #216-220):
342342- - album detail pages (`/u/{handle}/album/{slug}`) with track lists
343343- - artist discography sections on artist pages
344344- - album cover art display throughout UI
345345- - server-side rendering for SEO and link previews
346346-- **UI polish** (PR #228): long album title handling
347347- - 100-character slug limit with word-boundary truncation
348348- - CSS text truncation for inline album links
349349- - proper wrapping for album detail page titles
350350- - tested with 91-character production album title
351351-- **link previews** (PRs #230-231):
352352- - rich Open Graph metadata for albums (music.album type)
353353- - artist musician property, image dimensions, canonical URLs
354354- - fixed layout metadata conflicts (prevented generic tags from overriding page-specific ones)
355355-356356-**what's NOT done** (issue #221 still open):
357357-- ATProto records for albums (consciously deferred)
358358-- reason: want to thoughtfully design the lexicon before committing to a schema
359359-- tracks work fine without album ATProto records for now
360360-361361-**impact**:
362362-- albums now first-class citizens in UI and database
363363-- better content organization for artists with multiple releases
364364-- improved SEO with album-specific link previews
365365-- foundation for future features (album likes, album playlists)
366366-367367-### frontend architecture improvements (PRs #210, #227, Nov 13-14, 2025)
368368-369369-**motivation**: eliminate "flash of loading", improve SEO, reduce code duplication, fix performance bottlenecks.
370370-371371-**PR #210 - centralized auth and client-side load functions**:
372372-- created `lib/auth.svelte.ts` - centralized auth manager with SSR-safe guards
373373-- added `+layout.ts` - loads auth state once for entire app
374374-- added `+page.ts` to liked tracks page - loads data before component mounts
375375-- refactored all pages to use centralized auth (eliminated scattered localStorage calls)
376376-- code reduction: +256 lines, -308 lines (net -52 lines)
377377-378378-**PR #227 - artist pages moved to server-side rendering**:
379379-- replaced client-side `onMount` fetches with `+page.server.ts`
380380-- parallel server loading of artist info, tracks, and albums
381381-- data ready before page renders (eliminates loading states)
382382-- performance: ~1.66s sequential waterfall → instant render
383383-384384-**pattern shift**:
385385-```
386386-old: page loads → onMount → fetch artist → fetch tracks → fetch albums → render
387387-new: server fetches all in parallel → page renders immediately with data
388388-```
389389-390390-**impact**:
391391-- eliminated "flash of loading" across artist and album pages
392392-- improved lighthouse scores and SEO (real data in initial HTML)
393393-- consistent auth patterns throughout app
394394-- better UX - pages feel instant instead of progressive
395395-396396-**documentation**: see `docs/frontend/data-loading.md` for patterns and anti-patterns
397397-398398-### link preview system (PRs #230-231, Nov 14, 2025)
399399-400400-**problem**: album pages and homepage had no Open Graph metadata, leading to poor link previews on social media.
401401-402402-**PR #230 - add rich metadata**:
403403-- homepage: complete OG tags (type, title, description, url, site_name)
404404-- album pages: rich music.album metadata matching track page quality
405405- - added canonical URL, site name, musician property
406406- - added image dimensions (1200x1200), alt text, secure_url
407407- - improved meta description
408408-409409-**PR #231 - fix metadata conflicts**:
410410-- root layout was rendering duplicate OG tags on all pages
411411-- social scrapers use first tags encountered (generic layout ones)
412412-- page-specific metadata was being ignored
413413-- solution: exclude pages with their own metadata from layout defaults
414414- - homepage (`/`)
415415- - track pages (`/track/*`)
416416- - album pages (`/u/*/album/*`)
417417-418418-**result**: album links now show rich previews with cover art, artist info, track counts when shared on social platforms.
419419-420420-### Banana mix incident fixes (PR #191, Nov 13, 2025)
421421-422422-**Why:** stellz uploaded "banana mix" twice due to slow UI feedback, creating duplicate tracks (56 and 57)
423423-pointing to the same R2 file. When track 57 was deleted, it removed the shared R2 file, breaking track 56
424424-with 404 errors. ATProto record for track 57 was orphaned on her PDS. Investigation also revealed storage
425425-layer was guessing file extensions by trying all formats until finding a match.
426426-427427-**What shipped:**
428428-- **duplicate detection** (tracks.py:181-203): after saving file, checks if track with same `file_id`
429429- and `artist_did` exists. rejects upload with error instead of creating duplicate.
430430-- **refcount-based deletion** (r2.py:175-197): before deleting R2 file, queries database for refcount.
431431- only deletes if `refcount == 1`. logs when deletion skipped due to `refcount > 1`.
432432-- **exact key deletion** (r2.py:163-233, filesystem.py:85-123): updated `delete()` signature to accept
433433- optional `file_type` parameter. when provided, deletes exact key `audio/{file_id}.{file_type}` instead
434434- of looping through all formats. fallback to loop only when `file_type` is None (legacy rows, images).
435435- - upload cleanup passes `audio_format.value`
436436- - track deletion passes `track.file_type`
437437- - image deletion still uses fallback (no `image_format` field yet - tech debt)
438438-- **ATProto cleanup** (tracks.py:683-712): deletes PDS record when track deleted. handles 404 gracefully
439439- (record already gone), bubbles other errors.
440440-441441-**Impact:** prevents "delete duplicate and nuke original" scenario. logs show exact keys being deleted
442442-instead of trying wrong extensions first. manual e2e test confirmed: uploaded .wav file, verified exact
443443-key deletion via R2 API, confirmed clean deletion with no orphans in DB/PDS/R2.
444444-445445-**Tech debt identified:**
446446-- storage layer has accumulated naive patterns that work but aren't elegant:
447447- - image deletion still loops through formats (no `image_format` column on tracks)
448448- - could store image format alongside `image_id` to enable exact deletion
449449- - or maintain separate image metadata table
450450- - functional for now, but should clean up later
451451-452452-### detailed history
453453-454454-### Queue hydration + ATProto token hardening (Nov 12, 2025)
455455-456456-**Why:** queue endpoints were occasionally taking 2s+ and restore operations could 401
457457-when multiple requests refreshed an expired ATProto token simultaneously.
458458-459459-**What shipped:**
460460-- Added persistent `image_url` on `Track` rows so queue hydration no longer probes R2
461461- for every track. Queue payloads now pull art directly from Postgres, with a one-time
462462- fallback for legacy rows.
463463-- Updated `_internal/queue.py` to backfill any missing URLs once (with caching) instead
464464- of per-request GETs.
465465-- Introduced per-session locks in `_refresh_session_tokens` so only one coroutine hits
466466- `oauth_client.refresh_session` at a time; others reuse the refreshed tokens. This
467467- removes the race that caused the batch restore flow to intermittently 500/401.
468468-469469-**Impact:** queue tail latency dropped back under 500 ms in staging tests, ATProto
470470-restore flows are now reliable under concurrent use, and Logfire no longer shows 500s
471471-from the PDS.
472472-473473-### Liked tracks feature (PR #157, Nov 11, 2025)
474474-475475-- ✅ server-side persistent collections
476476-- ✅ ATProto record publication for cross-platform visibility
477477-- ✅ UI for adding/removing tracks from liked collection
478478-- ✅ like counts displayed in track responses and analytics (#170)
479479-- ✅ analytics cards now clickable links to track detail pages (#171)
480480-- ✅ liked state shown on artist page tracks (#163)
481481-482482-### Upload streaming + progress UX (PR #182, Nov 11, 2025)
483483-484484-- Frontend switched from `fetch` to `XMLHttpRequest` so we can display upload progress
485485- toasts (critical for >50 MB mixes on mobile).
486486-- Upload form now clears only after the request succeeds; failed attempts leave the
487487- form intact so users don't lose metadata.
488488-- Backend writes uploads/images to temp files in 8 MB chunks before handing them to the
489489- storage layer, eliminating whole-file buffering and iOS crashes for hour-long mixes.
490490-- Deployment verified locally and by rerunning the exact repro Stella hit (85 minute
491491- mix from mobile).
492492-493493-### transcoder API deployment (PR #156, Nov 11, 2025)
494494-495495-**standalone Rust transcoding service** 🎉
496496-- **deployed**: https://plyr-transcoder.fly.dev/
497497-- **purpose**: convert AIFF/FLAC/etc. to MP3 for browser compatibility
498498-- **technology**: Axum + ffmpeg + Docker
499499-- **security**: `X-Transcoder-Key` header authentication (shared secret)
500500-- **capacity**: handles 1GB uploads, tested with 85-minute AIFF files (~858MB → 195MB MP3 in 32 seconds)
501501-- **architecture**:
502502- - 2 Fly machines for high availability
503503- - auto-stop/start for cost efficiency
504504- - stateless design (no R2 integration yet)
505505- - 320kbps MP3 output with proper ID3 tags
506506-- **status**: deployed and tested, ready for integration into plyr.fm upload pipeline
507507-- **next steps**: wire into backend with R2 integration and job queue (see issue #153)
508508-509509-### AIFF/AIF browser compatibility fix (PR #152, Nov 11, 2025)
510510-511511-**format validation improvements**
512512-- **problem discovered**: AIFF/AIF files only work in Safari, not Chrome/Firefox
513513- - browsers throw `MediaError code 4: MEDIA_ERR_SRC_NOT_SUPPORTED`
514514- - users could upload files but they wouldn't play in most browsers
515515-- **immediate solution**: reject AIFF/AIF uploads at both backend and frontend
516516- - removed AIFF/AIF from AudioFormat enum
517517- - added format hints to upload UI: "supported: mp3, wav, m4a"
518518- - client-side validation with helpful error messages
519519-- **long-term solution**: deployed standalone transcoder service (see above)
520520- - separate Rust/Axum service with ffmpeg
521521- - accepts all formats, converts to browser-compatible MP3
522522- - integration into upload pipeline pending (issue #153)
523523-524524-**observability improvements**:
525525-- added logfire instrumentation to upload background tasks
526526-- added logfire spans to R2 storage operations
527527-- documented logfire querying patterns in `docs/logfire-querying.md`
528528-529529-### async I/O performance fixes (PRs #149-151, Nov 10-11, 2025)
530530-531531-Eliminated event loop blocking across backend with three critical PRs:
532532-533533-1. **PR #149: async R2 reads** - converted R2 `head_object` operations from sync boto3 to async aioboto3
534534- - portal page load time: 2+ seconds → ~200ms
535535- - root cause: `track.image_url` was blocking on serial R2 HEAD requests
536536-537537-2. **PR #150: concurrent PDS resolution** - parallelized ATProto PDS URL lookups
538538- - homepage load time: 2-6 seconds → 200-400ms
539539- - root cause: serial `resolve_atproto_data()` calls (8 artists × 200-300ms each)
540540- - fix: `asyncio.gather()` for batch resolution, database caching for subsequent loads
541541-542542-3. **PR #151: async storage writes/deletes** - made save/delete operations non-blocking
543543- - R2: switched to `aioboto3` for uploads/deletes (async S3 operations)
544544- - filesystem: used `anyio.Path` and `anyio.open_file()` for chunked async I/O (64KB chunks)
545545- - impact: multi-MB uploads no longer monopolize worker thread, constant memory usage
546546-547547-### cover art support (PRs #123-126, #132-139)
548548-- ✅ track cover image upload and storage (separate R2 bucket)
549549-- ✅ image display on track pages and player
550550-- ✅ Open Graph meta tags for track sharing
551551-- ✅ mobile-optimized layouts with cover art
552552-- ✅ sticky bottom player on mobile with cover
553553-554554-### track detail pages (PR #164, Nov 12, 2025)
555555-556556-- ✅ dedicated track detail pages with large cover art
557557-- ✅ play button updates queue state correctly (#169)
558558-- ✅ liked state loaded efficiently via server-side fetch
559559-- ✅ mobile-optimized layouts with proper scrolling constraints
560560-- ✅ origin validation for image URLs (#168)
561561-562562-### mobile UI improvements (PRs #159-185, Nov 11-12, 2025)
563563-564564-- ✅ compact action menus and better navigation (#161)
565565-- ✅ improved mobile responsiveness (#159)
566566-- ✅ consistent button layouts across mobile/desktop (#176-181, #185)
567567-- ✅ always show play count and like count on mobile (#177)
568568-- ✅ login page UX improvements (#174-175)
569569-- ✅ liked page UX improvements (#173)
570570-- ✅ accent color for liked tracks (#160)
571571-572572-### queue management improvements (PRs #110-113, #115)
573573-- ✅ visual feedback on queue add/remove
574574-- ✅ toast notifications for queue actions
575575-- ✅ better error handling for queue operations
576576-- ✅ improved shuffle and auto-advance UX
577577-578578-### infrastructure and tooling
579579-- ✅ R2 bucket separation: audio-prod and images-prod (PR #124)
580580-- ✅ admin script for content moderation (`scripts/delete_track.py`)
581581-- ✅ bluesky attribution link in header
582582-- ✅ changelog target added (#183)
583583-- ✅ documentation updates (#158)
584584-- ✅ track metadata edits now persist correctly (#162)
585585-586586-## immediate priorities
587587-588588-### high priority features
589589-1. **audio transcoding pipeline integration** (issue #153)
590590- - ✅ standalone transcoder service deployed at https://plyr-transcoder.fly.dev/
591591- - ✅ Rust/Axum service with ffmpeg, tested with 85-minute files
592592- - ✅ secure auth via X-Transcoder-Key header
593593- - ⏳ next: integrate into plyr.fm upload pipeline
594594- - backend calls transcoder API for unsupported formats
595595- - queue-based job system for async processing
596596- - R2 integration (fetch original, store MP3)
597597- - maintain original file hash for deduplication
598598- - handle transcoding failures gracefully
599599-600600-### critical bugs
601601-1. **upload reliability** (issue #147): upload returns 200 but file missing from R2, no error logged
602602- - priority: high (data loss risk)
603603- - need better error handling and retry logic in background upload task
604604-605605-2. **database connection pool SSL errors**: intermittent failures on first request
606606- - symptom: `/tracks/` returns 500 on first request, succeeds after
607607- - fix: set `pool_pre_ping=True`, adjust `pool_recycle` for Neon timeouts
608608- - documented in `docs/logfire-querying.md`
609609-610610-### performance optimizations
611611-3. **persist concrete file extensions in database**: currently brute-force probing all supported formats on read
612612- - already know `Track.file_type` and image format during upload
613613- - eliminating repeated `exists()` checks reduces filesystem/R2 HEAD spam
614614- - improves audio streaming latency (`/audio/{file_id}` endpoint walks extensions sequentially)
615615-616616-4. **stream large uploads directly to storage**: current implementation reads entire file into memory before background task
617617- - multi-GB uploads risk OOM
618618- - stream from `UploadFile.file` → storage backend for constant memory usage
619619-620620-### new features
621621-5. **content-addressable storage** (issue #146)
622622- - hash-based file storage for automatic deduplication
623623- - reduces storage costs when multiple artists upload same file
624624- - enables content verification
625625-626626-6. **liked tracks feature** (issue #144): design schema and ATProto record format
627627- - server-side persistent collections
628628- - ATProto record publication for cross-platform visibility
629629- - UI for adding/removing tracks from liked collection
630630-631631-## open issues by timeline
632632-633633-### immediate
634634-- issue #153: audio transcoding pipeline (ffmpeg worker for AIFF/FLAC→MP3)
635635-- issue #147: upload reliability bug (data loss risk)
636636-- issue #144: likes feature for personal collections
637637-638638-### short-term
639639-- issue #146: content-addressable storage (hash-based deduplication)
640640-- issue #24: implement play count abuse prevention
641641-- database connection pool tuning (SSL errors)
642642-- file extension persistence in database
643643-644644-### medium-term
645645-- issue #39: postmortem - cross-domain auth deployment and remaining security TODOs
646646-- issue #46: consider removing init_db() from lifespan in favor of migration-only approach
647647-- issue #56: design public developer API and versioning
648648-- issue #57: support multiple audio item types (voice memos/snippets)
649649-- issue #122: fullscreen player for immersive playback
650650-651651-### long-term
652652-- migrate to plyr-owned lexicon (custom ATProto namespace with richer metadata)
653653-- publish to multiple ATProto AppViews for cross-platform visibility
654654-- explore ATProto-native notifications (replace Bluesky DM bot)
655655-- realtime queue syncing across devices via SSE/WebSocket
656656-- artist analytics dashboard improvements
657657-- issue #44: modern music streaming feature parity
658658-659659-## technical state
660660-661661-### architecture
662662-663663-**backend**
664664-- language: Python 3.11+
665665-- framework: FastAPI with uvicorn
666666-- database: Neon PostgreSQL (serverless, fully managed)
667667-- storage: Cloudflare R2 (S3-compatible object storage)
668668-- hosting: Fly.io (2x shared-cpu VMs, auto-scaling)
669669-- observability: Pydantic Logfire (traces, metrics, logs)
670670-- auth: ATProto OAuth 2.1 (forked SDK: github.com/zzstoatzz/atproto)
671671-672672-**frontend**
673673-- framework: SvelteKit (latest v2.43.2)
674674-- runtime: Bun (fast JS runtime)
675675-- hosting: Cloudflare Pages (edge network)
676676-- styling: vanilla CSS with lowercase aesthetic
677677-- state management: Svelte 5 runes ($state, $derived, $effect)
678678-679679-**deployment**
680680-- ci/cd: GitHub Actions
681681-- backend: automatic on main branch merge (fly.io deploy)
682682-- frontend: automatic on every push to main (cloudflare pages)
683683-- migrations: automated via fly.io release_command
684684-- environments: dev → staging → production (full separation)
685685-- versioning: nebula timestamp format (YYYY.MMDD.HHMMSS)
686686-687687-**key dependencies**
688688-- atproto: forked SDK for OAuth and record management
689689-- sqlalchemy: async ORM for postgres
690690-- alembic: database migrations
691691-- boto3/aioboto3: R2 storage client
692692-- logfire: observability (FastAPI + SQLAlchemy instrumentation)
693693-- httpx: async HTTP client
694694-695695-### what's working
696696-697697-**core functionality**
698698-- ✅ ATProto OAuth 2.1 authentication with encrypted state
699699-- ✅ secure session management via HttpOnly cookies (XSS protection)
700700-- ✅ artist profiles synced with Bluesky (avatar, display name, handle)
701701-- ✅ track upload with streaming to prevent OOM
702702-- ✅ track edit (title, artist, album, features metadata)
703703-- ✅ track deletion with cascade cleanup
704704-- ✅ audio streaming via HTML5 player with 307 redirects to R2 CDN
705705-- ✅ track metadata published as ATProto records (fm.plyr.track namespace)
706706-- ✅ play count tracking with threshold (30% or 30s, whichever comes first)
707707-- ✅ like functionality with counts
708708-- ✅ artist analytics dashboard
709709-- ✅ queue management (shuffle, auto-advance, reorder)
710710-- ✅ mobile-optimized responsive UI
711711-- ✅ cross-tab queue synchronization via BroadcastChannel
712712-- ✅ share tracks via URL with Open Graph previews (including cover art)
713713-- ✅ image URL caching in database (eliminates N+1 R2 calls)
714714-- ✅ format validation (rejects AIFF/AIF, accepts MP3/WAV/M4A with helpful error messages)
715715-- ✅ admin content moderation script for removing inappropriate uploads
716716-717717-**albums**
718718-- ✅ album database schema with track relationships
719719-- ✅ album browsing pages (`/u/{handle}` shows discography)
720720-- ✅ album detail pages (`/u/{handle}/album/{slug}`) with full track lists
721721-- ✅ album cover art upload and display
722722-- ✅ server-side rendering for SEO
723723-- ✅ rich Open Graph metadata for link previews (music.album type)
724724-- ✅ long album title handling (100-char slugs, CSS truncation)
725725-- ⏸ ATProto records for albums (deferred, see issue #221)
726726-727727-**frontend architecture**
728728-- ✅ server-side data loading (`+page.server.ts`) for artist and album pages
729729-- ✅ client-side data loading (`+page.ts`) for auth-dependent pages
730730-- ✅ centralized auth manager (`lib/auth.svelte.ts`)
731731-- ✅ layout-level auth state (`+layout.ts`) shared across all pages
732732-- ✅ eliminated "flash of loading" via proper load functions
733733-- ✅ consistent auth patterns (no scattered localStorage calls)
734734-735735-**deployment (fully automated)**
736736-- **production**:
737737- - frontend: https://plyr.fm (cloudflare pages)
738738- - backend: https://relay-api.fly.dev (fly.io: 2 machines, 1GB RAM, 1 shared CPU, min 1 running)
739739- - database: neon postgresql
740740- - storage: cloudflare R2 (audio-prod and images-prod buckets)
741741- - deploy: github release → automatic
742742-743743-- **staging**:
744744- - backend: https://api-stg.plyr.fm (fly.io: relay-api-staging)
745745- - frontend: https://stg.plyr.fm (cloudflare pages: plyr-fm-stg)
746746- - database: neon postgresql (relay-staging)
747747- - storage: cloudflare R2 (audio-stg bucket)
748748- - deploy: push to main → automatic
749749-750750-- **development**:
751751- - backend: localhost:8000
752752- - frontend: localhost:5173
753753- - database: neon postgresql (relay-dev)
754754- - storage: cloudflare R2 (audio-dev and images-dev buckets)
755755-756756-- **developer tooling**:
757757- - `just serve` - run backend locally
758758- - `just dev` - run frontend locally
759759- - `just test` - run test suite
760760- - `just release` - create production release (backend + frontend)
761761- - `just release-frontend-only` - deploy only frontend changes (added Nov 13)
762762-763763-### what's in progress
764764-765765-**immediate work**
766766-- investigating playback auto-start behavior (#225)
767767- - page refresh sometimes starts playing immediately
768768- - may be related to queue state restoration or localStorage caching
769769- - `autoplay_next` preference not being respected in all cases
770770-- liquid glass effects as user-configurable setting (#186)
771771-772772-**active research**
773773-- transcoding pipeline architecture (see sandbox/transcoding-pipeline-plan.md)
774774-- content moderation systems (#166, #167)
775775-- PWA capabilities and offline support (#165)
776776-777777-### known issues
778778-779779-**player behavior**
780780-- playback auto-start on refresh (#225)
781781- - sometimes plays immediately after page load
782782- - investigating localStorage/queue state persistence
783783- - may not respect `autoplay_next` preference in all scenarios
784784-785785-**missing features**
786786-- no ATProto records for albums yet (#221 - consciously deferred)
787787-- no track genres/tags/descriptions yet (#155)
788788-- no AIFF/AIF transcoding support (#153)
789789-- no PWA installation prompts (#165)
790790-- no fullscreen player view (#122)
791791-- no public API for third-party integrations (#56)
792792-793793-**technical debt**
794794-- multi-tab playback synchronization could be more robust
795795-- queue state conflicts can occur with rapid operations
796796-- no automated content moderation yet
797797-- no DMCA compliance workflow
798798-799799-### technical decisions
800800-801801-**why Python/FastAPI instead of Rust?**
802802-- rapid prototyping velocity during MVP phase
803803-- rich ecosystem for web APIs (fastapi, sqlalchemy, pydantic)
804804-- excellent async support with asyncio
805805-- lower barrier to contribution
806806-- trade-off: accepting higher latency for faster development
807807-- future: can migrate hot paths to Rust if needed (transcoding service already planned)
808808-809809-**why Fly.io instead of AWS/GCP?**
810810-- simple deployment model (dockerfile → production)
811811-- automatic SSL/TLS certificates
812812-- built-in global load balancing
813813-- reasonable pricing for MVP ($5/month)
814814-- easy migration path to larger providers later
815815-- trade-off: vendor-specific features, less control
816816-817817-**why Cloudflare R2 instead of S3?**
818818-- zero egress fees (critical for audio streaming)
819819-- S3-compatible API (easy migration if needed)
820820-- integrated CDN for fast delivery
821821-- significantly cheaper than S3 for bandwidth-heavy workloads
822822-823823-**why forked atproto SDK?**
824824-- upstream SDK lacked OAuth 2.1 support
825825-- needed custom record management patterns
826826-- maintains compatibility with ATProto spec
827827-- contributes improvements back when possible
828828-829829-**why SvelteKit instead of React/Next.js?**
830830-- Svelte 5 runes provide excellent reactivity model
831831-- smaller bundle sizes (critical for mobile)
832832-- less boilerplate than React
833833-- SSR + static generation flexibility
834834-- modern DX with TypeScript
835835-836836-**why Neon instead of self-hosted Postgres?**
837837-- serverless autoscaling (no capacity planning)
838838-- branch-per-PR workflow (preview databases)
839839-- automatic backups and point-in-time recovery
840840-- generous free tier for MVP
841841-- trade-off: higher latency than co-located DB, but acceptable
842842-843843-**why reject AIFF instead of transcoding immediately?**
844844-- MVP speed: transcoding requires queue infrastructure, ffmpeg setup, error handling
845845-- user communication: better to be upfront about limitations than silent failures
846846-- resource management: transcoding is CPU-intensive, needs proper worker architecture
847847-- future flexibility: can add transcoding as optional feature (high-quality uploads → MP3 delivery)
848848-- trade-off: some users can't upload AIFF now, but those who can upload MP3 have working experience
849849-850850-**why async everywhere?**
851851-- event loop performance: single-threaded async handles high concurrency
852852-- I/O-bound workload: most time spent waiting on network/disk
853853-- recent work (PRs #149-151) eliminated all blocking operations
854854-- alternative: thread pools for blocking I/O, but increases complexity
855855-- trade-off: debugging async code harder than sync, but worth throughput gains
856856-857857-**why anyio.Path over thread pools?**
858858-- true async I/O: `anyio` uses OS-level async file operations where available
859859-- constant memory: chunked reads/writes (64KB) prevent OOM on large files
860860-- thread pools: would work but less efficient, more context switching
861861-- trade-off: anyio API slightly different from stdlib `pathlib`, but cleaner async semantics
862862-863863-## cost structure
864864-865865-current monthly costs: ~$5-6
866866-867867-- cloudflare pages: $0 (free tier)
868868-- cloudflare R2: ~$0.16 (storage + operations, no egress fees)
869869-- fly.io production: $5.00 (2x shared-cpu-1x VMs with auto-stop)
870870-- fly.io staging: $0 (auto-stop, only runs during testing)
871871-- neon: $0 (free tier, 0.5 CPU, 512MB RAM, 3GB storage)
872872-- logfire: $0 (free tier)
873873-- domain: $12/year (~$1/month)
874874-875875-## deployment URLs
876876-877877-- **production frontend**: https://plyr.fm
878878-- **production backend**: https://relay-api.fly.dev (redirects to https://api.plyr.fm)
879879-- **staging backend**: https://api-stg.plyr.fm
880880-- **staging frontend**: https://stg.plyr.fm
881881-- **repository**: https://github.com/zzstoatzz/plyr.fm (private)
882882-- **monitoring**: https://logfire-us.pydantic.dev/zzstoatzz/relay
883883-- **bluesky**: https://bsky.app/profile/plyr.fm
884884-- **latest release**: 2025.1110.042349
885885-886886-## health indicators
887887-888888-**production status**: ✅ healthy
889889-- uptime: consistently available
890890-- response times: <500ms p95 for API endpoints
891891-- error rate: <1% (mostly invalid OAuth states)
892892-- storage: ~12 tracks uploaded, functioning correctly
893893-894894-**key metrics**
895895-- total tracks: ~12
896896-- total artists: ~3
897897-- play counts: tracked per-track
898898-- storage used: <1GB R2
899899-- database size: <10MB postgres
900900-901901-## next session prep
902902-903903-**context for new agent:**
904904-1. player race condition was attempted but reverted (PR #187)
905905-2. main branch is clean and deployable
906906-3. current branch: fix/player-rapid-click-race-condition (can be deleted)
907907-4. focus should be on understanding what broke with the race condition fix
908908-5. liquid glass effects (#186) is a nice-to-have enhancement
909909-910910-**debugging resources:**
911911-- Logfire telemetry: logfire-us.pydantic.dev/zzstoatzz/relay
912912-- recent work documented in: sandbox/double-loading-analysis.md
913913-- relevant code: frontend/src/lib/components/Player.svelte, frontend/src/lib/queue.svelte.ts
914914-915915-**useful commands:**
916916-- `just serve` - run backend
917917-- `just dev` - run frontend
918918-- `just test` - run test suite
919919-- `date` - get current time (don't assume, always check)
920920-- `git log --oneline -20` - see recent work
921921-- `gh issue list` - check open issues
922922-923923-## admin tooling
924924-925925-### content moderation
926926-script: `scripts/delete_track.py`
927927-- requires `ADMIN_*` prefixed environment variables
928928-- deletes audio file from R2
929929-- deletes cover image from R2 (if exists)
930930-- deletes database record (cascades to likes and queue entries)
931931-- notes ATProto records for manual cleanup (can't delete from other users' PDS)
932932-933933-usage:
934934-```bash
935935-# dry run
936936-uv run scripts/delete_track.py <track_id> --dry-run
937937-938938-# delete with confirmation
939939-uv run scripts/delete_track.py <track_id>
940940-941941-# delete without confirmation
942942-uv run scripts/delete_track.py <track_id> --yes
943943-944944-# by URL
945945-uv run scripts/delete_track.py --url https://plyr.fm/track/34
946946-```
947947-948948-required environment variables:
949949-- `ADMIN_DATABASE_URL` - production database connection
950950-- `ADMIN_AWS_ACCESS_KEY_ID` - R2 access key
951951-- `ADMIN_AWS_SECRET_ACCESS_KEY` - R2 secret
952952-- `ADMIN_R2_ENDPOINT_URL` - R2 endpoint
953953-- `ADMIN_R2_BUCKET` - R2 bucket name
954954-955955-## known issues
956956-957957-### non-blocking
958958-- cloudflare pages preview URLs return 404 (production works fine)
959959-- some "relay" references remain in docs and comments
960960-- ATProto like records can't be deleted when removing tracks (orphaned on users' PDS)
961961-962962-## for new contributors
963963-964964-### getting started
965965-1. clone: `gh repo clone zzstoatzz/plyr.fm`
966966-2. install dependencies: `uv sync && cd frontend && bun install`
967967-3. run backend: `uv run uvicorn backend.main:app --reload`
968968-4. run frontend: `cd frontend && bun run dev`
969969-5. visit http://localhost:5173
970970-971971-### development workflow
972972-1. create issue on github
973973-2. create PR from feature branch
974974-3. ensure pre-commit hooks pass
975975-4. test locally
976976-5. merge to main → deploys to staging automatically
977977-6. verify on staging
978978-7. create github release → deploys to production automatically
979979-980980-### key principles
981981-- type hints everywhere
982982-- lowercase aesthetic
983983-- generic terminology (use "items" not "tracks" where appropriate)
984984-- ATProto first
985985-- mobile matters
986986-- cost conscious
987987-- async everywhere (no blocking I/O)
988988-989989-### project structure
990990-```
991991-plyr.fm/
992992-├── src/backend/ # fastapi backend
993993-│ ├── api/ # public HTTP endpoints
994994-│ ├── _internal/ # internal services
995995-│ ├── atproto/ # ATProto integration
996996-│ ├── models/ # sqlalchemy schemas
997997-│ ├── storage/ # R2 and filesystem backends
998998-│ └── utilities/ # helpers and config
999999-├── frontend/ # sveltekit app
10001000-│ ├── src/lib/ # components and stores
10011001-│ └── src/routes/ # pages
10021002-├── tests/ # pytest suite
10031003-├── alembic/ # database migrations
10041004-├── docs/ # deployment guides
10051005-├── .github/workflows/ # CI/CD pipelines
10061006-└── CLAUDE.md # project instructions
10071007-```
10081008-10091009-## documentation
10101010-10111011-- [deployment overview](docs/deployment/overview.md)
10121012-- [configuration guide](docs/configuration.md)
10131013-- [queue design](docs/queue-design.md)
10141014-- [logfire querying](docs/logfire-querying.md)
10151015-- [pdsx guide](docs/pdsx-guide.md)
10161016-- [neon mcp guide](docs/neon-mcp-guide.md)
10171017-10181018-## performance optimization session (Nov 12, 2025)
10191019-10201020-### issue: slow /tracks/liked endpoint
10211021-10221022-**symptoms**:
10231023-- `/tracks/liked` taking 600-900ms consistently
10241024-- only ~25ms spent in database queries
10251025-- mysterious 575ms gap with no spans in Logfire traces
10261026-- endpoint felt sluggish compared to other pages
10271027-10281028-**investigation**:
10291029-- examined Logfire traces for `/tracks/liked` requests
10301030-- found 5-6 liked tracks being returned per request
10311031-- DB queries completing fast (track data, artist info, like counts all under 10ms each)
10321032-- noticed R2 storage calls weren't appearing in traces despite taking majority of request time
10331033-10341034-**root cause**:
10351035-- PR #184 added `image_url` column to tracks table to eliminate N+1 R2 API calls
10361036-- new tracks (uploaded after PR) have `image_url` populated at upload time ✅
10371037-- legacy tracks (15 tracks uploaded before PR) had `image_url = NULL` ❌
10381038-- fallback code called `track.get_image_url()` for NULL values
10391039-- `get_image_url()` makes uninstrumented R2 `head_object` API calls to find image extensions
10401040-- each track with NULL `image_url` = ~100-120ms of R2 API calls per request
10411041-- 5 tracks × 120ms = ~600ms of uninstrumented latency
10421042-10431043-**why R2 calls weren't visible**:
10441044-- `storage.get_url()` method had no Logfire instrumentation
10451045-- R2 API calls happening but not creating spans
10461046-- appeared as mysterious gap in trace timeline
10471047-10481048-**solution implemented**:
10491049-1. created `scripts/backfill_image_urls.py` to populate missing `image_url` values
10501050-2. ran script against production database with production R2 credentials
10511051-3. backfilled 11 tracks successfully (4 already done in previous partial run)
10521052-4. 3 tracks "failed" but actually have non-existent images (optional, expected)
10531053-5. script uses concurrent `asyncio.gather()` for performance
10541054-10551055-**key learning: environment configuration matters**:
10561056-- initial script runs failed silently because:
10571057- - script used local `.env` credentials (dev R2 bucket)
10581058- - production images stored in different R2 bucket (`images-prod`)
10591059- - `get_url()` returned `None` when images not found in dev bucket
10601060-- fix: passed production R2 credentials via environment variables:
10611061- - `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`
10621062- - `R2_IMAGE_BUCKET=images-prod`
10631063- - `R2_PUBLIC_IMAGE_BUCKET_URL=https://pub-7ea7ea9a6f224f4f8c0321a2bb008c5a.r2.dev`
10641064-10651065-**results**:
10661066-- before: 15 tracks needed backfill, causing ~600-900ms latency on `/tracks/liked`
10671067-- after: 13 tracks populated with `image_url`, 3 legitimately have no images
10681068-- `/tracks/liked` now loads with 0 R2 API calls instead of 5-11
10691069-- endpoint feels "really, really snappy" (user feedback)
10701070-- performance improvement visible immediately after backfill
10711071-10721072-**database cleanup: queue_state table bloat**:
10731073-- discovered `queue_state` had 265% bloat (53 dead rows, 20 live rows)
10741074-- ran `VACUUM (FULL, ANALYZE) queue_state` against production
10751075-- result: 0 dead rows, table clean
10761076-- configured autovacuum for queue_state to prevent future bloat:
10771077- - frequent updates to this table make it prone to bloat
10781078- - should tune `autovacuum_vacuum_scale_factor` to 0.05 (5% vs default 20%)
10791079-10801080-**endpoint performance snapshot** (post-fix, last 10 minutes):
10811081-- `GET /tracks/`: 410ms (down from 2+ seconds)
10821082-- `GET /queue/`: 399ms (down from 2+ seconds)
10831083-- `GET /tracks/liked`: now sub-200ms (down from 600-900ms)
10841084-- `GET /preferences/`: 200ms median
10851085-- `GET /auth/me`: 114ms median
10861086-- `POST /tracks/{track_id}/play`: 34ms
10871087-10881088-**PR #184 context**:
10891089-- PR claimed "opportunistic backfill: legacy records update on first access"
10901090-- but actual implementation never saved computed `image_url` back to database
10911091-- fallback code only computed URLs on-demand, didn't persist them
10921092-- this is why repeated visits kept hitting R2 API for same tracks
10931093-- one-time backfill script was correct solution vs adding write logic to read endpoints
10941094-10951095-**graceful ATProto recovery (PR #180)**:
10961096-- reviewed recent work on handling tracks with missing `atproto_record_uri`
10971097-- 4 tracks in production have NULL ATProto records (expected from upload failures)
10981098-- system already handles this gracefully:
10991099- - like buttons disabled with helpful tooltips
11001100- - track owners can self-service restore via portal
11011101- - `restore-record` endpoint recreates with correct TID timestamps
11021102-- no action needed - existing recovery system working as designed
11031103-11041104-**performance metrics pre/post all recent PRs**:
11051105-- PR #184 (image_url storage): eliminated hundreds of R2 API calls per request
11061106-- today's backfill: eliminated remaining R2 calls for legacy tracks
11071107-- combined impact: queue/tracks endpoints now 5-10x faster than before PR #184
11081108-- all endpoints now consistently sub-second response times
11091109-11101110-**documentation created**:
11111111-- `docs/neon-mcp-guide.md`: comprehensive guide for using Neon MCP
11121112- - project/branch management
11131113- - database schema inspection
11141114- - SQL query patterns for plyr.fm
11151115- - connection string generation
11161116- - environment mapping (dev/staging/prod)
11171117- - debugging workflows
11181118-- `scripts/backfill_image_urls.py`: reusable for any future image_url gaps
11191119- - dry-run mode for safety
11201120- - concurrent R2 API calls
11211121- - detailed error logging
11221122- - production-tested
11231123-11241124-**tools and patterns established**:
11251125-- Neon MCP for database inspection and queries
11261126-- Logfire arbitrary queries for performance analysis
11271127-- production secret management via Fly.io
11281128-- `flyctl ssh console` for environment inspection
11291129-- backfill scripts with dry-run mode
11301130-- environment variable overrides for production operations
11311131-11321132-**system health indicators**:
11331133-- ✅ no 5xx errors in recent spans
11341134-- ✅ database queries all under 70ms p95
11351135-- ✅ SSL connection pool issues resolved (no errors in recent traces)
11361136-- ✅ queue_state table bloat eliminated
11371137-- ✅ all track images either in DB or legitimately NULL
11381138-- ✅ application feels fast and responsive
11391139-11401140-**next steps**:
11411141-1. configure autovacuum for `queue_state` table (prevent future bloat)
11421142-2. add Logfire instrumentation to `storage.get_url()` for visibility
11431143-3. monitor `/tracks/liked` performance over next few days
11441144-4. consider adding similar backfill pattern for any future column additions
11451145-11461146----
11471147-11481148-this is a living document. last updated 2025-11-18 after ATProto namespace cleanup.