docs: fix stale references and update README to reflect full feature set (#888)

+34 -21

README.md

··· 23 23 - **styling**: vanilla CSS (lowercase aesthetic) 24 24 25 25 ### services 26 - - **moderation**: Rust ATProto labeler for copyright/sensitive content 27 - - **transcoder**: Rust audio conversion service (ffmpeg) 26 + - **transcoder**: Rust audio conversion service (ffmpeg, Fly.io) 27 + - **moderation**: Rust ATProto labeler for copyright/sensitive content (Fly.io) 28 + - **mood search**: [CLAP](https://github.com/LAION-AI/CLAP) audio embeddings ([Modal](https://modal.com)) 29 + - **genre classification**: [effnet-discogs](https://replicate.com/) ML tagging ([Replicate](https://replicate.com)) 30 + - **vector search**: [turbopuffer](https://turbopuffer.com) for semantic audio queries 28 31 29 32 </details> 30 33 ··· 81 84 ### listening 82 85 - audio playback with persistent queue across tabs 83 86 - like tracks, add to playlists 84 - - browse artist profiles and discographies 85 - - share tracks, albums, and playlists with link previews 86 - - unified search with Cmd/Ctrl+K 87 - - teal.fm scrobbling 87 + - browse by artist, album, tag, or playlist 88 + - share tracks and albums with embeddable players and link previews 89 + - mood search - describe a vibe, get matching tracks (CLAP embeddings) 90 + - unified search with Cmd/Ctrl+K (fuzzy match across tracks, artists, albums, tags, playlists) 91 + - genre browsing and tag filtering 92 + - platform media controls (Media Session API) 93 + - teal.fm scrobbling and now-playing reporting 88 94 89 95 ### creating 90 - - OAuth authentication via ATProto (bluesky accounts) 96 + - OAuth authentication via ATProto (bluesky accounts), multi-account support 91 97 - upload tracks with title, artwork, tags, and featured artists 92 - - organize tracks into albums and playlists 93 - - drag-and-drop reordering 98 + - lossless audio support (AIFF/FLAC) with automatic MP3 transcoding for universal playback 99 + - auto-tagging via ML genre classification 100 + - organize tracks into albums and playlists with drag-and-drop reordering 94 101 - timed comments with clickable timestamps 95 - - artist support links (ko-fi, patreon, etc.) 102 + - artist support links and supporter-gated content 103 + - copyright scanning via audio fingerprinting 104 + - content reporting and automated sensitive content filtering 96 105 97 106 ### data ownership 98 107 - tracks, likes, playlists synced to your PDS as ATProto records 108 + - bulk media export (download all your tracks) 99 109 - portable identity - your data travels with you 100 110 - public by default - any client can read your music records 111 + 112 + > some features may be paywalled in the future for the financial viability of the project. if you have thoughts on what should or shouldn't be gated, open a [discussion on GitHub](https://github.com/zzstoatzz/plyr.fm/discussions) or [tangled](https://tangled.sh/@zzstoatzz.io/plyr.fm). 101 113 102 114 </details> 103 115 ··· 106 118 107 119 ``` 108 120 plyr.fm/ 109 - ├── backend/ # FastAPI app 121 + ├── backend/ # FastAPI app & Python tooling 110 122 │ ├── src/backend/ # application code 111 - │ │ ├── api/ # public endpoints 112 - │ │ ├── _internal/ # services (auth, atproto, background tasks) 113 - │ │ ├── models/ # database schemas 114 - │ │ └── storage/ # R2 adapter 115 123 │ ├── tests/ # pytest suite 116 - │ └── alembic/ # migrations 124 + │ └── alembic/ # database migrations 117 125 ├── frontend/ # SvelteKit app 118 126 │ ├── src/lib/ # components & state 119 127 │ └── src/routes/ # pages 120 - ├── moderation/ # Rust labeler service 121 - ├── transcoder/ # Rust audio service 122 - ├── redis/ # self-hosted Redis config 128 + ├── services/ 129 + │ ├── transcoder/ # Rust audio transcoding (Fly.io) 130 + │ ├── moderation/ # Rust content moderation (Fly.io) 131 + │ └── clap/ # ML embeddings (Python, Modal) 132 + ├── infrastructure/ 133 + │ └── redis/ # self-hosted Redis (Fly.io) 123 134 ├── docs/ # documentation 124 135 └── justfile # task runner 125 136 ``` ··· 129 140 <details> 130 141 <summary>costs</summary> 131 142 132 - ~$20/month: 133 - - fly.io (backend + redis + moderation): ~$14/month 143 + ~$25/month: 144 + - fly.io (backend + transcoder + redis + moderation): ~$14/month 134 145 - neon postgres: $5/month 135 146 - cloudflare (pages + r2): ~$1/month 136 147 - audd audio fingerprinting: $5-10/month (usage-based) 148 + - modal (CLAP embeddings): free tier / scales to zero 149 + - replicate (genre classification): <$1/month 137 150 138 151 live dashboard: https://plyr.fm/costs 139 152

+4 -1

docs/README.md

··· 15 15 - **[feature-flags.md](./backend/feature-flags.md)** - per-user feature rollout system 16 16 - **[streaming-uploads.md](./backend/streaming-uploads.md)** - SSE progress tracking 17 17 - **[transcoder.md](./backend/transcoder.md)** - rust audio conversion service (lossless support) 18 - - **[vibe-search.md](./backend/vibe-search.md)** - semantic search with CLAP embeddings (Modal + turbopuffer) 18 + - **[mood-search.md](./backend/mood-search.md)** - semantic search with CLAP embeddings (Modal + turbopuffer) 19 19 - **[genre-classification.md](./backend/genre-classification.md)** - ML genre tagging via effnet-discogs (Replicate) 20 20 21 21 ### frontend ··· 32 32 - **[logfire.md](./tools/logfire.md)** - SQL query patterns for observability 33 33 - **[neon.md](./tools/neon.md)** - postgres database management 34 34 - **[pdsx.md](./tools/pdsx.md)** - ATProto PDS explorer 35 + - **[plyrfm.md](./tools/plyrfm.md)** - Python SDK and MCP server 36 + - **[tap.md](./tools/tap.md)** - ATProto sync utility for backfilling custom lexicons 37 + - **[status-maintenance.md](./tools/status-maintenance.md)** - automated status podcasts 35 38 36 39 ### atproto 37 40 - **[lexicons/](./lexicons/)** - record schemas (track, like, comment, list, profile)

+20 -10

docs/backend/background-tasks.md

··· 44 44 45 45 ### ⚠️ worker settings - do not modify 46 46 47 - the worker is initialized in `backend/_internal/background.py` with pydocket's defaults. **do not change these settings without extensive testing:** 47 + the worker is initialized in `backend/_internal/background.py` using tasks registered via `backend/_internal/tasks/__init__.py`. **do not change these settings without extensive testing:** 48 48 49 49 | setting | default | why it matters | 50 50 |---------|---------|----------------| ··· 94 94 ### scheduling a task 95 95 96 96 ```python 97 - from backend._internal.background_tasks import schedule_copyright_scan, schedule_export 97 + from backend._internal.tasks import schedule_copyright_scan 98 + from backend._internal.tasks import schedule_genre_classification, schedule_embedding_generation 98 99 99 100 # automatically uses docket if enabled, else asyncio.create_task 100 101 await schedule_copyright_scan(track_id, audio_url) 101 - await schedule_export(export_id, artist_did) 102 + await schedule_genre_classification(track_id) 103 + await schedule_embedding_generation(track_id, audio_url) 102 104 ``` 103 105 104 106 ### adding new tasks 105 107 106 - 1. define the task function in `backend/_internal/background_tasks.py`: 108 + 1. define the task function in a module under `backend/_internal/tasks/` (organized by domain): 109 + - `tasks/copyright.py` - copyright scanning and resolution sync 110 + - `tasks/ml.py` - genre classification, CLAP embeddings 111 + - `tasks/pds.py` - PDS record sync (likes, comments) 112 + - `tasks/storage.py` - R2 file operations 113 + - `tasks/sync.py` - ATProto sync, scrobbling, album lists 114 + 107 115 ```python 116 + # e.g., backend/_internal/tasks/ml.py 108 117 async def my_new_task(arg1: str, arg2: int) -> None: 109 118 """task functions must be async and JSON-serializable args only.""" 110 119 # do work here 111 120 pass 112 121 ``` 113 122 114 - 2. register it in `backend/_internal/background.py`: 123 + 2. export it from `backend/_internal/tasks/__init__.py` and add it to the `background_tasks` list: 115 124 ```python 116 - def _register_tasks(docket: Docket) -> None: 117 - from backend._internal.background_tasks import my_new_task, scan_copyright 125 + from backend._internal.tasks.ml import my_new_task 118 126 119 - docket.register(scan_copyright) 120 - docket.register(my_new_task) # add here 127 + background_tasks = [ 128 + # ... existing tasks 129 + my_new_task, 130 + ] 121 131 ``` 122 132 123 - 3. create a scheduler helper if needed: 133 + 3. create a scheduler helper in the same module: 124 134 ```python 125 135 async def schedule_my_task(arg1: str, arg2: int) -> None: 126 136 """schedule with docket if enabled, else asyncio."""

+4 -4

docs/backend/database/connection-pooling.md

··· 43 43 # additional connections to create on demand when pool is exhausted (default: 5) 44 44 DATABASE_MAX_OVERFLOW=5 45 45 46 - # how long before recycling a connection, in seconds (default: 7200 = 2 hours) 47 - DATABASE_POOL_RECYCLE=7200 46 + # how long before recycling a connection, in seconds (default: 1800 = 30 minutes) 47 + DATABASE_POOL_RECYCLE=1800 48 48 49 49 # verify connection health before using from pool (default: true) 50 50 DATABASE_POOL_PRE_PING=true ··· 67 67 **pool_recycle:** 68 68 - prevents stale connections from lingering 69 69 - should be less than your database's connection timeout 70 - - 2 hours is a safe default for most PostgreSQL configurations 70 + - 30 minutes is a safe default for most PostgreSQL configurations 71 71 72 72 **pool_pre_ping:** 73 73 - adds small overhead (SELECT 1) before each connection use ··· 148 148 DATABASE_CONNECTION_TIMEOUT=10.0 149 149 150 150 # standard recycle 151 - DATABASE_POOL_RECYCLE=7200 151 + DATABASE_POOL_RECYCLE=1800 152 152 ``` 153 153 154 154 this configuration:

+63 -1

docs/frontend/search.md

··· 62 62 63 63 **parameters**: 64 64 - `q` (required): search query, 2-100 characters 65 - - `type` (optional): filter by type(s), comma-separated: `tracks`, `artists`, `albums`, `tags` 65 + - `type` (optional): filter by type(s), comma-separated: `tracks`, `artists`, `albums`, `tags`, `playlists` 66 66 - `limit` (optional): max results per type, 1-50, default 20 67 67 68 68 **response**: ··· 131 131 - "bufo" matches "bufo" (1.0), "bufo mix" (0.6), "buffalo" (0.4) 132 132 - "zz" matches "zzstoatzz" (0.3), "jazz" (0.25) 133 133 134 + ## semantic search (mood search) 135 + 136 + in addition to keyword search, plyr.fm supports semantic search via CLAP audio embeddings. users describe a mood or vibe in natural language and get tracks ranked by audio similarity. 137 + 138 + **gated by feature flag**: `vibe-search` (per-user) 139 + 140 + ### how it works 141 + 142 + 1. user types a text description (e.g., "chill lo-fi beats") 143 + 2. frontend sends query to `GET /search/semantic?q=...` 144 + 3. backend embeds the text via CLAP model (Modal) 145 + 4. text embedding is compared against pre-computed audio embeddings in turbopuffer 146 + 5. matching tracks returned with similarity scores 147 + 148 + ### frontend behavior 149 + 150 + - **debounce**: 500ms (vs 150ms for keyword search) 151 + - **minimum query length**: 3 characters (vs 2 for keyword) 152 + - **max results**: 5 per query 153 + - results are deduplicated against keyword results 154 + - similarity scores are merged alongside relevance scores 155 + 156 + ### backend endpoint 157 + 158 + ``` 159 + GET /search/semantic?q=chill+vibes&limit=10 160 + ``` 161 + 162 + **parameters**: 163 + - `q` (required): text description, 3-200 characters 164 + - `limit` (optional): max results, 1-50, default 10 (capped at 5 internally) 165 + 166 + **response**: 167 + 168 + ```json 169 + { 170 + "results": [ 171 + { 172 + "type": "track", 173 + "id": 456, 174 + "title": "midnight drift", 175 + "artist_handle": "artist.bsky.social", 176 + "artist_display_name": "artist name", 177 + "image_url": "https://...", 178 + "similarity": 0.7234 179 + } 180 + ], 181 + "query": "chill vibes", 182 + "available": true 183 + } 184 + ``` 185 + 186 + `available: false` indicates the embedding service is down - the frontend falls back to keyword-only search. 187 + 188 + see [mood-search.md](../backend/mood-search.md) for full backend architecture. 189 + 134 190 ## result types 135 191 136 192 ### tracks ··· 148 204 ### albums 149 205 150 206 - links to `/u/{artist_handle}/album/{slug}` 207 + - shows cover art if available 208 + - subtitle: "by {artist_display_name}" 209 + 210 + ### playlists 211 + 212 + - links to `/u/{artist_handle}/playlist/{slug}` 151 213 - shows cover art if available 152 214 - subtitle: "by {artist_display_name}" 153 215

+11 -11

docs/moderation/copyright-detection.md

··· 194 194 ### list all flagged tracks 195 195 196 196 ```sql 197 - SELECT t.id, t.title, a.handle, cf.confidence_score, cf.matched_tracks 198 - FROM copyright_flags cf 199 - JOIN tracks t ON t.id = cf.track_id 197 + SELECT t.id, t.title, a.handle, cs.highest_score, cs.matches 198 + FROM copyright_scans cs 199 + JOIN tracks t ON t.id = cs.track_id 200 200 JOIN artists a ON a.did = t.artist_did 201 - WHERE cf.status = 'flagged' 202 - ORDER BY cf.confidence_score DESC; 201 + WHERE cs.is_flagged = true 202 + ORDER BY cs.highest_score DESC; 203 203 ``` 204 204 205 205 ### scan statistics 206 206 207 207 ```sql 208 208 SELECT 209 - status, 209 + is_flagged, 210 210 COUNT(*) as count, 211 - AVG(confidence_score) as avg_score 212 - FROM copyright_flags 213 - GROUP BY status; 211 + AVG(highest_score) as avg_score 212 + FROM copyright_scans 213 + GROUP BY is_flagged; 214 214 ``` 215 215 216 216 ### tracks pending scan ··· 218 218 ```sql 219 219 SELECT t.id, t.title, t.created_at 220 220 FROM tracks t 221 - LEFT JOIN copyright_flags cf ON cf.track_id = t.id 222 - WHERE cf.id IS NULL OR cf.status = 'pending' 221 + LEFT JOIN copyright_scans cs ON cs.track_id = t.id 222 + WHERE cs.id IS NULL 223 223 ORDER BY t.created_at DESC; 224 224 ``` 225 225

Configure Feed

Configure Feed