audio streaming app plyr.fm
38
fork

Configure Feed

Select the types of activity you want to include in your feed.

fix(copyright): suppress self-match flags before they reach UI/DM (#1341)

flo.by uploaded his catalog; AuDD identified each track's dominant
match as "Floby IV" (his stage name). every scan returned
is_flagged=true, which:
- showed a red "potential copyright violation" badge to the
artist on his own /portal page
- fired an admin DM ("copyright flag on plyr.fm / primary: X
by Floby IV") — admin received ~30 DMs in one session

`sync_copyright_resolutions` flipped is_flagged=false within 5min,
but only after the artist had already seen the flag and the DM
spam had landed.

fix: in `_store_scan_result`, look up the uploader's artist record
when is_flagged=true and compare slugified forms of the dominant
match artist to the uploader's handle and display name. on a
self-match, demote is_flagged to false at write time so the UI
flag and the DM never fire. logs `copyright self-match suppressed`
for observability.

separate semantic bug (sync flipping flags whose URI was never
labelled, not just negated) is unaddressed here — this is the
short-term fix to stop creator-visible flags + DM spam.

Co-authored-by: Claude Opus 4 (1M context) <noreply@anthropic.com>

authored by

nate nowack
Claude Opus 4 (1M context)
and committed by
GitHub
c213426b f7946389

+323 -13
+91 -13
backend/src/backend/_internal/moderation.py
··· 1 1 """moderation service integration for copyright scanning.""" 2 2 3 3 import logging 4 + from collections import Counter 4 5 from typing import Any 5 6 6 7 import logfire ··· 15 16 16 17 logger = logging.getLogger(__name__) 17 18 19 + _SELF_MATCH_MIN_SLUG_LEN = 4 20 + 21 + 22 + def _slugify_artist(name: str) -> str: 23 + """lowercase, alphanumeric-only — for fuzzy artist-name comparison.""" 24 + return "".join(c for c in name.lower() if c.isalnum()) 25 + 26 + 27 + def _is_self_match( 28 + match_artist: str, uploader_handle: str, uploader_display: str 29 + ) -> bool: 30 + """detect when a copyright match's artist is the uploader themselves. 31 + 32 + AuDD frequently identifies an artist's own catalog uploads as 33 + "violations" of their own published works elsewhere (e.g. dominant 34 + match "Floby IV" on a track uploaded by handle "flo.by"). this is 35 + a false positive — flagging it spams admin DMs and shows a red 36 + badge to the artist on their own portal. 37 + 38 + we compare slugified forms (lowercase, alphanumeric only) of the 39 + match artist against the uploader's handle and display name. a 40 + bidirectional substring check catches stage-name variants in 41 + either direction (e.g. "flo.by" → "floby" is contained in 42 + "Floby IV" → "flobyiv"). minimum length avoids accidental 43 + matches on very short slugs. 44 + """ 45 + m = _slugify_artist(match_artist) 46 + if len(m) < _SELF_MATCH_MIN_SLUG_LEN: 47 + return False 48 + for candidate in (uploader_handle, uploader_display): 49 + if not candidate: 50 + continue 51 + c = _slugify_artist(candidate) 52 + if len(c) >= _SELF_MATCH_MIN_SLUG_LEN and (c in m or m in c): 53 + return True 54 + return False 55 + 56 + 57 + def _dominant_match_artist(matches: list[dict[str, Any]]) -> str | None: 58 + """return the most frequent artist in scan matches, or None if empty.""" 59 + counts = Counter( 60 + (m.get("artist") or "").strip() for m in matches if m.get("artist") 61 + ) 62 + if not counts: 63 + return None 64 + artist, _ = counts.most_common(1)[0] 65 + return artist or None 66 + 18 67 19 68 async def scan_track_for_copyright(track_id: int, audio_url: str) -> None: 20 69 """scan a track for potential copyright matches. ··· 63 112 result: ScanResult from moderation client 64 113 """ 65 114 async with db_session() as db: 115 + # decide effective is_flagged BEFORE the row is written so the 116 + # transient flag never reaches the UI / DM path. self-matches 117 + # (uploader is the dominant match artist) get demoted to clear. 118 + is_flagged = result.is_flagged 119 + suppressed_self_match: str | None = None 120 + 121 + if is_flagged: 122 + track = await db.scalar( 123 + select(Track) 124 + .options(joinedload(Track.artist)) 125 + .where(Track.id == track_id) 126 + ) 127 + dominant = _dominant_match_artist(result.matches) 128 + if ( 129 + track 130 + and track.artist 131 + and dominant 132 + and _is_self_match( 133 + dominant, track.artist.handle, track.artist.display_name or "" 134 + ) 135 + ): 136 + is_flagged = False 137 + suppressed_self_match = dominant 138 + else: 139 + track = None 140 + 66 141 scan = CopyrightScan( 67 142 track_id=track_id, 68 - is_flagged=result.is_flagged, 143 + is_flagged=is_flagged, 69 144 highest_score=result.highest_score, 70 145 matches=result.matches, 71 146 raw_response=result.raw_response, ··· 73 148 db.add(scan) 74 149 await db.commit() 75 150 151 + if suppressed_self_match: 152 + logfire.info( 153 + "copyright self-match suppressed", 154 + track_id=track_id, 155 + dominant_artist=suppressed_self_match, 156 + uploader_handle=track.artist.handle if track and track.artist else None, 157 + ) 158 + return 159 + 76 160 logfire.info( 77 161 "copyright scan stored", 78 162 track_id=track_id, ··· 82 166 ) 83 167 84 168 # notify admin only — never DM the artist 85 - if result.is_flagged: 86 - track = await db.scalar( 87 - select(Track) 88 - .options(joinedload(Track.artist)) 89 - .where(Track.id == track_id) 169 + if is_flagged and track and track.artist: 170 + await notification_service.send_copyright_flag_notification( 171 + track_id=track_id, 172 + track_title=track.title, 173 + artist_handle=track.artist.handle, 174 + matches=scan.matches, 90 175 ) 91 - if track and track.artist: 92 - await notification_service.send_copyright_flag_notification( 93 - track_id=track_id, 94 - track_title=track.title, 95 - artist_handle=track.artist.handle, 96 - matches=scan.matches, 97 - ) 98 176 99 177 100 178 async def _store_scan_error(track_id: int, error: str) -> None: