audio streaming app plyr.fm
38
fork

Configure Feed

Select the types of activity you want to include in your feed.

fix: use dominant match detection for copyright flagging (#748)

AudD doesn't return confidence scores - the `score` field was always
empty and defaulting to 0. This broke copyright detection after #703
reintroduced score-based thresholding.

New approach: count how many times each unique (artist, title) appears
across matched segments. If one song dominates (>= 30% of matches),
flag it. This filters out false positives where random segments match
different songs due to common chord progressions or drum patterns.

For track 594 (MARINA - BUTTERFLY):
- 18/53 matches (33%) were MARINA - BUTTERFLY
- Next highest was 11/53 (20%) for Eugenio Tokarev
- Would now correctly flag at 30% threshold

Changes:
- Add find_dominant_match() to count matches per unique song
- Flag based on dominant_match_pct >= threshold (default 30%)
- Add dominant_match and dominant_match_pct to scan response
- highest_score now always 0 (legacy field)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

authored by

nate nowack
Claude Opus 4.5
and committed by
GitHub
63fd1d3a e2ff7b85

+54 -7
+50 -4
moderation/src/audd.rs
··· 1 1 //! AuDD audio fingerprinting integration. 2 2 3 + use std::collections::HashMap; 4 + 3 5 use axum::{extract::State, Json}; 4 6 use serde::{Deserialize, Serialize}; 5 7 use tracing::info; ··· 17 19 pub struct ScanResponse { 18 20 pub matches: Vec<AuddMatch>, 19 21 pub is_flagged: bool, 22 + /// Percentage of matched segments belonging to the dominant song (0-100) 23 + pub dominant_match_pct: i32, 24 + /// The dominant song if one exists (artist - title) 25 + #[serde(skip_serializing_if = "Option::is_none")] 26 + pub dominant_match: Option<String>, 27 + /// Legacy field - always 0 since AudD doesn't return scores 20 28 pub highest_score: i32, 21 29 pub raw_response: serde_json::Value, 22 30 } ··· 108 116 } 109 117 110 118 let matches = extract_matches(&audd_response); 111 - let highest_score = matches.iter().map(|m| m.score).max().unwrap_or(0); 112 - let is_flagged = highest_score >= state.copyright_score_threshold; 119 + let (dominant_match, dominant_match_pct) = find_dominant_match(&matches); 120 + 121 + // Flag if any single song dominates the matches (>= threshold % of segments) 122 + // This filters out false positives where random segments match different songs 123 + let is_flagged = dominant_match_pct >= state.copyright_score_threshold; 113 124 114 125 info!( 115 126 match_count = matches.len(), 116 - highest_score, is_flagged, "scan complete" 127 + dominant_match_pct, 128 + dominant_match = dominant_match.as_deref().unwrap_or("none"), 129 + is_flagged, 130 + "scan complete" 117 131 ); 118 132 119 133 Ok(Json(ScanResponse { 120 134 matches, 121 135 is_flagged, 122 - highest_score, 136 + dominant_match_pct, 137 + dominant_match, 138 + highest_score: 0, // AudD doesn't return scores 123 139 raw_response, 124 140 })) 125 141 } ··· 186 202 _ => None, 187 203 } 188 204 } 205 + 206 + /// Find the dominant song in matches (the one that appears most frequently). 207 + /// Returns (dominant_song_name, percentage_of_total_matches). 208 + /// 209 + /// AudD doesn't return confidence scores, so we use match frequency as a proxy: 210 + /// if the same song matches across many segments of the track, it's likely real. 211 + /// Random false positives tend to be scattered across different songs. 212 + fn find_dominant_match(matches: &[AuddMatch]) -> (Option<String>, i32) { 213 + if matches.is_empty() { 214 + return (None, 0); 215 + } 216 + 217 + // Count matches per unique song (artist + title) 218 + let mut song_counts: HashMap<(String, String), usize> = HashMap::new(); 219 + for m in matches { 220 + let key = (m.artist.to_lowercase(), m.title.to_lowercase()); 221 + *song_counts.entry(key).or_insert(0) += 1; 222 + } 223 + 224 + // Find the song with the most matches 225 + let (dominant_key, dominant_count) = song_counts 226 + .into_iter() 227 + .max_by_key(|(_, count)| *count) 228 + .unwrap(); // Safe: matches is non-empty 229 + 230 + let pct = (dominant_count * 100 / matches.len()) as i32; 231 + let dominant_name = format!("{} - {}", dominant_key.0, dominant_key.1); 232 + 233 + (Some(dominant_name), pct) 234 + }
+3 -2
moderation/src/config.rs
··· 17 17 pub claude_api_key: Option<String>, 18 18 /// Claude model to use (default: claude-sonnet-4-5-20250929) 19 19 pub claude_model: String, 20 - /// Minimum AuDD score to flag as potential copyright violation (default: 85) 20 + /// Minimum percentage of matches that must belong to a single song to flag (default: 30) 21 + /// AudD doesn't return confidence scores, so we use match frequency as a proxy. 21 22 pub copyright_score_threshold: i32, 22 23 } 23 24 ··· 44 45 copyright_score_threshold: env::var("MODERATION_COPYRIGHT_SCORE_THRESHOLD") 45 46 .ok() 46 47 .and_then(|v| v.parse().ok()) 47 - .unwrap_or(85), 48 + .unwrap_or(30), 48 49 }) 49 50 } 50 51
+1 -1
moderation/src/state.rs
··· 24 24 pub label_tx: Option<broadcast::Sender<(i64, Label)>>, 25 25 /// Claude client for image moderation (if configured) 26 26 pub claude: Option<Arc<ClaudeClient>>, 27 - /// Minimum AuDD score to flag as potential copyright violation 27 + /// Minimum percentage of matches that must belong to a single song to flag 28 28 pub copyright_score_threshold: i32, 29 29 } 30 30