observe/transcribe: restore CMN and snip_edges=True in wespeaker fbank front-end

personal memory agent

Phase 2 calibration measured 16.39% EER on VoxCeleb1-O (22.7x worse than
the published 0.723%) with snip_edges=False and no CMN. Restore the
WeSpeaker training-time convention: snip_edges=True framing plus
per-utterance cepstral mean normalization, matching the POC reference
at scratch/wespeaker-poc/wespeaker_encoder.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Jer Miller 2 months ago 810d274d 94b5aeed

+4 -2

1 changed file

expand all

observe

transcribe

main.py

+4 -2

observe/transcribe/main.py

··· 170 170 opts = knf.FbankOptions() 171 171 opts.frame_opts.samp_freq = float(sample_rate) 172 172 opts.frame_opts.dither = 0.0 173 - opts.frame_opts.snip_edges = False 173 + opts.frame_opts.snip_edges = True 174 174 opts.frame_opts.frame_length_ms = 25.0 175 175 opts.frame_opts.frame_shift_ms = 10.0 176 176 opts.mel_opts.num_bins = 80 ··· 186 186 if not frames: 187 187 return np.zeros((0, 80), dtype=np.float32) 188 188 189 - return np.stack(frames, axis=0).astype(np.float32) 189 + feats = np.stack(frames, axis=0).astype(np.float32) 190 + feats = feats - feats.mean(axis=0, keepdims=True) 191 + return feats 190 192 191 193 192 194 def _get_jsonl_path(audio_path: Path) -> Path:

Configure Feed

Configure Feed