observe/transcribe: restore CMN and snip_edges=True in wespeaker fbank front-end
Phase 2 calibration measured 16.39% EER on VoxCeleb1-O (22.7x worse than
the published 0.723%) with snip_edges=False and no CMN. Restore the
WeSpeaker training-time convention: snip_edges=True framing plus
per-utterance cepstral mean normalization, matching the POC reference
at scratch/wespeaker-poc/wespeaker_encoder.py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>