transcribe: swap speaker embedder from resemblyzer to WeSpeaker ONNX
Replace resemblyzer's torch-backed VoiceEncoder with a vendored WeSpeaker
ResNet34 ONNX model (256-d embeddings), loaded via ONNX Runtime with CoreML
on Darwin and CPU elsewhere. kaldi-native-fbank provides the 80-bin Kaldi
fbank features; the model metadata's normalize_samples=0 setting requires
scaling the [-1, 1] waveform by 32768 before feature extraction. Every new
segment NPZ now carries an `encoder="wespeaker-resnet34-256"` provenance
field alongside the unchanged `embeddings` and `statement_ids` arrays.
Drops the torch + CUDA + librosa + numba + llvmlite stack:
.venv 5.7G → 1.0G (-4.7G)
The 26 MB .onnx model is committed directly (no LFS configured). Its
sha256 is verified at install time by the .installed Makefile target,
using a single constant in observe/transcribe/main.py as the source of
truth. Bundle source: sherpa-onnx `speaker-recongition-models` release,
Apache-2.0 licensed; upstream metadata confirms the wespeaker/voxceleb
provenance.
sympy remains in the dep tree as a transitive of onnxruntime (pulled in
both directly and via faster-whisper → onnxruntime); it's no longer
reachable via the torch stack.
This is a binary file and will not be displayed.