personal memory agent
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

observe/transcribe: add parakeet backend (FluidAudio 0.14.0)

Opt-in local STT backend for Apple Silicon via a Swift helper built
with `make parakeet-helper`. Helper ships as a SwiftPM package pinned
to FluidAudio 0.14.0 and emits low-level token timings over JSON;
Python collapses subwords to the repo's word schema and reuses
build_statements_from_acoustic so statements are interchangeable
with whisper's.

Whisper remains the default; parakeet is wired into BACKEND_REGISTRY
and BACKEND_METADATA so selection works through CLI, API, and journal
config with no UI-side changes required (workspace fieldsets remain
whisper/revai/gemini-only and can be extended later).

Reference implementation: local spike at f808049 in
/Users/jer/tmp/parakeet-spike/ and the report at
extro:vpe/workspace/parakeet-spike-260423/report.md.

Co-Authored-By: OpenAI Codex <codex@openai.com>

+1234 -2
+22
INSTALL.md
··· 45 45 brew install git uv 46 46 ``` 47 47 48 + #### Parakeet backend (optional, macOS only) 49 + 50 + - Apple Silicon only. the helper is not supported on Intel macs or linux. 51 + - Xcode command line tools are required because the helper is a Swift package; if the `xcodebuild -version` check above fails, fix that first. 52 + - Build the helper from the repo root with `make parakeet-helper`. 53 + - Enable it by setting `journal/config/journal.json`: 54 + 55 + ```json 56 + { 57 + "transcribe": { 58 + "backend": "parakeet", 59 + "parakeet": { 60 + "model_version": "v3", 61 + "timeout_sec": 120.0 62 + } 63 + } 64 + } 65 + ``` 66 + 67 + - first run downloads roughly 461 MB of model data into `~/Library/Application Support/solstone/parakeet/models`. 68 + - helper contract details live in `observe/transcribe/parakeet_helper/README.md`. 69 + 48 70 ## install 49 71 50 72 ```bash
+10 -1
Makefile
··· 7 7 # all runs to one path and pytest wipes it on startup, destroying concurrent state. 8 8 export TMPDIR := /var/tmp 9 9 10 - .PHONY: install uninstall test test-apps test-app test-only test-integration test-integration-only test-all format format-check install-checks ci clean clean-install coverage watch versions update update-prices pre-commit skills dev all sandbox sandbox-stop install-pinchtab verify-browser update-browser-baselines review verify verify-api update-api-baselines install-service uninstall-service service-logs gate-agents-rename check-layer-hygiene doctor FORCE 10 + .PHONY: install uninstall test test-apps test-app test-only test-integration test-integration-only test-all format format-check install-checks ci clean clean-install coverage watch versions update update-prices pre-commit skills dev all sandbox sandbox-stop install-pinchtab parakeet-helper parakeet-helper-clean verify-browser update-browser-baselines review verify verify-api update-api-baselines install-service uninstall-service service-logs gate-agents-rename check-layer-hygiene doctor FORCE 11 11 12 12 # Default target - install package in editable mode 13 13 all: install ··· 244 244 echo "Installing pinchtab..."; \ 245 245 curl -fsSL https://pinchtab.com/install.sh | bash; \ 246 246 fi 247 + 248 + # Build the parakeet helper binary (macOS/arm64 only, requires Xcode CLT) 249 + parakeet-helper: 250 + cd observe/transcribe/parakeet_helper && swift build -c release 251 + @echo "built: $$(pwd)/observe/transcribe/parakeet_helper/.build/release/parakeet-helper" 252 + 253 + # Remove parakeet helper build artifacts 254 + parakeet-helper-clean: 255 + rm -rf observe/transcribe/parakeet_helper/.build observe/transcribe/parakeet_helper/.swiftpm observe/transcribe/parakeet_helper/Package.resolved 247 256 248 257 # Run browser scenarios against sandbox 249 258 verify-browser: .installed
+8
observe/transcribe/__init__.py
··· 16 16 - whisper: Local faster-whisper (default, GPU/CPU) 17 17 - revai: Rev.ai cloud API (speaker diarization) 18 18 - gemini: Google Gemini API (speaker diarization) 19 + - parakeet: Local Apple Silicon processing via helper 19 20 20 21 Backend Interface: 21 22 Each backend module must export a transcribe() function: ··· 63 64 "whisper": "observe.transcribe.whisper", 64 65 "revai": "observe.transcribe.revai", 65 66 "gemini": "observe.transcribe.gemini", 67 + "parakeet": "observe.transcribe.parakeet", 66 68 } 67 69 68 70 # --------------------------------------------------------------------------- ··· 90 92 "description": "Cloud-based transcription with speaker identification", 91 93 "env_key": "GOOGLE_API_KEY", 92 94 "settings": [], 95 + }, 96 + "parakeet": { 97 + "label": "Parakeet - Local processing (Apple Silicon, optional helper)", 98 + "description": "On-device speech recognition via FluidAudio + Parakeet TDT; requires `make parakeet-helper`", 99 + "env_key": None, 100 + "settings": ["model_version", "timeout_sec"], 93 101 }, 94 102 } 95 103
+13 -1
observe/transcribe/main.py
··· 16 16 - <stem>.npz: Sentence-level voice embeddings indexed by statement id 17 17 18 18 Configuration (journal config transcribe section): 19 - - transcribe.backend: STT backend ("whisper", "revai", "gemini"). Default: "whisper" 19 + - transcribe.backend: STT backend ("whisper", "revai", "gemini", "parakeet"). Default: "whisper" 20 20 - transcribe.enrich: Enable/disable LLM enrichment (default: true) 21 21 - transcribe.preserve_all: Keep audio files even when no speech detected (default: false) 22 22 - transcribe.min_speech_seconds: Minimum speech duration to proceed. Default: 1.0 ··· 36 36 Gemini backend settings (transcribe.gemini): 37 37 - No configuration needed (model resolved by think.models context system) 38 38 - Includes speaker diarization 39 + 40 + Parakeet backend settings (transcribe.parakeet): 41 + - model_version: Parakeet model version ("v2", "v3"). Default: "v3" 42 + - cache_dir: Optional helper cache directory 43 + - timeout_sec: Helper timeout in seconds. Default: 120.0 39 44 40 45 Platform optimizations (Whisper): 41 46 - CUDA GPU: Uses float16 for GPU-optimized inference ··· 789 794 # Pass entities to Rev.ai for custom vocabulary 790 795 if entity_names: 791 796 backend_config["entities"] = entity_names 797 + elif backend == "parakeet": 798 + parakeet_config = transcribe_config.get("parakeet", {}) 799 + backend_config = { 800 + k: v 801 + for k, v in parakeet_config.items() 802 + if k in ("model_version", "cache_dir", "timeout_sec") 803 + } 792 804 elif backend == "gemini": 793 805 # Gemini backend - model resolved by think.models based on context 794 806 # Entity names handled by enrich step, not passed to transcription
+374
observe/transcribe/parakeet.py
··· 1 + # SPDX-License-Identifier: AGPL-3.0-only 2 + # Copyright (c) 2026 sol pbc 3 + 4 + """Parakeet STT backend via a Swift helper. 5 + 6 + This backend shells out to `observe/transcribe/parakeet_helper/` for FluidAudio 7 + inference, then rebuilds repo-standard statements with 8 + `observe.transcribe.utils.build_statements_from_acoustic`. 9 + """ 10 + 11 + from __future__ import annotations 12 + 13 + import difflib 14 + import json 15 + import logging 16 + import os 17 + import string 18 + import subprocess 19 + import tempfile 20 + from pathlib import Path 21 + 22 + import numpy as np 23 + import soundfile as sf 24 + 25 + from observe.transcribe.utils import build_statements_from_acoustic 26 + 27 + _VERSION_CACHE: dict[str, dict] = {} 28 + _DEFAULT_MODEL_VERSION = "v3" 29 + _DEFAULT_TIMEOUT_SEC = 120.0 30 + _DEFAULT_CACHE_DIR = ( 31 + Path.home() / "Library/Application Support/solstone/parakeet/models" 32 + ) 33 + _VALID_MODEL_VERSIONS = frozenset({"v2", "v3"}) 34 + _PUNCTUATION_TOKENS = frozenset(string.punctuation) | frozenset( 35 + {"—", "…", "’", "‘", "“", "”"} 36 + ) 37 + _HELPER_ENV_KEY = "SOLSTONE_PARAKEET_HELPER" 38 + 39 + 40 + def transcribe( 41 + audio: np.ndarray, 42 + sample_rate: int, 43 + config: dict, 44 + ) -> list[dict]: 45 + """Transcribe audio by invoking the Parakeet helper and rebuilding statements.""" 46 + model_version, cache_dir, timeout_sec = _validate_config(config) 47 + helper_path = _resolve_helper_path() 48 + model_info = get_model_info(config) 49 + 50 + temp_path = None 51 + try: 52 + with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as handle: 53 + temp_path = Path(handle.name) 54 + 55 + audio_int16 = (np.clip(audio, -1.0, 1.0) * 32767).astype(np.int16) 56 + sf.write(temp_path, audio_int16, sample_rate, format="WAV", subtype="PCM_16") 57 + 58 + argv = [ 59 + str(helper_path), 60 + "--cache-dir", 61 + str(cache_dir), 62 + "--model", 63 + model_version, 64 + str(temp_path), 65 + ] 66 + 67 + try: 68 + result = subprocess.run( 69 + argv, 70 + check=False, 71 + capture_output=True, 72 + text=True, 73 + timeout=timeout_sec, 74 + ) 75 + except subprocess.TimeoutExpired as exc: 76 + raise RuntimeError( 77 + f"Parakeet helper timed out after {timeout_sec:.1f}s. Rebuild with " 78 + f"'make parakeet-helper', increase transcribe.parakeet.timeout_sec, " 79 + f"or set ${_HELPER_ENV_KEY} to a different helper build." 80 + ) from exc 81 + 82 + if result.returncode != 0: 83 + stderr = (result.stderr or "").strip() 84 + try: 85 + stderr_json = json.loads(stderr) if stderr else {} 86 + except json.JSONDecodeError: 87 + stderr_json = {} 88 + message = stderr_json.get("message") or stderr or "unknown helper failure" 89 + 90 + if result.returncode == 2: 91 + raise RuntimeError( 92 + f"Parakeet helper rejected validated arguments (internal bug — file an issue): {message}" 93 + ) 94 + if result.returncode == 3: 95 + raise RuntimeError( 96 + f"Parakeet helper could not prepare cache dir {cache_dir}: {message}" 97 + ) 98 + if result.returncode == 4: 99 + raise RuntimeError( 100 + f"Parakeet helper failed to download or load model '{model_version}'. " 101 + f"Valid values: v2, v3. {message}" 102 + ) 103 + if result.returncode == 5: 104 + raise RuntimeError( 105 + f"Parakeet helper failed to transcribe audio: {message}" 106 + ) 107 + raise RuntimeError( 108 + f"Parakeet helper failed with exit code {result.returncode}: {message}" 109 + ) 110 + 111 + try: 112 + payload = json.loads(result.stdout) 113 + except json.JSONDecodeError as exc: 114 + raise RuntimeError(f"Parakeet helper returned invalid JSON: {exc}") from exc 115 + if not isinstance(payload, dict): 116 + raise RuntimeError("Parakeet helper returned a non-object JSON payload") 117 + 118 + helper_transcript = str(payload.get("transcript", "")).strip() 119 + token_timings = payload.get("token_timings", []) 120 + if not token_timings: 121 + if helper_transcript: 122 + raise RuntimeError( 123 + "Parakeet helper returned transcript text without token timings " 124 + "(internal bug — file an issue)." 125 + ) 126 + return [] 127 + 128 + words = _collapse_subwords_to_words(token_timings) 129 + if not words: 130 + if helper_transcript: 131 + raise RuntimeError( 132 + "Parakeet helper returned token timings that collapsed to no words " 133 + "(internal bug — file an issue)." 134 + ) 135 + return [] 136 + 137 + acoustic_segments = [ 138 + { 139 + "id": 1, 140 + "start": words[0]["start"], 141 + "end": words[-1]["end"], 142 + "text": helper_transcript, 143 + "words": words, 144 + } 145 + ] 146 + statements = build_statements_from_acoustic(acoustic_segments) 147 + for statement in statements: 148 + statement["speaker"] = None 149 + 150 + rebuilt_text = " ".join(statement["text"] for statement in statements).strip() 151 + _log_drift_if_needed(helper_transcript, rebuilt_text) 152 + 153 + audio_sec = float(payload.get("audio_sec", len(audio) / sample_rate)) 154 + transcribe_ms = int(payload.get("transcribe_ms", 0)) 155 + tx_sec = transcribe_ms / 1000.0 156 + rtfx = float(payload.get("rtfx", 0.0)) 157 + logging.info( 158 + f" Transcribed {len(statements)} statements, {audio_sec:.2f}s speech " 159 + f"in {tx_sec:.2f}s (RTFx: {rtfx:.2f}) [model={model_info['model']}]" 160 + ) 161 + 162 + return statements 163 + 164 + finally: 165 + if temp_path and temp_path.exists(): 166 + temp_path.unlink() 167 + 168 + 169 + def get_model_info(config: dict) -> dict: 170 + """Return Parakeet model metadata for transcript JSONL headers.""" 171 + model_version, _cache_dir, _timeout_sec = _validate_config(config) 172 + helper_path = _resolve_helper_path() 173 + version_envelope = _probe_helper_version(helper_path) 174 + return { 175 + "model": f"parakeet-tdt-0.6b-{model_version}", 176 + "device": "ane", 177 + "compute_type": "coreml_fp16", 178 + "fluidaudio_version": version_envelope["fluidaudio_version"], 179 + "helper_hardware": version_envelope["hardware"], 180 + } 181 + 182 + 183 + def _resolve_helper_path() -> Path: 184 + """Resolve the Parakeet helper path from env override or default build path.""" 185 + env_path = os.getenv(_HELPER_ENV_KEY) 186 + if env_path: 187 + candidate = Path(env_path).expanduser().resolve() 188 + if not candidate.exists(): 189 + raise RuntimeError( 190 + f"Parakeet helper not found at ${_HELPER_ENV_KEY}={candidate}. " 191 + f"Run 'make parakeet-helper' or point ${_HELPER_ENV_KEY} at a valid executable." 192 + ) 193 + if not candidate.is_file() or not os.access(candidate, os.X_OK): 194 + raise RuntimeError( 195 + f"Parakeet helper at ${_HELPER_ENV_KEY}={candidate} is not executable. " 196 + f"Run 'make parakeet-helper', chmod +x the file, or point " 197 + f"${_HELPER_ENV_KEY} at a valid executable." 198 + ) 199 + return candidate 200 + 201 + candidate = ( 202 + Path(__file__).with_name("parakeet_helper") 203 + / ".build" 204 + / "release" 205 + / "parakeet-helper" 206 + ).resolve() 207 + if ( 208 + not candidate.exists() 209 + or not candidate.is_file() 210 + or not os.access(candidate, os.X_OK) 211 + ): 212 + raise RuntimeError( 213 + f"Parakeet helper not found at {candidate}. Run 'make parakeet-helper' " 214 + f"or set ${_HELPER_ENV_KEY} to a valid executable." 215 + ) 216 + return candidate 217 + 218 + 219 + def _probe_helper_version(helper_path: Path) -> dict: 220 + """Probe and cache the helper version envelope by resolved binary path.""" 221 + cache_key = str(helper_path.resolve()) 222 + if cache_key in _VERSION_CACHE: 223 + return _VERSION_CACHE[cache_key] 224 + 225 + try: 226 + result = subprocess.run( 227 + [str(helper_path), "--version"], 228 + check=False, 229 + capture_output=True, 230 + text=True, 231 + timeout=10.0, 232 + ) 233 + except subprocess.TimeoutExpired as exc: 234 + raise RuntimeError( 235 + f"Parakeet helper timed out after 10.0s. Rebuild with " 236 + f"'make parakeet-helper', increase transcribe.parakeet.timeout_sec, " 237 + f"or set ${_HELPER_ENV_KEY} to a different helper build." 238 + ) from exc 239 + 240 + if result.returncode != 0: 241 + stderr = (result.stderr or "").strip() 242 + try: 243 + stderr_json = json.loads(stderr) if stderr else {} 244 + except json.JSONDecodeError: 245 + stderr_json = {} 246 + message = stderr_json.get("message") or stderr or "unknown helper failure" 247 + if result.returncode == 2: 248 + raise RuntimeError( 249 + f"Parakeet helper rejected validated arguments (internal bug — file an issue): {message}" 250 + ) 251 + if result.returncode == 3: 252 + raise RuntimeError( 253 + f"Parakeet helper could not prepare cache dir {_DEFAULT_CACHE_DIR}: {message}" 254 + ) 255 + if result.returncode == 4: 256 + raise RuntimeError( 257 + f"Parakeet helper failed to download or load model '{_DEFAULT_MODEL_VERSION}'. " 258 + f"Valid values: v2, v3. {message}" 259 + ) 260 + if result.returncode == 5: 261 + raise RuntimeError(f"Parakeet helper failed to transcribe audio: {message}") 262 + raise RuntimeError( 263 + f"Parakeet helper failed with exit code {result.returncode}: {message}" 264 + ) 265 + 266 + try: 267 + payload = json.loads(result.stdout) 268 + except json.JSONDecodeError as exc: 269 + raise RuntimeError( 270 + f"Parakeet helper returned invalid version JSON: {exc}" 271 + ) from exc 272 + if not isinstance(payload, dict): 273 + raise RuntimeError("Parakeet helper returned a non-object version payload") 274 + 275 + _VERSION_CACHE[cache_key] = payload 276 + return payload 277 + 278 + 279 + def _collapse_subwords_to_words(token_timings: list[dict]) -> list[dict]: 280 + """Collapse helper token timings into repo-standard word dicts.""" 281 + if not token_timings: 282 + return [] 283 + 284 + words = [] 285 + current_parts: list[str] = [] 286 + current_confidences: list[float] = [] 287 + current_start: float | None = None 288 + current_end: float | None = None 289 + 290 + def flush() -> None: 291 + nonlocal current_parts, current_confidences, current_start, current_end 292 + if not current_parts or current_start is None or current_end is None: 293 + current_parts = [] 294 + current_confidences = [] 295 + current_start = None 296 + current_end = None 297 + return 298 + text = "".join(current_parts).lstrip() 299 + words.append( 300 + { 301 + "word": f" {text}", 302 + "start": current_start, 303 + "end": current_end, 304 + "probability": min(current_confidences), 305 + } 306 + ) 307 + current_parts = [] 308 + current_confidences = [] 309 + current_start = None 310 + current_end = None 311 + 312 + for token in token_timings: 313 + raw = str(token.get("token", "")) 314 + is_punctuation = bool(raw) and all( 315 + char in _PUNCTUATION_TOKENS for char in raw if char 316 + ) 317 + starts_new = raw.startswith("▁") or raw.startswith(" ") 318 + 319 + if starts_new and current_parts and not is_punctuation: 320 + flush() 321 + 322 + cleaned = raw.lstrip("▁") 323 + if starts_new: 324 + cleaned = cleaned.lstrip(" ") 325 + 326 + current_parts.append(cleaned) 327 + current_confidences.append(float(token.get("confidence", 0.0))) 328 + 329 + if not is_punctuation and current_start is None: 330 + current_start = float(token["start"]) 331 + if not is_punctuation: 332 + current_end = float(token["end"]) 333 + 334 + flush() 335 + return words 336 + 337 + 338 + def _log_drift_if_needed(fluid_transcript: str, rebuilt_text: str) -> None: 339 + """Warn when helper text and rebuilt text materially diverge.""" 340 + fluid_transcript = fluid_transcript.strip() 341 + rebuilt_text = rebuilt_text.strip() 342 + if not fluid_transcript or not rebuilt_text: 343 + return 344 + 345 + ratio = difflib.SequenceMatcher(None, fluid_transcript, rebuilt_text).ratio() 346 + if ratio < 0.95: 347 + logging.warning( 348 + "Parakeet transcript drift detected (ratio=%.3f): helper=%r rebuilt=%r", 349 + ratio, 350 + fluid_transcript, 351 + rebuilt_text, 352 + ) 353 + 354 + 355 + def _validate_config(config: dict) -> tuple[str, Path, float]: 356 + """Validate backend config before spawning the helper.""" 357 + model_version = config.get("model_version", _DEFAULT_MODEL_VERSION) 358 + if model_version not in _VALID_MODEL_VERSIONS: 359 + raise ValueError("model_version must be one of: v2, v3") 360 + 361 + raw_timeout = config.get("timeout_sec", _DEFAULT_TIMEOUT_SEC) 362 + try: 363 + timeout_sec = float(raw_timeout) 364 + except (TypeError, ValueError) as exc: 365 + raise ValueError(f"timeout_sec must be > 0, got {raw_timeout!r}") from exc 366 + if timeout_sec <= 0: 367 + raise ValueError(f"timeout_sec must be > 0, got {raw_timeout!r}") 368 + 369 + raw_cache_dir = config.get("cache_dir") 370 + cache_dir = ( 371 + Path(raw_cache_dir).expanduser() if raw_cache_dir else _DEFAULT_CACHE_DIR 372 + ) 373 + 374 + return model_version, cache_dir, timeout_sec
+3
observe/transcribe/parakeet_helper/.gitignore
··· 1 + .build/ 2 + .swiftpm/ 3 + Package.resolved
+24
observe/transcribe/parakeet_helper/Package.swift
··· 1 + // swift-tools-version: 5.9 2 + import PackageDescription 3 + 4 + let package = Package( 5 + name: "parakeet-helper", 6 + platforms: [.macOS(.v14)], 7 + products: [ 8 + .executable(name: "parakeet-helper", targets: ["parakeet-helper"]) 9 + ], 10 + dependencies: [ 11 + .package( 12 + url: "https://github.com/FluidInference/FluidAudio.git", 13 + exact: "0.14.0" 14 + ) 15 + ], 16 + targets: [ 17 + .executableTarget( 18 + name: "parakeet-helper", 19 + dependencies: [ 20 + .product(name: "FluidAudio", package: "FluidAudio") 21 + ] 22 + ) 23 + ] 24 + )
+99
observe/transcribe/parakeet_helper/README.md
··· 1 + # parakeet-helper 2 + 3 + Swift helper for the Parakeet v3 STT backend in solstone. 4 + 5 + ## Build 6 + 7 + ```bash 8 + swift build -c release 9 + ``` 10 + 11 + Built binary: 12 + 13 + ```text 14 + .build/release/parakeet-helper 15 + ``` 16 + 17 + ## CLI 18 + 19 + ```text 20 + parakeet-helper --version 21 + parakeet-helper [--cache-dir PATH] [--model v2|v3] <audio.wav> 22 + ``` 23 + 24 + Rules: 25 + 26 + - `--version` is standalone. 27 + - `--cache-dir` defaults to `~/Library/Application Support/solstone/parakeet/models`. 28 + - `--model` defaults to `v3`. 29 + - accepted `--model` values: `v2`, `v3`. 30 + 31 + ## Stdout Schema 32 + 33 + Success emits one UTF-8 JSON object followed by `\n`: 34 + 35 + ```json 36 + { 37 + "path": "string", 38 + "transcript": "string", 39 + "confidence": 0.0, 40 + "audio_sec": 0.0, 41 + "load_ms": 0, 42 + "transcribe_ms": 0, 43 + "rtfx": 0.0, 44 + "token_timings": [ 45 + { 46 + "token": "string", 47 + "token_id": 0, 48 + "start": 0.0, 49 + "end": 0.0, 50 + "confidence": 0.0 51 + } 52 + ], 53 + "model_version": "parakeet-tdt-0.6b-v3", 54 + "fluidaudio_version": "0.14.0", 55 + "hardware": "MacBookPro18,3 / Apple M4 Max", 56 + "macos_version": "26.4.1", 57 + "swift_version": "Apple Swift version 6.3.1" 58 + } 59 + ``` 60 + 61 + `--version` emits one UTF-8 JSON object followed by `\n`: 62 + 63 + ```json 64 + { 65 + "fluidaudio_version": "0.14.0", 66 + "model_version_default": "v3", 67 + "swift_version": "Apple Swift version 6.3.1", 68 + "hardware": "MacBookPro18,3 / Apple M4 Max", 69 + "macos_version": "26.4.1" 70 + } 71 + ``` 72 + 73 + ## Stderr Schema 74 + 75 + Non-zero exits emit one UTF-8 JSON object followed by `\n`: 76 + 77 + ```json 78 + { 79 + "category": "argv|cache|model_download|transcribe", 80 + "message": "human-readable string", 81 + "detail": "optional extra detail" 82 + } 83 + ``` 84 + 85 + ## Exit Codes 86 + 87 + - `0`: success 88 + - `2`: argv / input parsing error 89 + - `3`: cache directory creation / write failure 90 + - `4`: model download / load failure 91 + - `5`: transcription failure 92 + 93 + ## Fixture Regeneration 94 + 95 + ```bash 96 + say "The quick brown fox jumps over the lazy dog." -o /tmp/parakeet_sample.aiff 97 + afconvert -f WAVE -d LEI16@16000 /tmp/parakeet_sample.aiff tests/fixtures/parakeet_sample.wav 98 + rm /tmp/parakeet_sample.aiff 99 + ```
+350
observe/transcribe/parakeet_helper/Sources/parakeet-helper/main.swift
··· 1 + // SPDX-License-Identifier: AGPL-3.0-only 2 + // Copyright (c) 2026 sol pbc 3 + 4 + import AVFoundation 5 + import Darwin 6 + import FluidAudio 7 + import Foundation 8 + 9 + private let fluidAudioVersion = "0.14.0" 10 + private let defaultModelVersion = "v3" 11 + 12 + private struct JSONTokenTiming: Encodable { 13 + let token: String 14 + let token_id: Int 15 + let start: Double 16 + let end: Double 17 + let confidence: Float 18 + } 19 + 20 + private struct JSONOutput: Encodable { 21 + let path: String 22 + let transcript: String 23 + let confidence: Float 24 + let audio_sec: Double 25 + let load_ms: Int 26 + let transcribe_ms: Int 27 + let rtfx: Double 28 + let token_timings: [JSONTokenTiming] 29 + let model_version: String 30 + let fluidaudio_version: String 31 + let hardware: String 32 + let macos_version: String 33 + let swift_version: String 34 + } 35 + 36 + private struct VersionOutput: Encodable { 37 + let fluidaudio_version: String 38 + let model_version_default: String 39 + let swift_version: String 40 + let hardware: String 41 + let macos_version: String 42 + } 43 + 44 + private struct ErrorOutput: Encodable { 45 + let category: String 46 + let message: String 47 + let detail: String? 48 + } 49 + 50 + private enum HelperModel: String { 51 + case v2 52 + case v3 53 + 54 + var asrModelVersion: AsrModelVersion { 55 + switch self { 56 + case .v2: 57 + return .v2 58 + case .v3: 59 + return .v3 60 + } 61 + } 62 + 63 + var fullModelVersion: String { 64 + "parakeet-tdt-0.6b-\(rawValue)" 65 + } 66 + } 67 + 68 + private enum ParsedCommand { 69 + case version 70 + case transcribe(audioPath: String, cacheDir: URL, model: HelperModel) 71 + } 72 + 73 + private func writeJSONLine<T: Encodable>(_ value: T, to handle: FileHandle) { 74 + let encoder = JSONEncoder() 75 + encoder.outputFormatting = [.withoutEscapingSlashes] 76 + do { 77 + let data = try encoder.encode(value) 78 + handle.write(data) 79 + handle.write("\n".data(using: .utf8)!) 80 + } catch { 81 + handle.write( 82 + #"{"category":"transcribe","message":"failed to encode JSON output"}"#.data( 83 + using: .utf8 84 + )! 85 + ) 86 + handle.write("\n".data(using: .utf8)!) 87 + } 88 + } 89 + 90 + private func fail( 91 + code: Int32, 92 + category: String, 93 + message: String, 94 + detail: String? = nil 95 + ) -> Never { 96 + writeJSONLine( 97 + ErrorOutput(category: category, message: message, detail: detail), 98 + to: FileHandle.standardError 99 + ) 100 + exit(code) 101 + } 102 + 103 + private func sysctlString(_ name: String) -> String { 104 + var size = 0 105 + guard sysctlbyname(name, nil, &size, nil, 0) == 0, size > 0 else { 106 + return "unknown" 107 + } 108 + var buffer = [CChar](repeating: 0, count: size) 109 + guard sysctlbyname(name, &buffer, &size, nil, 0) == 0 else { 110 + return "unknown" 111 + } 112 + return String(cString: buffer) 113 + } 114 + 115 + private func hardwareString() -> String { 116 + let model = sysctlString("hw.model") 117 + let brand = sysctlString("machdep.cpu.brand_string") 118 + return "\(model) / \(brand)" 119 + } 120 + 121 + private func macosVersionString() -> String { 122 + let version = ProcessInfo.processInfo.operatingSystemVersion 123 + return "\(version.majorVersion).\(version.minorVersion).\(version.patchVersion)" 124 + } 125 + 126 + private func swiftVersionString() -> String { 127 + let process = Process() 128 + let pipe = Pipe() 129 + process.executableURL = URL(fileURLWithPath: "/usr/bin/swift") 130 + process.arguments = ["--version"] 131 + process.standardOutput = pipe 132 + process.standardError = Pipe() 133 + do { 134 + try process.run() 135 + process.waitUntilExit() 136 + guard process.terminationStatus == 0 else { 137 + return "unknown" 138 + } 139 + let data = pipe.fileHandleForReading.readDataToEndOfFile() 140 + guard 141 + let output = String(data: data, encoding: .utf8)? 142 + .split(separator: "\n") 143 + .first 144 + else { 145 + return "unknown" 146 + } 147 + return String(output) 148 + } catch { 149 + return "unknown" 150 + } 151 + } 152 + 153 + private func expandedURL(path: String) -> URL { 154 + URL(fileURLWithPath: NSString(string: path).expandingTildeInPath) 155 + } 156 + 157 + private func defaultCacheDir() -> URL { 158 + expandedURL(path: "~/Library/Application Support/solstone/parakeet/models") 159 + } 160 + 161 + private func parseCommand() -> ParsedCommand { 162 + let args = Array(CommandLine.arguments.dropFirst()) 163 + if args.contains("--version") { 164 + guard args.count == 1, args.first == "--version" else { 165 + fail( 166 + code: 2, 167 + category: "argv", 168 + message: "--version must be used without other arguments" 169 + ) 170 + } 171 + return .version 172 + } 173 + 174 + var cacheDir = defaultCacheDir() 175 + var model = HelperModel.v3 176 + var audioPath: String? 177 + var index = 0 178 + 179 + while index < args.count { 180 + let arg = args[index] 181 + switch arg { 182 + case "--cache-dir": 183 + guard index + 1 < args.count else { 184 + fail( 185 + code: 2, 186 + category: "argv", 187 + message: "--cache-dir requires a path argument" 188 + ) 189 + } 190 + cacheDir = expandedURL(path: args[index + 1]) 191 + index += 2 192 + case "--model": 193 + guard index + 1 < args.count else { 194 + fail( 195 + code: 2, 196 + category: "argv", 197 + message: "--model requires one of: v2, v3" 198 + ) 199 + } 200 + guard let parsedModel = HelperModel(rawValue: args[index + 1]) else { 201 + fail( 202 + code: 2, 203 + category: "argv", 204 + message: "unknown --model value '\(args[index + 1])'; valid values: v2, v3" 205 + ) 206 + } 207 + model = parsedModel 208 + index += 2 209 + default: 210 + if arg.hasPrefix("--") { 211 + fail(code: 2, category: "argv", message: "unknown flag: \(arg)") 212 + } 213 + guard audioPath == nil else { 214 + fail( 215 + code: 2, 216 + category: "argv", 217 + message: "expected exactly one positional audio path" 218 + ) 219 + } 220 + audioPath = arg 221 + index += 1 222 + } 223 + } 224 + 225 + guard let audioPath else { 226 + fail(code: 2, category: "argv", message: "missing required positional audio path") 227 + } 228 + 229 + return .transcribe(audioPath: audioPath, cacheDir: cacheDir, model: model) 230 + } 231 + 232 + private func createCacheDir(_ cacheDir: URL) { 233 + do { 234 + try FileManager.default.createDirectory( 235 + at: cacheDir, 236 + withIntermediateDirectories: true, 237 + attributes: nil 238 + ) 239 + } catch { 240 + fail( 241 + code: 3, 242 + category: "cache", 243 + message: "failed to create cache dir", 244 + detail: String(describing: error) 245 + ) 246 + } 247 + } 248 + 249 + private func audioDurationSeconds(url: URL) throws -> Double { 250 + let file = try AVAudioFile(forReading: url) 251 + let format = file.processingFormat 252 + return Double(file.length) / format.sampleRate 253 + } 254 + 255 + @main 256 + struct Main { 257 + static func main() async { 258 + let hardware = hardwareString() 259 + let macosVersion = macosVersionString() 260 + let swiftVersion = swiftVersionString() 261 + 262 + switch parseCommand() { 263 + case .version: 264 + writeJSONLine( 265 + VersionOutput( 266 + fluidaudio_version: fluidAudioVersion, 267 + model_version_default: defaultModelVersion, 268 + swift_version: swiftVersion, 269 + hardware: hardware, 270 + macos_version: macosVersion 271 + ), 272 + to: FileHandle.standardOutput 273 + ) 274 + case let .transcribe(audioPath, cacheDir, model): 275 + createCacheDir(cacheDir) 276 + 277 + let loadStart = DispatchTime.now().uptimeNanoseconds 278 + let manager = AsrManager() 279 + let loadMs: Int 280 + do { 281 + let models = try await AsrModels.downloadAndLoad( 282 + to: cacheDir, 283 + version: model.asrModelVersion 284 + ) 285 + try await manager.loadModels(models) 286 + loadMs = Int( 287 + (DispatchTime.now().uptimeNanoseconds - loadStart) / 1_000_000 288 + ) 289 + } catch { 290 + fail( 291 + code: 4, 292 + category: "model_download", 293 + message: "failed to download or load model", 294 + detail: String(describing: error) 295 + ) 296 + } 297 + 298 + do { 299 + let audioURL = URL(fileURLWithPath: audioPath) 300 + let audioSec = try audioDurationSeconds(url: audioURL) 301 + var decoderState = try TdtDecoderState() 302 + 303 + let txStart = DispatchTime.now().uptimeNanoseconds 304 + let result = try await manager.transcribe( 305 + audioURL, 306 + decoderState: &decoderState 307 + ) 308 + let transcribeMs = 309 + Int((DispatchTime.now().uptimeNanoseconds - txStart) / 1_000_000) 310 + let txSeconds = max(Double(transcribeMs) / 1000.0, 1e-6) 311 + 312 + let timings = (result.tokenTimings ?? []).map { 313 + JSONTokenTiming( 314 + token: $0.token, 315 + token_id: $0.tokenId, 316 + start: $0.startTime, 317 + end: $0.endTime, 318 + confidence: $0.confidence 319 + ) 320 + } 321 + 322 + writeJSONLine( 323 + JSONOutput( 324 + path: audioPath, 325 + transcript: result.text, 326 + confidence: result.confidence, 327 + audio_sec: audioSec, 328 + load_ms: loadMs, 329 + transcribe_ms: transcribeMs, 330 + rtfx: audioSec / txSeconds, 331 + token_timings: timings, 332 + model_version: model.fullModelVersion, 333 + fluidaudio_version: fluidAudioVersion, 334 + hardware: hardware, 335 + macos_version: macosVersion, 336 + swift_version: swiftVersion 337 + ), 338 + to: FileHandle.standardOutput 339 + ) 340 + } catch { 341 + fail( 342 + code: 5, 343 + category: "transcribe", 344 + message: "failed to transcribe audio", 345 + detail: String(describing: error) 346 + ) 347 + } 348 + } 349 + } 350 + }
+11
tests/baselines/api/settings/transcribe.json
··· 1 1 { 2 2 "api_keys": { 3 3 "gemini": false, 4 + "parakeet": false, 4 5 "revai": false, 5 6 "whisper": false 6 7 }, ··· 30 31 "device", 31 32 "model", 32 33 "compute_type" 34 + ] 35 + }, 36 + { 37 + "description": "On-device speech recognition via FluidAudio + Parakeet TDT; requires `make parakeet-helper`", 38 + "env_key": null, 39 + "label": "Parakeet - Local processing (Apple Silicon, optional helper)", 40 + "name": "parakeet", 41 + "settings": [ 42 + "model_version", 43 + "timeout_sec" 33 44 ] 34 45 } 35 46 ],
tests/fixtures/parakeet_sample.wav

This is a binary file and will not be displayed.

+320
tests/test_transcribe_parakeet.py
··· 1 + # SPDX-License-Identifier: AGPL-3.0-only 2 + # Copyright (c) 2026 sol pbc 3 + 4 + import json 5 + import platform 6 + import subprocess 7 + from pathlib import Path 8 + from types import SimpleNamespace 9 + 10 + import numpy as np 11 + import pytest 12 + import soundfile as sf 13 + 14 + import observe.transcribe.parakeet as parakeet 15 + from observe.transcribe import BACKEND_METADATA, BACKEND_REGISTRY 16 + 17 + 18 + def _skip_reason() -> str | None: 19 + if platform.system() != "Darwin": 20 + return "requires Darwin" 21 + if platform.machine() != "arm64": 22 + return "requires arm64" 23 + try: 24 + parakeet._resolve_helper_path() 25 + except RuntimeError as exc: 26 + return str(exc) 27 + return None 28 + 29 + 30 + def test_collapse_empty_input(): 31 + assert parakeet._collapse_subwords_to_words([]) == [] 32 + 33 + 34 + def test_collapse_single_word(): 35 + tokens = [ 36 + { 37 + "token": "▁hello", 38 + "token_id": 1, 39 + "start": 0.0, 40 + "end": 0.1, 41 + "confidence": 0.9, 42 + } 43 + ] 44 + words = parakeet._collapse_subwords_to_words(tokens) 45 + assert words == [{"word": " hello", "start": 0.0, "end": 0.1, "probability": 0.9}] 46 + 47 + 48 + def test_collapse_two_words_with_boundary(): 49 + tokens = [ 50 + { 51 + "token": "▁the", 52 + "token_id": 1, 53 + "start": 0.0, 54 + "end": 0.1, 55 + "confidence": 0.9, 56 + }, 57 + { 58 + "token": " quick", 59 + "token_id": 2, 60 + "start": 0.1, 61 + "end": 0.2, 62 + "confidence": 0.8, 63 + }, 64 + ] 65 + words = parakeet._collapse_subwords_to_words(tokens) 66 + assert [word["word"] for word in words] == [" the", " quick"] 67 + 68 + 69 + def test_collapse_subword_rebuild(): 70 + tokens = [ 71 + { 72 + "token": "▁the", 73 + "token_id": 1, 74 + "start": 0.0, 75 + "end": 0.1, 76 + "confidence": 0.9, 77 + }, 78 + { 79 + "token": "▁qu", 80 + "token_id": 2, 81 + "start": 0.1, 82 + "end": 0.18, 83 + "confidence": 0.8, 84 + }, 85 + { 86 + "token": "ick", 87 + "token_id": 3, 88 + "start": 0.18, 89 + "end": 0.25, 90 + "confidence": 0.95, 91 + }, 92 + ] 93 + words = parakeet._collapse_subwords_to_words(tokens) 94 + assert len(words) == 2 95 + assert words[0]["word"] == " the" 96 + assert words[1]["word"] == " quick" 97 + assert words[1]["probability"] == pytest.approx(0.8) 98 + 99 + 100 + def test_collapse_contraction(): 101 + tokens = [ 102 + { 103 + "token": "▁don", 104 + "token_id": 1, 105 + "start": 0.0, 106 + "end": 0.1, 107 + "confidence": 0.9, 108 + }, 109 + { 110 + "token": "'", 111 + "token_id": 2, 112 + "start": 0.1, 113 + "end": 0.12, 114 + "confidence": 0.8, 115 + }, 116 + { 117 + "token": "t", 118 + "token_id": 3, 119 + "start": 0.12, 120 + "end": 0.18, 121 + "confidence": 0.85, 122 + }, 123 + ] 124 + words = parakeet._collapse_subwords_to_words(tokens) 125 + assert [word["word"] for word in words] == [" don't"] 126 + 127 + 128 + def test_collapse_trailing_punctuation_attaches(): 129 + tokens = [ 130 + { 131 + "token": "▁fox", 132 + "token_id": 1, 133 + "start": 0.0, 134 + "end": 0.2, 135 + "confidence": 0.9, 136 + }, 137 + { 138 + "token": ".", 139 + "token_id": 2, 140 + "start": 0.2, 141 + "end": 0.24, 142 + "confidence": 0.7, 143 + }, 144 + ] 145 + words = parakeet._collapse_subwords_to_words(tokens) 146 + assert words == [{"word": " fox.", "start": 0.0, "end": 0.2, "probability": 0.7}] 147 + 148 + 149 + def test_collapse_confidence_is_min(): 150 + tokens = [ 151 + { 152 + "token": "▁hel", 153 + "token_id": 1, 154 + "start": 0.0, 155 + "end": 0.1, 156 + "confidence": 0.9, 157 + }, 158 + { 159 + "token": "lo", 160 + "token_id": 2, 161 + "start": 0.1, 162 + "end": 0.2, 163 + "confidence": 0.5, 164 + }, 165 + { 166 + "token": "!", 167 + "token_id": 3, 168 + "start": 0.2, 169 + "end": 0.24, 170 + "confidence": 0.7, 171 + }, 172 + ] 173 + words = parakeet._collapse_subwords_to_words(tokens) 174 + assert words[0]["probability"] == pytest.approx(0.5) 175 + 176 + 177 + def test_collapse_leading_space_invariant(): 178 + tokens = [ 179 + { 180 + "token": "▁the", 181 + "token_id": 1, 182 + "start": 0.0, 183 + "end": 0.1, 184 + "confidence": 0.9, 185 + }, 186 + { 187 + "token": " quick", 188 + "token_id": 2, 189 + "start": 0.1, 190 + "end": 0.2, 191 + "confidence": 0.8, 192 + }, 193 + { 194 + "token": " brown", 195 + "token_id": 3, 196 + "start": 0.2, 197 + "end": 0.3, 198 + "confidence": 0.85, 199 + }, 200 + { 201 + "token": ".", 202 + "token_id": 4, 203 + "start": 0.3, 204 + "end": 0.34, 205 + "confidence": 0.95, 206 + }, 207 + ] 208 + words = parakeet._collapse_subwords_to_words(tokens) 209 + assert all( 210 + word["word"].startswith(" ") and not word["word"].startswith(" ") 211 + for word in words 212 + ) 213 + 214 + 215 + def test_validate_config_bad_model(): 216 + with pytest.raises(ValueError, match="v2, v3"): 217 + parakeet._validate_config({"model_version": "v4"}) 218 + 219 + 220 + def test_validate_config_bad_timeout(): 221 + with pytest.raises(ValueError, match="> 0"): 222 + parakeet._validate_config({"timeout_sec": -1}) 223 + 224 + 225 + def test_registry_has_parakeet(): 226 + assert "parakeet" in BACKEND_REGISTRY 227 + assert "parakeet" in BACKEND_METADATA 228 + 229 + 230 + def test_metadata_settings_list_of_str(): 231 + assert all(isinstance(key, str) for key in BACKEND_METADATA["parakeet"]["settings"]) 232 + 233 + 234 + def test_transcribe_rejects_transcript_without_token_timings( 235 + monkeypatch: pytest.MonkeyPatch, 236 + ): 237 + monkeypatch.setattr( 238 + parakeet, "_resolve_helper_path", lambda: Path("/tmp/parakeet-helper") 239 + ) 240 + monkeypatch.setattr( 241 + parakeet, 242 + "get_model_info", 243 + lambda config: {"model": "parakeet-tdt-0.6b-v3"}, 244 + ) 245 + monkeypatch.setattr( 246 + subprocess, 247 + "run", 248 + lambda *args, **kwargs: SimpleNamespace( 249 + returncode=0, 250 + stdout=json.dumps( 251 + { 252 + "transcript": "hello world", 253 + "token_timings": [], 254 + "audio_sec": 1.0, 255 + "transcribe_ms": 50, 256 + "rtfx": 20.0, 257 + } 258 + ), 259 + stderr="", 260 + ), 261 + ) 262 + 263 + with pytest.raises( 264 + RuntimeError, 265 + match="transcript text without token timings", 266 + ): 267 + parakeet.transcribe(np.zeros(16000, dtype=np.float32), 16000, {}) 268 + 269 + 270 + @pytest.mark.skipif(_skip_reason() is not None, reason=_skip_reason() or "") 271 + @pytest.mark.timeout(120) 272 + def test_helper_version_envelope(): 273 + helper_path = parakeet._resolve_helper_path() 274 + result = subprocess.run( 275 + [str(helper_path), "--version"], 276 + check=False, 277 + capture_output=True, 278 + text=True, 279 + ) 280 + assert result.returncode == 0 281 + payload = json.loads(result.stdout) 282 + assert payload["fluidaudio_version"] == "0.14.0" 283 + assert set(payload) >= { 284 + "fluidaudio_version", 285 + "model_version_default", 286 + "swift_version", 287 + "hardware", 288 + "macos_version", 289 + } 290 + 291 + 292 + @pytest.mark.skipif(_skip_reason() is not None, reason=_skip_reason() or "") 293 + @pytest.mark.timeout(120) 294 + def test_transcribe_pangram_end_to_end(): 295 + fixture_path = Path("tests/fixtures/parakeet_sample.wav") 296 + audio, sample_rate = sf.read(fixture_path, dtype="float32") 297 + statements = parakeet.transcribe(audio, sample_rate, {}) 298 + assert statements 299 + 300 + combined_text = " ".join(statement["text"] for statement in statements) 301 + tokens = combined_text.split() 302 + assert "quick" in tokens 303 + assert "fox" in tokens 304 + 305 + for statement in statements: 306 + assert set(statement) >= {"id", "start", "end", "text", "words", "speaker"} 307 + assert statement["speaker"] is None 308 + assert statement["words"] 309 + assert all( 310 + word["word"].startswith(" ") and not word["word"].startswith(" ") 311 + for word in statement["words"] 312 + ) 313 + 314 + 315 + @pytest.mark.skipif(_skip_reason() is not None, reason=_skip_reason() or "") 316 + @pytest.mark.timeout(120) 317 + def test_transcribe_empty_audio(): 318 + audio = np.zeros(5 * 16000, dtype=np.float32) 319 + statements = parakeet.transcribe(audio, 16000, {}) 320 + assert statements == []