Disable thinking for Gemini transcription to fix max_tokens truncation

personal memory agent

Gemini 3 Flash applies a default thinking budget when none is specified,
consuming output tokens from the 16384 limit. Dense 5-minute audio segments
were hitting max_tokens with only ~2000 chars of transcript generated.
Setting thinking_budget=0 gives the full token budget to the JSON output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Jer Miller 3 months ago 51254ac7 b6a043da

1 changed file

expand all

observe

transcribe

gemini.py

observe/transcribe/gemini.py

··· 321 321 ] 322 322 323 323 # Call Gemini via think.models.generate() 324 + # thinking_budget=0 disables thinking — transcription is extraction, not 325 + # reasoning, and Gemini's default thinking budget consumes output tokens. 324 326 response_text = generate( 325 327 contents=contents, 326 328 context="observe.transcribe.gemini", 327 329 temperature=0.3, 328 330 max_output_tokens=16384, 329 331 json_output=True, 332 + thinking_budget=0, 330 333 ) 331 334 332 335 transcribe_time = time.perf_counter() - t0

Configure Feed

Configure Feed