providers/anthropic: drop thinking from retry_kwargs when forcing tool use
Live validation of the L3 sense pilot surfaced a real bug in L1's
Anthropic structured-output fallback path: when the primary
output_config call raises BadRequestError, the fallback to forced
tool_use kept the `thinking` parameter, which Anthropic's API rejects
("Thinking may not be enabled when tool_choice forces tool use"). The
fallback then bubbled a confusing secondary 400 instead of recovering.
Drop `thinking` from retry_kwargs in both sync + async paths. Restore
the temperature value that thinking originally displaced (the primary
path sets thinking xor temperature). Add a regression test asserting
the retry kwargs strip thinking and carry temperature forward.
Pre-existing Anthropic constraints surfaced during the same live test
but are out of scope here:
1. max_tokens must be > thinking.budget_tokens (production sense
defaults satisfy this)
2. SDK requires streaming for max_tokens that could take >10 min
(~30k+ for sonnet) — production sense default of 49152 hits this
Both affect any thinking-enabled Anthropic caller, schema or no
schema. Filed as separate VPE follow-up notes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>