cogitate: retry first-event CLI timeouts, bump to 90s
Bump CLIRunner's default first_event_timeout from 30s to 90s.
Healthy gemini CLI pre-init already takes 5-10s, and 90s leaves
headroom for one transient retry/backoff before giving up.
Rationale follows the CTO investigation in
~/projects/extro/cto/workspace/gemini-cogitate-hang-investigation-260426.md.
Retry exactly once on first-event timeout, gated by
_already_retried_first_event. Mid-stream (full-run) timeouts are not
retried; the existing 600s timeout continues to cover that case.
Write /tmp/gemini-cogitate-timeout-<unix_ms>.log on every timeout:
pre-retry, final give-up, and full-run. Each log records cmd, cwd,
sorted env keys without values, which_timeout, and captured stderr,
with mode 0o600. Write failures are logged via LOG.warning and
swallowed so they never mask the underlying RuntimeError. Add the
module-scope _TIMEOUT_LOG_DIR seam so tests can monkeypatch the path.
Keep scope confined to CLIRunner in think/providers/cli.py and its
tests. build_cogitate_env, assemble_prompt, ThinkingAggregator, and
provider-side translate functions are intentionally untouched per the
CTO investigation scope.
Keep the give-up surface unchanged: the emitted error event payload and
the RuntimeError(error_message) text remain byte-identical to prior
behavior.