fix(plaud,supervisor): timeout stalled S3 streams; cap scheduled task runtime
Two coupled fixes for the 2026-04-28 wedge where a half-closed S3 socket
left download_to_file blocked inside iter_content for 7+ hours, coalescing
seven sched:sync:plaud refs onto one frozen task.
think/importers/plaud.py:
- download_to_file: switch session.get(stream=True) to (connect=30s,
read=45s) tuple timeout so per-chunk stalls fail fast. Wrap streaming
in RequestException handling that returns False with a clear log.
- Add throttled progress_cb (~5s) into iter_content; PlaudBackend.sync
passes a closure that refreshes last_completed mid-download so the
outer inactivity timer trips on per-step progress, not just on
per-file completion.
think/supervisor.py + think/scheduler.py + think/utils.py:
- New TaskQueue.set_cap / enforce_deadlines: cmd_name-keyed runtime
caps. supervise() ticks enforce_deadlines once per second; on cap
breach it sends non-blocking SIGTERM (raw send_signal, not
RunnerManagedProcess.terminate which blocks on wait(timeout=15)),
records the termination start, then escalates to SIGKILL after 15s
via a sidecar dict. Mirrors the existing _restart_requests pattern.
- scheduler.load_config parses optional max_runtime per entry via
new shared parse_duration_seconds helper (accepts int, Ns, Nm, Nh).
Invalid values warn and drop the cap; the entry still loads.
- scheduler.collect_runtime_caps feeds caps into the TaskQueue at
supervisor boot, keyed on TaskQueue.get_command_name(cmd).
Caps are off by default; absent max_runtime means no cap. Note: no
historical sync:plaud import logs were reachable from this worktree to
tune the schedule value. Operators can set "max_runtime": "30m" on the
sync:plaud entry in journal/config/schedules.json — comfortably above
normal sync duration, well below the 1h hourly cadence.
Tests: new tests/test_plaud_download.py covers read-timeout failure,
throttled progress callback, and the inactivity-timer trip when sync
makes no progress. tests/test_utils.py covers parse_duration_seconds.
tests/test_supervisor.py covers set_cap, SIGTERM on deadline, SIGKILL
escalation after 15s, termination-state cleanup on ref exit, and the
no-cap noop. tests/test_scheduler.py covers max_runtime parsing
(valid string, valid int, invalid negative/garbage/wrong-type), and
collect_runtime_caps filtering.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>