persistent host reconnect: never give up on known hosts
subscribers now retry with exponential backoff capped at 30 min
(was 60s cap with hard kill at 15 failures). on successful connect,
backoff resets to 1s and host flips back to active. hosts that fail
15+ consecutive times are marked dormant (observable) but the
subscriber keeps retrying. a reconciliation loop every 5 min
respawns any active/dormant host missing from the worker map.
this eliminates dependence on the external reconnect cron for host
retention — it can be reduced to discovery-only.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>