phi monitors prefect flow runs — same shape as relay checks
phi now watches the prefect-server via the prefect MCP on an hourly cadence
and notifies nate when a flow fails persistently. closes the loop that
made the last 6 days of broken ingest silent: a single failure wouldn't
have triggered a notification, but the second one would have.
architecture (mirrors relay watching):
- prefect MCP: https://prefect-by-zzstoatzz.fastmcp.app/mcp, same per-request
header auth pattern pdsx uses (x-prefect-api-url +
x-prefect-api-auth-string)
- tool_prefix="prefect" so tools become prefect_get_flow_runs,
prefect_get_flow_run_logs, etc — 13 tools from the MCP
- scheduled entry point process_flow_check, cadence ~1h (more time-
sensitive than relay's 3h)
- de-dup via the active observations pool: first failure → silent
observe(); same flow failing again → post + tag nate. one-off blips
don't wake anyone up; persistent problems do.
files:
- config: prefect_mcp_url, prefect_api_url, prefect_api_auth_string (from
fly secret), flow_check_interval_polls=360 (~1h at 10s poll interval)
- agent.py _mcp_toolsets: appends prefect MCP when auth is configured
(graceful degrade for dev/local without the secret)
- agent.py process_flow_check: new entry point, task prompt tells phi
to check [ACTIVE OBSERVATIONS] first and escalate on repeat
- message_handler.check_flows: thin wrapper that passes recent posts
- notification_poller: _polls_since_last_flow_check counter, _should_
check_flows / _maybe_check_flows gate (scheduled independently from
relay checks)
fly secret PREFECT_API_AUTH_STRING already set, pulled from the k8s
prefect-auth secret so both consumers share auth. 102 tests pass.
Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>