SML Rule Regression Tests#
Regression harness for the SML rules under ../rules/. Runs fixtures
through a live Osprey worker (via a slim Kafka + Postgres stack) and
asserts on the emitted labels + verdicts.
Why this exists#
The SML rules are small and readable, but their composition is not —
changing bulk_extreme.sml's threshold can shadow warming.sml's
label, and we'd only notice in prod. This harness freezes current
behavior so rule changes show up as explicit diffs in expected
verdicts, not as silent drift.
Running locally#
Prerequisites#
- Docker or Colima running
- Python 3.11+
- (Optional) a virtualenv — the harness's Python deps are kafka-python and PyYAML, both tiny
One-time setup#
cd osprey/tests
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
A full test run#
# 1. Bring the test stack up (takes ~30s for Kafka leader election)
docker compose -f docker-compose.test.yml -p osprey-test up -d
# 2. Wait for the worker to be ready — cold-start takes another
# ~15-30s while it compiles rules and attaches to Kafka.
docker compose -f docker-compose.test.yml -p osprey-test logs -f osprey-worker
# Watch for "Starting to consume" / similar, then Ctrl-C.
# 3. Run the harness
./run.py
# 4. Tear down when you're done
docker compose -f docker-compose.test.yml -p osprey-test down -v
Fixture layout#
fixtures/<scenario>/
├── input.json a single relay event envelope
└── expect.yaml labels_applied / labels_forbidden / verdicts / verdicts_forbidden
input.json is the exact payload that gets published to
osprey.actions_input. The harness replaces action_id with a unique
value per run so parallel invocations don't cross-match.
expect.yaml shape:
description: free-text explanation
labels_applied:
- SenderDID/<label>/add # must appear in __entity_label_mutations
labels_forbidden:
- SenderDID/<label>/add # must NOT appear
verdicts:
- reject # must appear in __verdicts
verdicts_forbidden:
- reject # must NOT appear
All four fields are optional. Empty fixtures test nothing — at minimum
declare one labels_applied or verdicts entry.
Adding a scenario#
- Pick a representative event from
../seed-events.shor the real relay (queryrelay_events.rawinrelay.sqlite). - Drop it into a new
fixtures/<name>/input.json. - Read the SML rule you expect to fire. Write
expect.yamlasserting on the labels and verdicts. - Run
./run.py— iterate until the scenario passes. - Commit. The CI workflow picks it up automatically.
Common pitfalls#
- "no execution_result with action_id=... within Ns" — usually
means the worker isn't consuming yet. Watch
docker compose logs osprey-workerfor "Starting to consume actions_input". Bump--timeoutif the worker is slow on your machine. - Rules that require history — velocity and bounce-rate rules
read pre-computed fields (
sends_last_hour,bounce_rate) that the relay stamps on outgoing events. The harness does not run the relay, so those counters are whatever you put in the fixture JSON. Set them explicitly. - Label shape mismatch — mutations are stringified as
<Entity>/<label>/<add|remove>. Case matters. Entity must match the relay.sml definitions (SenderDID,SenderDomain,RecipientDomain). - Host-port conflicts — docker-compose.test.yml uses offset ports
(Kafka 19092, Postgres 15433) so it can run alongside the dev
stack in
../docker-compose.yml. If you've also started the dev stack, double-check the broker you're pointingrun.pyat.
What this doesn't cover#
- Rule compilation errors —
validate-sml.ymlalready does that. - Velocity counters crossing event boundaries (the harness is one-event-per-scenario by design; add more scenarios if you need sequence coverage).
- The OspreyEnforcer-side policy derivation in the relay — that's
covered by
internal/relay/ospreyenforce_test.gowith a purepolicyFromLabelsharness.