SML Rule Regression Tests#

Regression harness for the SML rules under ../rules/. Runs fixtures through a live Osprey worker (via a slim Kafka + Postgres stack) and asserts on the emitted labels + verdicts.

Why this exists#

The SML rules are small and readable, but their composition is not — changing bulk_extreme.sml's threshold can shadow warming.sml's label, and we'd only notice in prod. This harness freezes current behavior so rule changes show up as explicit diffs in expected verdicts, not as silent drift.

Running locally#

Prerequisites#

Docker or Colima running
Python 3.11+
(Optional) a virtualenv — the harness's Python deps are kafka-python and PyYAML, both tiny

One-time setup#

cd osprey/tests
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt

A full test run#

# 1. Bring the test stack up (takes ~30s for Kafka leader election)
docker compose -f docker-compose.test.yml -p osprey-test up -d

# 2. Wait for the worker to be ready — cold-start takes another
#    ~15-30s while it compiles rules and attaches to Kafka.
docker compose -f docker-compose.test.yml -p osprey-test logs -f osprey-worker
# Watch for "Starting to consume" / similar, then Ctrl-C.

# 3. Run the harness
./run.py

# 4. Tear down when you're done
docker compose -f docker-compose.test.yml -p osprey-test down -v

Fixture layout#

fixtures/<scenario>/
├── input.json    a single relay event envelope
└── expect.yaml   labels_applied / labels_forbidden / verdicts / verdicts_forbidden

input.json is the exact payload that gets published to osprey.actions_input. The harness replaces action_id with a unique value per run so parallel invocations don't cross-match.

expect.yaml shape:

description: free-text explanation
labels_applied:
  - SenderDID/<label>/add       # must appear in __entity_label_mutations
labels_forbidden:
  - SenderDID/<label>/add       # must NOT appear
verdicts:
  - reject                      # must appear in __verdicts
verdicts_forbidden:
  - reject                      # must NOT appear

All four fields are optional. Empty fixtures test nothing — at minimum declare one labels_applied or verdicts entry.

Adding a scenario#

Pick a representative event from ../seed-events.sh or the real relay (query relay_events.raw in relay.sqlite).
Drop it into a new fixtures/<name>/input.json.
Read the SML rule you expect to fire. Write expect.yaml asserting on the labels and verdicts.
Run ./run.py — iterate until the scenario passes.
Commit. The CI workflow picks it up automatically.

Common pitfalls#

"no execution_result with action_id=... within Ns" — usually means the worker isn't consuming yet. Watch docker compose logs osprey-worker for "Starting to consume actions_input". Bump --timeout if the worker is slow on your machine.
Rules that require history — velocity and bounce-rate rules read pre-computed fields (sends_last_hour, bounce_rate) that the relay stamps on outgoing events. The harness does not run the relay, so those counters are whatever you put in the fixture JSON. Set them explicitly.
Label shape mismatch — mutations are stringified as <Entity>/<label>/<add|remove>. Case matters. Entity must match the relay.sml definitions (SenderDID, SenderDomain, RecipientDomain).
Host-port conflicts — docker-compose.test.yml uses offset ports (Kafka 19092, Postgres 15433) so it can run alongside the dev stack in ../docker-compose.yml. If you've also started the dev stack, double-check the broker you're pointing run.py at.

What this doesn't cover#

Rule compilation errors — validate-sml.yml already does that.
Velocity counters crossing event boundaries (the harness is one-event-per-scenario by design; add more scenarios if you need sequence coverage).
The OspreyEnforcer-side policy derivation in the relay — that's covered by internal/relay/ospreyenforce_test.go with a pure policyFromLabels harness.

Configure Feed