Mirror of https://github.com/roostorg/osprey
github.com/roostorg/osprey
1# Osprey Docs
2
3# Osprey Docs
4
5
6
7## Rules
8
9### Creating Rules
10
11Rules in Osprey are written in `Some Madeup Language (SML)` and follow most syntax conventions present in the Osprey Query UI. SML is a subset of Python with additional restrictions to make the rules simpler to craft.
12
13Rules by themselves only create variables, and without a corresponding `WhenRules()` function call, the rule will have no effects outside of evaluation and query functionality.
14
15Rules at present support the following concepts through the `Rule()` function of the same name.
16
17- Name
18
19 `Rule_Name = Rule()`
20
21 The name of the rule also functions as a conventional “RuleId” and the name of the bool that can be used to query individual rule hits in the Osprey Query UI. As a result, changing the name of a rule after activation may affect historical query results in the UI if not logged externally.
22
23- Logic
24
25 `when_all=[]`
26
27 The actual logic that will be used to evaluate Osprey rules is all encompassed as single comma-delimited list of signals within the `when_all` parameter of the Rule() function and supports the use of Labels, Plugins, UDFs and other values to help enrich heuristics.
28
29 At present, when evaluating UDFs or abstracted variables, any `NULL` evaluations in the series will cause the entire rule function to evaluate as `NULL`, which may be undesirable.
30
31- Description
32
33 `description=f''`
34
35 There is an additional string description field that is able to be emitted alongside the rule itself to external systems such as logging and ticketing systems to help enrich work-streams that may benefit from plain-language context on what the rule criteria is and what the rule may intend to do.
36
37 It may be helpful to include dynamic variables as well to help enrich operational workflows that may need to identify specific values related to the trigger criteria.
38
39
40An example is below of a simple rule using various signal evaluations and out-of-the-box UDFs.
41
42```python
43My_Rule_Name_v2 = Rule(
44 when_all=[
45 # Primary Signal
46 MyFirstValue == True,
47 HasLabel(entity=MyEntityName, label='MyLabel'),
48 ListLength(list=UsersValues) == 5,
49 # Secondary Signal
50 RegexMatch(target=MyStringValue, pattern='(hello|world)'),
51 MySecondValue >= 3,
52 MyThirdValue != Null,
53 # Guardrail Signal
54 (_LocalValue in [1, 2, 3, 5]) or (GlobalValue in ['hello', 'howdy']),
55 not HasLabel(entity=MySecondEntityName, label='MySecondLabel'),
56 ],
57 description=f"{UserA} performed {ActionB} in this way. Emit warning",
58)
59```
60
61### Instrumenting Rules with WhenRules
62
63The `WhenRules()` function allows for the connection of rules with external services, declarations or internal label modifications by listing Rule objects in sequence within the `rules_any=[]` parameter and `EffectBase`. By default, operators and designers can utilize UDFs with predefined effects such as `DeclareVerdict()`, `LabelAdd()`, and `LabelRemove()` on positive rule evaluations.
64
65Below is an example of the use of a WhenRules() block to verify and email and reject a request.
66
67```python
68WhenRules(
69 rules_any=[
70 Enabled_Rule_1,
71 Enabled_Rule_2,
72 # Staged_Rule_1,
73 ],
74 then=[
75 # Verdicts
76 DeclareVerdict(verdict='reject'),
77 # Labels
78 LabelAdd(entity=UserId, label='recently_challenged', expires_after=TimeDelta(days=7)),
79 LabelAdd(entity=UserId, label='verify', apply_if=NotVerified),
80 LabelAdd(entity=Email, label='pending_verify'),
81 LabelAdd(entity=Domain, label='recently_seen', expires_after=TimeDelta(days=7)),
82 ],
83)
84```
85
86WhenRules() must occur after rule creations within the file, and may become difficult to interpret outcomes of rules if too distributed so it can be beneficial to place any effects toward the bottom of workflows.
87
88## Output Sinks
89
90After the rules are all run, a set of output sinks takes the resulting `ExecutionOutput` and performs additional work based on that data. These may be defined as part of a plugin as a means to perform domain specific work.
91
92Some default use cases include a `StdoutOutputSink` which simply outputs the result to the log, a `KafkaOutputSink` which pipes data to Kafka (used for Osprey UI), or the `LabelsSink` which can add some stateful data to be used in future rules executions.
93
94```python
95class StdoutOutputSink(BaseOutputSink):
96 """An output sink that prints to standard out!"""
97
98 def __init__(self, log_sampler: Optional[DynamicLogSampler] = None):
99 pass
100
101 def will_do_work(self, result: ExecutionResult) -> bool:
102 return True
103
104 def push(self, result: ExecutionResult) -> None:
105 print(f'result: {result.extracted_features_json} {result.verdicts}')
106
107 def stop(self) -> None:
108 pass
109```
110
111Passing data to these output sinks is standardized through the use of `Effects`, which are outputs of some functions, usually UDFs.
112
113```python
114def push(self, result: ExecutionResult) -> None:
115 users_to_ban = result.effects[BanUserEffect]
116 ban_users(users_to_ban)
117```
118
119## User Defined Functions (UDFs)
120
121User Defined Functions (UDFs) are plugins that enable users of Osprey to extend and customize their use of the Osprey SML. UDFs are implemented python functions defined and registered as a plugin. They extend the `UDFBase` abstract base class with a set of arguments, and an output. These will be executed whenever called in the sml.
122
123```python
124# example_plugins/text_[contains.py](http://contains.py)
125class TextContainsArguments(ArgumentsBase):
126 text: str
127 phrase: str
128 case_sensitive = False
129
130class TextContains(UDFBase[TextContainsArguments, bool]):
131 def execute(self, execution_context: ExecutionContext, arguments: TextContainsArguments) -> bool:
132 escaped = re.escape(arguments.phrase)
133 pattern = rf'\b{escaped}\b'
134 flags = 0 if [arguments.case](http://arguments.case)_sensitive else re.IGNORECASE
135 regex = re.compile(pattern, flags)
136 return bool([regex.search](http://regex.search)(arguments.text))
137
138# example_plugins/register_[plugins.py](http://plugins.py)
139@hookimpl_osprey
140def register_udfs():
141 return [TextContains]
142```
143
144Usage in SML:
145
146```python
147# example_rules/post_contains_hello.sml
148ContainsHello = Rule(
149 when_all=[
150 EventType == 'create_post',
151 TextContains(text=PostText, phrase='hello'),
152 ],
153 description='Post contains the word "hello"',
154)
155```
156
157### Effects
158
159Plugins may also define external effects, which are useful for performing functionality in your primary service. Effects are simply passed to output sinks at the end of a rule run. These UDFs have an output that extends `EffectBase`, and can be called as a result of a `WhenRules`.
160
161```python
162# example_plugins/src/ban_[user.py](http://user.py)
163class BanUser(UDFBase[BanUserArguments, BanUserEffect]):
164 category = UdfCategories.ENGINE
165
166 def execute(self, execution_context: ExecutionContext, arguments: BanUserArguments) -> BanUserEffect:
167 return BanUserEffect(
168 entity=arguments.entity,
169 comment=arguments.comment,
170 )
171
172# example_rules/post_contains_hello.sml
173WhenRules(
174 rules_any=[ContainsHello],
175 then=[BanUser(entity=UserId, comment='User said "hello"')],
176)
177```
178
179UDF outputs can also implement the `CustomExtractedFeature` interface - which get persisted in the outputs for the UI. `EffectToCustomExtractedFeatureBase` can also be used when effects need additional processing for use in the UI.
180
181## Labels
182**NOTE: Labels are currently not in v0, so users will be unable to add or edit labels via the UI**
183
184Labels are a standard plugin that enable stateful rules, and touch many parts of Osprey. They are effectively tags on various entities, which can be arbitrarily defined.
185
186### Creating Entities
187
188Labels are applied to Entities, which are dynamically interpreted from outputs of the UDF `EntityJson`, usually applied to pieces of data that are generally consistent across actions such as User ID or email.
189
190```python
191# user.sml
192UserId: Entity[str] = EntityJson(
193 type='User',
194 path='$.user_id'
195)
196```
197
198It is possible to create new UDFs that also create entities by having the output of UDF set to `EntityT`.
199
200### Adding Labels
201
202Labels can be added in `WhenRules` clause. This will cause the labels output sink to tag the given entity with the given label at the end of the rules run.
203
204```python
205WhenRules(
206 rules_any=[
207 Sent_Too_Many_DMs,
208 ],
209 then=[
210 LabelAdd(entity=UserId, label='likely_spammer')
211 ],
212)
213```
214
215### Using Labels
216
217Since Labels can be retrieved during a rule run, they can be effectively used as state for your rules.
218
219```python
220Should_Warn_User_Of_Spammer = Rule(
221 when_all=[
222 HasLabel(entity=UserId, label='likely_spammer'),
223 This_Is_A_New_DM,
224 ],
225)
226```
227
228Labels will also be shown in the UI for entities, and can also be set manually. Note that since the UI only searches across actions, `HasLabel` will not work in the Query UI. Instead, you may use `DidAddLabel`, which will be true when the given action added a label to a specific entity.
229
230```python
231# UI Query
232DidAddLabel(entity_type="UserId", label_name="likely_spammer")
233```
234
235## Notable Gotchas
236
237### Nulls
238
239Nulls are the case where a rule or variable in SML does not exist. This can occur for many reasons - either a piece of data is missing or a rule didn’t run. Unlike many programming languages, generally rules with null valued variables will not evaluate that rule (and thus, downstream rules will not evaluate either). The exception cases are when nulls are explicitly checked in a rule. For example:
240
241```python
242Thing: int = JsonData(path='$.property_that_doesnt_exist')
243
244# Evaluates to False
245MyFirstRule = Rule(when_all=[
246 Thing != Null,
247])
248
249# Skips evaluation and sets to Null
250MySecondRule = Rule(when_all=[
251 Thing > 1,
252])
253
254# Skips evaluation and sets to Null
255MyThirdRule = Rule(when_all=[
256 MySecondRule,
257])
258```
259
260## Workflow Structure and File Placement
261
262SML files can be composed to make your rules easier to understand. The `Import` statement allows you to include rules and variables found in other files.
263
264```python
265# models/action_name.sml
266ActionName = "foo"
267
268# main.sml
269Import(
270 rules=[
271 'models/action_name.sml',
272 'models/http_request.sml',
273 ]
274)
275
276MyRule = Rule(when_all=[ActionName == "foo"])
277```
278
279`Require` allows you to selectively run other SML scripts. Requires supports templating and conditionals, allowing scripts to be filtered out if necessary. This is important in situations where some rules or UDFs are particularly expensive to run (such as making a call to an AI service, for example).
280
281```python
282# main.sml
283Require(rule=f'actions/{ActionName}.sml') # will execute 'actions/foo.sml'
284
285Require(rule='ai_services/my_ai_service.sml', require_if=ActionName == "register")
286```