docs/rules.md at 106145d75ff4dc44f9ce3a2de38ab39bfbca6689

roost.tools / osprey
fork
Mirror of https://github.com/roostorg/osprey github.com/roostorg/osprey
fork
osprey / docs / rules.md
at 106145d75ff4dc44f9ce3a2de38ab39bfbca6689 286 lines 11 kB view raw view rendered
wrap content
Rashmi Raghunandan docs: fix LabelRemote typo to LabelRemove in rules.md (#101) 4mo ago
f69c543e
  1# Osprey Docs
  2
  3# Osprey Docs
  4
  5![images/rules_architecture.png](images/rules_architecture.png)
  6
  7## Rules
  8
  9### Creating Rules
 10
 11Rules in Osprey are written in `Some Madeup Language (SML)` and follow most syntax conventions present in the Osprey Query UI. SML is a subset of Python with additional restrictions to make the rules simpler to craft.
 12
 13Rules by themselves only create variables, and without a corresponding `WhenRules()` function call, the rule will have no effects outside of evaluation and query functionality.
 14
 15Rules at present support the following concepts through the `Rule()` function of the same name.
 16
 17- Name
 18
 19    `Rule_Name = Rule()`
 20
 21    The name of the rule also functions as a conventional “RuleId” and the name of the bool that can be used to query individual rule hits in the Osprey Query UI. As a result, changing the name of a rule after activation may affect historical query results in the UI if not logged externally.
 22
 23- Logic
 24
 25    `when_all=[]`
 26
 27    The actual logic that will be used to evaluate Osprey rules is all encompassed as single comma-delimited list of signals within the `when_all` parameter of the Rule() function and supports the use of Labels, Plugins, UDFs and other values to help enrich heuristics.
 28
 29    At present, when evaluating UDFs or abstracted variables, any `NULL` evaluations in the series will cause the entire rule function to evaluate as `NULL`, which may be undesirable.
 30
 31- Description
 32
 33    `description=f''`
 34
 35    There is an additional string description field that is able to be emitted alongside the rule itself to external systems such as logging and ticketing systems to help enrich work-streams that may benefit from plain-language context on what the rule criteria is and what the rule may intend to do.
 36
 37    It may be helpful to include dynamic variables as well to help enrich operational workflows that may need to identify specific values related to the trigger criteria.
 38
 39
 40An example is below of a simple rule using various signal evaluations and out-of-the-box UDFs.
 41
 42```python
 43My_Rule_Name_v2 = Rule(
 44    when_all=[
 45        # Primary Signal
 46        MyFirstValue == True,
 47        HasLabel(entity=MyEntityName, label='MyLabel'),
 48        ListLength(list=UsersValues) == 5,
 49        # Secondary Signal
 50        RegexMatch(target=MyStringValue, pattern='(hello|world)'),
 51        MySecondValue >= 3,
 52        MyThirdValue != Null,
 53        # Guardrail Signal
 54        (_LocalValue in [1, 2, 3, 5]) or (GlobalValue in ['hello', 'howdy']),
 55        not HasLabel(entity=MySecondEntityName, label='MySecondLabel'),
 56    ],
 57    description=f"{UserA} performed {ActionB} in this way. Emit warning",
 58)
 59```
 60
 61### Instrumenting Rules with WhenRules
 62
 63The `WhenRules()` function allows for the connection of rules with external services, declarations or internal label modifications by listing Rule objects in sequence within the `rules_any=[]` parameter and `EffectBase`. By default, operators and designers can utilize UDFs with predefined effects such as `DeclareVerdict()`, `LabelAdd()`, and `LabelRemove()` on positive rule evaluations.
 64
 65Below is an example of the use of a WhenRules() block to verify and email and reject a request.
 66
 67```python
 68WhenRules(
 69    rules_any=[
 70        Enabled_Rule_1,
 71        Enabled_Rule_2,
 72        # Staged_Rule_1,
 73    ],
 74    then=[
 75        # Verdicts
 76        DeclareVerdict(verdict='reject'),
 77        # Labels
 78        LabelAdd(entity=UserId, label='recently_challenged', expires_after=TimeDelta(days=7)),
 79        LabelAdd(entity=UserId, label='verify', apply_if=NotVerified),
 80        LabelAdd(entity=Email, label='pending_verify'),
 81        LabelAdd(entity=Domain, label='recently_seen', expires_after=TimeDelta(days=7)),
 82    ],
 83)
 84```
 85
 86WhenRules() must occur after rule creations within the file, and may become difficult to interpret outcomes of rules if too distributed so it can be beneficial to place any effects toward the bottom of workflows.
 87
 88## Output Sinks
 89
 90After the rules are all run, a set of output sinks takes the resulting `ExecutionOutput` and performs additional work based on that data. These may be defined as part of a plugin as a means to perform domain specific work.
 91
 92Some default use cases include a `StdoutOutputSink` which simply outputs the result to the log, a `KafkaOutputSink` which pipes data to Kafka (used for Osprey UI), or the `LabelsSink` which can add some stateful data to be used in future rules executions.
 93
 94```python
 95class StdoutOutputSink(BaseOutputSink):
 96    """An output sink that prints to standard out!"""
 97
 98    def __init__(self, log_sampler: Optional[DynamicLogSampler] = None):
 99        pass
100
101    def will_do_work(self, result: ExecutionResult) -> bool:
102        return True
103
104    def push(self, result: ExecutionResult) -> None:
105        print(f'result: {result.extracted_features_json} {result.verdicts}')
106
107    def stop(self) -> None:
108        pass
109```
110
111Passing data to these output sinks is standardized through the use of `Effects`, which are outputs of some functions, usually UDFs.
112
113```python
114def push(self, result: ExecutionResult) -> None:
115    users_to_ban = result.effects[BanUserEffect]
116    ban_users(users_to_ban)
117```
118
119## User Defined Functions (UDFs)
120
121User Defined Functions (UDFs) are plugins that enable users of Osprey to extend and customize their use of the Osprey SML. UDFs are implemented python functions defined and registered as a plugin. They extend the `UDFBase` abstract base class with a set of arguments, and an output. These will be executed whenever called in the sml.
122
123```python
124# example_plugins/text_[contains.py](http://contains.py)
125class TextContainsArguments(ArgumentsBase):
126    text: str
127    phrase: str
128    case_sensitive = False
129
130class TextContains(UDFBase[TextContainsArguments, bool]):
131    def execute(self, execution_context: ExecutionContext, arguments: TextContainsArguments) -> bool:
132        escaped = re.escape(arguments.phrase)
133        pattern = rf'\b{escaped}\b'
134        flags = 0 if [arguments.case](http://arguments.case)_sensitive else re.IGNORECASE
135        regex = re.compile(pattern, flags)
136        return bool([regex.search](http://regex.search)(arguments.text))
137
138# example_plugins/register_[plugins.py](http://plugins.py)
139@hookimpl_osprey
140def register_udfs():
141    return [TextContains]
142```
143
144Usage in SML:
145
146```python
147# example_rules/post_contains_hello.sml
148ContainsHello = Rule(
149  when_all=[
150    EventType == 'create_post',
151    TextContains(text=PostText, phrase='hello'),
152  ],
153  description='Post contains the word "hello"',
154)
155```
156
157### Effects
158
159Plugins may also define external effects, which are useful for performing functionality in your primary service. Effects are simply passed to output sinks at the end of a rule run. These UDFs have an output that extends `EffectBase`, and can be called as a result of a `WhenRules`.
160
161```python
162# example_plugins/src/ban_[user.py](http://user.py)
163class BanUser(UDFBase[BanUserArguments, BanUserEffect]):
164    category = UdfCategories.ENGINE
165
166    def execute(self, execution_context: ExecutionContext, arguments: BanUserArguments) -> BanUserEffect:
167        return BanUserEffect(
168            entity=arguments.entity,
169            comment=arguments.comment,
170        )
171
172# example_rules/post_contains_hello.sml
173WhenRules(
174  rules_any=[ContainsHello],
175  then=[BanUser(entity=UserId, comment='User said "hello"')],
176)
177```
178
179UDF outputs can also implement the `CustomExtractedFeature` interface - which get persisted in the outputs for the UI. `EffectToCustomExtractedFeatureBase` can also be used when effects need additional processing for use in the UI.
180
181## Labels
182**NOTE: Labels are currently not in v0, so users will be unable to add or edit labels via the UI**
183
184Labels are a standard plugin that enable stateful rules, and touch many parts of Osprey. They are effectively tags on various entities, which can be arbitrarily defined.
185
186### Creating Entities
187
188Labels are applied to Entities, which are dynamically interpreted from outputs of the UDF `EntityJson`, usually applied to pieces of data that are generally consistent across actions such as User ID or email.
189
190```python
191# user.sml
192UserId: Entity[str] = EntityJson(
193  type='User',
194  path='$.user_id'
195)
196```
197
198It is possible to create new UDFs that also create entities by having the output of UDF set to `EntityT`.
199
200### Adding Labels
201
202Labels can be added in `WhenRules` clause. This will cause the labels output sink to tag the given entity with the given label at the end of the rules run.
203
204```python
205WhenRules(
206    rules_any=[
207        Sent_Too_Many_DMs,
208    ],
209    then=[
210        LabelAdd(entity=UserId, label='likely_spammer')
211    ],
212)
213```
214
215### Using Labels
216
217Since Labels can be retrieved during a rule run, they can be effectively used as state for your rules.
218
219```python
220Should_Warn_User_Of_Spammer = Rule(
221    when_all=[
222        HasLabel(entity=UserId, label='likely_spammer'),
223        This_Is_A_New_DM,
224    ],
225)
226```
227
228Labels will also be shown in the UI for entities, and can also be set manually. Note that since the UI only searches across actions, `HasLabel` will not work in the Query UI. Instead, you may use `DidAddLabel`, which will be true when the given action added a label to a specific entity.
229
230```python
231# UI Query
232DidAddLabel(entity_type="UserId", label_name="likely_spammer")
233```
234
235## Notable Gotchas
236
237### Nulls
238
239Nulls are the case where a rule or variable in SML does not exist. This can occur for many reasons - either a piece of data is missing or a rule didn’t run. Unlike many programming languages, generally rules with null valued variables will not evaluate that rule (and thus, downstream rules will not evaluate either). The exception cases are when nulls are explicitly checked in a rule. For example:
240
241```python
242Thing: int = JsonData(path='$.property_that_doesnt_exist')
243
244# Evaluates to False
245MyFirstRule = Rule(when_all=[
246    Thing != Null,
247])
248
249# Skips evaluation and sets to Null
250MySecondRule = Rule(when_all=[
251    Thing > 1,
252])
253
254# Skips evaluation and sets to Null
255MyThirdRule = Rule(when_all=[
256    MySecondRule,
257])
258```
259
260## Workflow Structure and File Placement
261
262SML files can be composed to make your rules easier to understand. The `Import` statement allows you to include rules and variables found in other files.
263
264```python
265# models/action_name.sml
266ActionName = "foo"
267
268# main.sml
269Import(
270    rules=[
271        'models/action_name.sml',
272        'models/http_request.sml',
273    ]
274)
275
276MyRule = Rule(when_all=[ActionName == "foo"])
277```
278
279`Require` allows you to selectively run other SML scripts. Requires supports templating and conditionals, allowing scripts to be filtered out if necessary. This is important in situations where some rules or UDFs are particularly expensive to run (such as making a call to an AI service, for example).
280
281```python
282# main.sml
283Require(rule=f'actions/{ActionName}.sml')  # will execute 'actions/foo.sml'
284
285Require(rule='ai_services/my_ai_service.sml', require_if=ActionName == "register")
286```
Configure Feed

Configure Feed