···4455## Creating Rules
6677-Rules in Osprey are written in `Some Madeup Language (SML)` and follow most syntax conventions present in the Osprey Query UI. SML is a subset of Python with additional restrictions to make the rules simpler to craft.
77+Osprey rules are written in SML (Some Madeup Language) which is a subset of Python with additional restrictions to simplify rule writing. You may write rules that are specific to
88+single event types on a network, or ones that are applied to multiple event types.
8999-Rules by themselves only create variables, and without a corresponding `WhenRules()` function call, the rule will have no effects outside of evaluation and query functionality.
1010+By themselves, rules only create variables, and without a corresponding `WhenRules()` function call, the rule will have no effects outside of evaluation and query functionality.
10111111-Rules at present support the following concepts through the `Rule()` function of the same name.
1212+Rules currently support the following concepts through the `Rule(...)` function of the same name.
12131314- Name
14151515- `Rule_Name = Rule()`
1616+ `Rule_Name = Rule(...)`
16171717- The name of the rule also functions as a conventional “RuleId” and the name of the bool that can be used to query individual rule hits in the Osprey Query UI. As a result, changing the name of a rule after activation may affect historical query results in the UI if not logged externally.
1818+ The name of the rule also functions as a conventional "RuleId" and the name of the bool that can be used to query individual rule hits in the Osprey Query UI. As a result, changing the name of a rule after activation may affect historical query results in the UI if not logged externally.
18191920- Logic
20212122 `when_all=[]`
22232323- The actual logic that will be used to evaluate Osprey rules is all encompassed as single comma-delimited list of signals within the `when_all` parameter of the Rule() function and supports the use of Labels, Plugins, UDFs and other values to help enrich heuristics.
2424+ The actual logic that will be used to evaluate Osprey rules is all encompassed as single comma-delimited list of signals within the `when_all` parameter of the `Rule(...)` function and supports the use of Labels, Plugins, UDFs and other values to help enrich heuristics.
24252526 At present, when evaluating UDFs or abstracted variables, any `NULL` evaluations in the series will cause the entire rule function to evaluate as `NULL`, which may be undesirable.
2627···5455)
5556```
56575757-## Instrumenting Rules with WhenRules
5858+## Rule Structuring
5959+6060+You will likely find it useful to maintain two subdirectories inside of your main rules directory - a `rules` directory where actual logic will be added and a `models` directory for defining the various features that occur in any or specific event types. For example, your structure may look something like this:
58615959-The `WhenRules()` function allows for the connection of rules with external services, declarations or internal label modifications by listing Rule objects in sequence within the `rules_any=[]` parameter and `EffectBase`. By default, operators and designers can utilize UDFs with predefined effects such as `DeclareVerdict()`, `LabelAdd()`, and `LabelRemove()` on positive rule evaluations.
6262+```bash
6363+example-rules/
6464+| rules/
6565+| | record/
6666+| | | post/
6767+| | | | first_post_link.sml
6868+| | | | index.sml
6969+| | | like/
7070+| | | | like_own_post.sml
7171+| | | | index.sml
7272+| | account/
7373+| | | signup/
7474+| | | | high_risk_signup.sml
7575+| | | | index.sml
7676+| | index.sml
7777+| models/
7878+| | record/
7979+| | | post.sml
8080+| | | like.sml
8181+| | account/
8282+| | | signup.sml
8383+| main.sml
8484+```
8585+8686+The `main.sml` file at the root of your rules directory serves as the entrypoint. It uses `Import` and `Require` statements to control which other files are loaded and when, allowing you to compose together logic across the project. This sort of structure lets you define rules and models that are specific to certain event types so that only the necessary rules are run for various event types. For example, you likely have some rules that should only be run on a `post` event, since only a `post` will have features like `text` or `mention_count`.
8787+8888+Inside of each directory, you may maintain an `index.sml` file that will define the conditional logic in which the rules inside that directory are actually included for execution. Although you could handle all of this conditional logic inside of a single file, maintaining separate `index.sml`s per directory greatly helps with neat organization. See [Workflow Structure and File Placement](#workflow-structure-and-file-placement) for more on `Import` and `Require`.
8989+9090+## Models
9191+9292+Before you actually write a rule, you'll need to define a "model" for an event type. For this example, we will assume that you run a social media website that lets users create posts, either at the "top level" or as a reply to another top level post. Each post may include text, mentions of other users on your network, and an optional link embed in the post. Let's say that the event's JSON structure looks like this:
9393+9494+```json
9595+{
9696+ "eventType": "userPost",
9797+ "user": {
9898+ "userId": "user_id_789",
9999+ "handle": "carol",
100100+ "postCount": 3,
101101+ "accountAgeSeconds": 9002
102102+ },
103103+ "postId": "abc123xyz",
104104+ "replyId": null,
105105+ "text": "Is anyone online right now? @alice or @bob, you there? If so check this video out",
106106+ "mentionIds": ["user_id_123", "user_id_456"],
107107+ "embedLink": "https://youtube.com/watch?id=1"
108108+}
109109+```
110110+111111+Inside of our `models/record` directory, we should now create a `post.sml` file where we will define the features for a post.
112112+113113+```python
114114+PostId: Entity[str] = EntityJson(
115115+ type='PostId',
116116+ path='$.postId',
117117+)
118118+119119+PostText: str = JsonData(
120120+ path='$.text',
121121+)
122122+123123+MentionIds: List[str] = JsonData(
124124+ path='$.mentionIds',
125125+)
126126+127127+EmbedLink: Optional[str] = JsonData(
128128+ path='$.embedLink',
129129+ required=False,
130130+)
131131+132132+ReplyId: Entity[str] = JsonData(
133133+ path='$.replyId',
134134+ required=False,
135135+)
136136+```
137137+138138+The [`JsonData` UDF](#user-defined-functions-udfs) lets us take the event's JSON and define features based on the contents of that JSON. These features can then be referenced in other rules that we import the `models/record/post.sml` model into. If you have any values inside your JSON object that may not always be present, you can set `required` to `False`, and these features will be `None` whenever the feature is not present.
139139+140140+Note that we did not actually create any features for things like `userId` or `handle`. That is because these values will be present in *any* event. It wouldn't be very nice to have to copy these features into each event type's model. Therefore, we will actually create a `base.sml` model that defines these features which are always present. Inside of `models/base.sml`, let's define these.
141141+142142+```python
143143+EventType = JsonData(
144144+ path='$.eventType',
145145+)
146146+147147+UserId: Entity[str] = EntityJson(
148148+ type='UserId',
149149+ path='$.user.userId',
150150+)
151151+152152+Handle: Entity[str] = EntityJson(
153153+ type='Handle',
154154+ path='$.user.handle',
155155+)
156156+157157+PostCount: int = JsonData(
158158+ path='$.user.postCount',
159159+)
160160+161161+AccountAgeSeconds: int = JsonData(
162162+ path='$.user.accountAgeSeconds',
163163+)
164164+```
165165+166166+Here, instead of simply using `JsonData`, we instead use the `EntityJson` UDF for the `UserID`. This is covered in the [UDFs section](#user-defined-functions-udfs), but as a rule of thumb, you likely will want to have values for things like a user's ID set to be entities. This will help more later, such as when doing data explorations within the Osprey UI.
167167+168168+### Model Hierarchy
169169+170170+In practice, you may find it useful to create a hierarchy of base models:
171171+172172+- `base.sml` for features present in every event (user IDs, handles, account stats, etc.)
173173+- `account_base.sml` for features that appear only in account related events, but always appear in each account related event. Similarly, you may add one like `record_base.sml` for those features which appear in all record events.
174174+175175+This type of hierarchy prevents duplication (which Osprey does not allow) and ensures features are defined at the appropriate level of abstraction.
176176+177177+## Effects with WhenRules
178178+179179+The `WhenRules()` function allows for creating effects that trigger external services, create declarations, or modify internal labels by listing `Rule` objects in sequence within the
180180+`rules_any` parameter of `WhenRules()`. By default, operators and designers may utilize UDFs with predefined effects such as `DeclareVerdict()`, `LabelAdd()`, or `LabelRemove()` upon
181181+positive rule evaluation.
6018261183Below is an example of the use of a WhenRules() block to verify and email and reject a request.
62184···65187 rules_any=[
66188 Enabled_Rule_1,
67189 Enabled_Rule_2,
6868- # Staged_Rule_1,
190190+ # Disabled_Rule_1,
69191 ],
70192 then=[
71193 # Verdicts
···79201)
80202```
812038282-WhenRules() must occur after rule creations within the file, and may become difficult to interpret outcomes of rules if too distributed so it can be beneficial to place any effects toward the bottom of workflows.
204204+`WhenRules()` must be placed after rule declaration within a file, and it may become difficult to interpret outcomes of rules that are too distributed. Therefore, it may be beneficial
205205+to place any effects toward the bottom of workflows.
8320684207## Output Sinks
852088686-After the rules are all run, a set of output sinks takes the resulting `ExecutionOutput` and performs additional work based on that data. These may be defined as part of a plugin as a means to perform domain specific work.
209209+After all rules are evaluated for an input event, a set of output sinks takes the resulting `ExecutionResult` and performs additional work based on that data. These may be defined
210210+as part of a plugin for performing domain specific work.
872118888-Some default use cases include a `StdoutOutputSink` which simply outputs the result to the log, a `KafkaOutputSink` which pipes data to Kafka (used for Osprey UI), or the `LabelsSink` which can add some stateful data to be used in future rules executions.
212212+Some default use cases include a `StdoutOutputSink` which simply outputs the result to the log, a `KafkaOutputSink` which pipes data to Kafka (used for Osprey UI), or the
213213+`LabelOutputSink` which can add some stateful data to be used in future rules executions.
8921490215```python
91216class StdoutOutputSink(BaseOutputSink):
···114239115240## User Defined Functions (UDFs)
116241117117-User Defined Functions (UDFs) are plugins that enable users of Osprey to extend and customize their use of the Osprey SML. UDFs are implemented python functions defined and registered as a plugin. They extend the `UDFBase` abstract base class with a set of arguments, and an output. These will be executed whenever called in the sml.
242242+User Defined Functions (UDFs) are plugins written in Python that enable users of Osprey to extend and customize their use of the Osprey SML. UDFs are implemented as Python functions and are registered
243243+as a plugin. They extend the `UDFBase` abstract base class with a set of arguments and an output. These will be executed whenever called in SML.
118244119245```python
120120-# example_plugins/text_[contains.py](http://contains.py)
246246+# example_plugins/text_contains.py
121247class TextContainsArguments(ArgumentsBase):
122248 text: str
123249 phrase: str
···127253 def execute(self, execution_context: ExecutionContext, arguments: TextContainsArguments) -> bool:
128254 escaped = re.escape(arguments.phrase)
129255 pattern = rf'\b{escaped}\b'
130130- flags = 0 if [arguments.case](http://arguments.case)_sensitive else re.IGNORECASE
256256+ flags = 0 if arguments.case_sensitive else re.IGNORECASE
131257 regex = re.compile(pattern, flags)
132132- return bool([regex.search](http://regex.search)(arguments.text))
258258+ return bool(regex.search(arguments.text))
133259134134-# example_plugins/register_[plugins.py](http://plugins.py)
260260+# example_plugins/register_plugins.py
135261@hookimpl_osprey
136262def register_udfs():
137263 return [TextContains]
···150276)
151277```
152278153153-### Effects
279279+### Effect UDFs
154280155155-Plugins may also define external effects, which are useful for performing functionality in your primary service. Effects are simply passed to output sinks at the end of a rule run. These UDFs have an output that extends `EffectBase`, and can be called as a result of a `WhenRules`.
281281+Plugins may also define external effects, which are useful for performing functionality in your primary service. Effects are simply passed to output sinks at the end of a rule run.
282282+These UDFs have an output that extends `EffectBase`, and can be called as a result of a `WhenRules`.
156283157284```python
158158-# example_plugins/src/ban_[user.py](http://user.py)
285285+# example_plugins/src/ban_user.py
159286class BanUser(UDFBase[BanUserArguments, BanUserEffect]):
160287 category = UdfCategories.ENGINE
161288···176303177304## Labels
178305179179-Labels are a standard plugin that enable stateful rules, and touch many parts of Osprey. They are effectively tags on various entities, which can be arbitrarily defined.
306306+Labels are a standard plugin that enable stateful rules, and touch many parts of Osprey. They are effectively tags on various entities, which may be arbitrarily defined.
180307181308### Creating Entities
182309···194321195322### Adding Labels
196323197197-Labels can be added in `WhenRules` clause. This will cause the labels output sink to tag the given entity with the given label at the end of the rules run.
324324+Labels may be added in a `WhenRules()` clause. This will cause the labels output sink to tag the given entity with the given label at the end of the rules run.
198325199326```python
200327WhenRules(
···209336210337### Using Labels
211338212212-Since Labels can be retrieved during a rule run, they can be effectively used as state for your rules.
339339+Since labels may be retrieved during a rule run, they can be effectively used as state for your rules.
213340214341```python
215342Should_Warn_User_Of_Spammer = Rule(
···220347)
221348```
222349223223-Labels will also be shown in the UI for entities, and can also be set manually. Note that since the UI only searches across actions, `HasLabel` will not work in the Query UI. Instead, you may use `DidAddLabel`, which will be true when the given action added a label to a specific entity.
350350+Labels will also be shown in the UI for entities, and can also be set manually. Note that since the UI only searches across actions, `HasLabel()` will not work in the Query UI.
351351+Instead, you may use `DidAddLabel`, which will be true when the given action added a label to a specific entity.
224352225353```python
226354# UI Query
···231359232360### Nulls
233361234234-Nulls are the case where a rule or variable in SML does not exist. This can occur for many reasons - either a piece of data is missing or a rule didn’t run. Unlike many programming languages, generally rules with null valued variables will not evaluate that rule (and thus, downstream rules will not evaluate either). The exception cases are when nulls are explicitly checked in a rule. For example:
362362+Nulls are the case where a rule or variable in SML does not exist. This can occur for many reasons - either a piece of data is missing or a rule didn't run. Unlike many programming languages, generally rules with null valued variables will not evaluate that rule (and thus, downstream rules will not evaluate either). The exception cases are when nulls are explicitly checked in a rule. For example:
235363236364```python
237365Thing: int = JsonData(path='$.property_that_doesnt_exist')
···252380])
253381```
254382255255-## Workflow Structure and File Placement
383383+### Workflow Structure and File Placement
256384257385SML files can be composed to make your rules easier to understand. The `Import` statement allows you to include rules and variables found in other files.
258386···279407280408Require(rule='ai_services/my_ai_service.sml', require_if=ActionName == "register")
281409```
410410+411411+## Full Example
412412+413413+The following is a complete walkthrough of writing a rule using the project structure described above. The goal is to flag accounts whose first post mentions at least one user and includes a link.
414414+415415+### Writing the Rule
416416+417417+We'll create `rules/record/post/first_post_link.sml` for the rule logic. This file defines both the conditions that cause the rule to evaluate to `True` and the actions to take when it does.
418418+419419+```python
420420+# First, import the models that you will need inside of this rule
421421+Import(
422422+ rules=[
423423+ 'models/base.sml',
424424+ 'models/record/post.sml',
425425+ ],
426426+)
427427+428428+# Next, define a variable that uses the `Rule` UDF
429429+FirstPostLinkRule = Rule(
430430+ # Set the conditions in which this rule will be `True`
431431+ when_all=[
432432+ PostCount == 1, # if this is the user's first post
433433+ EmbedLink != None, # if there is a link inside of the post
434434+ ListLength(list=MentionIds) >= 1, # if there is at least one mention in the post
435435+ ],
436436+ description='First post for user includes a link embed',
437437+)
438438+439439+# Finally, set which effect UDFs will be triggered
440440+WhenRules(
441441+ rules_any=[FirstPostLinkRule],
442442+ then=[
443443+ # This is a custom effect UDF that we have implemented
444444+ ReportRecord(
445445+ entity=PostId,
446446+ comment='This was the first post by a user and included a link',
447447+ severity=3,
448448+ ),
449449+ ],
450450+)
451451+```
452452+453453+### Wiring Up the Rule
454454+455455+We want this rule to run *only* when the event is a post event. Using the project structure described above, this involves three files.
456456+457457+First, `main.sml` at the project root includes a single `Require` statement pointing to the top-level rules index:
458458+459459+```python
460460+Require(
461461+ rule='rules/index.sml',
462462+)
463463+```
464464+465465+Next, `rules/index.sml` conditionally requires the post rules when the event type matches:
466466+467467+```python
468468+Import(
469469+ rules=[
470470+ 'models/base.sml',
471471+ ],
472472+)
473473+474474+Require(
475475+ rule='rules/record/post/index.sml',
476476+ require_if=EventType == 'userPost',
477477+)
478478+```
479479+480480+Finally, `rules/record/post/index.sml` requires the new rule:
481481+482482+```python
483483+Import(
484484+ rules=[
485485+ 'models/base.sml',
486486+ 'models/record/post.sml',
487487+ ],
488488+)
489489+490490+Require(
491491+ rule='rules/record/post/first_post_link.sml',
492492+)
493493+```