Mirror of https://github.com/roostorg/osprey github.com/roostorg/osprey
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

update/add to rules documentation (#145)

Co-authored-by: Emelia Smith <ThisIsMissEm@users.noreply.github.com>

authored by

hailey
Emelia Smith
and committed by
GitHub
941f169e 63abe402

+238 -26
+238 -26
docs/rules.md
··· 4 4 5 5 ## Creating Rules 6 6 7 - Rules in Osprey are written in `Some Madeup Language (SML)` and follow most syntax conventions present in the Osprey Query UI. SML is a subset of Python with additional restrictions to make the rules simpler to craft. 7 + Osprey rules are written in SML (Some Madeup Language) which is a subset of Python with additional restrictions to simplify rule writing. You may write rules that are specific to 8 + single event types on a network, or ones that are applied to multiple event types. 8 9 9 - Rules by themselves only create variables, and without a corresponding `WhenRules()` function call, the rule will have no effects outside of evaluation and query functionality. 10 + By themselves, rules only create variables, and without a corresponding `WhenRules()` function call, the rule will have no effects outside of evaluation and query functionality. 10 11 11 - Rules at present support the following concepts through the `Rule()` function of the same name. 12 + Rules currently support the following concepts through the `Rule(...)` function of the same name. 12 13 13 14 - Name 14 15 15 - `Rule_Name = Rule()` 16 + `Rule_Name = Rule(...)` 16 17 17 - The name of the rule also functions as a conventional “RuleId” and the name of the bool that can be used to query individual rule hits in the Osprey Query UI. As a result, changing the name of a rule after activation may affect historical query results in the UI if not logged externally. 18 + The name of the rule also functions as a conventional "RuleId" and the name of the bool that can be used to query individual rule hits in the Osprey Query UI. As a result, changing the name of a rule after activation may affect historical query results in the UI if not logged externally. 18 19 19 20 - Logic 20 21 21 22 `when_all=[]` 22 23 23 - The actual logic that will be used to evaluate Osprey rules is all encompassed as single comma-delimited list of signals within the `when_all` parameter of the Rule() function and supports the use of Labels, Plugins, UDFs and other values to help enrich heuristics. 24 + The actual logic that will be used to evaluate Osprey rules is all encompassed as single comma-delimited list of signals within the `when_all` parameter of the `Rule(...)` function and supports the use of Labels, Plugins, UDFs and other values to help enrich heuristics. 24 25 25 26 At present, when evaluating UDFs or abstracted variables, any `NULL` evaluations in the series will cause the entire rule function to evaluate as `NULL`, which may be undesirable. 26 27 ··· 54 55 ) 55 56 ``` 56 57 57 - ## Instrumenting Rules with WhenRules 58 + ## Rule Structuring 59 + 60 + You will likely find it useful to maintain two subdirectories inside of your main rules directory - a `rules` directory where actual logic will be added and a `models` directory for defining the various features that occur in any or specific event types. For example, your structure may look something like this: 58 61 59 - The `WhenRules()` function allows for the connection of rules with external services, declarations or internal label modifications by listing Rule objects in sequence within the `rules_any=[]` parameter and `EffectBase`. By default, operators and designers can utilize UDFs with predefined effects such as `DeclareVerdict()`, `LabelAdd()`, and `LabelRemove()` on positive rule evaluations. 62 + ```bash 63 + example-rules/ 64 + | rules/ 65 + | | record/ 66 + | | | post/ 67 + | | | | first_post_link.sml 68 + | | | | index.sml 69 + | | | like/ 70 + | | | | like_own_post.sml 71 + | | | | index.sml 72 + | | account/ 73 + | | | signup/ 74 + | | | | high_risk_signup.sml 75 + | | | | index.sml 76 + | | index.sml 77 + | models/ 78 + | | record/ 79 + | | | post.sml 80 + | | | like.sml 81 + | | account/ 82 + | | | signup.sml 83 + | main.sml 84 + ``` 85 + 86 + The `main.sml` file at the root of your rules directory serves as the entrypoint. It uses `Import` and `Require` statements to control which other files are loaded and when, allowing you to compose together logic across the project. This sort of structure lets you define rules and models that are specific to certain event types so that only the necessary rules are run for various event types. For example, you likely have some rules that should only be run on a `post` event, since only a `post` will have features like `text` or `mention_count`. 87 + 88 + Inside of each directory, you may maintain an `index.sml` file that will define the conditional logic in which the rules inside that directory are actually included for execution. Although you could handle all of this conditional logic inside of a single file, maintaining separate `index.sml`s per directory greatly helps with neat organization. See [Workflow Structure and File Placement](#workflow-structure-and-file-placement) for more on `Import` and `Require`. 89 + 90 + ## Models 91 + 92 + Before you actually write a rule, you'll need to define a "model" for an event type. For this example, we will assume that you run a social media website that lets users create posts, either at the "top level" or as a reply to another top level post. Each post may include text, mentions of other users on your network, and an optional link embed in the post. Let's say that the event's JSON structure looks like this: 93 + 94 + ```json 95 + { 96 + "eventType": "userPost", 97 + "user": { 98 + "userId": "user_id_789", 99 + "handle": "carol", 100 + "postCount": 3, 101 + "accountAgeSeconds": 9002 102 + }, 103 + "postId": "abc123xyz", 104 + "replyId": null, 105 + "text": "Is anyone online right now? @alice or @bob, you there? If so check this video out", 106 + "mentionIds": ["user_id_123", "user_id_456"], 107 + "embedLink": "https://youtube.com/watch?id=1" 108 + } 109 + ``` 110 + 111 + Inside of our `models/record` directory, we should now create a `post.sml` file where we will define the features for a post. 112 + 113 + ```python 114 + PostId: Entity[str] = EntityJson( 115 + type='PostId', 116 + path='$.postId', 117 + ) 118 + 119 + PostText: str = JsonData( 120 + path='$.text', 121 + ) 122 + 123 + MentionIds: List[str] = JsonData( 124 + path='$.mentionIds', 125 + ) 126 + 127 + EmbedLink: Optional[str] = JsonData( 128 + path='$.embedLink', 129 + required=False, 130 + ) 131 + 132 + ReplyId: Entity[str] = JsonData( 133 + path='$.replyId', 134 + required=False, 135 + ) 136 + ``` 137 + 138 + The [`JsonData` UDF](#user-defined-functions-udfs) lets us take the event's JSON and define features based on the contents of that JSON. These features can then be referenced in other rules that we import the `models/record/post.sml` model into. If you have any values inside your JSON object that may not always be present, you can set `required` to `False`, and these features will be `None` whenever the feature is not present. 139 + 140 + Note that we did not actually create any features for things like `userId` or `handle`. That is because these values will be present in *any* event. It wouldn't be very nice to have to copy these features into each event type's model. Therefore, we will actually create a `base.sml` model that defines these features which are always present. Inside of `models/base.sml`, let's define these. 141 + 142 + ```python 143 + EventType = JsonData( 144 + path='$.eventType', 145 + ) 146 + 147 + UserId: Entity[str] = EntityJson( 148 + type='UserId', 149 + path='$.user.userId', 150 + ) 151 + 152 + Handle: Entity[str] = EntityJson( 153 + type='Handle', 154 + path='$.user.handle', 155 + ) 156 + 157 + PostCount: int = JsonData( 158 + path='$.user.postCount', 159 + ) 160 + 161 + AccountAgeSeconds: int = JsonData( 162 + path='$.user.accountAgeSeconds', 163 + ) 164 + ``` 165 + 166 + Here, instead of simply using `JsonData`, we instead use the `EntityJson` UDF for the `UserID`. This is covered in the [UDFs section](#user-defined-functions-udfs), but as a rule of thumb, you likely will want to have values for things like a user's ID set to be entities. This will help more later, such as when doing data explorations within the Osprey UI. 167 + 168 + ### Model Hierarchy 169 + 170 + In practice, you may find it useful to create a hierarchy of base models: 171 + 172 + - `base.sml` for features present in every event (user IDs, handles, account stats, etc.) 173 + - `account_base.sml` for features that appear only in account related events, but always appear in each account related event. Similarly, you may add one like `record_base.sml` for those features which appear in all record events. 174 + 175 + This type of hierarchy prevents duplication (which Osprey does not allow) and ensures features are defined at the appropriate level of abstraction. 176 + 177 + ## Effects with WhenRules 178 + 179 + The `WhenRules()` function allows for creating effects that trigger external services, create declarations, or modify internal labels by listing `Rule` objects in sequence within the 180 + `rules_any` parameter of `WhenRules()`. By default, operators and designers may utilize UDFs with predefined effects such as `DeclareVerdict()`, `LabelAdd()`, or `LabelRemove()` upon 181 + positive rule evaluation. 60 182 61 183 Below is an example of the use of a WhenRules() block to verify and email and reject a request. 62 184 ··· 65 187 rules_any=[ 66 188 Enabled_Rule_1, 67 189 Enabled_Rule_2, 68 - # Staged_Rule_1, 190 + # Disabled_Rule_1, 69 191 ], 70 192 then=[ 71 193 # Verdicts ··· 79 201 ) 80 202 ``` 81 203 82 - WhenRules() must occur after rule creations within the file, and may become difficult to interpret outcomes of rules if too distributed so it can be beneficial to place any effects toward the bottom of workflows. 204 + `WhenRules()` must be placed after rule declaration within a file, and it may become difficult to interpret outcomes of rules that are too distributed. Therefore, it may be beneficial 205 + to place any effects toward the bottom of workflows. 83 206 84 207 ## Output Sinks 85 208 86 - After the rules are all run, a set of output sinks takes the resulting `ExecutionOutput` and performs additional work based on that data. These may be defined as part of a plugin as a means to perform domain specific work. 209 + After all rules are evaluated for an input event, a set of output sinks takes the resulting `ExecutionResult` and performs additional work based on that data. These may be defined 210 + as part of a plugin for performing domain specific work. 87 211 88 - Some default use cases include a `StdoutOutputSink` which simply outputs the result to the log, a `KafkaOutputSink` which pipes data to Kafka (used for Osprey UI), or the `LabelsSink` which can add some stateful data to be used in future rules executions. 212 + Some default use cases include a `StdoutOutputSink` which simply outputs the result to the log, a `KafkaOutputSink` which pipes data to Kafka (used for Osprey UI), or the 213 + `LabelOutputSink` which can add some stateful data to be used in future rules executions. 89 214 90 215 ```python 91 216 class StdoutOutputSink(BaseOutputSink): ··· 114 239 115 240 ## User Defined Functions (UDFs) 116 241 117 - User Defined Functions (UDFs) are plugins that enable users of Osprey to extend and customize their use of the Osprey SML. UDFs are implemented python functions defined and registered as a plugin. They extend the `UDFBase` abstract base class with a set of arguments, and an output. These will be executed whenever called in the sml. 242 + User Defined Functions (UDFs) are plugins written in Python that enable users of Osprey to extend and customize their use of the Osprey SML. UDFs are implemented as Python functions and are registered 243 + as a plugin. They extend the `UDFBase` abstract base class with a set of arguments and an output. These will be executed whenever called in SML. 118 244 119 245 ```python 120 - # example_plugins/text_[contains.py](http://contains.py) 246 + # example_plugins/text_contains.py 121 247 class TextContainsArguments(ArgumentsBase): 122 248 text: str 123 249 phrase: str ··· 127 253 def execute(self, execution_context: ExecutionContext, arguments: TextContainsArguments) -> bool: 128 254 escaped = re.escape(arguments.phrase) 129 255 pattern = rf'\b{escaped}\b' 130 - flags = 0 if [arguments.case](http://arguments.case)_sensitive else re.IGNORECASE 256 + flags = 0 if arguments.case_sensitive else re.IGNORECASE 131 257 regex = re.compile(pattern, flags) 132 - return bool([regex.search](http://regex.search)(arguments.text)) 258 + return bool(regex.search(arguments.text)) 133 259 134 - # example_plugins/register_[plugins.py](http://plugins.py) 260 + # example_plugins/register_plugins.py 135 261 @hookimpl_osprey 136 262 def register_udfs(): 137 263 return [TextContains] ··· 150 276 ) 151 277 ``` 152 278 153 - ### Effects 279 + ### Effect UDFs 154 280 155 - Plugins may also define external effects, which are useful for performing functionality in your primary service. Effects are simply passed to output sinks at the end of a rule run. These UDFs have an output that extends `EffectBase`, and can be called as a result of a `WhenRules`. 281 + Plugins may also define external effects, which are useful for performing functionality in your primary service. Effects are simply passed to output sinks at the end of a rule run. 282 + These UDFs have an output that extends `EffectBase`, and can be called as a result of a `WhenRules`. 156 283 157 284 ```python 158 - # example_plugins/src/ban_[user.py](http://user.py) 285 + # example_plugins/src/ban_user.py 159 286 class BanUser(UDFBase[BanUserArguments, BanUserEffect]): 160 287 category = UdfCategories.ENGINE 161 288 ··· 176 303 177 304 ## Labels 178 305 179 - Labels are a standard plugin that enable stateful rules, and touch many parts of Osprey. They are effectively tags on various entities, which can be arbitrarily defined. 306 + Labels are a standard plugin that enable stateful rules, and touch many parts of Osprey. They are effectively tags on various entities, which may be arbitrarily defined. 180 307 181 308 ### Creating Entities 182 309 ··· 194 321 195 322 ### Adding Labels 196 323 197 - Labels can be added in `WhenRules` clause. This will cause the labels output sink to tag the given entity with the given label at the end of the rules run. 324 + Labels may be added in a `WhenRules()` clause. This will cause the labels output sink to tag the given entity with the given label at the end of the rules run. 198 325 199 326 ```python 200 327 WhenRules( ··· 209 336 210 337 ### Using Labels 211 338 212 - Since Labels can be retrieved during a rule run, they can be effectively used as state for your rules. 339 + Since labels may be retrieved during a rule run, they can be effectively used as state for your rules. 213 340 214 341 ```python 215 342 Should_Warn_User_Of_Spammer = Rule( ··· 220 347 ) 221 348 ``` 222 349 223 - Labels will also be shown in the UI for entities, and can also be set manually. Note that since the UI only searches across actions, `HasLabel` will not work in the Query UI. Instead, you may use `DidAddLabel`, which will be true when the given action added a label to a specific entity. 350 + Labels will also be shown in the UI for entities, and can also be set manually. Note that since the UI only searches across actions, `HasLabel()` will not work in the Query UI. 351 + Instead, you may use `DidAddLabel`, which will be true when the given action added a label to a specific entity. 224 352 225 353 ```python 226 354 # UI Query ··· 231 359 232 360 ### Nulls 233 361 234 - Nulls are the case where a rule or variable in SML does not exist. This can occur for many reasons - either a piece of data is missing or a rule didn’t run. Unlike many programming languages, generally rules with null valued variables will not evaluate that rule (and thus, downstream rules will not evaluate either). The exception cases are when nulls are explicitly checked in a rule. For example: 362 + Nulls are the case where a rule or variable in SML does not exist. This can occur for many reasons - either a piece of data is missing or a rule didn't run. Unlike many programming languages, generally rules with null valued variables will not evaluate that rule (and thus, downstream rules will not evaluate either). The exception cases are when nulls are explicitly checked in a rule. For example: 235 363 236 364 ```python 237 365 Thing: int = JsonData(path='$.property_that_doesnt_exist') ··· 252 380 ]) 253 381 ``` 254 382 255 - ## Workflow Structure and File Placement 383 + ### Workflow Structure and File Placement 256 384 257 385 SML files can be composed to make your rules easier to understand. The `Import` statement allows you to include rules and variables found in other files. 258 386 ··· 279 407 280 408 Require(rule='ai_services/my_ai_service.sml', require_if=ActionName == "register") 281 409 ``` 410 + 411 + ## Full Example 412 + 413 + The following is a complete walkthrough of writing a rule using the project structure described above. The goal is to flag accounts whose first post mentions at least one user and includes a link. 414 + 415 + ### Writing the Rule 416 + 417 + We'll create `rules/record/post/first_post_link.sml` for the rule logic. This file defines both the conditions that cause the rule to evaluate to `True` and the actions to take when it does. 418 + 419 + ```python 420 + # First, import the models that you will need inside of this rule 421 + Import( 422 + rules=[ 423 + 'models/base.sml', 424 + 'models/record/post.sml', 425 + ], 426 + ) 427 + 428 + # Next, define a variable that uses the `Rule` UDF 429 + FirstPostLinkRule = Rule( 430 + # Set the conditions in which this rule will be `True` 431 + when_all=[ 432 + PostCount == 1, # if this is the user's first post 433 + EmbedLink != None, # if there is a link inside of the post 434 + ListLength(list=MentionIds) >= 1, # if there is at least one mention in the post 435 + ], 436 + description='First post for user includes a link embed', 437 + ) 438 + 439 + # Finally, set which effect UDFs will be triggered 440 + WhenRules( 441 + rules_any=[FirstPostLinkRule], 442 + then=[ 443 + # This is a custom effect UDF that we have implemented 444 + ReportRecord( 445 + entity=PostId, 446 + comment='This was the first post by a user and included a link', 447 + severity=3, 448 + ), 449 + ], 450 + ) 451 + ``` 452 + 453 + ### Wiring Up the Rule 454 + 455 + We want this rule to run *only* when the event is a post event. Using the project structure described above, this involves three files. 456 + 457 + First, `main.sml` at the project root includes a single `Require` statement pointing to the top-level rules index: 458 + 459 + ```python 460 + Require( 461 + rule='rules/index.sml', 462 + ) 463 + ``` 464 + 465 + Next, `rules/index.sml` conditionally requires the post rules when the event type matches: 466 + 467 + ```python 468 + Import( 469 + rules=[ 470 + 'models/base.sml', 471 + ], 472 + ) 473 + 474 + Require( 475 + rule='rules/record/post/index.sml', 476 + require_if=EventType == 'userPost', 477 + ) 478 + ``` 479 + 480 + Finally, `rules/record/post/index.sml` requires the new rule: 481 + 482 + ```python 483 + Import( 484 + rules=[ 485 + 'models/base.sml', 486 + 'models/record/post.sml', 487 + ], 488 + ) 489 + 490 + Require( 491 + rule='rules/record/post/first_post_link.sml', 492 + ) 493 + ```