Mirror of https://github.com/roostorg/osprey github.com/roostorg/osprey
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

1# Osprey Rules 2 3![images/rules_architecture.png](images/rules_architecture.png) 4 5## Creating Rules 6 7Osprey rules are written in SML (Some Madeup Language) which is a subset of Python with additional restrictions to simplify rule writing. You may write rules that are specific to 8single event types on a network, or ones that are applied to multiple event types. 9 10By themselves, rules only create variables, and without a corresponding `WhenRules()` function call, the rule will have no effects outside of evaluation and query functionality. 11 12Rules currently support the following concepts through the `Rule(...)` function of the same name. 13 14- Name 15 16 `Rule_Name = Rule(...)` 17 18 The name of the rule also functions as a conventional "RuleId" and the name of the bool that can be used to query individual rule hits in the Osprey Query UI. As a result, changing the name of a rule after activation may affect historical query results in the UI if not logged externally. 19 20- Logic 21 22 `when_all=[]` 23 24 The actual logic that will be used to evaluate Osprey rules is all encompassed as single comma-delimited list of signals within the `when_all` parameter of the `Rule(...)` function and supports the use of Labels, Plugins, UDFs and other values to help enrich heuristics. 25 26 At present, when evaluating UDFs or abstracted variables, any `NULL` evaluations in the series will cause the entire rule function to evaluate as `NULL`, which may be undesirable. 27 28- Description 29 30 `description=f''` 31 32 There is an additional string description field that is able to be emitted alongside the rule itself to external systems such as logging and ticketing systems to help enrich work-streams that may benefit from plain-language context on what the rule criteria is and what the rule may intend to do. 33 34 It may be helpful to include dynamic variables as well to help enrich operational workflows that may need to identify specific values related to the trigger criteria. 35 36 37An example is below of a simple rule using various signal evaluations and out-of-the-box UDFs. 38 39```python 40My_Rule_Name_v2 = Rule( 41 when_all=[ 42 # Primary Signal 43 MyFirstValue == True, 44 HasLabel(entity=MyEntityName, label='MyLabel'), 45 ListLength(list=UsersValues) == 5, 46 # Secondary Signal 47 RegexMatch(target=MyStringValue, pattern='(hello|world)'), 48 MySecondValue >= 3, 49 MyThirdValue != Null, 50 # Guardrail Signal 51 (_LocalValue in [1, 2, 3, 5]) or (GlobalValue in ['hello', 'howdy']), 52 not HasLabel(entity=MySecondEntityName, label='MySecondLabel'), 53 ], 54 description=f"{UserA} performed {ActionB} in this way. Emit warning", 55) 56``` 57 58## Rule Structuring 59 60You will likely find it useful to maintain two subdirectories inside of your main rules directory - a `rules` directory where actual logic will be added and a `models` directory for defining the various features that occur in any or specific event types. For example, your structure may look something like this: 61 62```bash 63example-rules/ 64| rules/ 65| | record/ 66| | | post/ 67| | | | first_post_link.sml 68| | | | index.sml 69| | | like/ 70| | | | like_own_post.sml 71| | | | index.sml 72| | account/ 73| | | signup/ 74| | | | high_risk_signup.sml 75| | | | index.sml 76| | index.sml 77| models/ 78| | record/ 79| | | post.sml 80| | | like.sml 81| | account/ 82| | | signup.sml 83| main.sml 84``` 85 86The `main.sml` file at the root of your rules directory serves as the entrypoint. It uses `Import` and `Require` statements to control which other files are loaded and when, allowing you to compose together logic across the project. This sort of structure lets you define rules and models that are specific to certain event types so that only the necessary rules are run for various event types. For example, you likely have some rules that should only be run on a `post` event, since only a `post` will have features like `text` or `mention_count`. 87 88Inside of each directory, you may maintain an `index.sml` file that will define the conditional logic in which the rules inside that directory are actually included for execution. Although you could handle all of this conditional logic inside of a single file, maintaining separate `index.sml`s per directory greatly helps with neat organization. See [Workflow Structure and File Placement](#workflow-structure-and-file-placement) for more on `Import` and `Require`. 89 90## Models 91 92Before you actually write a rule, you'll need to define a "model" for an event type. For this example, we will assume that you run a social media website that lets users create posts, either at the "top level" or as a reply to another top level post. Each post may include text, mentions of other users on your network, and an optional link embed in the post. Let's say that the event's JSON structure looks like this: 93 94```json 95{ 96 "eventType": "userPost", 97 "user": { 98 "userId": "user_id_789", 99 "handle": "carol", 100 "postCount": 3, 101 "accountAgeSeconds": 9002 102 }, 103 "postId": "abc123xyz", 104 "replyId": null, 105 "text": "Is anyone online right now? @alice or @bob, you there? If so check this video out", 106 "mentionIds": ["user_id_123", "user_id_456"], 107 "embedLink": "https://youtube.com/watch?id=1" 108} 109``` 110 111Inside of our `models/record` directory, we should now create a `post.sml` file where we will define the features for a post. 112 113```python 114PostId: Entity[str] = EntityJson( 115 type='PostId', 116 path='$.postId', 117) 118 119PostText: str = JsonData( 120 path='$.text', 121) 122 123MentionIds: List[str] = JsonData( 124 path='$.mentionIds', 125) 126 127EmbedLink: Optional[str] = JsonData( 128 path='$.embedLink', 129 required=False, 130) 131 132ReplyId: Entity[str] = JsonData( 133 path='$.replyId', 134 required=False, 135) 136``` 137 138The [`JsonData` UDF](#user-defined-functions-udfs) lets us take the event's JSON and define features based on the contents of that JSON. These features can then be referenced in other rules that we import the `models/record/post.sml` model into. If you have any values inside your JSON object that may not always be present, you can set `required` to `False`, and these features will be `None` whenever the feature is not present. 139 140Note that we did not actually create any features for things like `userId` or `handle`. That is because these values will be present in *any* event. It wouldn't be very nice to have to copy these features into each event type's model. Therefore, we will actually create a `base.sml` model that defines these features which are always present. Inside of `models/base.sml`, let's define these. 141 142```python 143EventType = JsonData( 144 path='$.eventType', 145) 146 147UserId: Entity[str] = EntityJson( 148 type='UserId', 149 path='$.user.userId', 150) 151 152Handle: Entity[str] = EntityJson( 153 type='Handle', 154 path='$.user.handle', 155) 156 157PostCount: int = JsonData( 158 path='$.user.postCount', 159) 160 161AccountAgeSeconds: int = JsonData( 162 path='$.user.accountAgeSeconds', 163) 164``` 165 166Here, instead of simply using `JsonData`, we instead use the `EntityJson` UDF for the `UserID`. This is covered in the [UDFs section](#user-defined-functions-udfs), but as a rule of thumb, you likely will want to have values for things like a user's ID set to be entities. This will help more later, such as when doing data explorations within the Osprey UI. 167 168### Model Hierarchy 169 170In practice, you may find it useful to create a hierarchy of base models: 171 172- `base.sml` for features present in every event (user IDs, handles, account stats, etc.) 173- `account_base.sml` for features that appear only in account related events, but always appear in each account related event. Similarly, you may add one like `record_base.sml` for those features which appear in all record events. 174 175This type of hierarchy prevents duplication (which Osprey does not allow) and ensures features are defined at the appropriate level of abstraction. 176 177## Effects with WhenRules 178 179The `WhenRules()` function allows for creating effects that trigger external services, create declarations, or modify internal labels by listing `Rule` objects in sequence within the 180`rules_any` parameter of `WhenRules()`. By default, operators and designers may utilize UDFs with predefined effects such as `DeclareVerdict()`, `LabelAdd()`, or `LabelRemove()` upon 181positive rule evaluation. 182 183Below is an example of the use of a WhenRules() block to verify and email and reject a request. 184 185```python 186WhenRules( 187 rules_any=[ 188 Enabled_Rule_1, 189 Enabled_Rule_2, 190 # Disabled_Rule_1, 191 ], 192 then=[ 193 # Verdicts 194 DeclareVerdict(verdict='reject'), 195 # Labels 196 LabelAdd(entity=UserId, label='recently_challenged', expires_after=TimeDelta(days=7)), 197 LabelAdd(entity=UserId, label='verify', apply_if=NotVerified), 198 LabelAdd(entity=Email, label='pending_verify'), 199 LabelAdd(entity=Domain, label='recently_seen', expires_after=TimeDelta(days=7)), 200 ], 201) 202``` 203 204`WhenRules()` must be placed after rule declaration within a file, and it may become difficult to interpret outcomes of rules that are too distributed. Therefore, it may be beneficial 205to place any effects toward the bottom of workflows. 206 207## Output Sinks 208 209After all rules are evaluated for an input event, a set of output sinks takes the resulting `ExecutionResult` and performs additional work based on that data. These may be defined 210as part of a plugin for performing domain specific work. 211 212Some default use cases include a `StdoutOutputSink` which simply outputs the result to the log, a `KafkaOutputSink` which pipes data to Kafka (used for Osprey UI), or the 213`LabelOutputSink` which can add some stateful data to be used in future rules executions. 214 215```python 216class StdoutOutputSink(BaseOutputSink): 217 """An output sink that prints to standard out!""" 218 219 def __init__(self, log_sampler: Optional[DynamicLogSampler] = None): 220 pass 221 222 def will_do_work(self, result: ExecutionResult) -> bool: 223 return True 224 225 def push(self, result: ExecutionResult) -> None: 226 print(f'result: {result.extracted_features_json} {result.verdicts}') 227 228 def stop(self) -> None: 229 pass 230``` 231 232Passing data to these output sinks is standardized through the use of `Effects`, which are outputs of some functions, usually UDFs. 233 234```python 235def push(self, result: ExecutionResult) -> None: 236 users_to_ban = result.effects[BanUserEffect] 237 ban_users(users_to_ban) 238``` 239 240## User Defined Functions (UDFs) 241 242User Defined Functions (UDFs) are plugins written in Python that enable users of Osprey to extend and customize their use of the Osprey SML. UDFs are implemented as Python functions and are registered 243as a plugin. They extend the `UDFBase` abstract base class with a set of arguments and an output. These will be executed whenever called in SML. 244 245```python 246# example_plugins/text_contains.py 247class TextContainsArguments(ArgumentsBase): 248 text: str 249 phrase: str 250 case_sensitive = False 251 252class TextContains(UDFBase[TextContainsArguments, bool]): 253 def execute(self, execution_context: ExecutionContext, arguments: TextContainsArguments) -> bool: 254 escaped = re.escape(arguments.phrase) 255 pattern = rf'\b{escaped}\b' 256 flags = 0 if arguments.case_sensitive else re.IGNORECASE 257 regex = re.compile(pattern, flags) 258 return bool(regex.search(arguments.text)) 259 260# example_plugins/register_plugins.py 261@hookimpl_osprey 262def register_udfs(): 263 return [TextContains] 264``` 265 266Usage in SML: 267 268```python 269# example_rules/post_contains_hello.sml 270ContainsHello = Rule( 271 when_all=[ 272 EventType == 'create_post', 273 TextContains(text=PostText, phrase='hello'), 274 ], 275 description='Post contains the word "hello"', 276) 277``` 278 279### Effect UDFs 280 281Plugins may also define external effects, which are useful for performing functionality in your primary service. Effects are simply passed to output sinks at the end of a rule run. 282These UDFs have an output that extends `EffectBase`, and can be called as a result of a `WhenRules`. 283 284```python 285# example_plugins/src/ban_user.py 286class BanUser(UDFBase[BanUserArguments, BanUserEffect]): 287 category = UdfCategories.ENGINE 288 289 def execute(self, execution_context: ExecutionContext, arguments: BanUserArguments) -> BanUserEffect: 290 return BanUserEffect( 291 entity=arguments.entity, 292 comment=arguments.comment, 293 ) 294 295# example_rules/post_contains_hello.sml 296WhenRules( 297 rules_any=[ContainsHello], 298 then=[BanUser(entity=UserId, comment='User said "hello"')], 299) 300``` 301 302UDF outputs can also implement the `CustomExtractedFeature` interface - which get persisted in the outputs for the UI. `EffectToCustomExtractedFeatureBase` can also be used when effects need additional processing for use in the UI. 303 304## Labels 305 306Labels are a standard plugin that enable stateful rules, and touch many parts of Osprey. They are effectively tags on various entities, which may be arbitrarily defined. 307 308### Creating Entities 309 310Labels are applied to Entities, which are dynamically interpreted from outputs of the UDF `EntityJson`, usually applied to pieces of data that are generally consistent across actions such as User ID or email. 311 312```python 313# user.sml 314UserId: Entity[str] = EntityJson( 315 type='User', 316 path='$.user_id' 317) 318``` 319 320It is possible to create new UDFs that also create entities by having the output of UDF set to `EntityT`. 321 322### Adding Labels 323 324Labels may be added in a `WhenRules()` clause. This will cause the labels output sink to tag the given entity with the given label at the end of the rules run. 325 326```python 327WhenRules( 328 rules_any=[ 329 Sent_Too_Many_DMs, 330 ], 331 then=[ 332 LabelAdd(entity=UserId, label='likely_spammer') 333 ], 334) 335``` 336 337### Using Labels 338 339Since labels may be retrieved during a rule run, they can be effectively used as state for your rules. 340 341```python 342Should_Warn_User_Of_Spammer = Rule( 343 when_all=[ 344 HasLabel(entity=UserId, label='likely_spammer'), 345 This_Is_A_New_DM, 346 ], 347) 348``` 349 350Labels will also be shown in the UI for entities, and can also be set manually. Note that since the UI only searches across actions, `HasLabel()` will not work in the Query UI. 351Instead, you may use `DidAddLabel`, which will be true when the given action added a label to a specific entity. 352 353```python 354# UI Query 355DidAddLabel(entity_type="UserId", label_name="likely_spammer") 356``` 357 358## Notable Gotchas 359 360### Nulls 361 362Nulls are the case where a rule or variable in SML does not exist. This can occur for many reasons - either a piece of data is missing or a rule didn't run. Unlike many programming languages, generally rules with null valued variables will not evaluate that rule (and thus, downstream rules will not evaluate either). The exception cases are when nulls are explicitly checked in a rule. For example: 363 364```python 365Thing: int = JsonData(path='$.property_that_doesnt_exist') 366 367# Evaluates to False 368MyFirstRule = Rule(when_all=[ 369 Thing != Null, 370]) 371 372# Skips evaluation and sets to Null 373MySecondRule = Rule(when_all=[ 374 Thing > 1, 375]) 376 377# Skips evaluation and sets to Null 378MyThirdRule = Rule(when_all=[ 379 MySecondRule, 380]) 381``` 382 383### Workflow Structure and File Placement 384 385SML files can be composed to make your rules easier to understand. The `Import` statement allows you to include rules and variables found in other files. 386 387```python 388# models/action_name.sml 389ActionName = "foo" 390 391# main.sml 392Import( 393 rules=[ 394 'models/action_name.sml', 395 'models/http_request.sml', 396 ] 397) 398 399MyRule = Rule(when_all=[ActionName == "foo"]) 400``` 401 402`Require` allows you to selectively run other SML scripts. Requires supports templating and conditionals, allowing scripts to be filtered out if necessary. This is important in situations where some rules or UDFs are particularly expensive to run (such as making a call to an AI service, for example). 403 404```python 405# main.sml 406Require(rule=f'actions/{ActionName}.sml') # will execute 'actions/foo.sml' 407 408Require(rule='ai_services/my_ai_service.sml', require_if=ActionName == "register") 409``` 410 411## Full Example 412 413The following is a complete walkthrough of writing a rule using the project structure described above. The goal is to flag accounts whose first post mentions at least one user and includes a link. 414 415### Writing the Rule 416 417We'll create `rules/record/post/first_post_link.sml` for the rule logic. This file defines both the conditions that cause the rule to evaluate to `True` and the actions to take when it does. 418 419```python 420# First, import the models that you will need inside of this rule 421Import( 422 rules=[ 423 'models/base.sml', 424 'models/record/post.sml', 425 ], 426) 427 428# Next, define a variable that uses the `Rule` UDF 429FirstPostLinkRule = Rule( 430 # Set the conditions in which this rule will be `True` 431 when_all=[ 432 PostCount == 1, # if this is the user's first post 433 EmbedLink != None, # if there is a link inside of the post 434 ListLength(list=MentionIds) >= 1, # if there is at least one mention in the post 435 ], 436 description='First post for user includes a link embed', 437) 438 439# Finally, set which effect UDFs will be triggered 440WhenRules( 441 rules_any=[FirstPostLinkRule], 442 then=[ 443 # This is a custom effect UDF that we have implemented 444 ReportRecord( 445 entity=PostId, 446 comment='This was the first post by a user and included a link', 447 severity=3, 448 ), 449 ], 450) 451``` 452 453### Wiring Up the Rule 454 455We want this rule to run *only* when the event is a post event. Using the project structure described above, this involves three files. 456 457First, `main.sml` at the project root includes a single `Require` statement pointing to the top-level rules index: 458 459```python 460Require( 461 rule='rules/index.sml', 462) 463``` 464 465Next, `rules/index.sml` conditionally requires the post rules when the event type matches: 466 467```python 468Import( 469 rules=[ 470 'models/base.sml', 471 ], 472) 473 474Require( 475 rule='rules/record/post/index.sml', 476 require_if=EventType == 'userPost', 477) 478``` 479 480Finally, `rules/record/post/index.sml` requires the new rule: 481 482```python 483Import( 484 rules=[ 485 'models/base.sml', 486 'models/record/post.sml', 487 ], 488) 489 490Require( 491 rule='rules/record/post/first_post_link.sml', 492) 493```