docs/SIGNALS.md at 3a75984654db888a95d657c181e4e2a1c3a46b2d

roost.tools / coop
fork
Mirror of https://github.com/roostorg/coop github.com/roostorg/coop
fork
coop / docs / SIGNALS.md
at 3a75984654db888a95d657c181e4e2a1c3a46b2d 183 lines 9.3 kB view raw view rendered
wrap content
Cassidy James Blaede docs: Clean up Content Safety API application details (#23) 2mo ago
0408de5b
  1# Signals
  2
  3Signals are what make Coop powerful. You use Signals to analyze **Items** and judge their characteristics. A Signal can be as simple as a basic check for a keyword or as complex as running an Item through an LLM or other AI model. A Signal takes an Item and spits out some information about the Item that you can use to make automated moderation decisions.
  4
  5Coop has a library of Signals that you can use in your Rules, and each of them provides the flexibility to choose how strict or lax you want to be. For example, if your service is primarily for children, you'll want to prevent any form of nudity or sexual content. So you'll create a Rule, and in that Rule you'll choose Signals designed to detect nudity, such as nudity classifiers. If a nudity classifier Signal assigns a score of 95% to a user's profile picture (i.e. there is a 95% likelihood that the profile picture contains nudity), then you might have your Rule automatically ban the user.
  6
  7The Signals library contains three types of Signals:
  8
  91. **Text Analysis:** Coop offer a number of Signals to run analysis on text, including  
 10   1. **Exact Keyword Matching:** Look for exact words or phrases in your content.  
 11   2. **Regular Expression (Regex) Matching:** Look for text patterns in your Items using [regular expressions](https://en.wikipedia.org/wiki/Regular_expression).  
 12   3. **Text Variant Matching:** Coop has an algorithm to detect common variants of strings of text. This is particularly useful to catch bad actors trying to evade your enforcement by using [leetspeak](https://en.wikipedia.org/wiki/Leet), replacing characters, adding punctuation in the middle of words, or other forms of evasion. For example, if you're looking for the word "Hello", we'll detect "h3||0" and "helllllllloooo" as matches.  
 132. **3rd Party Integrations:** Connect to free safety-oriented APIs at the click of a button. We've built integrations with those companies' APIs so you don't have to; just enter your API key.  
 143. **Location Matching:** If you want to set up Rules that target specific locations, you can do so with Coop’s location matching Signal. With every Item you send to Coop, you'll need to include a [geohash](https://en.wikipedia.org/wiki/Geohash) representing the latitude-longitude location of the user who created it. Then you can create Rules that only action on Items created in or around particular locations. You can even create Matching Banks that contain geohash locations, so you can easily manage a large set of locations in one place.  
 154. **Custom Signals:** you can add any custom signal\! If you've built your own machine learning models or have some internal data that Coop can't access, you can add it through the signalService.
 16
 17## External Signals Integration Guide
 18
 19Coop supports integrating external classifiers (signals) for content moderation such as:
 20
 21- **Prebuilt APIs**: OpenAI Moderation API, Google Content Safety API
 22
 23## **How It Works**
 24
 251. **Configure Integration** \- Add API credentials for the external service  
 262. **Use Signal in Rules** \- Reference the signal in your moderation rules  
 273. **Content Evaluation** \- When content is submitted, the signal is called and returns a score  
 284. **Action Execution** \- If the score exceeds the threshold, the rule's action is executed
 29
 30### Example: OpenAI Integration
 31
 32Here's a complete example showing how OpenAI is integrated into Coop.
 33
 34#### 1. Signal Configuration
 35
 36**Signal Class** - Each third-party signal extends the `SignalBase` class and implements the `run` method to call the external API.  
 37**Registration** - Signals are instantiated and registered in `server/services/signalsService/helpers/instantiateBuiltInSignals.ts`.
 38
 39#### 2. Rules Implementation
 40
 41**Using the Signal in a Rule**:
 42
 43```ts
 44{
 45  "name": "Block Hate Speech",
 46  "conditions": {
 47    "field": "post.text",
 48    "signal": {
 49      "type": "OPEN_AI_HATE_TEXT_MODEL"
 50    },
 51    "comparator": "GREATER_THAN",
 52    "threshold": 0.8
 53  },
 54  "actions": [
 55    { "type": "BLOCK" }
 56  ]
 57}
 58```
 59
 60**Execution Flow** (`server/condition_evaluator/leafCondition.ts`):
 61
 62```ts
 63// 1. Extract content field
 64const value = getFieldValue(content, condition.field); // "post.text"
 65
 66// 2. Get signal implementation
 67const signal = signalsService.getSignal(condition.signal.type);
 68
 69// 3. Run signal
 70const result = await signal.run({
 71  value: { type: 'STRING', value },
 72  orgId: org.id,
 73});
 74
 75// 4. Compare to threshold
 76const conditionMet = result.score > condition.threshold; // 0.85 > 0.8 = true
 77
 78// 5. Execute action if condition met
 79if (conditionMet) {
 80  await executeAction({ type: 'BLOCK' });
 81}
 82```
 83
 84## Supported Integrations
 85
 86### Prebuilt APIs
 87
 88| Integration | Signals | Configuration |
 89| :---- | :---- | :---- |
 90| **Moderation API by OpenAI** | There are two models you can use with this endpoint: **omni-moderation-latest:** This model and all snapshots support more categorization options and multi-modal inputs. <br> <br> **text-moderation-latest (Legacy):** Older model that supports only text inputs and fewer input categorizations. The newer omni-moderation models will be the best choice for new applications.  | OpenAI API key |
 91| **Content Safety API by Google** | V0: image classification | Content Safety API Key[^csapi] |
 92
 93[^csapi]: Industry and civil society third parties seeking to protect their platform against abuse can sign up to access the Content Safety API. Applications are subject to approval. You can submit an interest form through [Google’s Child Safety Toolkit program](https://protectingchildren.google/toolkit-interest-form/?roost-coop).
 94
 95#### Moderation API by OpenAI
 96Use the [moderations endpoint](https://platform.openai.com/docs/guides/moderation) to check whether text or images are potentially harmful. If harmful content is identified, you can take corrective action, like filtering content or intervening with user accounts creating offending content. The moderation endpoint is free to use.
 97
 98#### Content Safety API by Google
 99
100The Content Safety API is an AI classifier which issues a Child Safety prioritization recommendation on content sent to it. Content Safety API users must conduct their own manual review in order to determine whether to take action on the content, and comply with applicable local reporting laws. [Apply for an API key](https://protectingchildren.google/toolkit-interest-form/?roost-coop) and mention in your application that you are using the Coop review tool. Upon reviewing your application, Google will be back in touch shortly to take the application forward if you qualify.
101
102The API accepts a list of raw image bytes. The supported file types are listed below:
103
104* BMP
105* GIF
106* ICO
107* JPEG
108* PNG
109* PPM
110* TIFF
111* WEBP
112
113**Issue an HTTP Request**
114
115To upload an image, issue a POST request to the API access point:
116
117```json
118POST /v1beta1/images:classify?key=your_key HTTP/1.1
119
120HOST: contentsafety.googleapis.com
121Content-Type: application/json
122
123{
124    images: ["<base64 encoding>"]
125}
126```
127#### Response
128
129The response contains 1 of 5 priorities:
130
131| Priority ENUM |
132| ------------- |
133| VERY_LOW      |
134| LOW           |
135| MEDIUM        |
136| HIGH          |
137| VERY_HIGH     |
138
139The higher the priority, the more likely the image may be abusive content. However, this is an indication and not a confirmation of it. You must always do a manual review to confirm and avoid false positives. This signal is only available for manual routing rules and not automated action rules.
140
141#### Best practice
142* It is recommended for the image resolution to be around 640x480 pixels (about 300k pixels) for best performance.
143
144* If you have an image smaller than 300K pixels, do NOT resize it to a larger image as it introduce noises and does not improve performance.
145
146* For images larger than 300K pixels you may consider resizing them to 300K. The performance is not expected to degrade in this case.
147
148* It is generally suggested to compress your images with some quality-preserving codec (for example WEBP or JPEG with 90+ quality) to reduce request size.
149
150#### Limitations
151* Up to 32 images can be sent at a time.
152* Image must be in one of the formats listed above.
153* Total JSON body can't exceed 10MB in size.
154* **Maximum QPS**: 200.
155
156
157## Code Structure
158
159```
160server/
161├── services/
162│   ├── signalsService/
163│   │   ├── signals/
164│   │   │   ├── third_party_signals/
165│   │   │   │   └── open_ai/           # OpenAI implementation
166                └── google/           # Google implementation
167│   │   │   └── SignalBase.ts          # Base class
168│   │   └── helpers/
169│   │       └── instantiateBuiltInSignals.ts
170│   └── signalAuthService/             # Credential storage
171│       └── signalAuthService.ts
172├── rule_engine/
173│   └── RuleEvaluator.ts               # Rule execution
174└── condition_evaluator/
175    └── leafCondition.ts               # Signal execution
176```
177
178## Key Files
179
180- **Signal implementations**: `server/services/signalsService/signals/third_party_signals/`  
181- **Credential management**: `server/services/signalAuthService/signalAuthService.ts`  
182- **GraphQL schema**: `server/graphql/modules/integration.ts`  
183- **Rule evaluation**: `server/condition_evaluator/leafCondition.ts`
Configure Feed

Configure Feed