Mirror of https://github.com/roostorg/coop github.com/roostorg/coop
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

1# Coop Architecture 2 3This document provides an overview of Coop's system architecture for developers and operators. 4 5### Overview 6 7Coop is built as a monorepo with a React frontend, Node.js backend, and multi-database architecture designed for high-throughput content moderation at scale. Coop: 8 9* Lets operations and policy teams manage settings, like which queue to send reports to, or \# of strikes per enforcement, without requiring engineers to change backend code 10* Supports both automation and a manual review process 11* Provides intuitive UI with role-based access control permissioning 12* Includes an embedded media player for image and video 13* Best-practice wellness features built-in 14* Uses webhook-based architecture to link effects with events 15* Logs an audit trail of actions taken, metadata about the action (incl. When it happened and who it was performed by), and the corresponding policy 16* Dev/staging env for manual testing and automated integration tests 17 18### Technology Stack 19 20| Layer | Technologies | 21| :---- | :---- | 22| **Frontend** | React, TypeScript, Ant Design, TailwindCSS, Apollo Client | 23| **Backend** | Node.js, Express, Apollo Server, TypeScript | 24| **Databases** | PostgreSQL, Scylla(5.2), ClickHouse, Redis | 25| **Messaging** | Kafka (optional), BullMQ | 26| **ORM** | Sequelize, Kysely | 27| **Auth** | Passport.js, express-session, SAML (SSO) | 28| **Observability** | OpenTelemetry | 29 30## **Directory Structure** 31 32``` 33coop/ 34├── client/ # React frontend 35│ └── src/ 36│ ├── webpages/ # Page components 37│ ├── graphql/ # GraphQL queries/mutations 38│ └── components/ # Shared UI components 39│ └── utils/ # Utility Functions 40 41├── server/ # Node.js backend 42│ ├── bin/ # CLI scripts 43│ ├── graphql/ # GraphQL schema and resolvers 44│ ├── iocContainer/ # Dependency injection setup 45│ ├── models/ # Sequelize ORM models 46│ ├── routes/ # REST API routes 47│ ├── rule_engine/ # Rule evaluation logic 48│ ├── services/ # Business logic services including NCMEC 49│ └── workers_jobs/ # Background processing 50 51├── .devops/ 52│ └── migrator/ # Database migrations 53│ └── src/scripts/ 54│ ├── api-server-pg/ # PostgreSQL 55│ ├── clickhouse/ # ClickHouse 56│ └── scylla/ # Scylla 57 58└── docs/ # Documentation 59``` 60 61 62# Coop Core Components 63 64## API 65 66Coop accepts both synchronous and asynchronous input. 67 68* Synchronous input is handled via REST APIs and supports item submission, action execution, reporting workflows, policy retrieval, and related operations. 69* Asynchronous input is handled via Kafka-based event streaming using the ITEM\_SUBMISSION\_EVENT topic. 70 71All API requests require an organization API key passed via the x-api-key header. 72 73### Content Submission 74 75* **File**: `/server/routes/content/ContentRoutes.ts` 76* **Route**: `Post /api/v1/content/` 77* **Header**: `x-api-key: <org-api-key>` 78 79Accepts any item (eg: content, user, thread) but only accepts a single item at a time. By default, requests are processed asynchronously. To force synchronous mode, set `sync: true` 80 81**Example request body (JSON):** 82```json 83{ 84 "contentId": "unique-id-123", 85 "contentType": "Comment", 86 "content": { 87 "text": "Hello world", 88 "authorId": "user-456", 89 "createdAt": "2024-01-01T00:00:00Z" 90 }, 91 "userId": "user-456", 92 "sync": false 93} 94``` 95 96### Item Submission 97 98* **File**: `/server/routes/items/ItemRoutes.ts` 99* **Route**: `POST /api/v1/items/async/` 100* **Header**: `x-api-key: <org-api-key>` 101 102Accepts one or more arbitrary items (users, threads, etc.). All processing is asynchronous. 103 104**Example request body (JSON):** 105 106```json 107{ 108 "items": [ 109 { 110 "id": "unique-item-id-123", 111 "data": { 112 "fieldName1": "value1", 113 "fieldName2": 123 114 }, 115 "typeId": "your-item-type-id", 116 "typeVersion": "optional-version-string", 117 "typeSchemaVariant": "original" 118 } 119 ] 120} 121``` 122 123### Action Execution 124 125* **File**: `/server/routes/action/ActionRoutes.ts` 126* **Route**: `POST /api/v1/actions` 127* **Header**: `x-api-key: \<org-api-key\>` 128 129**Example request body (JSON):** 130 131```json 132{ 133 "actionId": "action-id-to-execute", 134 "itemId": "target-item-id", 135 "itemTypeId": "item-type-id", 136 "policyIds": ["policy-id-1", "policy-id-2"], 137 "reportedItems": [ 138 { 139 "id": "reported-item-id", 140 "typeId": "reported-item-type-id" 141 } 142 ], 143 "actorId": "user-id-who-triggered-action" 144} 145``` 146 147### Reporting 148 149* **File:** `/server/routes/reporting/ReportingRoutes.ts` 150* **Route**: `POST /api/v1/report` 151* **Header**: `x-api-key: <org-api-key>` 152 153Used to submit reports from users or systems, including contextual items and thread history. The payload supports: 154 155* Reporter identity 156* Reported item 157* Thread context 158* Policy reason(s) 159* Additional contextual items 160 161**Example request body (JSON):** 162 163```json 164{ 165 "reporter": { 166 "kind": "user", 167 "typeId": "reporter-user-type-id", 168 "id": "reporter-user-id" 169 }, 170 "reportedAt": "2024-01-15T10:30:00.000Z", 171 "reportedForReason": { 172 "policyId": "violated-policy-id", 173 "reason": "Free-text reason from reporter", 174 "csam": false 175 }, 176 "reportedItem": { 177 "id": "reported-item-id", 178 "data": { "fieldName": "value" }, 179 "typeId": "item-type-id" 180 }, 181 "reportedItemThread": [ 182 { 183 "id": "thread-message-1", 184 "data": { "content": "message content" }, 185 "typeId": "message-type-id" 186 } 187 ], 188 "reportedItemsInThread": [ 189 { "id": "specific-reported-message", "typeId": "message-type-id" } 190 ], 191 "additionalItems": [ 192 { "id": "additional-context-item", "data": {}, "typeId": "item-type-id" } 193 ] 194} 195``` 196 197### Appeal 198 199* **File**: `/server/routes/reporting/ReportingRoutes.ts:105-154` 200* **Route**: `POST /api/v1/report/appeal` 201* **Header**: `x-api-key: <org-api-key>` 202 203Appeals allow users to contest actions taken against items. Appeals include the original action, violated policies, appeal reason, and optional additional context. 204 205**Example request body (JSON):** 206 207```json 208{ 209 "appealId": "customer-internal-appeal-id", 210 "appealedBy": { 211 "typeId": "appealer-user-type-id", 212 "id": "appealer-user-id" 213 }, 214 "appealedAt": "2024-01-15T12:00:00.000Z", 215 "actionedItem": { 216 "id": "item-that-was-actioned", 217 "data": { "fieldName": "value" }, 218 "typeId": "item-type-id" 219 }, 220 "actionsTaken": ["action-id-1", "action-id-2"], 221 "appealReason": "User's explanation for why they are appealing", 222 "violatingPolicies": [ 223 { "id": "policy-id-1" }, 224 { "id": "policy-id-2" } 225 ], 226 "additionalItems": [ 227 { "id": "additional-context-item", "data": {}, "typeId": "item-type-id" } 228 ] 229} 230``` 231 232### Supporting API Endpoints 233 234* **Policies**: `GET /api/v1/policies/` 235* **User Scores**: `GET /api/v1/user_scores` 236* **GDPR Deletion**: `POST /api/v1/gdpr/delete` 237 238### Errors 239 240All API errors use a consistent JSON structure: 241 242```json 243{ 244 "errors": [ 245 { 246 "status": 400, 247 "type": ["/errors/invalid-user-input"], 248 "title": "Short error description", 249 "detail": "Detailed explanation (optional)", 250 "pointer": "/path/to/problematic/field (optional)", 251 "requestId": "correlation-id (optional)" 252 } 253 ] 254} 255``` 256 257## Rules Engine 258 259When an item is submitted, Coop retrieves all [rules](RULES.md) associated with the item’s type. Each rule is evaluated by recursively processing its `conditionSet`, extracting values from the item, optionally passing them through signals, and comparing results using configured comparators. 260 261Key characteristics: 262 263* Conditions are evaluated in ascending cost order 264* Short-circuiting is applied based on conjunction type (AND / OR / XOR) 265* Expensive signals are skipped when earlier conditions fail 266* Actions are deduplicated before execution 267 268For rules in actionable environments (e.g., `LIVE`, `MANUAL`), actions are published via the `ActionPublisher`, which handles: 269 270* Customer webhooks 271* MRT enqueueing 272* NCMEC routing 273 274**Location**: `/server/rule_engine` 275 276**Rule structure:** `/server/models/rules/RuleModel.ts` 277 278```typescript 279Rule { 280 id: string; 281 name: string; 282 status: RuleStatus; 283 ruleType: RuleType; 284 conditionSet: ConditionSet; 285 orgId: string; 286 tags: string[]; 287 maxDailyActions: number; 288} 289``` 290 291## Manual Review Tool (MRT) 292 293The Manual Review Tool (MRT) is a BullMQ-backed queue system used for human review. Items enter MRT via rule actions or user reports. Each job is enriched with context (user scores, related items) and routes them to named queues via routing rules configured in the UI. Moderators claim tasks via exclusive locks (so only one person can claim one task) and submit decisions (aka take actions), which trigger downstream callbacks or reporting workflows (ie. NCMEC). 294 295### Queue Management 296 297#### Queue Operations 298 299**File**: `/server/services/manualReviewToolService/modules/QueueOperations.ts` 300 301Jobs can be enqueued from: 302 303* Rules engine execution 304* User reports 305* Post-action workflows 306* MRT internal jobs 307 308**Users:** 309 310* Dequeue jobs with exclusive locks 311* Submit decisions 312* Trigger post-decision webhooks or NCMEC reporting 313 314**Supported decision types:** 315 316* `IGNORE` 317* `CUSTOM_ACTION` 318* `SUBMIT_NCMEC_REPORT` 319* `ACCEPT_APPEAL` 320* `REJECT_APPEAL` 321* `TRANSFORM_JOB_AND_RECREATE_IN_QUEUE` 322* `AUTOMATIC_CLOSE` 323 324**Manual Enqueue:** 325 326```typescript 327{ 328 orgId: string; 329 correlationId: RuleExecutionCorrelationId | ActionExecutionCorrelationId; 330 createdAt: Date; 331 enqueueSource: 'REPORT' | 'RULE_EXECUTION' | 'POST_ACTIONS' | 'MRT_JOB'; 332 enqueueSourceInfo: ReportEnqueueSourceInfo | RuleExecutionEnqueueSourceInfo | ...; 333 payload: ManualReviewJobPayloadInput; 334 policyIds: string[]; 335} 336``` 337 338**Entry from Rules Engine** (ActionPublisher.ts): 339 340```typescript 341case ActionType.ENQUEUE_TO_MRT: 342 await this.manualReviewToolService.enqueue({ 343 orgId, 344 payload: { kind: 'DEFAULT', item, reportHistory: [], ... }, 345 enqueueSource: 'RULE_EXECUTION', 346 enqueueSourceInfo: { kind: 'RULE_EXECUTION', rules: rules.map(x => x.id) }, 347 correlationId, 348 policyIds: policies.map(it => it.id), 349 }); 350``` 351 352**Dequeue with lock:** 353 354```typescript 355async dequeueNextJob(opts: { 356 orgId: string; 357 queueId: string; 358 userId: string; 359}): Promise<{ job: ManualReviewJob; lockToken: string } | null> 360``` 361 362**Submit Decisions:** 363 364```typescript 365async submitDecision(opts: SubmitDecisionInput): Promise<SubmitDecisionResponse> 366``` 367 368## Actions 369 370Actions are created when a rule matches or a moderator submits a decision. Coop determines *when* an action should occur; the customer determines *what* happens as a result (label / warn / ban / remove content etc). The actual action is taken by the customer after being triggered through Coop. 371 372Action types: 373 374* CUSTOMER\_DEFINED\_ACTION: POST webhook to customer infrastructure 375* ENQUEUE\_TO\_MRT: Add item to the manual review queue 376* ENQUEUE\_TO\_NCMEC: Route to NCMEC reporting queue 377 378**Webhook structure:** 379 380```json 381{ 382 "item": { "id": "...", "typeId": "..." }, 383 "policies": [{ "id": "...", "name": "...", "penalty": "..." }], 384 "rules": [{ "id": "...", "name": "..." }], 385 "action": { "id": "..." }, 386 "custom": {}, 387 "actorEmail": "moderator@example.com" 388} 389``` 390 391Failed webhook deliveries retry five times with exponential back off. 392 393## Storage 394 395Coop uses a multiple database storage system: 396 397* **PostgreSQL** stores configuration, rules, users, sessions, and MRT decisions with ACID guarantees. 398* **Redis (via BullMQ)** powers MRT job queues, caching, and aggregation counters for very low latency. 399* **ScyllaDb (5.2)** stores item submission history for high-throughput writes with materialized views for varied access patterns. 400* **Clickhouse** serves as the analytics warehouse for rule executions, actions and user statistics. 401 402### PostgreSQL 403 404ACID compliant storage for config, auth, rules, and operational data including: 405 406* *public*: orgs, users, actions, policies, item\_types, banks, api_keys 407* *jobs*: Scheduled job tracking 408* *manual_review_tool:* manual review queues, decisions, routing rules, comments 409* *ncmec_reporting*: Child safety NCMEC reports 410* *reporting_rules:* User / content reporting rules 411* *signal_service:* Signal configuration 412* *user_management_service*: User management 413* *users_statistics_service:* User statistics 414 415### Redis 416 417Used as low-latency hot cache for: 418 419* **MRT**: BullMQ job queues 420* **Caching**: Sets, Sorted Sets, Lua scripts 421* **Distributed counters** 422 423### ScyllaDb 424 425Used for high-throughput item history (Investigations tool and associated users/items). It serves as time-series item submission storage with multiple access patterns 426 427Tables/Views 428 429* **item_submission_by_thread**: Primary table 430* **item_submission_by_item_id**: Lookup by item ID 431* **item_submission_by_thread_and_time**: Thread and time range 432* **item_submission_by_creator**: Lookup by creator 433 434### ClickHouse 435 436Serves as the OLAP storage for analytics, aggregations, and audit trails 437 438Databases and key tables 439 440* **analytics**: RULE_EXECUTIONS, ACTION_EXECUTIONS, CONTENT_API_REQUESTS, ITEM_MODEL_SCORES_LOG 441* **action executions:** ACTION_STATISTICS_SERVICE: BY_ACTION, BY_RULE, BY_POLICY, ACTIONED_SUBMISSION_COUNTS 442 * MANUAL_REVIEW_TOOL: ROUTING_RULE_EXECUTIONS 443* **Reporting and appeal stats:** REPORTING_SERVICE: REPORTS, APPEALS, REPORTING_RULE_EXECUTIONS 444* **User level metrics:** USER_STATISTICS_SERVICE: LIFETIME_ACTION_STATS, SUBMISSION_STATS, USER_SCORES 445 446## Signals 447 448Signals are scoring or evaluation functions used by rules. They range from simple text matching to third-party ML services. 449 450The rules engine calls signals when evaluating conditions that need a score. Signals run in cost order (e.g. text matching will run early). If an early condition fails, the expensive signals are skipped. Results are memoized and cached for 30 seconds for reuse. Signals extend a shared base class and define metadata, cost, and execution logic. 451 452File: `/server/services/signalsService` 453 454**Signals Base Class:** 455File: `/server/services/signalsService/signals/SignalBase.ts` 456 457```typescript 458abstract class SignalBase<Input, OutputType, MatchingValue, Type> { 459 abstract get id(): SignalId; 460 abstract get displayName(): string; 461 abstract get description(): string; 462 abstract get eligibleInputs(): readonly Input[]; 463 abstract get outputType(): OutputType; 464 abstract get supportedLanguages(): readonly Language[] | 'ALL'; 465 abstract get integration(): Integration | null; 466 abstract getCost(): number; 467 abstract run(input: SignalInput): Promise<SignalResult | SignalErrorResult>; 468} 469``` 470 471# Services Required 472 473* PostgreSQL 474* Redis 475* Kafka 476 * Schema registry 477 * Zookeeper 478* Clickhouse 479* ScyllaDb 480* Metrics 481 * Jaeger 482 * Open Telemetry 483 484# Configuration 485 486Server configuration lives in `/server/.env.example` 487 488* Database: PostgreSQL 489* Analytics, Warehouse: Clickhouse 490* Redis: Redis 491* Scylla: Scylla 492 493Rules 494 495* Configured in frontend via GraphQL/dashboard UI 496* Rate limiting via maxDailyActions for each rule 497* Rule status: `LIVE`, `DRAFT`, `BACKGROUND`, `EXPIRED` 498* Signals: Configured in the rules front-end 499 500User roles 501 502* ADMIN: Full access 503* RULES_MANAGER: Can modify live rules 504* ANALYST: View insights 505* MODERATOR_MANAGER: Managers MRT queues 506* MODERATOR: Reviews assigned queues 507* CHILD_SAFETY_MODERATOR: Access to NCMEC data 508* EXTERNAL_MODERATOR: View only MRT access 509 510Permissions 511 512* MANAGE_ORG: ADMIN 513* MUTATE_LIVE_RULES: ADMIN, RULES_MANAGER 514* VIEW_MRT: All moderator roles 515* EDIT_MRT_QUEUES: ADMIN, MODERATOR_MANAGER 516* VIEW_CHILD_SAFETY_DATA: ADMIN, MODERATOR_MANAGER, CHILD\_SAFETY\_MODERATOR 517 518# Action Rules vs Routing Rules 519 520Coop supports two sets of [rules](RULES.md). Each has separate code paths, storage tables, and UI surfaces. 521 5221. [Automated Action rules](RULES.md#automated-action-rules): All rules act in parallel on all events to determine auto actions and MRT decisioning 5232. [Routing rules](RULES.md#routing-rules): First routing rule that succeeds routes the MRT bound event into the appropriate queue awaiting review, the rest are executed in order. 524 525## Rules Engine Rules 526 527Code: `/server/models/rules/RuleModel.ts` 528 529UI: `/client/src/webpages/dashboard/rules/` 530 531Storage tables: 532 533* public.rules 534* public.rules_and_actions 535* public.rules_and_item_types 536* public.rules_and_policies 537* public.rules_history 538 539## Routing Rules 540 541Code: `/server/services/manualReviewToolService/modules/JobRouting.ts` 542UI: `/client/src/webpages/dashboard/mrt/queue_routing/` 543 544Storage tables: 545 546* manual_review_tool.routing_rules 547* manual_review_tool.routing_rules_to_item_types 548* manual_review_tool.routing_rules_history 549* manual_review_tool.appeal_routing_rules 550* manual_review_tool.appeal_routing_rules_to_item_types 551 552## Authentication 553 554Coop supports three authentication methods: API key authentication for programmatic access, and session-based. 555 556### API Key Authentication 557 558API keys authenticate programmatic requests to REST endpoints. All API requests require the x-api-key header. 559 560 1. Middleware extracts the x-api-key header 561 2. Key is validated via SHA-256 hash lookup in the database 562 3. If valid, orgId is set on the request for downstream handlers 563 4. Returns 401 Unauthorized if invalid or missing 564 565 566* Keys are 32-byte random values, SHA-256 hashed before storage 567* Each key is scoped to a single team (ie. if you have different teams in the same organization whose data should not mix) 568* Last-used timestamp tracked for auditing 569* Keys can be rotated (creates new key, deactivates old) 570 571Files: 572* Middleware: `/server/utils/apiKeyMiddleware.ts` 573* Service: `/server/services/apiKeyService/apiKeyService.ts` 574 575### Session-Based Authentication 576 577Session authentication is used for dashboard UI access via GraphQL. 578 579 1. User submits credentials via GraphQL login mutation 580 2. Passport's GraphQLLocalStrategy validates email/password 581 3. Password verified via bcrypt comparison 582 4. On success, user serialized to session via passport.serializeUser() 583 5. Session stored in PostgreSQL via connect-pg-simple 584Session configuration: 585* Store: PostgreSQL-backed 586* Cookie: Secure flag in production, 30-day expiry 587* Session secret: process.env.SESSION_SECRET 588 Files: `/server/api.ts` 589 590### SAML/SSO Authentication 591 592Enterprise SSO uses SAML with per-organization configuration. 593 594 1. User navigates to /saml/login/{orgId} 595 2. Passport's MultiSamlStrategy retrieves org-specific SAML settings 596 3. User redirected to configured SAML provider 597 4. Provider authenticates and posts assertion to callback URL 598 5. User email extracted from SAML assertion 599 6. User record looked up and session created 600Configuration (per org in org\_settings table): 601 602* saml\_enabled: Boolean flag 603* sso\_url: SAML entry point URL 604* cert: Certificate for validation 605 606 Files: 607`/server/api.ts (lines 142-227)` 608`/server/services/SSOService/SSOService.ts` 609