Coffee journaling on ATProto (alpha) alpha.arabica.social
coffee
14
fork

Configure Feed

Select the types of activity you want to include in your feed.

at main 214 lines 8.2 kB view raw view rendered
1# Osprey Rules Engine Evaluation 2 3Evaluation of [Osprey](https://github.com/roostorg/osprey) (by ROOST / 4internet.dev) as a potential replacement or complement to Arabica's moderation 5system. 6 7## What Is Osprey? 8 9Osprey is a **real-time safety rules engine** for processing event streams and 10making automated decisions about user behavior. Originally built at Discord, 11open-sourced through ROOST (Robust Open Online Safety Tools). Tagline: "Automate 12the obvious and investigate the ambiguous." 13 14Adopted by **Bluesky, Discord, and Matrix.org**. Apache 2.0 licensed, reached 15v1.0.1 (March 2026), actively maintained. 16 17### Core Concepts 18 19**SML (Some Madeup Language)** — A Python-subset DSL for writing rules: 20 21```python 22# Models: extract features from event JSON 23UserId: Entity[str] = EntityJson(type='User', path='$.user.userId') 24PostText: str = JsonData(path='$.text') 25 26# Rules: boolean conditions 27SpamLinkRule = Rule( 28 when_all=[ 29 PostCount == 1, 30 EmbedLink != None, 31 ListLength(list=MentionIds) >= 1, 32 ], 33 description='First post with link embed', 34) 35 36# Effects: actions when rules match 37WhenRules( 38 rules_any=[SpamLinkRule], 39 then=[ 40 DeclareVerdict(verdict='reject'), 41 LabelAdd(entity=UserId, label='likely_spammer'), 42 ], 43) 44``` 45 46**Labels** — Stateful tags on entities (users, IPs, etc.) that persist across 47evaluations. Support expiry (`expires_after=TimeDelta(days=7)`). Enable stateful 48rules like "if this user was flagged as a spammer last week, auto-reject." 49 50**Entities** — Special features (UserID, IP, etc.) that can carry labels and 51have effects applied to them. 52 53**File Organization** — Rules compose across files via `Import()` and 54conditional `Require(require_if=EventType == 'userPost')`. 55 56### Architecture 57 58Osprey is a **multi-service system**, not an embeddable library: 59 60| Component | Language | Purpose | 61|-----------|----------|---------| 62| Worker | Python | Core rules engine, consumes Kafka events | 63| Coordinator | Rust | Distributed deployment coordination (optional) | 64| UI | TypeScript/React | Investigation dashboard, querying, labeling | 65| UI API | Python/Flask | Backend for the UI, queries Druid | 66| RPC | gRPC/Protobuf | Inter-service communication | 67 68**Infrastructure requirements:** 69- Kafka (KRaft mode) — event I/O 70- PostgreSQL — labels, execution results 71- Apache Druid — OLAP for UI queries 72- MinIO — object storage for Druid 73- Google Bigtable (optional) — labels at scale 74 75**Data flow:** 761. Events arrive on Kafka input topic 772. Worker evaluates SML rules against events 783. Rules produce verdicts and effects (hide, label, reject) 794. Results dispatched to output sinks (Kafka, Labels, stdout) 805. Execution results flow to Druid for UI querying 81 82### Plugin System 83 84Python `pluggy`-based. Plugins can register: 85- **UDFs** — Custom functions for SML rules 86- **Output Sinks** — Custom result destinations 87- **AST Validators** — Custom rule validation 88 89## Current Arabica Moderation System 90 91For comparison, here's what Arabica has today (~2,100 lines across 3 layers): 92 93### Capabilities 94 95| Feature | Implementation | 96|---------|---------------| 97| Role-based access | JSON config with admin/moderator roles, 8 granular permissions | 98| Record hiding | Manual + automod (3 reports → auto-hide) | 99| User blacklisting | Manual ban/unban by moderators | 100| Reports | User submission with rate limiting (10/hr), duplicate detection | 101| Automod | Threshold-based: 3 reports/URI or 5 reports/user → auto-hide | 102| Audit log | All actions logged including automod flag | 103| Admin dashboard | HTMX-powered with stats, reports, hidden records, blacklist | 104| Feed filtering | Batch-loads hidden URIs for efficient filtering | 105 106### Code Organization 107 108``` 109internal/moderation/ 110 models.go # Roles, permissions, data types 111 service.go # Thread-safe permission checks 112 store.go # 16-method store interface 113internal/database/sqlitestore/ 114 moderation.go # SQLite implementation 115internal/handlers/ 116 admin.go # Dashboard + mod actions (662 lines) 117 report.go # Report submission + automod (260 lines) 118``` 119 120### What Works Well 121 122- Optional design — gracefully degrades without config 123- Thread-safe service with RWMutex 124- Comprehensive audit trail 125- CSRF protection on all mutations 126- Efficient batch feed filtering 127- Flexible role/permission model 128 129### Pain Points 130 131- Automod thresholds are hardcoded constants 132- Permission checks are manual boilerplate in every handler 133- Report enrichment makes unbatched PDS calls 134- Config changes require server restart 135- No rule composition or conditional logic beyond fixed thresholds 136 137## Fit Assessment 138 139### Where Osprey Shines vs Arabica's Needs 140 141**Osprey's strengths:** 142- Sophisticated rule composition (AND/OR, conditional loading, labels) 143- Stateful rules across evaluations (labels with TTL) 144- Investigation UI for T&S teams 145- Built for high-throughput event streams 146- Plugin system for custom detection logic 147 148**What Arabica could use:** 149- More flexible automod rules (not just hardcoded thresholds) 150- Rule composition (e.g., "new user + link + mention → suspicious") 151- Stateful tracking (e.g., "user had 3 reports dismissed this month") 152- Easier rule iteration without code deploys 153 154### Why It Doesn't Fit Today 155 156| Concern | Detail | 157|---------|--------| 158| **Massive infrastructure overhead** | Kafka + Postgres + Druid + MinIO is orders of magnitude more infra than Arabica's SQLite-based stack. Arabica runs as a single Go binary. | 159| **No Go SDK** | Osprey is Python-native. No Go client library exists. Integration would require Kafka as middleware or raw gRPC proto compilation. | 160| **Scale mismatch** | Osprey was built for Discord-scale (millions of events/sec). Arabica is a small community app. The operational complexity is not justified. | 161| **Python dependency** | Arabica is a pure Go project. Adding a Python rules engine (plus its infra) contradicts the project's preference for stdlib solutions and minimal dependencies. | 162| **Overlapping concerns** | Osprey would replace ~2,100 lines of straightforward Go code with a multi-service deployment. The current system is well-structured and maintainable. | 163 164### What Could Be Borrowed (Ideas, Not Code) 165 166Even though deploying Osprey doesn't make sense, some of its concepts are worth 167adopting in Arabica's existing moderation code: 168 1691. **Configurable thresholds** — Move automod constants (3 reports/URI, 5 170 reports/user) into the moderators JSON config so they're tunable without 171 deploys. 172 1732. **Labels / stateful tags** — Add a lightweight label system for users (e.g., 174 `new_account`, `warned`, `trusted`). Labels could influence automod behavior: 175 a `warned` user might have a lower auto-hide threshold. 176 1773. **Rule composition** — Express automod rules as config rather than code: 178 ```json 179 { 180 "automod_rules": [ 181 { 182 "name": "high_report_volume", 183 "conditions": {"reports_on_uri": {"gte": 3}}, 184 "action": "hide_record" 185 }, 186 { 187 "name": "repeated_offender", 188 "conditions": {"reports_on_user": {"gte": 5}, "user_label": "warned"}, 189 "action": "blacklist_user" 190 } 191 ] 192 } 193 ``` 194 1954. **Permission middleware** — Replace per-handler permission boilerplate with 196 middleware that checks permissions based on route patterns. 197 1985. **TTL-based labels** — Osprey's label expiry is useful for temporary states 199 like "under review" or "rate-limited for 24h." 200 201## Recommendation 202 203**Don't integrate Osprey.** The infrastructure and language mismatch is too 204large, and Arabica's moderation needs are well-served by its current 205~2,100-line Go implementation. 206 207**Do consider** extracting the best ideas (configurable thresholds, labels, 208rule composition as config) into the existing system. This would address the 209current pain points (hardcoded thresholds, no stateful tracking) without the 210operational burden of a multi-service Python deployment. 211 212If Arabica ever grows to need a dedicated T&S team with investigation tooling, 213Osprey becomes worth revisiting — but that's a fundamentally different scale 214than today.