Coffee journaling on ATProto (alpha)
alpha.arabica.social
coffee
1# Osprey Rules Engine Evaluation
2
3Evaluation of [Osprey](https://github.com/roostorg/osprey) (by ROOST /
4internet.dev) as a potential replacement or complement to Arabica's moderation
5system.
6
7## What Is Osprey?
8
9Osprey is a **real-time safety rules engine** for processing event streams and
10making automated decisions about user behavior. Originally built at Discord,
11open-sourced through ROOST (Robust Open Online Safety Tools). Tagline: "Automate
12the obvious and investigate the ambiguous."
13
14Adopted by **Bluesky, Discord, and Matrix.org**. Apache 2.0 licensed, reached
15v1.0.1 (March 2026), actively maintained.
16
17### Core Concepts
18
19**SML (Some Madeup Language)** — A Python-subset DSL for writing rules:
20
21```python
22# Models: extract features from event JSON
23UserId: Entity[str] = EntityJson(type='User', path='$.user.userId')
24PostText: str = JsonData(path='$.text')
25
26# Rules: boolean conditions
27SpamLinkRule = Rule(
28 when_all=[
29 PostCount == 1,
30 EmbedLink != None,
31 ListLength(list=MentionIds) >= 1,
32 ],
33 description='First post with link embed',
34)
35
36# Effects: actions when rules match
37WhenRules(
38 rules_any=[SpamLinkRule],
39 then=[
40 DeclareVerdict(verdict='reject'),
41 LabelAdd(entity=UserId, label='likely_spammer'),
42 ],
43)
44```
45
46**Labels** — Stateful tags on entities (users, IPs, etc.) that persist across
47evaluations. Support expiry (`expires_after=TimeDelta(days=7)`). Enable stateful
48rules like "if this user was flagged as a spammer last week, auto-reject."
49
50**Entities** — Special features (UserID, IP, etc.) that can carry labels and
51have effects applied to them.
52
53**File Organization** — Rules compose across files via `Import()` and
54conditional `Require(require_if=EventType == 'userPost')`.
55
56### Architecture
57
58Osprey is a **multi-service system**, not an embeddable library:
59
60| Component | Language | Purpose |
61|-----------|----------|---------|
62| Worker | Python | Core rules engine, consumes Kafka events |
63| Coordinator | Rust | Distributed deployment coordination (optional) |
64| UI | TypeScript/React | Investigation dashboard, querying, labeling |
65| UI API | Python/Flask | Backend for the UI, queries Druid |
66| RPC | gRPC/Protobuf | Inter-service communication |
67
68**Infrastructure requirements:**
69- Kafka (KRaft mode) — event I/O
70- PostgreSQL — labels, execution results
71- Apache Druid — OLAP for UI queries
72- MinIO — object storage for Druid
73- Google Bigtable (optional) — labels at scale
74
75**Data flow:**
761. Events arrive on Kafka input topic
772. Worker evaluates SML rules against events
783. Rules produce verdicts and effects (hide, label, reject)
794. Results dispatched to output sinks (Kafka, Labels, stdout)
805. Execution results flow to Druid for UI querying
81
82### Plugin System
83
84Python `pluggy`-based. Plugins can register:
85- **UDFs** — Custom functions for SML rules
86- **Output Sinks** — Custom result destinations
87- **AST Validators** — Custom rule validation
88
89## Current Arabica Moderation System
90
91For comparison, here's what Arabica has today (~2,100 lines across 3 layers):
92
93### Capabilities
94
95| Feature | Implementation |
96|---------|---------------|
97| Role-based access | JSON config with admin/moderator roles, 8 granular permissions |
98| Record hiding | Manual + automod (3 reports → auto-hide) |
99| User blacklisting | Manual ban/unban by moderators |
100| Reports | User submission with rate limiting (10/hr), duplicate detection |
101| Automod | Threshold-based: 3 reports/URI or 5 reports/user → auto-hide |
102| Audit log | All actions logged including automod flag |
103| Admin dashboard | HTMX-powered with stats, reports, hidden records, blacklist |
104| Feed filtering | Batch-loads hidden URIs for efficient filtering |
105
106### Code Organization
107
108```
109internal/moderation/
110 models.go # Roles, permissions, data types
111 service.go # Thread-safe permission checks
112 store.go # 16-method store interface
113internal/database/sqlitestore/
114 moderation.go # SQLite implementation
115internal/handlers/
116 admin.go # Dashboard + mod actions (662 lines)
117 report.go # Report submission + automod (260 lines)
118```
119
120### What Works Well
121
122- Optional design — gracefully degrades without config
123- Thread-safe service with RWMutex
124- Comprehensive audit trail
125- CSRF protection on all mutations
126- Efficient batch feed filtering
127- Flexible role/permission model
128
129### Pain Points
130
131- Automod thresholds are hardcoded constants
132- Permission checks are manual boilerplate in every handler
133- Report enrichment makes unbatched PDS calls
134- Config changes require server restart
135- No rule composition or conditional logic beyond fixed thresholds
136
137## Fit Assessment
138
139### Where Osprey Shines vs Arabica's Needs
140
141**Osprey's strengths:**
142- Sophisticated rule composition (AND/OR, conditional loading, labels)
143- Stateful rules across evaluations (labels with TTL)
144- Investigation UI for T&S teams
145- Built for high-throughput event streams
146- Plugin system for custom detection logic
147
148**What Arabica could use:**
149- More flexible automod rules (not just hardcoded thresholds)
150- Rule composition (e.g., "new user + link + mention → suspicious")
151- Stateful tracking (e.g., "user had 3 reports dismissed this month")
152- Easier rule iteration without code deploys
153
154### Why It Doesn't Fit Today
155
156| Concern | Detail |
157|---------|--------|
158| **Massive infrastructure overhead** | Kafka + Postgres + Druid + MinIO is orders of magnitude more infra than Arabica's SQLite-based stack. Arabica runs as a single Go binary. |
159| **No Go SDK** | Osprey is Python-native. No Go client library exists. Integration would require Kafka as middleware or raw gRPC proto compilation. |
160| **Scale mismatch** | Osprey was built for Discord-scale (millions of events/sec). Arabica is a small community app. The operational complexity is not justified. |
161| **Python dependency** | Arabica is a pure Go project. Adding a Python rules engine (plus its infra) contradicts the project's preference for stdlib solutions and minimal dependencies. |
162| **Overlapping concerns** | Osprey would replace ~2,100 lines of straightforward Go code with a multi-service deployment. The current system is well-structured and maintainable. |
163
164### What Could Be Borrowed (Ideas, Not Code)
165
166Even though deploying Osprey doesn't make sense, some of its concepts are worth
167adopting in Arabica's existing moderation code:
168
1691. **Configurable thresholds** — Move automod constants (3 reports/URI, 5
170 reports/user) into the moderators JSON config so they're tunable without
171 deploys.
172
1732. **Labels / stateful tags** — Add a lightweight label system for users (e.g.,
174 `new_account`, `warned`, `trusted`). Labels could influence automod behavior:
175 a `warned` user might have a lower auto-hide threshold.
176
1773. **Rule composition** — Express automod rules as config rather than code:
178 ```json
179 {
180 "automod_rules": [
181 {
182 "name": "high_report_volume",
183 "conditions": {"reports_on_uri": {"gte": 3}},
184 "action": "hide_record"
185 },
186 {
187 "name": "repeated_offender",
188 "conditions": {"reports_on_user": {"gte": 5}, "user_label": "warned"},
189 "action": "blacklist_user"
190 }
191 ]
192 }
193 ```
194
1954. **Permission middleware** — Replace per-handler permission boilerplate with
196 middleware that checks permissions based on route patterns.
197
1985. **TTL-based labels** — Osprey's label expiry is useful for temporary states
199 like "under review" or "rate-limited for 24h."
200
201## Recommendation
202
203**Don't integrate Osprey.** The infrastructure and language mismatch is too
204large, and Arabica's moderation needs are well-served by its current
205~2,100-line Go implementation.
206
207**Do consider** extracting the best ideas (configurable thresholds, labels,
208rule composition as config) into the existing system. This would address the
209current pain points (hardcoded thresholds, no stateful tracking) without the
210operational burden of a multi-service Python deployment.
211
212If Arabica ever grows to need a dedicated T&S team with investigation tooling,
213Osprey becomes worth revisiting — but that's a fundamentally different scale
214than today.