automod: expanded keyword and slur detection (#524) · alice.mosphere.at/indigo@7f2ed82

this repo has no description

automod: expanded keyword and slur detection (#524)

The motivation is to move slur detection out of appview and in to
automod. A secondary goal is to reduce the extremely high current rate
of false-positives (reports every minute or so, almost never actioned).
And finally, to flag in a couple new locations, like did:web
identifiers, or post alt-text.

The `automod/keyword` package provides a couple new string-processing
helpers:

- slur "fuzzy matching" using fixed regexes for a small number of
explicit slurs. the regexes have been tweaks to reduce false-positives
in some cases. these operate on "slugified" strings, which have had all
whitespace and non-letter characters removed, and should match regadless
of diacritics and most "l33t" transformations
- tokenization of both identifiers and general text (different
tokenizers), with some basic normalization (eg, unicode folding,
lowercase, remove non-letter characters). intended to then be used to
match against configurable lists of slurs and offensive words. does not
(yet) do robust stemming (eg, de-pluralization) or "l33t" matching

Here is a summary of the included rules, and which record fields they
run against:

```
- longer text: "worst" tokens, or slur fuzzy match
x post: body, alt-text (UPDATE: only "worst" tokens and subset of fuzzy match)
x profile: description
x record: list+feed descriptions
TODO: mod service description
- short names: "bad" tokens, or slur fuzzy match
x profile: just display-name
x other records: list+feed display names
TODO: mod service display name
- identifier: "bad" tokens (identifier split to tokens); or slur fuzzy match
x identity: handle, DID (UPDATE: only on did:web, not did:plc; fuzzy too many false-positives on random base32)
x any record: record-key (UPDATE: only full-identifier fuzzy on rkey, not subset)
- tags: "bad" hashtags, "bad" tokens, or slur fuzzy match
x post (in-text or external)
NOTE: profile doesn't do richtext or tags yet
```

Mod service records haven't been spec'd yet, but there is a clear place
to handle them, same as feed gen records and mod list records.

The "bad" word list is a superset of the "worst" word list. The longer
list includes words like "nazi" which are common in conversation, but
should be flagged in the context of identifiers or names.

In all the above cases, the action will be to report for human mod
review.

Keep in mind that our client and PDS implementations will continue to
prevent taking actions like:

- signup or changing handle to a slur
- using slurs in the "short names" cases above when creating or updating
records

We have an existing mod action which looks for reply posts which are a
single word/token, and that token is any of the "bad" words. We may want
to escalate the action strength for that case, when there is not a
follow relationship, to an immediate label on the reply post, as that is
a pretty strong confidence of harassment.

authored by

bnewbold and committed by

GitHub 2 years ago 7f2ed82f 0051bd3e

+705 -45

18 changed files

expand all

Configure Feed

Configure Feed