this repo has no description
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

automod: expanded keyword and slur detection (#524)

The motivation is to move slur detection out of appview and in to
automod. A secondary goal is to reduce the extremely high current rate
of false-positives (reports every minute or so, almost never actioned).
And finally, to flag in a couple new locations, like did:web
identifiers, or post alt-text.

The `automod/keyword` package provides a couple new string-processing
helpers:

- slur "fuzzy matching" using fixed regexes for a small number of
explicit slurs. the regexes have been tweaks to reduce false-positives
in some cases. these operate on "slugified" strings, which have had all
whitespace and non-letter characters removed, and should match regadless
of diacritics and most "l33t" transformations
- tokenization of both identifiers and general text (different
tokenizers), with some basic normalization (eg, unicode folding,
lowercase, remove non-letter characters). intended to then be used to
match against configurable lists of slurs and offensive words. does not
(yet) do robust stemming (eg, de-pluralization) or "l33t" matching

Here is a summary of the included rules, and which record fields they
run against:

```
- longer text: "worst" tokens, or slur fuzzy match
x post: body, alt-text (UPDATE: only "worst" tokens and subset of fuzzy match)
x profile: description
x record: list+feed descriptions
TODO: mod service description
- short names: "bad" tokens, or slur fuzzy match
x profile: just display-name
x other records: list+feed display names
TODO: mod service display name
- identifier: "bad" tokens (identifier split to tokens); or slur fuzzy match
x identity: handle, DID (UPDATE: only on did:web, not did:plc; fuzzy too many false-positives on random base32)
x any record: record-key (UPDATE: only full-identifier fuzzy on rkey, not subset)
- tags: "bad" hashtags, "bad" tokens, or slur fuzzy match
x post (in-text or external)
NOTE: profile doesn't do richtext or tags yet
```

Mod service records haven't been spec'd yet, but there is a clear place
to handle them, same as feed gen records and mod list records.

The "bad" word list is a superset of the "worst" word list. The longer
list includes words like "nazi" which are common in conversation, but
should be flagged in the context of identifiers or names.

In all the above cases, the action will be to report for human mod
review.

Keep in mind that our client and PDS implementations will continue to
prevent taking actions like:

- signup or changing handle to a slur
- using slurs in the "short names" cases above when creating or updating
records

We have an existing mod action which looks for reply posts which are a
single word/token, and that token is any of the "bad" words. We may want
to escalate the action strength for that case, when there is not a
follow relationship, to an immediate label on the reply post, as that is
a pretty strong confidence of harassment.

authored by

bnewbold and committed by
GitHub
7f2ed82f 0051bd3e

+705 -45
+14 -4
automod/engine/engine.go
··· 53 53 ctx, cancel := context.WithTimeout(ctx, identityEventTimeout) 54 54 defer cancel() 55 55 56 + // first purge any caches; we need to re-resolve from scratch on identity updates 57 + if err := eng.PurgeAccountCaches(ctx, did); err != nil { 58 + eng.Logger.Error("failed to purge identity cache; identity rule may not run correctly", "err", err) 59 + } 56 60 ident, err := eng.Directory.LookupDID(ctx, did) 57 61 if err != nil { 58 62 return fmt.Errorf("resolving identity: %w", err) ··· 70 74 return fmt.Errorf("rule execution failed: %w", err) 71 75 } 72 76 eng.CanonicalLogLineAccount(&ac) 73 - eng.PurgeAccountCaches(ctx, am.Identity.DID) 74 77 if err := eng.persistAccountModActions(&ac); err != nil { 75 78 return fmt.Errorf("failed to persist actions for identity event: %w", err) 76 79 } ··· 125 128 eng.CanonicalLogLineRecord(&rc) 126 129 // purge the account meta cache when profile is updated 127 130 if rc.RecordOp.Collection == "app.bsky.actor.profile" { 128 - eng.PurgeAccountCaches(ctx, am.Identity.DID) 131 + if err := eng.PurgeAccountCaches(ctx, op.DID); err != nil { 132 + eng.Logger.Error("failed to purge identity cache", "err", err) 133 + } 129 134 } 130 135 if err := eng.persistRecordModActions(&rc); err != nil { 131 136 return fmt.Errorf("failed to persist actions for record event: %w", err) ··· 182 187 183 188 // Purge metadata caches for a specific account. 184 189 func (e *Engine) PurgeAccountCaches(ctx context.Context, did syntax.DID) error { 185 - e.Directory.Purge(ctx, did.AtIdentifier()) 186 - return e.Cache.Purge(ctx, "acct", did.String()) 190 + e.Logger.Debug("purging account caches", "did", did.String()) 191 + dirErr := e.Directory.Purge(ctx, did.AtIdentifier()) 192 + cacheErr := e.Cache.Purge(ctx, "acct", did.String()) 193 + if dirErr != nil { 194 + return dirErr 195 + } 196 + return cacheErr 187 197 } 188 198 189 199 func (e *Engine) CanonicalLogLineAccount(c *AccountContext) {
+5
automod/engine/testing.go
··· 47 47 sets := setstore.NewMemSetStore() 48 48 sets.Sets["bad-hashtags"] = make(map[string]bool) 49 49 sets.Sets["bad-hashtags"]["slur"] = true 50 + sets.Sets["bad-words"] = make(map[string]bool) 51 + sets.Sets["bad-words"]["hardr"] = true 52 + sets.Sets["bad-words"]["hardestr"] = true 53 + sets.Sets["worst-words"] = make(map[string]bool) 54 + sets.Sets["worst-words"]["hardestr"] = true 50 55 dir := identity.NewMockDirectory() 51 56 id1 := identity.Identity{ 52 57 DID: syntax.DID("did:plc:abc111"),
+94
automod/keyword/cmd/kw-cli/main.go
··· 1 + package main 2 + 3 + import ( 4 + "bufio" 5 + "context" 6 + "fmt" 7 + "log/slog" 8 + "os" 9 + 10 + "github.com/bluesky-social/indigo/automod/keyword" 11 + "github.com/bluesky-social/indigo/automod/setstore" 12 + 13 + "github.com/urfave/cli/v2" 14 + ) 15 + 16 + func main() { 17 + app := cli.App{ 18 + Name: "kw-cli", 19 + Usage: "informal debugging CLI tool for keyword matching", 20 + } 21 + app.Commands = []*cli.Command{ 22 + &cli.Command{ 23 + Name: "fuzzy", 24 + Usage: "reads lines of text from stdin, runs regex fuzzy matching, outputs matches", 25 + Action: runFuzzy, 26 + }, 27 + &cli.Command{ 28 + Name: "tokens", 29 + Usage: "reads lines of text from stdin, tokenizes and matches against set", 30 + Action: runTokens, 31 + Flags: []cli.Flag{ 32 + &cli.StringFlag{ 33 + Name: "json-set-file", 34 + Usage: "path to JSON file containing bad word sets", 35 + Value: "automod/rules/example_sets.json", 36 + }, 37 + &cli.StringFlag{ 38 + Name: "set-name", 39 + Usage: "which set within the set file to use", 40 + Value: "bad-words", 41 + }, 42 + &cli.BoolFlag{ 43 + Name: "identifiers", 44 + Usage: "whether to parse the line as identifiers (instead of text)", 45 + }, 46 + }, 47 + }, 48 + } 49 + h := slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelDebug}) 50 + slog.SetDefault(slog.New(h)) 51 + app.RunAndExitOnError() 52 + } 53 + 54 + func runFuzzy(cctx *cli.Context) error { 55 + scanner := bufio.NewScanner(os.Stdin) 56 + for scanner.Scan() { 57 + line := scanner.Text() 58 + word := keyword.SlugContainsExplicitSlur(keyword.Slugify(line)) 59 + if word != "" { 60 + fmt.Printf("MATCH\t%s\t%s\n", word, line) 61 + } 62 + } 63 + return nil 64 + } 65 + 66 + func runTokens(cctx *cli.Context) error { 67 + ctx := context.Background() 68 + sets := setstore.NewMemSetStore() 69 + if err := sets.LoadFromFileJSON(cctx.String("json-set-file")); err != nil { 70 + return err 71 + } 72 + setName := cctx.String("set-name") 73 + identMode := cctx.Bool("identifiers") 74 + scanner := bufio.NewScanner(os.Stdin) 75 + for scanner.Scan() { 76 + line := scanner.Text() 77 + var tokens []string 78 + if identMode { 79 + tokens = keyword.TokenizeIdentifier(line) 80 + } else { 81 + tokens = keyword.TokenizeText(line) 82 + } 83 + for _, tok := range tokens { 84 + match, err := sets.InSet(ctx, setName, tok) 85 + if err != nil { 86 + return err 87 + } 88 + if match { 89 + fmt.Printf("MATCH\t%s\t%s\n", tok, line) 90 + } 91 + } 92 + } 93 + return nil 94 + }
+2
automod/keyword/doc.go
··· 1 + // String processing helpers for doing fuzzy detection and normalized token matching against keyword lists. 2 + package keyword
+11
automod/keyword/keyword.go
··· 1 + package keyword 2 + 3 + // Helper to check a single token against a list of tokens 4 + func TokenInSet(tok string, set []string) bool { 5 + for _, v := range set { 6 + if tok == v { 7 + return true 8 + } 9 + } 10 + return false 11 + }
+20
automod/keyword/keyword_test.go
··· 1 + package keyword 2 + 3 + import ( 4 + "testing" 5 + 6 + "github.com/stretchr/testify/assert" 7 + ) 8 + 9 + func TestTokenInSet(t *testing.T) { 10 + assert := assert.New(t) 11 + 12 + keywords := []string{ 13 + "example", 14 + "bunch", 15 + } 16 + 17 + assert.True(TokenInSet("example", keywords)) 18 + assert.False(TokenInSet("Example", keywords)) 19 + assert.False(TokenInSet("elephant", keywords)) 20 + }
+13
automod/keyword/slugify.go
··· 1 + package keyword 2 + 3 + import ( 4 + "regexp" 5 + "strings" 6 + ) 7 + 8 + var nonSlugChars = regexp.MustCompile(`[^\pL\pN]+`) 9 + 10 + // Takes an arbitrary string (eg, an identifier or free-form text) and returns a version with all non-letter, non-digit characters removed, and all lower-case 11 + func Slugify(orig string) string { 12 + return strings.ToLower(nonSlugChars.ReplaceAllString(orig, "")) 13 + }
+42
automod/keyword/slur_regex.go
··· 1 + package keyword 2 + 3 + import ( 4 + "regexp" 5 + ) 6 + 7 + // regexes taken from: https://github.com/Blank-Cheque/Slurs 8 + var explicitSlurRegexes = map[string]*regexp.Regexp{ 9 + "chink": regexp.MustCompile("[cĆćĈĉČčĊċÇçḈḉȻȼꞒꞓꟄꞔƇƈɕ][hĤĥȞȟḦḧḢḣḨḩḤḥḪḫH̱ẖĦħⱧⱨꞪɦꞕΗНн][iÍíi̇́Ììi̇̀ĬĭÎîǏǐÏïḮḯĨĩi̇̃ĮįĮ́į̇́Į̃į̇̃ĪīĪ̀ī̀ỈỉȈȉI̋i̋ȊȋỊịꞼꞽḬḭƗɨᶖİiIıIi1lĺľļḷḹl̃ḽḻłŀƚꝉⱡɫɬꞎꬷꬸꬹᶅɭȴLl][nŃńǸǹŇňÑñṄṅŅņṆṇṊṋṈṉN̈n̈ƝɲŊŋꞐꞑꞤꞥᵰᶇɳȵꬻꬼИиПпNn][kḰḱǨǩĶķḲḳḴḵƘƙⱩⱪᶄꝀꝁꝂꝃꝄꝅꞢꞣ][sŚśṤṥŜŝŠšṦṧṠṡŞşṢṣṨṩȘșS̩s̩ꞨꞩⱾȿꟅʂᶊᵴ]?"), 10 + // modified to not match "cocoon", "raccoon", "racoon", or "tycoon" 11 + "coon": regexp.MustCompile("(^|[^cayo])[cĆćĈĉČčĊċÇçḈḉȻȼꞒꞓꟄꞔƇƈɕ][ÓóÒòŎŏÔôỐốỒồỖỗỔổǑǒÖöȪȫŐőÕõṌṍṎṏȬȭȮȯO͘o͘ȰȱØøǾǿǪǫǬǭŌōṒṓṐṑỎỏȌȍȎȏƠơỚớỜờỠỡỞởỢợỌọỘộO̩o̩Ò̩ò̩Ó̩ó̩ƟɵꝊꝋꝌꝍⱺOo0]{2}[nŃńǸǹŇňÑñṄṅŅņṆṇṊṋṈṉN̈n̈ƝɲŊŋꞐꞑꞤꞥᵰᶇɳȵꬻꬼИиПпNn][sŚśṤṥŜŝŠšṦṧṠṡŞşṢṣṨṩȘșS̩s̩ꞨꞩⱾȿꟅʂᶊᵴ]?"), 12 + "faggot": regexp.MustCompile("[fḞḟƑƒꞘꞙᵮᶂ][aÁáÀàĂăẮắẰằẴẵẲẳÂâẤấẦầẪẫẨẩǍǎÅåǺǻÄäǞǟÃãȦȧǠǡĄąĄ́ą́Ą̃ą̃ĀāĀ̀ā̀ẢảȀȁA̋a̋ȂȃẠạẶặẬậḀḁȺⱥꞺꞻᶏẚAa@4][gǴǵĞğĜĝǦǧĠġG̃g̃ĢģḠḡǤǥꞠꞡƓɠᶃꬶGgqꝖꝗꝘꝙɋʠ]{2}([ÓóÒòŎŏÔôỐốỒồỖỗỔổǑǒÖöȪȫŐőÕõṌṍṎṏȬȭȮȯO͘o͘ȰȱØøǾǿǪǫǬǭŌōṒṓṐṑỎỏȌȍȎȏƠơỚớỜờỠỡỞởỢợỌọỘộO̩o̩Ò̩ò̩Ó̩ó̩ƟɵꝊꝋꝌꝍⱺOo0e3ЄєЕеÉéÈèĔĕÊêẾếỀềỄễỂểÊ̄ê̄Ê̌ê̌ĚěËëẼẽĖėĖ́ė́Ė̃ė̃ȨȩḜḝĘęĘ́ę́Ę̃ę̃ĒēḖḗḔḕẺẻȄȅE̋e̋ȆȇẸẹỆệḘḙḚḛɆɇE̩e̩È̩è̩É̩é̩ᶒⱸꬴꬳEeiÍíi̇́Ììi̇̀ĬĭÎîǏǐÏïḮḯĨĩi̇̃ĮįĮ́į̇́Į̃į̇̃ĪīĪ̀ī̀ỈỉȈȉI̋i̋ȊȋỊịꞼꞽḬḭƗɨᶖİiIıIi1lĺľļḷḹl̃ḽḻłŀƚꝉⱡɫɬꞎꬷꬸꬹᶅɭȴLl][tŤťṪṫŢţṬṭȚțṰṱṮṯŦŧȾⱦ ƬƭƮʈT̈ẗᵵƫȶ]{1,2}([rŔŕŘřṘṙŖŗȐȑȒȓṚṛṜṝṞṟR̃r̃ɌɍꞦꞧⱤɽᵲᶉꭉ][yÝýỲỳŶŷY̊ẙŸÿỸỹẎẏȲȳỶỷỴỵɎɏƳƴỾỿ]|[rŔŕŘřṘṙŖŗȐȑȒȓṚṛṜṝṞṟR̃r̃ɌɍꞦꞧⱤɽᵲᶉꭉ][iÍíi̇́Ììi̇̀ĬĭÎîǏǐÏïḮḯĨĩi̇̃ĮįĮ́į̇́Į̃į̇̃ĪīĪ̀ī̀ỈỉȈȉI̋i̋ȊȋỊịꞼꞽḬḭƗɨᶖİiIıIi1lĺľļḷḹl̃ḽḻłŀƚꝉⱡɫɬꞎꬷꬸꬹᶅɭȴLl][e3ЄєЕеÉéÈèĔĕÊêẾế ỀềỄễỂểÊ̄ê̄Ê̌ê̌ĚěËëẼẽĖėĖ́ė́Ė̃ė̃ȨȩḜḝĘęĘ́ę́Ę̃ę̃ĒēḖḗḔḕẺẻȄȅE̋e̋ȆȇẸẹỆệḘḙḚḛɆɇE̩e̩È̩è̩É̩é̩ᶒⱸꬴꬳEe])?)?[sŚśṤṥŜŝŠšṦṧṠṡŞşṢṣṨṩȘșS̩s̩ꞨꞩⱾȿꟅʂᶊᵴ]?"), 13 + "kike": regexp.MustCompile("[kḰḱǨǩĶķḲḳḴḵƘƙⱩⱪᶄꝀꝁꝂꝃꝄꝅꞢꞣ][iÍíi̇́Ììi̇̀ĬĭÎîǏǐÏïḮḯĨĩi̇̃ĮįĮ́į̇́Į̃į̇̃ĪīĪ̀ī̀ỈỉȈȉI̋i̋ȊȋỊịꞼꞽḬḭƗɨᶖİiIıIi1lĺľļḷḹl̃ḽḻłŀƚꝉⱡɫɬꞎꬷꬸꬹᶅɭȴLlyÝýỲỳŶŷY̊ẙŸÿỸỹẎẏȲȳỶỷỴỵɎɏƳƴỾỿ][kḰḱǨǩĶķḲḳḴḵƘƙⱩⱪᶄꝀꝁꝂꝃꝄꝅꞢꞣ][e3ЄєЕеÉéÈèĔĕÊêẾếỀềỄễỂểÊ̄ê̄Ê̌ê̌ĚěËëẼẽĖėĖ́ė́Ė̃ė̃ȨȩḜḝ ĘęĘ́ę́Ę̃ę̃ĒēḖḗḔḕẺẻȄȅE̋e̋ȆȇẸẹỆệḘḙḚḛɆɇE̩e̩È̩è̩É̩é̩ᶒⱸꬴꬳEe]([rŔŕŘřṘṙŖŗȐȑȒȓṚṛṜṝṞṟR̃r̃ɌɍꞦꞧⱤɽᵲᶉꭉ][yÝýỲỳŶŷY̊ẙŸÿỸỹẎẏȲȳỶỷỴỵɎɏƳƴỾỿ]|[rŔŕŘřṘṙŖŗȐȑȒȓṚṛṜṝṞṟR̃r̃ɌɍꞦꞧⱤɽᵲᶉꭉ][iÍíi̇́Ììi̇̀ĬĭÎîǏǐÏïḮḯĨĩi̇̃ĮįĮ́į̇́Į̃į̇̃ĪīĪ̀ī̀ỈỉȈȉI̋i̋ȊȋỊịꞼꞽḬḭƗɨᶖİiIıIi1lĺľļḷḹl̃ḽḻłŀƚꝉⱡɫ ɬꞎꬷꬸꬹᶅɭȴLl][e3ЄєЕеÉéÈèĔĕÊêẾếỀềỄễỂểÊ̄ê̄Ê̌ê̌ĚěËëẼẽĖėĖ́ė́Ė̃ė̃ȨȩḜḝĘęĘ́ę́Ę̃ę̃ĒēḖḗḔḕẺẻȄȅE̋e̋ȆȇẸẹỆệḘḙḚḛɆɇE̩e̩È̩è̩É̩é̩ᶒⱸꬴꬳEe])?[sŚśṤṥŜŝŠšṦṧṠṡŞşṢṣṨṩȘșS̩s̩ꞨꞩⱾȿꟅʂᶊᵴ]*"), 14 + // modified to not match "snigger" 15 + "nigger": regexp.MustCompile("(^|[^s])[nŃńǸǹŇňÑñṄṅŅņṆṇṊṋṈṉN̈n̈ƝɲŊŋꞐꞑꞤꞥᵰᶇɳȵꬻꬼИиПпNn][iÍíi̇́Ììi̇̀ĬĭÎîǏǐÏïḮḯĨĩi̇̃ĮįĮ́į̇́Į̃į̇̃ĪīĪ̀ī̀ỈỉȈȉI̋i̋ȊȋỊịꞼꞽḬḭƗɨᶖİiIıIi1lĺľļḷḹl̃ḽḻłŀƚꝉⱡɫɬꞎꬷꬸꬹᶅɭȴLloÓóÒòŎŏÔôỐốỒồỖỗỔổǑǒÖöȪȫŐőÕõṌṍṎṏȬȭȮȯO͘o͘ȰȱØøǾǿǪǫǬǭŌōṒṓṐṑỎỏȌȍȎȏƠơỚớỜờỠỡỞởỢợỌọỘộO̩o̩Ò̩ ò̩Ó̩ó̩ƟɵꝊꝋꝌꝍⱺOoІіa4ÁáÀàĂăẮắẰằẴẵẲẳÂâẤấẦầẪẫẨẩǍǎÅåǺǻÄäǞǟÃãȦȧǠǡĄąĄ́ą́Ą̃ą̃ĀāĀ̀ā̀ẢảȀȁA̋a̋ȂȃẠạẶặẬậḀḁȺⱥꞺꞻᶏẚAa][gǴǵĞğĜĝǦǧĠġG̃g̃ĢģḠḡǤǥꞠꞡƓɠᶃꬶGgqꝖꝗꝘꝙɋʠ]{2}([e3ЄєЕеÉéÈèĔĕÊêẾếỀềỄễỂểÊ̄ê̄Ê̌ê̌ĚěËëẼẽĖėĖ́ė́Ė̃ė̃ȨȩḜḝĘęĘ́ę́Ę̃ę̃ĒēḖḗḔḕẺẻȄȅE̋e̋ȆȇẸẹỆệḘḙḚḛɆɇE̩e̩È̩è̩É̩é̩ᶒⱸꬴꬳEeaÁáÀàĂăẮắẰằẴẵẲẳÂâẤấẦầẪẫẨẩǍǎÅåǺǻÄäǞǟÃãȦȧǠǡĄąĄ́ą́Ą̃ą̃ĀāĀ̀ā̀ẢảȀȁA̋a̋ȂȃẠạẶặẬậḀḁȺⱥꞺꞻᶏẚAa][rŔŕŘřṘṙŖŗȐȑȒȓṚṛṜṝṞṟR̃r̃ɌɍꞦꞧⱤɽᵲᶉꭉ ]?|n[ÓóÒòŎŏÔôỐốỒồỖỗỔổǑǒÖöȪȫŐőÕõṌṍṎṏȬȭȮȯO͘o͘ȰȱØøǾǿǪǫǬǭŌōṒṓṐṑỎỏȌȍȎȏƠơỚớỜờỠỡỞởỢợỌọỘộO̩o̩Ò̩ò̩Ó̩ó̩ƟɵꝊꝋꝌꝍⱺOo0][gǴǵĞğĜĝǦǧĠġG̃g̃ĢģḠḡǤǥꞠꞡƓɠᶃꬶGgqꝖꝗꝘꝙɋʠ]|[a4ÁáÀàĂăẮắẰằẴẵẲẳÂâẤấẦầẪẫẨẩǍǎÅåǺǻÄäǞǟÃãȦȧǠǡĄąĄ́ą́Ą̃ą̃ĀāĀ̀ā̀ẢảȀȁA̋a̋ȂȃẠạẶặẬậḀḁȺⱥꞺꞻᶏẚ Aa])[sŚśṤṥŜŝŠšṦṧṠṡŞşṢṣṨṩȘșS̩s̩ꞨꞩⱾȿꟅʂᶊᵴ]?"), 16 + "tranny": regexp.MustCompile("[tŤťṪṫŢţṬṭȚțṰṱṮṯŦŧȾⱦƬƭƮʈT̈ẗᵵƫȶ][rŔŕŘřṘṙŖŗȐȑȒȓṚṛṜṝṞṟR̃r̃ɌɍꞦꞧⱤɽᵲᶉꭉ][aÁáÀàĂăẮắẰằẴẵẲẳÂâẤấẦầẪẫẨẩǍǎÅåǺǻÄäǞǟÃãȦȧǠǡĄąĄ́ą́Ą̃ą̃ĀāĀ̀ā̀ẢảȀȁA̋a̋ȂȃẠạẶặẬậḀḁȺⱥꞺꞻᶏẚAa4]+[nŃńǸǹŇňÑñṄṅŅņṆṇṊṋṈṉN̈n̈ƝɲŊŋꞐꞑꞤꞥᵰᶇɳȵꬻꬼИиПпNn]{1,2}([iÍíi̇́Ììi̇̀ĬĭÎîǏ ǐÏïḮḯĨĩi̇̃ĮįĮ́į̇́Į̃į̇̃ĪīĪ̀ī̀ỈỉȈȉI̋i̋ȊȋỊịꞼꞽḬḭƗɨᶖİiIıIi1lĺľļḷḹl̃ḽḻłŀƚꝉⱡɫɬꞎꬷꬸꬹᶅɭȴLl][e3ЄєЕеÉéÈèĔĕÊêẾếỀềỄễỂểÊ̄ê̄Ê̌ê̌ĚěËëẼẽĖėĖ́ė́Ė̃ė̃ȨȩḜḝĘęĘ́ę́Ę̃ę̃ĒēḖḗḔḕẺẻȄȅE̋e̋ȆȇẸẹỆệḘḙḚḛɆɇE̩e̩È̩è̩É̩é̩ᶒⱸꬴꬳEe]|[yÝýỲỳŶŷY̊ẙŸÿỸỹẎẏȲȳỶỷỴỵɎɏƳƴỾỿ]|[e3ЄєЕеÉéÈèĔĕÊêẾếỀềỄễ ỂểÊ̄ê̄Ê̌ê̌ĚěËëẼẽĖėĖ́ė́Ė̃ė̃ȨȩḜḝĘęĘ́ę́Ę̃ę̃ĒēḖḗḔḕẺẻȄȅE̋e̋ȆȇẸẹỆệḘḙḚḛɆɇE̩e̩È̩è̩É̩é̩ᶒⱸꬴꬳEe][rŔŕŘřṘṙŖŗȐȑȒȓṚṛṜṝṞṟR̃r̃ɌɍꞦꞧⱤɽᵲᶉꭉ])[sŚśṤṥŜŝŠšṦṧṠṡŞşṢṣṨṩȘșS̩s̩ꞨꞩⱾȿꟅʂᶊᵴ]?"), 17 + } 18 + 19 + // For a small set of frequently-abused explicit slurs, checks for a of permissive set of "l33t-speak" variations of the keyword. This is intended to be used with pre-processed "slugs", which are strings with all whitespace, punctuation, and other characters removed. These could be pre-processed identifiers (like handles or record keys), or pre-processed free-form text. 20 + // 21 + // If there is a match, returns a plan-text version of the slur. 22 + // 23 + // This is a loose port of the 'hasExplicitSlur' function from the `@atproto/pds` TypeScript package. 24 + func SlugContainsExplicitSlur(raw string) string { 25 + for word, r := range explicitSlurRegexes { 26 + if r.MatchString(raw) { 27 + return word 28 + } 29 + } 30 + return "" 31 + } 32 + 33 + // Variant of `SlugContainsExplicitSlur` where the entire slug must match. 34 + func SlugIsExplicitSlur(raw string) string { 35 + for word, r := range explicitSlurRegexes { 36 + m := r.FindString(raw) 37 + if m != "" && m == raw { 38 + return word 39 + } 40 + } 41 + return "" 42 + }
+69
automod/keyword/slur_regex_test.go
··· 1 + package keyword 2 + 3 + import ( 4 + "testing" 5 + 6 + "github.com/stretchr/testify/assert" 7 + ) 8 + 9 + func TestSlugContainsExplicitSlur(t *testing.T) { 10 + assert := assert.New(t) 11 + 12 + fixtures := []struct { 13 + text string 14 + contains string 15 + is string 16 + }{ 17 + {contains: "", is: "", text: ""}, 18 + {contains: "", is: "", text: "hello"}, 19 + {contains: "chink", is: "chink", text: "chink"}, 20 + {contains: "faggot", is: "faggot", text: "faggot"}, 21 + {contains: "faggot", is: "faggot", text: "f4gg0t"}, 22 + {contains: "coon", is: "coon", text: "coon"}, 23 + {contains: "coon", is: "coon", text: "coons"}, 24 + {contains: "", is: "", text: "raccoon"}, 25 + {contains: "", is: "", text: "racoon"}, 26 + {contains: "", is: "", text: "tycoon"}, 27 + {contains: "", is: "", text: "cocoon"}, 28 + {contains: "kike", is: "kike", text: "kike"}, 29 + {contains: "nigger", is: "nigger", text: "nigger"}, 30 + {contains: "nigger", is: "nigger", text: "niggers"}, 31 + {contains: "nigger", is: "nigger", text: "n1gg4"}, 32 + {contains: "nigger", is: "nigger", text: "niggas"}, 33 + {contains: "", is: "", text: "niggle"}, 34 + {contains: "", is: "", text: "niggling"}, 35 + {contains: "", is: "", text: "snigger"}, 36 + {contains: "tranny", is: "tranny", text: "tranny"}, 37 + {contains: "tranny", is: "tranny", text: "trannie"}, 38 + {contains: "tranny", is: "", text: "blahtrannie"}, 39 + } 40 + 41 + for _, fix := range fixtures { 42 + assert.Equal(fix.contains, SlugContainsExplicitSlur(fix.text)) 43 + assert.Equal(fix.is, SlugIsExplicitSlur(fix.text)) 44 + } 45 + } 46 + 47 + func TestStringContainsExplicitSlur(t *testing.T) { 48 + assert := assert.New(t) 49 + 50 + fixtures := []struct { 51 + text string 52 + out string 53 + }{ 54 + {out: "", text: ""}, 55 + {out: "", text: "hello"}, 56 + {out: "chink", text: "CHINK"}, 57 + {out: "faggot", text: "f-a-g-g-o-t"}, 58 + {out: "faggot", text: "f a g g o t"}, 59 + {out: "faggot", text: "f\na\ng\ng\no\nt"}, 60 + {out: "kike", text: "kike"}, 61 + {out: "nigger", text: "niggers"}, 62 + {out: "nigger", text: "niggers.bsky.social"}, 63 + {out: "tranny", text: "trannie"}, 64 + } 65 + 66 + for _, fix := range fixtures { 67 + assert.Equal(fix.out, SlugContainsExplicitSlur(Slugify(fix.text))) 68 + } 69 + }
+51
automod/keyword/tokenize.go
··· 1 + package keyword 2 + 3 + import ( 4 + "log/slog" 5 + "regexp" 6 + "strings" 7 + "unicode" 8 + 9 + "golang.org/x/text/runes" 10 + "golang.org/x/text/transform" 11 + "golang.org/x/text/unicode/norm" 12 + ) 13 + 14 + var ( 15 + puncChars = regexp.MustCompile(`[[:punct:]]+`) 16 + nonTokenChars = regexp.MustCompile(`[^\pL\pN\s]+`) 17 + normFunc = transform.Chain(norm.NFD, runes.Remove(runes.In(unicode.Mn)), norm.NFC) 18 + ) 19 + 20 + // Splits free-form text in to tokens, including lower-case, unicode normalization, and some unicode folding. 21 + // 22 + // The intent is for this to work similarly to an NLP tokenizer, as might be used in a fulltext search engine, and enable fast matching to a list of known tokens. It might eventually even do stemming, removing pluralization (trailing "s" for English), etc. 23 + func TokenizeText(text string) []string { 24 + split := strings.ToLower(nonTokenChars.ReplaceAllString(text, " ")) 25 + bare := strings.ToLower(nonTokenChars.ReplaceAllString(split, "")) 26 + norm, _, err := transform.String(normFunc, bare) 27 + if err != nil { 28 + slog.Warn("unicode normalization error", "err", err) 29 + norm = bare 30 + } 31 + return strings.Fields(norm) 32 + } 33 + 34 + func splitIdentRune(c rune) bool { 35 + return !unicode.IsLetter(c) && !unicode.IsNumber(c) 36 + } 37 + 38 + // Splits an identifier in to tokens. Removes any single-character tokens. 39 + // 40 + // For example, the-handle.bsky.social would be split in to ["the", "handle", "bsky", "social"] 41 + func TokenizeIdentifier(orig string) []string { 42 + fields := strings.FieldsFunc(orig, splitIdentRune) 43 + out := make([]string, 0, len(fields)) 44 + for _, v := range fields { 45 + tok := Slugify(v) 46 + if len(tok) > 1 { 47 + out = append(out, tok) 48 + } 49 + } 50 + return out 51 + }
+42
automod/keyword/tokenize_test.go
··· 1 + package keyword 2 + 3 + import ( 4 + "testing" 5 + 6 + "github.com/stretchr/testify/assert" 7 + ) 8 + 9 + func TestTokenizeText(t *testing.T) { 10 + assert := assert.New(t) 11 + 12 + fixtures := []struct { 13 + text string 14 + out []string 15 + }{ 16 + {text: "", out: []string{}}, 17 + {text: "Hello, โลก!", out: []string{"hello", "โลก"}}, 18 + {text: "Gdańsk", out: []string{"gdansk"}}, 19 + {text: " foo1;bar2,baz3...", out: []string{"foo1", "bar2", "baz3"}}, 20 + } 21 + 22 + for _, fix := range fixtures { 23 + assert.Equal(fix.out, TokenizeText(fix.text)) 24 + } 25 + } 26 + 27 + func TestTokenizeIdentifier(t *testing.T) { 28 + assert := assert.New(t) 29 + 30 + fixtures := []struct { 31 + ident string 32 + out []string 33 + }{ 34 + {ident: "", out: []string{}}, 35 + {ident: "the-handle.example.com", out: []string{"the", "handle", "example", "com"}}, 36 + {ident: "@a-b-c", out: []string{}}, 37 + } 38 + 39 + for _, fix := range fixtures { 40 + assert.Equal(fix.out, TokenizeIdentifier(fix.ident)) 41 + } 42 + }
+7 -3
automod/rules/all.go
··· 15 15 //AccountDemoPostRule, 16 16 AccountPrivateDemoPostRule, 17 17 GtubePostRule, 18 - KeywordPostRule, 19 - ReplySingleKeywordPostRule, 18 + BadWordPostRule, 19 + ReplySingleBadWordPostRule, 20 20 AggressivePromotionRule, 21 21 IdenticalReplyPostRule, 22 22 DistinctMentionsRule, ··· 24 24 }, 25 25 ProfileRules: []automod.ProfileRuleFunc{ 26 26 GtubeProfileRule, 27 - KeywordProfileRule, 27 + BadWordProfileRule, 28 28 }, 29 29 RecordRules: []automod.RecordRuleFunc{ 30 30 InteractionChurnRule, 31 + BadWordRecordKeyRule, 32 + BadWordOtherRecordRule, 31 33 }, 32 34 RecordDeleteRules: []automod.RecordRuleFunc{ 33 35 DeleteInteractionRule, 34 36 }, 35 37 IdentityRules: []automod.IdentityRuleFunc{ 36 38 NewAccountRule, 39 + BadWordHandleRule, 40 + BadWordDIDRule, 37 41 }, 38 42 BlobRules: []automod.BlobRuleFunc{ 39 43 //BlobVerifyRule,
+5 -2
automod/rules/example_sets.json
··· 1 1 { 2 2 "bad-hashtags": [ 3 - "slur", 4 3 "deathtooutgroup" 5 4 ], 6 5 "bad-words": [ 7 - "hardar" 6 + "hardar", 7 + "veryhardar" 8 + ], 9 + "worst-words": [ 10 + "veryhardar" 8 11 ], 9 12 "promo-domain": [ 10 13 "buy-crypto.example.com"
+11 -6
automod/rules/hashtags.go
··· 3 3 import ( 4 4 appbsky "github.com/bluesky-social/indigo/api/bsky" 5 5 "github.com/bluesky-social/indigo/automod" 6 + "github.com/bluesky-social/indigo/automod/keyword" 6 7 ) 7 - 8 - var _ automod.PostRuleFunc = BadHashtagsPostRule 9 8 10 9 // looks for specific hashtags from known lists 11 10 func BadHashtagsPostRule(c *automod.RecordContext, post *appbsky.FeedPost) error { 12 - for _, tag := range ExtractHashtags(post) { 11 + for _, tag := range ExtractHashtagsPost(post) { 13 12 tag = NormalizeHashtag(tag) 14 - if c.InSet("bad-hashtags", tag) { 13 + if c.InSet("bad-hashtags", tag) || c.InSet("bad-words", tag) { 15 14 c.AddRecordFlag("bad-hashtag") 16 15 c.Notify("slack") 17 16 break 18 17 } 18 + word := keyword.SlugContainsExplicitSlur(keyword.Slugify(tag)) 19 + if word != "" { 20 + c.AddAccountFlag("bad-hashtag") 21 + } 19 22 } 20 23 return nil 21 24 } 22 25 23 - var _ automod.PostRuleFunc = TooManyHashtagsPostRule 26 + var _ automod.PostRuleFunc = BadHashtagsPostRule 24 27 25 28 // if a post is "almost all" hashtags, it might be a form of search spam 26 29 func TooManyHashtagsPostRule(c *automod.RecordContext, post *appbsky.FeedPost) error { 27 - tags := ExtractHashtags(post) 30 + tags := ExtractHashtagsPost(post) 28 31 tagChars := 0 29 32 for _, tag := range tags { 30 33 tagChars += len(tag) ··· 40 43 } 41 44 return nil 42 45 } 46 + 47 + var _ automod.PostRuleFunc = TooManyHashtagsPostRule
+25 -15
automod/rules/helpers.go
··· 3 3 import ( 4 4 "fmt" 5 5 "regexp" 6 - "strings" 7 - "unicode" 8 6 9 7 appbsky "github.com/bluesky-social/indigo/api/bsky" 10 8 "github.com/bluesky-social/indigo/atproto/syntax" 11 9 "github.com/bluesky-social/indigo/automod" 10 + "github.com/bluesky-social/indigo/automod/keyword" 12 11 13 12 "github.com/spaolacci/murmur3" 14 13 ) ··· 25 24 return out 26 25 } 27 26 28 - func ExtractHashtags(post *appbsky.FeedPost) []string { 27 + func ExtractHashtagsPost(post *appbsky.FeedPost) []string { 29 28 var tags []string 30 29 for _, tag := range post.Tags { 31 30 tags = append(tags, tag) ··· 41 40 } 42 41 43 42 func NormalizeHashtag(raw string) string { 44 - return strings.ToLower(raw) 43 + return keyword.Slugify(raw) 45 44 } 46 45 47 46 type PostFacet struct { ··· 117 116 return dedupeStrings(out) 118 117 } 119 118 120 - // NOTE: this function has not been optimiszed at all! 121 - func ExtractTextTokens(raw string) []string { 122 - raw = strings.ToLower(raw) 123 - f := func(c rune) bool { 124 - return !unicode.IsLetter(c) && !unicode.IsNumber(c) 119 + func ExtractTextTokensPost(post *appbsky.FeedPost) []string { 120 + s := post.Text 121 + if post.Embed != nil { 122 + if post.Embed.EmbedImages != nil { 123 + for _, img := range post.Embed.EmbedImages.Images { 124 + if img.Alt != "" { 125 + s += " " + img.Alt 126 + } 127 + } 128 + } 129 + if post.Embed.EmbedRecordWithMedia != nil { 130 + media := post.Embed.EmbedRecordWithMedia.Media 131 + if media.EmbedImages != nil { 132 + for _, img := range media.EmbedImages.Images { 133 + if img.Alt != "" { 134 + s += " " + img.Alt 135 + } 136 + } 137 + } 138 + } 125 139 } 126 - return strings.FieldsFunc(raw, f) 127 - } 128 - 129 - func ExtractTextTokensPost(post *appbsky.FeedPost) []string { 130 - return ExtractTextTokens(post.Text) 140 + return keyword.TokenizeText(s) 131 141 } 132 142 133 143 func ExtractTextTokensProfile(profile *appbsky.ActorProfile) []string { ··· 138 148 if profile.DisplayName != nil { 139 149 s += " " + *profile.DisplayName 140 150 } 141 - return ExtractTextTokens(s) 151 + return keyword.TokenizeText(s) 142 152 } 143 153 144 154 // based on: https://stackoverflow.com/a/48769624, with no trailing period allowed
+2 -1
automod/rules/helpers_test.go
··· 3 3 import ( 4 4 "testing" 5 5 6 + "github.com/bluesky-social/indigo/automod/keyword" 6 7 "github.com/stretchr/testify/assert" 7 8 ) 8 9 ··· 28 29 } 29 30 30 31 for _, fix := range fixtures { 31 - assert.Equal(fix.out, ExtractTextTokens(fix.s)) 32 + assert.Equal(fix.out, keyword.TokenizeText(fix.s)) 32 33 } 33 34 } 34 35
+190 -14
automod/rules/keyword.go
··· 2 2 3 3 import ( 4 4 "fmt" 5 + "strings" 5 6 6 7 appbsky "github.com/bluesky-social/indigo/api/bsky" 7 8 "github.com/bluesky-social/indigo/automod" 9 + "github.com/bluesky-social/indigo/automod/keyword" 8 10 ) 9 11 10 - var _ automod.PostRuleFunc = KeywordPostRule 11 - 12 - func KeywordPostRule(c *automod.RecordContext, post *appbsky.FeedPost) error { 12 + func BadWordPostRule(c *automod.RecordContext, post *appbsky.FeedPost) error { 13 13 for _, tok := range ExtractTextTokensPost(post) { 14 - if c.InSet("bad-words", tok) { 15 - c.AddRecordFlag("bad-word") 16 - c.ReportRecord(automod.ReportReasonRude, fmt.Sprintf("bad-word: %s", tok)) 14 + word := keyword.SlugIsExplicitSlur(tok) 15 + // used very frequently in a reclaimed context 16 + if word != "" && word != "faggot" && word != "tranny" { 17 + c.AddRecordFlag("bad-word-text") 18 + // TODO: c.ReportRecord(automod.ReportReasonRude, fmt.Sprintf("bad word in post text: %s", word)) 19 + c.Notify("slack") 20 + break 21 + } 22 + // de-pluralize 23 + tok = strings.TrimSuffix(tok, "s") 24 + if c.InSet("worst-words", tok) { 25 + c.AddRecordFlag("bad-word-text") 26 + // TODO: c.ReportRecord(automod.ReportReasonRude, fmt.Sprintf("bad word in post text: %s", word)) 27 + c.Notify("slack") 17 28 break 18 29 } 19 30 } 20 31 return nil 21 32 } 22 33 23 - var _ automod.ProfileRuleFunc = KeywordProfileRule 34 + var _ automod.PostRuleFunc = BadWordPostRule 24 35 25 - func KeywordProfileRule(c *automod.RecordContext, profile *appbsky.ActorProfile) error { 36 + func BadWordProfileRule(c *automod.RecordContext, profile *appbsky.ActorProfile) error { 37 + if profile.DisplayName != nil { 38 + word := keyword.SlugContainsExplicitSlur(keyword.Slugify(*profile.DisplayName)) 39 + if word != "" { 40 + c.AddRecordFlag("bad-word-name") 41 + // TODO: c.ReportRecord(automod.ReportReasonRude, fmt.Sprintf("bad word in display name: %s", word)) 42 + c.Notify("slack") 43 + } 44 + } 26 45 for _, tok := range ExtractTextTokensProfile(profile) { 27 - if c.InSet("bad-words", tok) { 28 - c.AddRecordFlag("bad-word") 29 - c.ReportRecord(automod.ReportReasonRude, fmt.Sprintf("bad-word: %s", tok)) 46 + // de-pluralize 47 + tok = strings.TrimSuffix(tok, "s") 48 + if c.InSet("worst-words", tok) { 49 + c.AddRecordFlag("bad-word-text") 50 + // TODO: c.ReportRecord(automod.ReportReasonRude, fmt.Sprintf("bad word in profile description: %s", word)) 51 + c.Notify("slack") 30 52 break 31 53 } 32 54 } 33 55 return nil 34 56 } 35 57 36 - var _ automod.PostRuleFunc = ReplySingleKeywordPostRule 58 + var _ automod.ProfileRuleFunc = BadWordProfileRule 37 59 38 - func ReplySingleKeywordPostRule(c *automod.RecordContext, post *appbsky.FeedPost) error { 60 + // looks for the specific harassment situation of a replay to another user with only a single word 61 + func ReplySingleBadWordPostRule(c *automod.RecordContext, post *appbsky.FeedPost) error { 39 62 if post.Reply != nil && !IsSelfThread(c, post) { 40 63 tokens := ExtractTextTokensPost(post) 41 - if len(tokens) == 1 && c.InSet("bad-words", tokens[0]) { 64 + if len(tokens) != 1 { 65 + return nil 66 + } 67 + tok := tokens[0] 68 + if c.InSet("bad-words", tok) || keyword.SlugIsExplicitSlur(tok) != "" { 42 69 c.AddRecordFlag("reply-single-bad-word") 70 + c.ReportRecord(automod.ReportReasonRude, fmt.Sprintf("bad single-word reply: %s", tok)) 71 + c.Notify("slack") 43 72 } 44 73 } 45 74 return nil 46 75 } 76 + 77 + var _ automod.PostRuleFunc = ReplySingleBadWordPostRule 78 + 79 + // scans for bad keywords in records other than posts and profiles 80 + func BadWordOtherRecordRule(c *automod.RecordContext) error { 81 + name := "" 82 + text := "" 83 + switch c.RecordOp.Collection.String() { 84 + case "app.bsky.graph.list": 85 + list, ok := c.RecordOp.Value.(*appbsky.GraphList) 86 + if !ok { 87 + return fmt.Errorf("mismatch between collection (%s) and type", c.RecordOp.Collection) 88 + } 89 + name += " " + list.Name 90 + if list.Description != nil { 91 + text += " " + *list.Description 92 + } 93 + if list.Purpose != nil { 94 + text += " " + *list.Purpose 95 + } 96 + case "app.bsky.feed.generator": 97 + generator, ok := c.RecordOp.Value.(*appbsky.FeedGenerator) 98 + if !ok { 99 + return fmt.Errorf("mismatch between collection (%s) and type", c.RecordOp.Collection) 100 + } 101 + name += " " + generator.DisplayName 102 + if generator.Description != nil { 103 + text += " " + *generator.Description 104 + } 105 + } 106 + if name != "" { 107 + // check for explicit slurs or bad word tokens 108 + word := keyword.SlugContainsExplicitSlur(keyword.Slugify(name)) 109 + if word != "" { 110 + c.AddRecordFlag("bad-word-name") 111 + // TODO: c.ReportRecord(automod.ReportReasonRude, fmt.Sprintf("bad word in name: %s", tok)) 112 + c.Notify("slack") 113 + } 114 + tokens := keyword.TokenizeText(name) 115 + for _, tok := range tokens { 116 + if c.InSet("bad-words", tok) { 117 + c.AddRecordFlag("bad-word-name") 118 + // TODO: c.ReportRecord(automod.ReportReasonRude, fmt.Sprintf("bad word in name: %s", tok)) 119 + c.Notify("slack") 120 + break 121 + } 122 + } 123 + } 124 + if text != "" { 125 + // check for explicit slurs or worst word tokens 126 + word := keyword.SlugContainsExplicitSlur(keyword.Slugify(text)) 127 + if word != "" { 128 + c.AddRecordFlag("bad-word-text") 129 + // TODO: c.ReportRecord(automod.ReportReasonRude, fmt.Sprintf("bad word in description: %s", word)) 130 + c.Notify("slack") 131 + } 132 + tokens := keyword.TokenizeText(text) 133 + for _, tok := range tokens { 134 + // de-pluralize 135 + tok = strings.TrimSuffix(tok, "s") 136 + if c.InSet("worst-words", tok) { 137 + c.AddRecordFlag("bad-word-text") 138 + // TODO: c.ReportRecord(automod.ReportReasonRude, fmt.Sprintf("bad word in description: %s", tok)) 139 + c.Notify("slack") 140 + break 141 + } 142 + } 143 + } 144 + return nil 145 + } 146 + 147 + var _ automod.RecordRuleFunc = BadWordOtherRecordRule 148 + 149 + // scans the record-key for all records 150 + func BadWordRecordKeyRule(c *automod.RecordContext) error { 151 + // check record key 152 + word := keyword.SlugIsExplicitSlur(keyword.Slugify(c.RecordOp.RecordKey.String())) 153 + if word != "" { 154 + c.AddRecordFlag("bad-word-recordkey") 155 + // TODO: c.ReportRecord(automod.ReportReasonRude, fmt.Sprintf("bad word in record-key (URL): %s", word)) 156 + c.Notify("slack") 157 + } 158 + tokens := keyword.TokenizeIdentifier(c.RecordOp.RecordKey.String()) 159 + for _, tok := range tokens { 160 + if c.InSet("bad-words", tok) { 161 + c.AddRecordFlag("bad-word-recordkey") 162 + // TODO: c.ReportRecord(automod.ReportReasonRude, fmt.Sprintf("bad word in record-key (URL): %s", tok)) 163 + c.Notify("slack") 164 + break 165 + } 166 + } 167 + 168 + return nil 169 + } 170 + 171 + var _ automod.RecordRuleFunc = BadWordRecordKeyRule 172 + 173 + func BadWordHandleRule(c *automod.AccountContext) error { 174 + word := keyword.SlugContainsExplicitSlur(keyword.Slugify(c.Account.Identity.Handle.String())) 175 + if word != "" { 176 + c.AddAccountFlag("bad-word-handle") 177 + // TODO: c.ReportRecord(automod.ReportReasonRude, fmt.Sprintf("bad word in handle (username): %s", word)) 178 + c.Notify("slack") 179 + return nil 180 + } 181 + 182 + tokens := keyword.TokenizeIdentifier(c.Account.Identity.Handle.String()) 183 + for _, tok := range tokens { 184 + if c.InSet("bad-words", tok) { 185 + c.AddAccountFlag("bad-word-handle") 186 + // TODO: c.ReportRecord(automod.ReportReasonRude, fmt.Sprintf("bad word in handle (username): %s", tok)) 187 + c.Notify("slack") 188 + break 189 + } 190 + } 191 + 192 + return nil 193 + } 194 + 195 + var _ automod.IdentityRuleFunc = BadWordHandleRule 196 + 197 + func BadWordDIDRule(c *automod.AccountContext) error { 198 + if c.Account.Identity.DID.Method() == "plc" { 199 + return nil 200 + } 201 + word := keyword.SlugContainsExplicitSlur(keyword.Slugify(c.Account.Identity.DID.String())) 202 + if word != "" { 203 + c.AddAccountFlag("bad-word-did") 204 + // TODO: c.ReportRecord(automod.ReportReasonRude, fmt.Sprintf("bad word in DID (account identifier): %s", word)) 205 + c.Notify("slack") 206 + return nil 207 + } 208 + 209 + tokens := keyword.TokenizeIdentifier(c.Account.Identity.DID.String()) 210 + for _, tok := range tokens { 211 + if c.InSet("bad-words", tok) { 212 + c.AddAccountFlag("bad-word-did") 213 + // TODO: c.ReportRecord(automod.ReportReasonRude, fmt.Sprintf("bad word in DID (account identifier): %s", tok)) 214 + c.Notify("slack") 215 + break 216 + } 217 + } 218 + 219 + return nil 220 + } 221 + 222 + var _ automod.IdentityRuleFunc = BadWordDIDRule
+102
automod/rules/keyword_test.go
··· 1 + package rules 2 + 3 + import ( 4 + "context" 5 + "testing" 6 + 7 + appbsky "github.com/bluesky-social/indigo/api/bsky" 8 + "github.com/bluesky-social/indigo/atproto/identity" 9 + "github.com/bluesky-social/indigo/atproto/syntax" 10 + "github.com/bluesky-social/indigo/automod" 11 + "github.com/bluesky-social/indigo/automod/engine" 12 + 13 + "github.com/stretchr/testify/assert" 14 + ) 15 + 16 + func TestBadWordHandleRule(t *testing.T) { 17 + assert := assert.New(t) 18 + ctx := context.Background() 19 + 20 + eng := engine.EngineTestFixture() 21 + am1 := automod.AccountMeta{ 22 + Identity: &identity.Identity{ 23 + DID: syntax.DID("did:plc:abc111"), 24 + Handle: syntax.Handle("handle.example.com"), 25 + }, 26 + } 27 + am2 := automod.AccountMeta{ 28 + Identity: &identity.Identity{ 29 + DID: syntax.DID("did:plc:abc222"), 30 + Handle: syntax.Handle("hardr.example.com"), 31 + }, 32 + } 33 + am3 := automod.AccountMeta{ 34 + Identity: &identity.Identity{ 35 + DID: syntax.DID("did:plc:abc333"), 36 + Handle: syntax.Handle("f.agg.ot"), 37 + }, 38 + } 39 + 40 + ac1 := engine.NewAccountContext(ctx, &eng, am1) 41 + assert.NoError(BadWordHandleRule(&ac1)) 42 + eff1 := engine.ExtractEffects(&ac1.BaseContext) 43 + assert.Empty(eff1.RecordFlags) 44 + 45 + ac2 := engine.NewAccountContext(ctx, &eng, am2) 46 + assert.NoError(BadWordHandleRule(&ac2)) 47 + eff2 := engine.ExtractEffects(&ac2.BaseContext) 48 + assert.Equal([]string{"bad-word-handle"}, eff2.AccountFlags) 49 + 50 + ac3 := engine.NewAccountContext(ctx, &eng, am3) 51 + assert.NoError(BadWordHandleRule(&ac3)) 52 + eff3 := engine.ExtractEffects(&ac3.BaseContext) 53 + assert.Equal([]string{"bad-word-handle"}, eff3.AccountFlags) 54 + } 55 + 56 + func TestBadWordPostRule(t *testing.T) { 57 + assert := assert.New(t) 58 + ctx := context.Background() 59 + 60 + eng := engine.EngineTestFixture() 61 + am1 := automod.AccountMeta{ 62 + Identity: &identity.Identity{ 63 + DID: syntax.DID("did:plc:abc111"), 64 + Handle: syntax.Handle("handle.example.com"), 65 + }, 66 + } 67 + 68 + // record key 69 + cid1 := syntax.CID("cid123") 70 + p1 := appbsky.FeedPost{ 71 + Text: "some post blah", 72 + } 73 + op := engine.RecordOp{ 74 + Action: engine.CreateOp, 75 + DID: am1.Identity.DID, 76 + Collection: syntax.NSID("app.bsky.feed.post"), 77 + RecordKey: syntax.RecordKey("fagg0t"), 78 + CID: &cid1, 79 + Value: p1, 80 + } 81 + c1 := engine.NewRecordContext(ctx, &eng, am1, op) 82 + assert.NoError(BadWordRecordKeyRule(&c1)) 83 + eff1 := engine.ExtractEffects(&c1.BaseContext) 84 + assert.Equal([]string{"bad-word-recordkey"}, eff1.RecordFlags) 85 + 86 + // token in body 87 + p2 := appbsky.FeedPost{ 88 + Text: "some post hardestr blah", 89 + } 90 + op2 := engine.RecordOp{ 91 + Action: engine.CreateOp, 92 + DID: am1.Identity.DID, 93 + Collection: syntax.NSID("app.bsky.feed.post"), 94 + RecordKey: syntax.RecordKey("abc123"), 95 + CID: &cid1, 96 + Value: p1, 97 + } 98 + c2 := engine.NewRecordContext(ctx, &eng, am1, op2) 99 + assert.NoError(BadWordPostRule(&c2, &p2)) 100 + eff2 := engine.ExtractEffects(&c2.BaseContext) 101 + assert.Equal([]string{"bad-word-text"}, eff2.RecordFlags) 102 + }