···11# awesome-safety-tools
22A curated collection of open source tools for online safety
33+44+55+Inspired by prior work like [Awesome Redteaming](https://github.com/yeyintminthuhtut/Awesome-Red-Teaming/) and [Awesome Phishing](https://github.com/PhishyAlice/awesome-phishing).
66+77+Help and contribute by adding a pull request to add more resources and tools!
88+99+1010+## Hash Matching
1111+* [Hasher Matcher Action (HMA) by Meta](https://github.com/facebook/ThreatExchange/tree/main/hasher-matcher-actioner)
1212+ * hashing algorithm, matching function, and ability to hook into actions
1313+* [PDQ by Meta](https://github.com/facebook/ThreatExchange/tree/main/pdq)
1414+ * perceptual hash algorithm for images
1515+* [TMK by Meta](https://github.com/facebook/ThreatExchange/tree/main/tmk)
1616+ * visual similarity match for videos
1717+* [VPDQ](https://github.com/facebook/ThreatExchange/tree/main/vpdq)
1818+ * visual similarity match for videos using PDQ algorithm
1919+* [Hasher-Matcher-Actioner (CLIP demo)](https://github.com/juanmrad/HMA-CLIP-demo)
2020+ * HMA extension for CLIP as reference for adding other format extensins
2121+* [Perception ](https://github.com/thorn-oss/perception)
2222+ * provides a common wrapper around existing, popular perceptual hashes (such as those implemented by ImageHash)
2323+* [Altitude by Jigsaw](https://github.com/jigsaw-code/altitude)
2424+ * web UI and hash matching for violent extremism and terrorism content
2525+* [Lattice Extract by Adobe](https://github.com/adobe/lattice_extract)
2626+ * grid and lattice detection to guard against FP in hash matching
2727+* [RocketChat CSAM](https://github.com/prostasia/rocketchatcsam)
2828+ * CSAM hash matching for RocketChat
2929+* [MediaModeration (Wiki Extension)](https://github.com/wikimedia/mediawiki-extensions-MediaModeration?tab=readme-ov-file)
3030+ * CSAM hash matching for Wikimedia
3131+3232+## Classification
3333+* [OSmod by Jigsaw](https://github.com/conversationai/conversationai-moderator)
3434+ * toolkit of machine learning (ML) tools, models, and APIs that platforms can use to moderate content
3535+* [Perspective API by Jigsaw](https://github.com/conversationai/perspectiveapi)
3636+ * machine learning-powered tool that helps platforms detect and assess the toxicity of online conversations
3737+* [Presidio by Microsoft](https://github.com/microsoft/presidio)
3838+ * toolset for detecting Personal Identifiable Information (PII) and other sensitive data in images and text
3939+* [Llama Guard by Meta](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3)
4040+ * AI-powered content moderation model to detect harm in text-based interactions
4141+* [Purple Llama by Meta](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3)
4242+ * set of tools to assess and improve LLM security. Includes Llama Guard, CyberSec Eval, and Code Shield
4343+* [ShieldGemma by Google DeepMind](https://www.kaggle.com/code/fernandosr85/shieldgemma-web-content-safety-analyzer?scriptVersionId=198456916)
4444+ * AI safety toolkit by Google DeepMind designed to help detect and mitigate harmful or unsafe outputs in LLM applications
4545+* [Roblox Voice Safety Classifier](https://github.com/Roblox/voice-safety-classifier)
4646+ * machine learnign model that detects and moderates harmful content in real-time voice chat on Roblox. Focuses on spoken language detection.
4747+* [Detoxify by Unitary AI](https://github.com/unitaryai/detoxify)
4848+ * detects and mitigates generalized toxic language (including hate speech, harassment, bullying) in text
4949+* [NSFW Filtering](https://github.com/nsfw-filter/nsfw-filter)
5050+ * browser extension to block explicit images from online platforms. User facing.
5151+* [NSFW Keras Model](https://github.com/GantMan/nsfw_model)
5252+ * convoluted neural network (CNN) based explicit image ML model
5353+* [Guardrails AI](https://github.com/guardrails-ai/guardrails)
5454+ * a Python framework that helps build safe AI applications checking input/output for predefined risks
5555+5656+## Core Infrastructure
5757+* [Mjolnir by Matrix](https://github.com/matrix-org/mjolnir)
5858+ * moderation bot for the Matrix protocol that automatically enforces content policies
5959+* [AbuseIO](https://github.com/AbuseIO/AbuseIO)
6060+ * abuse management platform designed to help organizations handle and track abuse complaints related to online content, infrastructure, or services
6161+* [Ozone by Bluesky](https://github.com/bluesky-social/ozone)
6262+ * labeling tool designed for Bluesky. Includes moderation features to action on abuse flags, policy enforcement tools, and investigation features
6363+* [Open Truss by Github](https://github.com/open-truss/open-truss)
6464+ * framework designed to help users create internal tools without needing to write code
6565+* [Access by Discord](https://github.com/discord/access)
6666+ * a centralized portal for managing access to internal systems within any organization
6767+6868+## Clustering
6969+* [SpamAssassin by Apache](https://spamassassin.apache.org)
7070+ * anti-spam platform that uses a variety of techniques, including text analysis, Bayesian filtering, and DNS blocklists, to classify and block unsolicited email
7171+* [scikit-learn](https://github.com/scikit-learn/scikit-learn)
7272+ * python library including clustering through various algorithms, such as K-Means, DBSCAN, and hierarchical clustering
7373+7474+7575+## Rules Engines
7676+* [RulesEngine by Microsoft](https://microsoft.github.io/RulesEngine/)
7777+ * a library for abstracting business logic, rules, and policies from a system via JSON for .NET language families
7878+* [Marble](https://github.com/checkmarble/marble)
7979+ * a real-time fraud detection and compliance engine tailored for fintech companies and financial institutions
8080+* [Automod](https://github.com/bluesky-social/indigo/tree/main/automod)
8181+ * a tool for automating content moderation processes for the Bluesky social network and other apps on the AT Protocol
8282+* [Wikimedia Smite Spam](https://github.com/wikimedia/mediawiki-extensions-SmiteSpam)
8383+ * an extension for MediaWiki that helps identify and manage spam content on a wiki
8484+8585+8686+## Review
8787+* [RabbitMQ](https://github.com/rabbitmq)
8888+ * a message broker that enables applications to communicate with each other by sending messages through queues
8989+* [BullMQ](https://github.com/taskforcesh/bullmq)
9090+ * message queue and batch processing for NodeJS and Python based on Redis
9191+* [Owlculus](https://github.com/be0vlk/owlculus)
9292+ * an OSINT (Open-Source Intelligence) toolkit and case management platform
9393+* [NCMEC Reporting by ello](https://github.com/ello/ncmec_reporting)
9494+ * a Ruby client library for reporting incidents to the National Center for Missing & Exploited Children (NCMEC) CyberTipline
9595+9696+## Threat Intelligence
9797+* [ThreatExchange by Meta](https://github.com/facebook/ThreatExchange )
9898+ * a platform that enables organizations to share information about threats, such as malware, phishing attacks, and online safety harms in a structured and privacy-compliant manner
9999+* [ThreatExchange Client via PHP](https://github.com/certly/threatexchange)
100100+ * a PHP client for ThreatExchange
101101+* [ThreatExchange via Python](https://github.com/facebook/ThreatExchange/tree/main/python-threatexchange)
102102+ * a Python library for ThreatExchange
103103+* [FediCheck]( https://about.iftas.org/activities/moderation-as-a-service/fedicheck/)
104104+ * a web service designed to assist ActivityPub service providers, such as Mastodon servers
105105+106106+## Datasets
107107+* [Aegis Content Safety by NVIDIA](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0)
108108+ * a dataset created by NVIDIA to aid in content moderation and toxicity detection
109109+* [Toxicity by Jigsaw](https://huggingface.co/datasets/google/jigsaw_toxicity_pred)
110110+ * a large number of Wikipedia comments which have been labeled by human raters for toxic behavior