Mirror of https://github.com/roostorg/awesome-safety-tools
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

added resources and links

+108
+108
README.md
··· 1 1 # awesome-safety-tools 2 2 A curated collection of open source tools for online safety 3 + 4 + 5 + Inspired by prior work like [Awesome Redteaming](https://github.com/yeyintminthuhtut/Awesome-Red-Teaming/) and [Awesome Phishing](https://github.com/PhishyAlice/awesome-phishing). 6 + 7 + Help and contribute by adding a pull request to add more resources and tools! 8 + 9 + 10 + ## Hash Matching 11 + * [Hasher Matcher Action (HMA) by Meta](https://github.com/facebook/ThreatExchange/tree/main/hasher-matcher-actioner) 12 + * hashing algorithm, matching function, and ability to hook into actions 13 + * [PDQ by Meta](https://github.com/facebook/ThreatExchange/tree/main/pdq) 14 + * perceptual hash algorithm for images 15 + * [TMK by Meta](https://github.com/facebook/ThreatExchange/tree/main/tmk) 16 + * visual similarity match for videos 17 + * [VPDQ](https://github.com/facebook/ThreatExchange/tree/main/vpdq) 18 + * visual similarity match for videos using PDQ algorithm 19 + * [Hasher-Matcher-Actioner (CLIP demo)](https://github.com/juanmrad/HMA-CLIP-demo) 20 + * HMA extension for CLIP as reference for adding other format extensins 21 + * [Perception ](https://github.com/thorn-oss/perception) 22 + * provides a common wrapper around existing, popular perceptual hashes (such as those implemented by ImageHash) 23 + * [Altitude by Jigsaw](https://github.com/jigsaw-code/altitude) 24 + * web UI and hash matching for violent extremism and terrorism content 25 + * [Lattice Extract by Adobe](https://github.com/adobe/lattice_extract) 26 + * grid and lattice detection to guard against FP in hash matching 27 + * [RocketChat CSAM](https://github.com/prostasia/rocketchatcsam) 28 + * CSAM hash matching for RocketChat 29 + * [MediaModeration (Wiki Extension)](https://github.com/wikimedia/mediawiki-extensions-MediaModeration?tab=readme-ov-file) 30 + * CSAM hash matching for Wikimedia 31 + 32 + ## Classification 33 + * [OSmod by Jigsaw](https://github.com/conversationai/conversationai-moderator) 34 + * toolkit of machine learning (ML) tools, models, and APIs that platforms can use to moderate content 35 + * [Perspective API by Jigsaw](https://github.com/conversationai/perspectiveapi) 36 + * machine learning-powered tool that helps platforms detect and assess the toxicity of online conversations 37 + * [Presidio by Microsoft](https://github.com/microsoft/presidio) 38 + * toolset for detecting Personal Identifiable Information (PII) and other sensitive data in images and text 39 + * [Llama Guard by Meta](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3) 40 + * AI-powered content moderation model to detect harm in text-based interactions 41 + * [Purple Llama by Meta](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3) 42 + * set of tools to assess and improve LLM security. Includes Llama Guard, CyberSec Eval, and Code Shield 43 + * [ShieldGemma by Google DeepMind](https://www.kaggle.com/code/fernandosr85/shieldgemma-web-content-safety-analyzer?scriptVersionId=198456916) 44 + * AI safety toolkit by Google DeepMind designed to help detect and mitigate harmful or unsafe outputs in LLM applications 45 + * [Roblox Voice Safety Classifier](https://github.com/Roblox/voice-safety-classifier) 46 + * machine learnign model that detects and moderates harmful content in real-time voice chat on Roblox. Focuses on spoken language detection. 47 + * [Detoxify by Unitary AI](https://github.com/unitaryai/detoxify) 48 + * detects and mitigates generalized toxic language (including hate speech, harassment, bullying) in text 49 + * [NSFW Filtering](https://github.com/nsfw-filter/nsfw-filter) 50 + * browser extension to block explicit images from online platforms. User facing. 51 + * [NSFW Keras Model](https://github.com/GantMan/nsfw_model) 52 + * convoluted neural network (CNN) based explicit image ML model 53 + * [Guardrails AI](https://github.com/guardrails-ai/guardrails) 54 + * a Python framework that helps build safe AI applications checking input/output for predefined risks 55 + 56 + ## Core Infrastructure 57 + * [Mjolnir by Matrix](https://github.com/matrix-org/mjolnir) 58 + * moderation bot for the Matrix protocol that automatically enforces content policies 59 + * [AbuseIO](https://github.com/AbuseIO/AbuseIO) 60 + * abuse management platform designed to help organizations handle and track abuse complaints related to online content, infrastructure, or services 61 + * [Ozone by Bluesky](https://github.com/bluesky-social/ozone) 62 + * labeling tool designed for Bluesky. Includes moderation features to action on abuse flags, policy enforcement tools, and investigation features 63 + * [Open Truss by Github](https://github.com/open-truss/open-truss) 64 + * framework designed to help users create internal tools without needing to write code 65 + * [Access by Discord](https://github.com/discord/access) 66 + * a centralized portal for managing access to internal systems within any organization 67 + 68 + ## Clustering 69 + * [SpamAssassin by Apache](https://spamassassin.apache.org) 70 + * anti-spam platform that uses a variety of techniques, including text analysis, Bayesian filtering, and DNS blocklists, to classify and block unsolicited email 71 + * [scikit-learn](https://github.com/scikit-learn/scikit-learn) 72 + * python library including clustering through various algorithms, such as K-Means, DBSCAN, and hierarchical clustering 73 + 74 + 75 + ## Rules Engines 76 + * [RulesEngine by Microsoft](https://microsoft.github.io/RulesEngine/) 77 + * a library for abstracting business logic, rules, and policies from a system via JSON for .NET language families 78 + * [Marble](https://github.com/checkmarble/marble) 79 + * a real-time fraud detection and compliance engine tailored for fintech companies and financial institutions 80 + * [Automod](https://github.com/bluesky-social/indigo/tree/main/automod) 81 + * a tool for automating content moderation processes for the Bluesky social network and other apps on the AT Protocol 82 + * [Wikimedia Smite Spam](https://github.com/wikimedia/mediawiki-extensions-SmiteSpam) 83 + * an extension for MediaWiki that helps identify and manage spam content on a wiki 84 + 85 + 86 + ## Review 87 + * [RabbitMQ](https://github.com/rabbitmq) 88 + * a message broker that enables applications to communicate with each other by sending messages through queues 89 + * [BullMQ](https://github.com/taskforcesh/bullmq) 90 + * message queue and batch processing for NodeJS and Python based on Redis 91 + * [Owlculus](https://github.com/be0vlk/owlculus) 92 + * an OSINT (Open-Source Intelligence) toolkit and case management platform 93 + * [NCMEC Reporting by ello](https://github.com/ello/ncmec_reporting) 94 + * a Ruby client library for reporting incidents to the National Center for Missing & Exploited Children (NCMEC) CyberTipline 95 + 96 + ## Threat Intelligence 97 + * [ThreatExchange by Meta](https://github.com/facebook/ThreatExchange ) 98 + * a platform that enables organizations to share information about threats, such as malware, phishing attacks, and online safety harms in a structured and privacy-compliant manner 99 + * [ThreatExchange Client via PHP](https://github.com/certly/threatexchange) 100 + * a PHP client for ThreatExchange 101 + * [ThreatExchange via Python](https://github.com/facebook/ThreatExchange/tree/main/python-threatexchange) 102 + * a Python library for ThreatExchange 103 + * [FediCheck]( https://about.iftas.org/activities/moderation-as-a-service/fedicheck/) 104 + * a web service designed to assist ActivityPub service providers, such as Mastodon servers 105 + 106 + ## Datasets 107 + * [Aegis Content Safety by NVIDIA](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) 108 + * a dataset created by NVIDIA to aid in content moderation and toxicity detection 109 + * [Toxicity by Jigsaw](https://huggingface.co/datasets/google/jigsaw_toxicity_pred) 110 + * a large number of Wikipedia comments which have been labeled by human raters for toxic behavior