added tools from UW Social Futures Lab, RoGuard from Roblox, Frankly from ASML, and reorganized decentralized platform tools

+30 -19

1 changed file

expand all

README.md

+30 -19

README.md

··· 1 1 # awesome-safety-tools 2 - A curated collection of open source tools for online safety 2 + A collection of open source tools for online safety 3 3 4 4 5 - Inspired by prior work like [Awesome Redteaming](https://github.com/yeyintminthuhtut/Awesome-Red-Teaming/) and [Awesome Phishing](https://github.com/PhishyAlice/awesome-phishing). 5 + Inspired by prior work like [Awesome Redteaming](https://github.com/yeyintminthuhtut/Awesome-Red-Teaming/) and [Awesome Phishing](https://github.com/PhishyAlice/awesome-phishing). This list is not an endorsement, but rather an attempt to organize and map the available technology ❤️ 6 6 7 7 Help and contribute by adding a pull request to add more resources and tools! 8 8 ··· 34 34 * toolkit of machine learning (ML) tools, models, and APIs that platforms can use to moderate content 35 35 * [Perspective API by Jigsaw](https://github.com/conversationai/perspectiveapi) 36 36 * machine learning-powered tool that helps platforms detect and assess the toxicity of online conversations 37 - * [Llama Guard by Meta](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3) 38 - * AI-powered content moderation model to detect harm in text-based interactions 39 - * [Llama Prompt Guard 2 by Meta](https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Prompt-Guard-2/86M/MODEL_CARD.md) 40 - * Detects prompt injection and jailbreaking attacks in LLM inputs. 41 - * [Purple Llama by Meta](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3) 42 - * set of tools to assess and improve LLM security. Includes Llama Guard, CyberSec Eval, and Code Shield 43 - * [ShieldGemma by Google DeepMind](https://www.kaggle.com/code/fernandosr85/shieldgemma-web-content-safety-analyzer?scriptVersionId=198456916) 44 - * AI safety toolkit by Google DeepMind designed to help detect and mitigate harmful or unsafe outputs in LLM applications 45 37 * [Roblox Voice Safety Classifier](https://github.com/Roblox/voice-safety-classifier) 46 38 * machine learning model that detects and moderates harmful content in real-time voice chat on Roblox. Focuses on spoken language detection. 47 39 * [Detoxify by Unitary AI](https://github.com/unitaryai/detoxify) ··· 52 44 * browser extension to block explicit images from online platforms. User facing. 53 45 * [NSFW Keras Model](https://github.com/GantMan/nsfw_model) 54 46 * convoluted neural network (CNN) based explicit image ML model 47 + * [Private Detector by Bumble](https://github.com/bumble-tech/private-detector) 48 + * a pretrained model for detecting lewd images 49 + 50 + ## AI-powered Guardrails 51 + * [Llama Guard by Meta](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3) 52 + * AI-powered content moderation model to detect harm in text-based interactions 53 + * [Llama Prompt Guard 2 by Meta](https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Prompt-Guard-2/86M/MODEL_CARD.md) 54 + * Detects prompt injection and jailbreaking attacks in LLM inputs. 55 + * [Purple Llama by Meta](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3) 56 + * set of tools to assess and improve LLM security. Includes Llama Guard, CyberSec Eval, and Code Shield 57 + * [ShieldGemma by Google DeepMind](https://www.kaggle.com/code/fernandosr85/shieldgemma-web-content-safety-analyzer?scriptVersionId=198456916) 58 + * AI safety toolkit by Google DeepMind designed to help detect and mitigate harmful or unsafe outputs in LLM applications 55 59 * [Guardrails AI](https://github.com/guardrails-ai/guardrails) 56 60 * a Python framework that helps build safe AI applications checking input/output for predefined risks 57 - * [Private Detector by Bumble](https://github.com/bumble-tech/private-detector) 58 - * a pretrained model for detecting lewd images 61 + * [RoGuard](https://github.com/Roblox/RoGuard-1.0) 62 + * a LLM that helps safeguard unlimited text generation on Roblox 59 63 60 64 ## Privacy Protection 61 65 * [Fawkes Facial De-Recognition Cloaking](https://github.com/Shawn-Shan/fawkes) ··· 69 73 * moderation bot for the Matrix protocol that automatically enforces content policies 70 74 * [AbuseIO](https://github.com/AbuseIO/AbuseIO) 71 75 * abuse management platform designed to help organizations handle and track abuse complaints related to online content, infrastructure, or services 72 - * [Ozone by Bluesky](https://github.com/bluesky-social/ozone) 73 - * labeling tool designed for Bluesky. Includes moderation features to action on abuse flags, policy enforcement tools, and investigation features 74 76 * [Open Truss by Github](https://github.com/open-truss/open-truss) 75 77 * framework designed to help users create internal tools without needing to write code 76 78 * [Access by Discord](https://github.com/discord/access) ··· 105 107 * a library for abstracting business logic, rules, and policies from a system via JSON for .NET language families 106 108 * [Marble](https://github.com/checkmarble/marble) 107 109 * a real-time fraud detection and compliance engine tailored for fintech companies and financial institutions 108 - * [Automod by Bluesky](https://github.com/bluesky-social/indigo/tree/main/automod) 109 - * a tool for automating content moderation processes for the Bluesky social network and other apps on the AT Protocol 110 110 * [Wikimedia Smite Spam](https://github.com/wikimedia/mediawiki-extensions-SmiteSpam) 111 111 * an extension for MediaWiki that helps identify and manage spam content on a wiki 112 112 * [Druid by Apache](https://github.com/apache/druid) ··· 142 142 * An open-source project that builds dashboards for monitoring and analyzing the recommendation algorithms of social networks, with a focus on disinformation and election monitoring. 143 143 * [TikTok Observatory](https://github.com/aiforensics/tkobservatory) 144 144 * An open-source project maintained by [AI Forensics](https://aiforensics.org/) that allows researchers to monitor the promotion and demotion of content by the TikTok reccomendation algorithm. 145 + * [OpenMeasures](https://gitlab.com/openmeasures) 146 + * an open source platform for investigating internet trends 145 147 146 148 147 149 ## Safety Datasets ··· 175 177 * [Jailbreak Prompt Generator AI Model](https://huggingface.co/tsq2000/Jailbreak-generator) 176 178 * AI model that generates jailbreak-style prompts. 177 179 178 - ## Fediverse 180 + ## Decentralized Platforms 179 181 * [FediCheck](https://connect.iftas.org/library/iftas-documentation/fedicheck/) 180 182 * domain moderation tool to assist ActivityPub service providers, such as Mastodon servers, now open-sourced. 181 - 182 183 * [Fediverse Spam Filtering](https://github.com/MarcT0K/Fediverse-Spam-Filtering/ ) 183 184 * a spam filter for Fediverse social media platforms. For now, the current version is only a proof of concept. 184 - 185 185 * [FIRES](https://github.com/fedimod/fires) 186 186 * reference server + protocol for the exchange of moderation adivsories and recommendations 187 + * [Ozone by Bluesky](https://github.com/bluesky-social/ozone) 188 + * labeling tool designed for Bluesky. Includes moderation features to action on abuse flags, policy enforcement tools, and investigation features 189 + * [Automod by Bluesky](https://github.com/bluesky-social/indigo/tree/main/automod) 190 + * a tool for automating content moderation processes for the Bluesky social network and other apps on the AT Protocol 187 191 188 192 ## User Safety Tools 189 193 * [Uli by Tattle](https://github.com/tattle-made/Uli) 190 194 * Software and Resources for Mitigating Online Gender Based Violence in India 195 + * [Frankly by Applied Social Media Lab](https://github.com/berkmancenter/frankly/) 196 + * an online deliberations platform that allows anyone to host video-enabled conversations about any topic 197 + * [PolicyKit by UW Social Futures Lab](https://github.com/policykit/policykit) 198 + * a toolkit for building governance in your online community 199 + * [SquadBox by UW Social Futures Lab](https://github.com/amyxzhang/squadbox) 200 + * a tool to help people who are being harassed online by having their friends (or “squad”) moderate their messages 201 + 191 202 192 203

Configure Feed

Configure Feed