Mirror of https://github.com/roostorg/awesome-safety-tools
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

added tools from UW Social Futures Lab, RoGuard from Roblox, Frankly from ASML, and reorganized decentralized platform tools

+30 -19
+30 -19
README.md
··· 1 1 # awesome-safety-tools 2 - A curated collection of open source tools for online safety 2 + A collection of open source tools for online safety 3 3 4 4 5 - Inspired by prior work like [Awesome Redteaming](https://github.com/yeyintminthuhtut/Awesome-Red-Teaming/) and [Awesome Phishing](https://github.com/PhishyAlice/awesome-phishing). 5 + Inspired by prior work like [Awesome Redteaming](https://github.com/yeyintminthuhtut/Awesome-Red-Teaming/) and [Awesome Phishing](https://github.com/PhishyAlice/awesome-phishing). This list is not an endorsement, but rather an attempt to organize and map the available technology ❤️ 6 6 7 7 Help and contribute by adding a pull request to add more resources and tools! 8 8 ··· 34 34 * toolkit of machine learning (ML) tools, models, and APIs that platforms can use to moderate content 35 35 * [Perspective API by Jigsaw](https://github.com/conversationai/perspectiveapi) 36 36 * machine learning-powered tool that helps platforms detect and assess the toxicity of online conversations 37 - * [Llama Guard by Meta](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3) 38 - * AI-powered content moderation model to detect harm in text-based interactions 39 - * [Llama Prompt Guard 2 by Meta](https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Prompt-Guard-2/86M/MODEL_CARD.md) 40 - * Detects prompt injection and jailbreaking attacks in LLM inputs. 41 - * [Purple Llama by Meta](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3) 42 - * set of tools to assess and improve LLM security. Includes Llama Guard, CyberSec Eval, and Code Shield 43 - * [ShieldGemma by Google DeepMind](https://www.kaggle.com/code/fernandosr85/shieldgemma-web-content-safety-analyzer?scriptVersionId=198456916) 44 - * AI safety toolkit by Google DeepMind designed to help detect and mitigate harmful or unsafe outputs in LLM applications 45 37 * [Roblox Voice Safety Classifier](https://github.com/Roblox/voice-safety-classifier) 46 38 * machine learning model that detects and moderates harmful content in real-time voice chat on Roblox. Focuses on spoken language detection. 47 39 * [Detoxify by Unitary AI](https://github.com/unitaryai/detoxify) ··· 52 44 * browser extension to block explicit images from online platforms. User facing. 53 45 * [NSFW Keras Model](https://github.com/GantMan/nsfw_model) 54 46 * convoluted neural network (CNN) based explicit image ML model 47 + * [Private Detector by Bumble](https://github.com/bumble-tech/private-detector) 48 + * a pretrained model for detecting lewd images 49 + 50 + ## AI-powered Guardrails 51 + * [Llama Guard by Meta](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3) 52 + * AI-powered content moderation model to detect harm in text-based interactions 53 + * [Llama Prompt Guard 2 by Meta](https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Prompt-Guard-2/86M/MODEL_CARD.md) 54 + * Detects prompt injection and jailbreaking attacks in LLM inputs. 55 + * [Purple Llama by Meta](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3) 56 + * set of tools to assess and improve LLM security. Includes Llama Guard, CyberSec Eval, and Code Shield 57 + * [ShieldGemma by Google DeepMind](https://www.kaggle.com/code/fernandosr85/shieldgemma-web-content-safety-analyzer?scriptVersionId=198456916) 58 + * AI safety toolkit by Google DeepMind designed to help detect and mitigate harmful or unsafe outputs in LLM applications 55 59 * [Guardrails AI](https://github.com/guardrails-ai/guardrails) 56 60 * a Python framework that helps build safe AI applications checking input/output for predefined risks 57 - * [Private Detector by Bumble](https://github.com/bumble-tech/private-detector) 58 - * a pretrained model for detecting lewd images 61 + * [RoGuard](https://github.com/Roblox/RoGuard-1.0) 62 + * a LLM that helps safeguard unlimited text generation on Roblox 59 63 60 64 ## Privacy Protection 61 65 * [Fawkes Facial De-Recognition Cloaking](https://github.com/Shawn-Shan/fawkes) ··· 69 73 * moderation bot for the Matrix protocol that automatically enforces content policies 70 74 * [AbuseIO](https://github.com/AbuseIO/AbuseIO) 71 75 * abuse management platform designed to help organizations handle and track abuse complaints related to online content, infrastructure, or services 72 - * [Ozone by Bluesky](https://github.com/bluesky-social/ozone) 73 - * labeling tool designed for Bluesky. Includes moderation features to action on abuse flags, policy enforcement tools, and investigation features 74 76 * [Open Truss by Github](https://github.com/open-truss/open-truss) 75 77 * framework designed to help users create internal tools without needing to write code 76 78 * [Access by Discord](https://github.com/discord/access) ··· 105 107 * a library for abstracting business logic, rules, and policies from a system via JSON for .NET language families 106 108 * [Marble](https://github.com/checkmarble/marble) 107 109 * a real-time fraud detection and compliance engine tailored for fintech companies and financial institutions 108 - * [Automod by Bluesky](https://github.com/bluesky-social/indigo/tree/main/automod) 109 - * a tool for automating content moderation processes for the Bluesky social network and other apps on the AT Protocol 110 110 * [Wikimedia Smite Spam](https://github.com/wikimedia/mediawiki-extensions-SmiteSpam) 111 111 * an extension for MediaWiki that helps identify and manage spam content on a wiki 112 112 * [Druid by Apache](https://github.com/apache/druid) ··· 142 142 * An open-source project that builds dashboards for monitoring and analyzing the recommendation algorithms of social networks, with a focus on disinformation and election monitoring. 143 143 * [TikTok Observatory](https://github.com/aiforensics/tkobservatory) 144 144 * An open-source project maintained by [AI Forensics](https://aiforensics.org/) that allows researchers to monitor the promotion and demotion of content by the TikTok reccomendation algorithm. 145 + * [OpenMeasures](https://gitlab.com/openmeasures) 146 + * an open source platform for investigating internet trends 145 147 146 148 147 149 ## Safety Datasets ··· 175 177 * [Jailbreak Prompt Generator AI Model](https://huggingface.co/tsq2000/Jailbreak-generator) 176 178 * AI model that generates jailbreak-style prompts. 177 179 178 - ## Fediverse 180 + ## Decentralized Platforms 179 181 * [FediCheck](https://connect.iftas.org/library/iftas-documentation/fedicheck/) 180 182 * domain moderation tool to assist ActivityPub service providers, such as Mastodon servers, now open-sourced. 181 - 182 183 * [Fediverse Spam Filtering](https://github.com/MarcT0K/Fediverse-Spam-Filtering/ ) 183 184 * a spam filter for Fediverse social media platforms. For now, the current version is only a proof of concept. 184 - 185 185 * [FIRES](https://github.com/fedimod/fires) 186 186 * reference server + protocol for the exchange of moderation adivsories and recommendations 187 + * [Ozone by Bluesky](https://github.com/bluesky-social/ozone) 188 + * labeling tool designed for Bluesky. Includes moderation features to action on abuse flags, policy enforcement tools, and investigation features 189 + * [Automod by Bluesky](https://github.com/bluesky-social/indigo/tree/main/automod) 190 + * a tool for automating content moderation processes for the Bluesky social network and other apps on the AT Protocol 187 191 188 192 ## User Safety Tools 189 193 * [Uli by Tattle](https://github.com/tattle-made/Uli) 190 194 * Software and Resources for Mitigating Online Gender Based Violence in India 195 + * [Frankly by Applied Social Media Lab](https://github.com/berkmancenter/frankly/) 196 + * an online deliberations platform that allows anyone to host video-enabled conversations about any topic 197 + * [PolicyKit by UW Social Futures Lab](https://github.com/policykit/policykit) 198 + * a toolkit for building governance in your online community 199 + * [SquadBox by UW Social Futures Lab](https://github.com/amyxzhang/squadbox) 200 + * a tool to help people who are being harassed online by having their friends (or “squad”) moderate their messages 201 + 191 202 192 203