···11# awesome-safety-tools
22-A curated collection of open source tools for online safety
22+A collection of open source tools for online safety
334455-Inspired by prior work like [Awesome Redteaming](https://github.com/yeyintminthuhtut/Awesome-Red-Teaming/) and [Awesome Phishing](https://github.com/PhishyAlice/awesome-phishing).
55+Inspired by prior work like [Awesome Redteaming](https://github.com/yeyintminthuhtut/Awesome-Red-Teaming/) and [Awesome Phishing](https://github.com/PhishyAlice/awesome-phishing). This list is not an endorsement, but rather an attempt to organize and map the available technology ❤️
6677Help and contribute by adding a pull request to add more resources and tools!
88···3434 * toolkit of machine learning (ML) tools, models, and APIs that platforms can use to moderate content
3535* [Perspective API by Jigsaw](https://github.com/conversationai/perspectiveapi)
3636 * machine learning-powered tool that helps platforms detect and assess the toxicity of online conversations
3737-* [Llama Guard by Meta](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3)
3838- * AI-powered content moderation model to detect harm in text-based interactions
3939-* [Llama Prompt Guard 2 by Meta](https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Prompt-Guard-2/86M/MODEL_CARD.md)
4040- * Detects prompt injection and jailbreaking attacks in LLM inputs.
4141-* [Purple Llama by Meta](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3)
4242- * set of tools to assess and improve LLM security. Includes Llama Guard, CyberSec Eval, and Code Shield
4343-* [ShieldGemma by Google DeepMind](https://www.kaggle.com/code/fernandosr85/shieldgemma-web-content-safety-analyzer?scriptVersionId=198456916)
4444- * AI safety toolkit by Google DeepMind designed to help detect and mitigate harmful or unsafe outputs in LLM applications
4537* [Roblox Voice Safety Classifier](https://github.com/Roblox/voice-safety-classifier)
4638 * machine learning model that detects and moderates harmful content in real-time voice chat on Roblox. Focuses on spoken language detection.
4739* [Detoxify by Unitary AI](https://github.com/unitaryai/detoxify)
···5244 * browser extension to block explicit images from online platforms. User facing.
5345* [NSFW Keras Model](https://github.com/GantMan/nsfw_model)
5446 * convoluted neural network (CNN) based explicit image ML model
4747+* [Private Detector by Bumble](https://github.com/bumble-tech/private-detector)
4848+ * a pretrained model for detecting lewd images
4949+5050+## AI-powered Guardrails
5151+* [Llama Guard by Meta](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3)
5252+ * AI-powered content moderation model to detect harm in text-based interactions
5353+* [Llama Prompt Guard 2 by Meta](https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Prompt-Guard-2/86M/MODEL_CARD.md)
5454+ * Detects prompt injection and jailbreaking attacks in LLM inputs.
5555+* [Purple Llama by Meta](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3)
5656+ * set of tools to assess and improve LLM security. Includes Llama Guard, CyberSec Eval, and Code Shield
5757+* [ShieldGemma by Google DeepMind](https://www.kaggle.com/code/fernandosr85/shieldgemma-web-content-safety-analyzer?scriptVersionId=198456916)
5858+ * AI safety toolkit by Google DeepMind designed to help detect and mitigate harmful or unsafe outputs in LLM applications
5559* [Guardrails AI](https://github.com/guardrails-ai/guardrails)
5660 * a Python framework that helps build safe AI applications checking input/output for predefined risks
5757-* [Private Detector by Bumble](https://github.com/bumble-tech/private-detector)
5858- * a pretrained model for detecting lewd images
6161+* [RoGuard](https://github.com/Roblox/RoGuard-1.0)
6262+ * a LLM that helps safeguard unlimited text generation on Roblox
59636064## Privacy Protection
6165* [Fawkes Facial De-Recognition Cloaking](https://github.com/Shawn-Shan/fawkes)
···6973 * moderation bot for the Matrix protocol that automatically enforces content policies
7074* [AbuseIO](https://github.com/AbuseIO/AbuseIO)
7175 * abuse management platform designed to help organizations handle and track abuse complaints related to online content, infrastructure, or services
7272-* [Ozone by Bluesky](https://github.com/bluesky-social/ozone)
7373- * labeling tool designed for Bluesky. Includes moderation features to action on abuse flags, policy enforcement tools, and investigation features
7476* [Open Truss by Github](https://github.com/open-truss/open-truss)
7577 * framework designed to help users create internal tools without needing to write code
7678* [Access by Discord](https://github.com/discord/access)
···105107 * a library for abstracting business logic, rules, and policies from a system via JSON for .NET language families
106108* [Marble](https://github.com/checkmarble/marble)
107109 * a real-time fraud detection and compliance engine tailored for fintech companies and financial institutions
108108-* [Automod by Bluesky](https://github.com/bluesky-social/indigo/tree/main/automod)
109109- * a tool for automating content moderation processes for the Bluesky social network and other apps on the AT Protocol
110110* [Wikimedia Smite Spam](https://github.com/wikimedia/mediawiki-extensions-SmiteSpam)
111111 * an extension for MediaWiki that helps identify and manage spam content on a wiki
112112* [Druid by Apache](https://github.com/apache/druid)
···142142 * An open-source project that builds dashboards for monitoring and analyzing the recommendation algorithms of social networks, with a focus on disinformation and election monitoring.
143143* [TikTok Observatory](https://github.com/aiforensics/tkobservatory)
144144 * An open-source project maintained by [AI Forensics](https://aiforensics.org/) that allows researchers to monitor the promotion and demotion of content by the TikTok reccomendation algorithm.
145145+* [OpenMeasures](https://gitlab.com/openmeasures)
146146+ * an open source platform for investigating internet trends
145147146148147149## Safety Datasets
···175177* [Jailbreak Prompt Generator AI Model](https://huggingface.co/tsq2000/Jailbreak-generator)
176178 * AI model that generates jailbreak-style prompts.
177179178178-## Fediverse
180180+## Decentralized Platforms
179181* [FediCheck](https://connect.iftas.org/library/iftas-documentation/fedicheck/)
180182 * domain moderation tool to assist ActivityPub service providers, such as Mastodon servers, now open-sourced.
181181-182183* [Fediverse Spam Filtering](https://github.com/MarcT0K/Fediverse-Spam-Filtering/ )
183184 * a spam filter for Fediverse social media platforms. For now, the current version is only a proof of concept.
184184-185185* [FIRES](https://github.com/fedimod/fires)
186186 * reference server + protocol for the exchange of moderation adivsories and recommendations
187187+* [Ozone by Bluesky](https://github.com/bluesky-social/ozone)
188188+ * labeling tool designed for Bluesky. Includes moderation features to action on abuse flags, policy enforcement tools, and investigation features
189189+* [Automod by Bluesky](https://github.com/bluesky-social/indigo/tree/main/automod)
190190+ * a tool for automating content moderation processes for the Bluesky social network and other apps on the AT Protocol
187191188192## User Safety Tools
189193* [Uli by Tattle](https://github.com/tattle-made/Uli)
190194 * Software and Resources for Mitigating Online Gender Based Violence in India
195195+* [Frankly by Applied Social Media Lab](https://github.com/berkmancenter/frankly/)
196196+ * an online deliberations platform that allows anyone to host video-enabled conversations about any topic
197197+* [PolicyKit by UW Social Futures Lab](https://github.com/policykit/policykit)
198198+ * a toolkit for building governance in your online community
199199+* [SquadBox by UW Social Futures Lab](https://github.com/amyxzhang/squadbox)
200200+ * a tool to help people who are being harassed online by having their friends (or “squad”) moderate their messages
201201+191202192203