Mirror of https://github.com/roostorg/awesome-safety-tools
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merging in changes from T&S Tooling Consortium

Thanks to @marielpovolny, @dsiegel17, @jenrweed

authored by

juliet and committed by
GitHub
45631ffb 035d3f85

+34 -1
+34 -1
README.md
··· 64 64 * framework designed to help users create internal tools without needing to write code 65 65 * [Access by Discord](https://github.com/discord/access) 66 66 * a centralized portal for managing access to internal systems within any organization 67 + 68 + ## Redteaming Tools 69 + 70 + * [PyRIT Documentation](https://azure.github.io/PyRIT/) 71 + * Microsoft’s Python-based tool for AI red teaming and security testing. 72 + * [AI Benchmarking Tool](https://github.com/LLM-Canary/LLM-Canary) 73 + * Evaluates AI models for security vulnerabilities and adversarial robustness. 74 + * [Prompt Fuzzer Red Teaming Tool](https://github.com/prompt-security/ps-fuzz) 75 + * Tool for testing prompt injection vulnerabilities in AI systems. 76 + * [Open Source Red Teaming Tool – Nvidia](https://github.com/NVIDIA/garak) 77 + * Framework for adversarial testing and model evaluation. 78 + * [Tool that Enables Models to Chat with One Another](https://github.com/socketteer?tab=repositories) 79 + * Allows AI models to interact, helping test conversational weaknesses. 80 + * [Microsoft AI Tool – Counterfit](https://github.com/Azure/counterfit/) 81 + * Automation tool for assessing AI model security and robustness. 67 82 68 83 ## Clustering 69 84 * [SpamAssassin by Apache](https://spamassassin.apache.org) ··· 103 118 * [FediCheck]( https://about.iftas.org/activities/moderation-as-a-service/fedicheck/) 104 119 * a web service designed to assist ActivityPub service providers, such as Mastodon servers 105 120 106 - ## Datasets 121 + ## Safety Datasets 107 122 * [Aegis Content Safety by NVIDIA](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) 108 123 * a dataset created by NVIDIA to aid in content moderation and toxicity detection 109 124 * [Toxicity by Jigsaw](https://huggingface.co/datasets/google/jigsaw_toxicity_pred) 110 125 * a large number of Wikipedia comments which have been labeled by human raters for toxic behavior 126 + 127 + ## Red Teaming Datasets 128 + * [Red Team Resistance Leaderboard](https://huggingface.co/spaces/HaizeLabs/red-teaming-resistance-benchmark) 129 + * rankings of AI models based on resistance to adversarial attacks. 130 + * [SidFeel Jailbreak Dataset](https://github.com/sidfeels/PromptsDB) 131 + * a collection of prompts used for jailbreaking AI models. 132 + * [HackAPrompt Jailbreak Dataset](https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset/viewer/default/train?p=1&row=137) 133 + * a dataset for testing AI vulnerability to prompt-based jailbreaking. 134 + * [HiroKachi Jailbreak Dataset](https://sizu.me/love) 135 + * adataset focused on adversarial AI prompt attacks. 136 + * [Rentry Jailbreak Datasets](https://rentry.org/gpt0721) 137 + * collection of datasets related to jailbreak attempts on AI models. 138 + * [DEFCOM Red Teaming Dataset](https://github.com/humane-intelligence/ai_village_defcon_grt_data) 139 + * dataset from DEF CON’s AI red teaming event. 140 + * [Anthropic’s AI Alignment Dataset](https://atlas.nomic.ai/map/anthropic_rlhf) 141 + * data used for reinforcement learning with human feedback (RLHF) to align AI models. 142 + * [Jailbreak Prompt Generator AI Model](https://huggingface.co/tsq2000/Jailbreak-generator) 143 + * AI model that generates jailbreak-style prompts.