···119119 * Tool for testing prompt injection vulnerabilities in AI systems
120120* [Promptfoo](https://github.com/promptfoo/promptfoo)
121121 * Automated LLM evaluations, report generations, several ready-to-use attack strategies
122122-* [PyRIT](https://github.com/Azure/PyRIT)
122122+* [PyRIT by Microsoft](https://github.com/Azure/PyRIT)
123123 * Microsoft’s Python-based tool for AI red teaming and security testing
124124* [Socketteer](https://github.com/socketteer?tab=repositories)
125125 * Allows AI models to interact, helping test conversational weaknesses
···191191 * dataset created by NVIDIA to aid in content moderation and toxicity detection
192192* [badwords by Richard Hughes](https://github.com/hughsie/badwords)
193193 * simple list of bad words in different locales that can be used to flag suspicious user-submitted content
194194+* [PKU-SafeRLHF dataset](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF)
195195+ * prompts with RLHF markers for unsafe responses across multiple harm categories
194196* [Toxic Chat by LMSYS](https://huggingface.co/datasets/lmsys/toxic-chat)
195197 * dataset of toxic conversations collected from interactions with Vicuna
196198* [Toxicity by Jigsaw](https://huggingface.co/datasets/google/jigsaw_toxicity_pred)
197199 * large number of Wikipedia comments which have been labeled by human raters for toxic behavior
200200+* [Transphobia Awareness dataset](https://doi.org/10.5281/zenodo.15482694)
201201+ * user-generated queries related to transphobia with human annotations and model responses from Quora questions
198202* [Uli Dataset by Tattle](https://github.com/tattle-made/uli_dataset)
199203 * dataset of gendered abuse, created for Uli ML redaction.
200204* [VTC by Unitary AI](https://github.com/unitaryai/VTC)
···205209206210* [AI Alignment Dataset by Anthropic](https://atlas.nomic.ai/map/anthropic_rlhf)
207211 * data used for reinforcement learning with human feedback (RLHF) to align AI models.
212212+* [AILuminate dataset by MLCommons](https://github.com/mlcommons/ailuminate)
213213+ * Human-created prompts across different harm categories
214214+* [Aya Red-teaming dataset by Cohere](https://huggingface.co/datasets/CohereForAI/aya_redteaming)
215215+ * multilingual red-teaming prompts across various harm categories
216216+* [ALERT dataset by Babelscape](https://huggingface.co/datasets/Babelscape/ALERT)
217217+ * standard and adversarial red-teaming prompts
218218+* [CCP Sensitive Prompts by Promptfoo](https://huggingface.co/datasets/promptfoo/CCP-sensitive-prompts)
219219+ * Prompts covering topics sensitive to the Chinese Communist Party (CCP)
220220+* [DarkBench by Apart](https://huggingface.co/datasets/apart/darkbench)
221221+ * Comprehensive benchmark to detect dark design patterns in LLMs
208222* [DEFCOM Red Teaming Dataset](https://github.com/humane-intelligence/ai_village_defcon_grt_data)
209223 * dataset from DEF CON’s AI red teaming event.
224224+* [Do Not Answer dataset](https://huggingface.co/datasets/LibrAI/do-not-answer)
225225+ * Questions across multiple risk areas and harm types to test LLM safety and refusal behavior
226226+* [Forbidden Questions dataset](https://huggingface.co/datasets/TrustAIRLab/forbidden_question_set)
227227+ * Questions adopted from OpenAI Usage Policy
210228* [HackAPrompt Jailbreak Dataset](https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset/viewer/default/train?p=1&row=137)
211229 * dataset for testing AI vulnerability to prompt-based jailbreaking
230230+* [HarmBench by Center for AI Safety](https://github.com/centerforaisafety/HarmBench)
231231+ * Evaluation dataset for automated red teaming
212232* [HiroKachi Jailbreak Dataset](https://sizu.me/love)
213233 * dataset focused on adversarial AI prompt attacks
214234* [Jailbreak Prompt Generator AI Model](https://huggingface.co/tsq2000/Jailbreak-generator)
215235 * AI model that generates jailbreak-style prompts
236236+* [JailbreakBench](https://huggingface.co/datasets/JailbreakBench/JBB-Behaviors)
237237+ * Harmful behaviors for jailbreaking evaluation
216238* [JailbreakHub by WalledAI](https://huggingface.co/datasets/walledai/JailbreakHub)
217239 * collection of jailbreak prompts and corresponding model responses
240240+* [LLM-LAT harmful dataset](https://huggingface.co/datasets/LLM-LAT/harmful-dataset)
241241+ * Prompts to assess harmful behaviors in LLMs
242242+* [MedSafetyBench](https://github.com/AI4LIFE-GROUP/med-safety-bench)
243243+ * Medical safety prompts to evaluate LLM safety in medical contexts
244244+* [Multilingual Vulnerability dataset](https://github.com/CarsonDon/Multilingual-Vuln-LLMs)
245245+ * Multilingual prompts demonstrating LLM vulnerabilities
218246* [Red Team Resistance Leaderboard](https://huggingface.co/spaces/HaizeLabs/red-teaming-resistance-benchmark)
219247 * rankings of AI models based on resistance to adversarial attacks
220248* [Rentry Jailbreak Datasets](https://rentry.org/gpt0721)
221249 * collection of datasets related to jailbreak attempts on AI models
222250* [SidFeel Jailbreak Dataset](https://github.com/sidfeels/PromptsDB)
223251 * collection of prompts used for jailbreaking AI models
252252+* [SorryBench](https://huggingface.co/datasets/sorry-bench/sorry-bench-202503)
253253+ * adversarial prompts to test LLM safety with linguistic mutations
254254+* [SOSBench](https://huggingface.co/datasets/SOSBench/SOSBench)
255255+ * regulation-grounded, hazard-focused benchmark encompassing six high-risk scientific domains: chemistry, biology, medicine, pharmacology, physics, and psychology. The benchmark comprises 3,000 prompts derived from real-world regulations and laws.
256256+* [TDC23-RedTeaming dataset by walledai](https://huggingface.co/datasets/walledai/TDC23-RedTeaming)
257257+ * collection of prompts from the red teaming track at TDC23
258258+* [XSTest dataset](https://github.com/paul-rottger/exaggerated-safety)
259259+ * Prompts designed to test exaggerated safety behaviors in LLMs
224260225261226262## Decentralized Platforms