···4646 * machine learning model that detects and moderates harmful content in real-time voice chat on Roblox. Focuses on spoken language detection.
4747* [Detoxify by Unitary AI](https://github.com/unitaryai/detoxify)
4848 * detects and mitigates generalized toxic language (including hate speech, harassment, bullying) in text
4949+* [Toxic Prompt RoBERTa by Intel](https://huggingface.co/Intel/toxic-prompt-roberta)
5050+ * a BERT-based model for detecting toxic content in prompts to language models
4951* [NSFW Filtering](https://github.com/nsfw-filter/nsfw-filter)
5052 * browser extension to block explicit images from online platforms. User facing.
5153* [NSFW Keras Model](https://github.com/GantMan/nsfw_model)
···131133 * a dataset created by NVIDIA to aid in content moderation and toxicity detection
132134* [Toxicity by Jigsaw](https://huggingface.co/datasets/google/jigsaw_toxicity_pred)
133135 * a large number of Wikipedia comments which have been labeled by human raters for toxic behavior
136136+* [Toxic Chat by LMSYS](https://huggingface.co/datasets/lmsys/toxic-chat)
137137+ * a dataset of toxic conversations collected from interactions with Vicuna
134138135139## Red Teaming Datasets
136140* [Red Team Resistance Leaderboard](https://huggingface.co/spaces/HaizeLabs/red-teaming-resistance-benchmark)
137141 * rankings of AI models based on resistance to adversarial attacks.
142142+* [JailbreakHub by WalledAI](https://huggingface.co/datasets/walledai/JailbreakHub)
143143+ * a collection of jailbreak prompts and corresponding model responses
138144* [SidFeel Jailbreak Dataset](https://github.com/sidfeels/PromptsDB)
139145 * a collection of prompts used for jailbreaking AI models.
140146* [HackAPrompt Jailbreak Dataset](https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset/viewer/default/train?p=1&row=137)