Mirror of https://github.com/roostorg/awesome-safety-tools
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Update README.md

+6
+6
README.md
··· 46 46 * machine learning model that detects and moderates harmful content in real-time voice chat on Roblox. Focuses on spoken language detection. 47 47 * [Detoxify by Unitary AI](https://github.com/unitaryai/detoxify) 48 48 * detects and mitigates generalized toxic language (including hate speech, harassment, bullying) in text 49 + * [Toxic Prompt RoBERTa by Intel](https://huggingface.co/Intel/toxic-prompt-roberta) 50 + * a BERT-based model for detecting toxic content in prompts to language models 49 51 * [NSFW Filtering](https://github.com/nsfw-filter/nsfw-filter) 50 52 * browser extension to block explicit images from online platforms. User facing. 51 53 * [NSFW Keras Model](https://github.com/GantMan/nsfw_model) ··· 131 133 * a dataset created by NVIDIA to aid in content moderation and toxicity detection 132 134 * [Toxicity by Jigsaw](https://huggingface.co/datasets/google/jigsaw_toxicity_pred) 133 135 * a large number of Wikipedia comments which have been labeled by human raters for toxic behavior 136 + * [Toxic Chat by LMSYS](https://huggingface.co/datasets/lmsys/toxic-chat) 137 + * a dataset of toxic conversations collected from interactions with Vicuna 134 138 135 139 ## Red Teaming Datasets 136 140 * [Red Team Resistance Leaderboard](https://huggingface.co/spaces/HaizeLabs/red-teaming-resistance-benchmark) 137 141 * rankings of AI models based on resistance to adversarial attacks. 142 + * [JailbreakHub by WalledAI](https://huggingface.co/datasets/walledai/JailbreakHub) 143 + * a collection of jailbreak prompts and corresponding model responses 138 144 * [SidFeel Jailbreak Dataset](https://github.com/sidfeels/PromptsDB) 139 145 * a collection of prompts used for jailbreaking AI models. 140 146 * [HackAPrompt Jailbreak Dataset](https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset/viewer/default/train?p=1&row=137)