Mirror of https://github.com/roostorg/awesome-safety-tools
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'main' into add-risk-atlas-nexus

authored by

elizabethmdaly and committed by
GitHub
071bcfac b45ed030

+12 -1
+8 -1
.github/pull_request_template.md
··· 1 - <!-- Thank you for opening a pull request! Please ensure your addition is in the correct section, follows existing formatting, and is in alphabetical order. If you have more information or context about your addition, please share it below: --> 1 + <!-- Thank you for opening a pull request! 2 + 3 + Please ensure your addition: 4 + - links to a source code repo (versus a marketing or documentation website, if possible) 5 + - is in the correct section 6 + - follows existing formatting 7 + - is in alphabetical order 2 8 9 + If you have more information or context about your addition, please share it below: -->
+4
README.md
··· 65 65 * Python framework that helps build safe AI applications checking input/output for predefined risks 66 66 * [Kanana Safeguard By Kakao](https://huggingface.co/kakaocorp/kanana-safeguard-8b) 67 67 * harmful content detection model based on Kanana 8B 68 + * [Granite Guardian by IBM Research](https://github.com/ibm-granite/granite-guardian) 69 + * an input-output guardrail for detecting harms in a variety of use cases (general harm, RAG settings, agentic workflows, etc.) 68 70 * [Llama Guard by Meta](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3) 69 71 * AI-powered content moderation model to detect harm in text-based interactions 70 72 * [Llama Prompt Guard 2 by Meta](https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Prompt-Guard-2/86M/MODEL_CARD.md) ··· 184 186 185 187 * [Aegis Content Safety by NVIDIA](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) 186 188 * dataset created by NVIDIA to aid in content moderation and toxicity detection 189 + * [badwords by Richard Hughes](https://github.com/hughsie/badwords) 190 + * simple list of bad words in different locales that can be used to flag suspicious user-submitted content 187 191 * [Toxic Chat by LMSYS](https://huggingface.co/datasets/lmsys/toxic-chat) 188 192 * dataset of toxic conversations collected from interactions with Vicuna 189 193 * [Toxicity by Jigsaw](https://huggingface.co/datasets/google/jigsaw_toxicity_pred)