Merge branch 'main' into add-risk-atlas-nexus · roost.tools/awesome-safety-tools@071bcfa

roost.tools / awesome-safety-tools

fork

Mirror of https://github.com/roostorg/awesome-safety-tools

fork

+12 -1

2 changed files

expand all

.github

pull_request_template.md

README.md

+8 -1

.github/pull_request_template.md

··· 1 -  1 +

README.md

··· 65 65 * Python framework that helps build safe AI applications checking input/output for predefined risks 66 66 * [Kanana Safeguard By Kakao](https://huggingface.co/kakaocorp/kanana-safeguard-8b) 67 67 * harmful content detection model based on Kanana 8B 68 + * [Granite Guardian by IBM Research](https://github.com/ibm-granite/granite-guardian) 69 + * an input-output guardrail for detecting harms in a variety of use cases (general harm, RAG settings, agentic workflows, etc.) 68 70 * [Llama Guard by Meta](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3) 69 71 * AI-powered content moderation model to detect harm in text-based interactions 70 72 * [Llama Prompt Guard 2 by Meta](https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Prompt-Guard-2/86M/MODEL_CARD.md) ··· 184 186 185 187 * [Aegis Content Safety by NVIDIA](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) 186 188 * dataset created by NVIDIA to aid in content moderation and toxicity detection 189 + * [badwords by Richard Hughes](https://github.com/hughsie/badwords) 190 + * simple list of bad words in different locales that can be used to flag suspicious user-submitted content 187 191 * [Toxic Chat by LMSYS](https://huggingface.co/datasets/lmsys/toxic-chat) 188 192 * dataset of toxic conversations collected from interactions with Vicuna 189 193 * [Toxicity by Jigsaw](https://huggingface.co/datasets/google/jigsaw_toxicity_pred)

Configure Feed

Configure Feed