Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
-
Updated
Mar 13, 2025 - Python
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
🐢 Open-Source Evaluation & Testing for AI & LLM systems
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
[ICML 2024] TrustLLM: Trustworthiness in Large Language Models
The open-sourced Python toolbox for backdoor attacks and defenses.
Deliver safe & effective language models
Moonshot - A simple and modular tool to evaluate and red-team any LLM application.
[NeurIPS-2023] Annual Conference on Neural Information Processing Systems
[NeurIPS'24] "Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration"
🚀 A fast safe reinforcement learning library in PyTorch
A comprehensive toolbox for model inversion attacks and defenses, which is easy to get started.
Code of the paper: A Recipe for Watermarking Diffusion Models
A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
Neural Network Verification Software Tool
Official code repo for the O'Reilly Book - Machine Learning for High-Risk Applications
A toolkit for tools and techniques related to the privacy and compliance of AI models.
TrustEval: A modular and extensible toolkit for comprehensive trust evaluation of generative foundation models (GenFMs)
The official implementation for ICLR23 paper "GNNSafe: Energy-based Out-of-Distribution Detection for Graph Neural Networks"
Add a description, image, and links to the trustworthy-ai topic page so that developers can more easily learn about it.
To associate your repository with the trustworthy-ai topic, visit your repo's landing page and select "manage topics."