🐢 Open-Source Evaluation & Testing for ML & LLM systems
-
Updated
Nov 8, 2024 - Python
🐢 Open-Source Evaluation & Testing for ML & LLM systems
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Aligning AI With Shared Human Values (ICLR 2021)
Deliver safe & effective language models
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022
[NeurIPS '23 Spotlight] Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
[CCS'24] SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models
RuLES: a benchmark for evaluating rule-following in language models
Code accompanying the paper Pretraining Language Models with Human Preferences
Attack to induce LLMs within hallucinations
[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use
An unrestricted attack based on diffusion models that can achieve both good transferability and imperceptibility.
[SafeAI'21] Feature Space Singularity for Out-of-Distribution Detection.
LAMBDA is a model-based reinforcement learning agent that uses Bayesian world models for safe policy optimization
[ICCV2021 Oral] Fooling LiDAR by Attacking GPS Trajectory
A project to add scalable state-of-the-art out-of-distribution detection (open set recognition) support by changing two lines of code! Perform efficient inferences (i.e., do not increase inference time) and detection without classification accuracy drop, hyperparameter tuning, or collecting additional data.
LangFair is a Python library for conducting use-case level LLM bias and fairness assessments
Scan your AI/ML models for problems before you put them into production.
[Findings of EMNLP 2022] Holistic Sentence Embeddings for Better Out-of-Distribution Detection
Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.
To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."