-
Updated
May 12, 2024 - Python
ai-safety
Here are 44 public repositories matching this topic...
Hardened AI Assurance reference platform
-
Updated
Jan 23, 2023 - Python
A Python-based toolkit for comparing transformers.
-
Updated
Jun 12, 2024 - Python
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models - 🔥 ICLR 2024 Spotlight - 🏆 Best Paper Award SoCal NLP 2023
-
Updated
Jun 6, 2024 - Python
Implementation of adaptive constrained RL algorithms. Child repository of @lasgroup/safe-adaptation-gym
-
Updated
Oct 5, 2022 - Python
The Model Library is a project that maps the risks associated with modern machine learning systems.
-
Updated
Apr 4, 2024 - Python
Stubborn: An Environment for Evaluating Stubbornness between Agents with Aligned Incentives
-
Updated
Jun 4, 2023 - Python
An educational resource to help anyone learn safe reinforcement learning, inspired by openai/spinningup
-
Updated
May 18, 2022 - Python
Code for our paper "Model-less Is the Best Model: Generating Pure Code Implementations to Replace On-Device DL Models" that has been accepted by ISSTA'24
-
Updated
Mar 31, 2024 - Python
NeurIPS workshop : We examine the risk of powerful malignant intelligent actors spreading their influence over networks of agents with varying intelligence and motivations.
-
Updated
Dec 11, 2023 - Python
Extended, multi-agent and multi-objective (MaMoRL) environments based on DeepMind's AI Safety Gridworlds. This is a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. It is made compatible with OpenAI's Gym/Gymnasium and Farama Foundation PettingZoo.
-
Updated
Jun 12, 2024 - Python
LLMs evaluation tool for robustness, consistency, and credibility
-
Updated
Aug 30, 2023 - Python
An implementation of iterated distillation and amplification
-
Updated
Jun 22, 2022 - Python
a prototype for an AI safety library that allows an agent to maximize its reward by solving a puzzle in order to prevent the worst-case outcomes of perverse instantiation
-
Updated
Nov 8, 2022 - Python
A novel physical adversarial attack tackling the Digital-to-Physical Visual Inconsistency problem.
-
Updated
May 23, 2024 - Python
[Findings of EMNLP 2022] Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks
-
Updated
Feb 26, 2023 - Python
LAWLIA is an open-source computational legal framework designed to revolutionize legal reasoning and analysis. It combines the power of large language models with a structured syntactical grammar to facilitate precise legal assessments, truth values, and verdicts. LAWLIA is the future of computational jurisprudence
-
Updated
Dec 6, 2023 - Python
DPLL(T)-based Verification tool for DNNs
-
Updated
Jun 10, 2024 - Python
Improve this page
Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."