-
Updated
May 12, 2024 - Python
ai-safety
Here are 47 public repositories matching this topic...
[Findings of EMNLP 2022] Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks
-
Updated
Feb 26, 2023 - Python
LLMs evaluation tool for robustness, consistency, and credibility
-
Updated
Aug 30, 2023 - Python
DPLL(T)-based Verification tool for DNNs
-
Updated
Aug 6, 2024 - Python
NeurIPS workshop : We examine the risk of powerful malignant intelligent actors spreading their influence over networks of agents with varying intelligence and motivations.
-
Updated
Dec 11, 2023 - Python
The Model Library is a project that maps the risks associated with modern machine learning systems.
-
Updated
Apr 4, 2024 - Python
A Python-based toolkit for comparing transformers.
-
Updated
Jul 4, 2024 - Python
Code for our paper "Model-less Is the Best Model: Generating Pure Code Implementations to Replace On-Device DL Models" that has been accepted by ISSTA'24
-
Updated
Mar 31, 2024 - Python
用于检测图像中不良内容的深度学习模型,对输入图像进行暴力和非暴力的二分类,并通过AIGC图像、对抗样本和加噪图像进行了增强。
-
Updated
Jun 22, 2024 - Python
a prototype for an AI safety library that allows an agent to maximize its reward by solving a puzzle in order to prevent the worst-case outcomes of perverse instantiation
-
Updated
Nov 8, 2022 - Python
Stubborn: An Environment for Evaluating Stubbornness between Agents with Aligned Incentives
-
Updated
Jun 4, 2023 - Python
The official implementation of the paper "Data Contamination Calibration for Black-box LLMs" (ACL 2024)
-
Updated
May 21, 2024 - Python
A novel physical adversarial attack tackling the Digital-to-Physical Visual Inconsistency problem.
-
Updated
May 23, 2024 - Python
[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
-
Updated
Jun 6, 2024 - Python
QROA: A Black-Box Query-Response Optimization Attack on LLMs
-
Updated
Aug 7, 2024 - Python
An educational resource to help anyone learn safe reinforcement learning, inspired by openai/spinningup
-
Updated
May 18, 2022 - Python
LAWLIA is an open-source computational legal framework designed to revolutionize legal reasoning and analysis. It combines the power of large language models with a structured syntactical grammar to facilitate precise legal assessments, truth values, and verdicts. LAWLIA is the future of computational jurisprudence
-
Updated
Dec 6, 2023 - Python
Hardened AI Assurance reference platform
-
Updated
Jan 23, 2023 - Python
Improve this page
Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."