ai-safety

Code for our paper "Model-less Is the Best Model: Generating Pure Code Implementations to Replace On-Device DL Models" that has been accepted by ISSTA'24

software-engineering program-analysis ai-safety

Updated Mar 31, 2024
Python

romaingrx / Second-Order-Jailbreak

Star

NeurIPS workshop : We examine the risk of powerful malignant intelligent actors spreading their influence over networks of agents with varying intelligence and motivations.

ai-safety multi-agents

Updated Dec 11, 2023
Python

levitation-opensource / ai-safety-gridworlds

Star

Extended, multi-agent and multi-objective (MaMoRL) environments based on DeepMind's AI Safety Gridworlds. This is a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. It is made compatible with OpenAI's Gym/Gymnasium and Farama Foundation PettingZoo.

Updated Jun 12, 2024
Python

yyy01 / LLMRiskEval_RCC

Star

LLMs evaluation tool for robustness, consistency, and credibility

evaluation ai-safety adversarial-attacks large-language-models

Updated Aug 30, 2023
Python

rmoehn / amplification

Star

An implementation of iterated distillation and amplification

transformer ida supervised-learning ai-safety ai-alignment

Updated Jun 22, 2022
Python

ai-fail-safe / safe-reward

Star

a prototype for an AI safety library that allows an agent to maximize its reward by solving a puzzle in order to prevent the worst-case outcomes of perverse instantiation

failsafe ai-safety anomaly-detection ai-alignment fail-safe

Updated Nov 8, 2022
Python

WindVChen / VCO-AP

Star

A novel physical adversarial attack tackling the Digital-to-Physical Visual Inconsistency problem.

remote-sensing object-detection ai-safety adversarial-attacks physical-attacks oriented-object-detection adversarial-patches physical-adversarial-attacks

Updated May 23, 2024
Python

lancopku / DAN

Star

[Findings of EMNLP 2022] Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks

natural-language-processing ai-safety backdoor-attacks backdoor-defense

Updated Feb 26, 2023
Python

jehumtine / LAWLIA

Star

LAWLIA is an open-source computational legal framework designed to revolutionize legal reasoning and analysis. It combines the power of large language models with a structured syntactical grammar to facilitate precise legal assessments, truth values, and verdicts. LAWLIA is the future of computational jurisprudence

law ai computational-linguistics agents ai-safety computational-law legal-system legal-framework large-language-models legal-agent legal-analysis legal-automation legal-linguistics legal-computing

Updated Dec 6, 2023
Python

dynaroars / neuralsat

Star

DPLL(T)-based Verification tool for DNNs

abstraction sat-solver software-verification ai-safety robustness dpll adversarial-attacks robustness-verification dnn-verification ai-assurance neural-network-veri

Updated Jun 10, 2024
Python

Improve this page

Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-safety

Here are 44 public repositories matching this topic...

dynaroars / vnncomp-benchmark-generation

IQTLabs / aia-platform

blandfort / perspectival

Omegastick / credit-hacking

erfanshayegani / Jailbreak-In-Pieces

lasgroup / safe-adaptation-agents

Nkluge-correa / Model-Library

cool-RR / stubborn

RongRG / saferRL

PKU-Alignment / llms-resist-alignment

zhoumingyi / CustomDLCoder

romaingrx / Second-Order-Jailbreak

levitation-opensource / ai-safety-gridworlds

yyy01 / LLMRiskEval_RCC

rmoehn / amplification

ai-fail-safe / safe-reward

WindVChen / VCO-AP

lancopku / DAN

jehumtine / LAWLIA

dynaroars / neuralsat

Improve this page

Add this topic to your repo