#

ai-safety

Here are 95 public repositories matching this topic...

blandfort / perspectival

A Python-based toolkit for comparing transformers.

transformer ai-safety generative-ai

Updated Jun 13, 2024
Python

PKU-Alignment / safe-rlhf

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Updated Jun 13, 2024
Python

giskard

Giskard-AI / giskard

🐢 Open-Source Evaluation & Testing for LLMs and ML models

Updated Jun 13, 2024
Python

normster / llm_rules

RuLES: a benchmark for evaluating rule-following in language models

ai-safety ai-security gpt-4

Updated Jun 13, 2024
Python

levitation-opensource / ai-safety-gridworlds

Extended, multi-agent and multi-objective (MaMoRL) environments based on DeepMind's AI Safety Gridworlds. This is a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. It is made compatible with OpenAI's Gym/Gymnasium and Farama Foundation PettingZoo.

Updated Jun 12, 2024
Python

IQTLabs / daisybell

Scan your AI/ML models for problems before you put them into production.

cybersecurity ai-safety bias-correction bias-detection ai-alignment model-poison ai-assurance

Updated Jun 13, 2024
Python

jphall663 / awesome-machine-learning-interpretability

A curated list of awesome responsible machine learning resources.

Updated Jun 11, 2024

StampyAI / stampy-ui

AI Safety Q&A web frontend

Updated Jun 13, 2024
TypeScript

Nkluge-correa / Aira

Aira is a series of chatbots developed as an experimentation playground for value alignment.

natural-language-processing ai chatbot alignment language-model ai-safety

Updated Jun 10, 2024
Jupyter Notebook

PKU-Alignment / llms-resist-alignment

Repo for paper "Language Models Resist Alignment"

alignment llama safe alpaca ai-safety vicuna llm llms rlhf safe-rlhf llama2 llama3

Updated Jun 9, 2024
Python

moonwatcher-ai / moonwatcher

Evaluation & testing framework for computer vision models

computer-vision ai-safety ethical-artificial-intelligence ai-security mlops ml-safety ml-validation trustworthy-ai ml-testing

Updated Jun 7, 2024
Python

erfanshayegani / Jailbreak-In-Pieces

[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models

alignment ai-safety vlm llm vision-language-models cross-modality-safety-alignment multi-modal-models

Updated Jun 6, 2024
Python

ztjona / ztjona.github.io

My personal website.

machine-learning deep-learning ai-safety

Updated Jun 5, 2024
HTML

riceissa / aiwatch

Website to track people, organizations, and products (tools, websites, etc.) in AI safety

mysql php database dataset ai-safety data-portal aisafety ai-alignment

Updated Jun 13, 2024
HTML

dynaroars / neuralsat

DPLL(T)-based Verification tool for DNNs

abstraction sat-solver software-verification ai-safety robustness dpll adversarial-attacks robustness-verification dnn-verification ai-assurance neural-network-veri

Updated Jun 13, 2024
Python

wesg52 / universal-neurons

Universal Neurons in GPT2 Language Models

ai-safety interpretability llm mechanistic-interpretability

Updated May 28, 2024
Jupyter Notebook

SafeAILab / RAIN

[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning

alignment ai-safety large-language-models

Updated May 23, 2024
Python

WindVChen / VCO-AP

A novel physical adversarial attack tackling the Digital-to-Physical Visual Inconsistency problem.

remote-sensing object-detection ai-safety adversarial-attacks physical-attacks oriented-object-detection adversarial-patches physical-adversarial-attacks

Updated May 23, 2024
Python

yyy01 / PAC

The official implementation of the paper "Data Contamination Calibration for Black-box LLMs" (ACL 2024)

nlp machine-learning ai-safety data-contamination membership-inference-attack large-language-models

Updated May 21, 2024
Python

zhoumingyi / ModelObfuscator

Code for our paper "Modelobfuscator: Obfuscating Model Information to Protect Deployed ML-Based Systems" that has been published by ISSTA'23

obfuscation deep-learning ai-safety

Updated May 18, 2024
C++

Improve this page

Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."