ai-safety

A project to add scalable state-of-the-art out-of-distribution detection (open set recognition) support by changing two lines of code! Perform efficient inferences (i.e., do not increase inference time) and detection without classification accuracy drop, hyperparameter tuning, or collecting additional data.

machine-learning deep-learning pytorch ood osr ai-safety open-set anomaly-detection novelty-detection robust-machine-learning open-set-recognition out-of-distribution out-of-distribution-detection ood-detection trustworthy-machine-learning trustworthy-ai

Updated Sep 22, 2022
Python

dlmacedo / distinction-maximization-loss

Star

A project to improve out-of-distribution detection (open set recognition) and uncertainty estimation by changing a few lines of code in your project! Perform efficient inferences (i.e., do not increase inference time) without repetitive model training, hyperparameter tuning, or collecting additional data.

machine-learning deep-learning pytorch classification ood osr uncertainty-estimation ai-safety open-set anomaly-detection novelty-detection robust-machine-learning open-set-recognition out-of-distribution out-of-distribution-detection ood-detection trustworthy-machine-learning trustworthy-ai

Updated Sep 22, 2022
Python

lasgroup / safe-adaptation-agents

Star

Implementation of adaptive constrained RL algorithms. Child repository of @lasgroup/safe-adaptation-gym

machine-learning reinforcement-learning ai-safety meta-learning safe-adaptation

Updated Oct 5, 2022
Python

ai-fail-safe / safe-reward

Star

a prototype for an AI safety library that allows an agent to maximize its reward by solving a puzzle in order to prevent the worst-case outcomes of perverse instantiation

failsafe ai-safety anomaly-detection ai-alignment fail-safe

Updated Nov 8, 2022
Python

LuanAdemi / toumei

Star

An interpretability library for pytorch

python deep-learning pytorch transformer modularity ai-safety interpretability feature-visualization

Updated Dec 31, 2022
Python

yardenas / la-mbda

Star

LAMBDA is a model-based reinforcement learning agent that uses Bayesian world models for safe policy optimization

machine-learning reinforcement-learning deep-learning constrained-optimization ai-safety model-based-reinforcement-learning safe-reinforcement-learning

Updated Jan 16, 2023
Python

IQTLabs / aia-platform

Star

Hardened AI Assurance reference platform

cybersecurity ai-safety devsecops ai-assurance

Updated Jan 23, 2023
Python

lancopku / DAN

Star

[Findings of EMNLP 2022] Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks

natural-language-processing ai-safety backdoor-attacks backdoor-defense

Updated Feb 26, 2023
Python

hendrycks / ethics

Star

Aligning AI With Shared Human Values (ICLR 2021)

ai-safety machine-ethics ml-safety ethical-ai gpt-3

Updated Apr 21, 2023
Python

cool-RR / stubborn

Star

Stubborn: An Environment for Evaluating Stubbornness between Agents with Aligned Incentives

reinforcement-learning deep-learning ai-safety multi-agent-reinforcement-learning

Updated Jun 4, 2023
Python

lancopku / Avg-Avg

Star

[Findings of EMNLP 2022] Holistic Sentence Embeddings for Better Out-of-Distribution Detection

natural-language-processing ai-safety robust-machine-learning ood-detection trustworthy-machine-learning

Updated Jun 14, 2023
Python

yyy01 / LLMRiskEval_RCC

Star

LLMs evaluation tool for robustness, consistency, and credibility

evaluation ai-safety adversarial-attacks large-language-models

Updated Aug 30, 2023
Python

cure-lab / ContraNet

Star

This is the official implementation of ContraNet (NDSS2022).

defense ai-safety adversarial-attacks

Updated Aug 31, 2023
Python

Omegastick / credit-hacking

Star

Eliciting credit hacking behaviours in large language models

ai-safety llm

Updated Sep 14, 2023
Python

microsoft / SafeNLP

Star

Safety Score for Pre-Trained Language Models

nlp ai-safety fairness-ai

Updated Oct 18, 2023
Python

jehumtine / LAWLIA

Star

LAWLIA is an open-source computational legal framework designed to revolutionize legal reasoning and analysis. It combines the power of large language models with a structured syntactical grammar to facilitate precise legal assessments, truth values, and verdicts. LAWLIA is the future of computational jurisprudence

law ai computational-linguistics agents ai-safety computational-law legal-system legal-framework large-language-models legal-agent legal-analysis legal-automation legal-linguistics legal-computing

Updated Dec 6, 2023
Python

Improve this page

Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-safety

Here are 44 public repositories matching this topic...

megvii-research / FSSD_OoD_Detection

RongRG / saferRL

rmoehn / amplification

ai4ce / FLAT

dlmacedo / entropic-out-of-distribution-detection

dlmacedo / distinction-maximization-loss

lasgroup / safe-adaptation-agents

ai-fail-safe / safe-reward

LuanAdemi / toumei

yardenas / la-mbda

IQTLabs / aia-platform

lancopku / DAN

hendrycks / ethics

cool-RR / stubborn

lancopku / Avg-Avg

yyy01 / LLMRiskEval_RCC

cure-lab / ContraNet

Omegastick / credit-hacking

microsoft / SafeNLP

jehumtine / LAWLIA

Improve this page

Add this topic to your repo