#

ai-alignment

Here are 27 public repositories matching this topic...

riceissa / miri-top-contributors

sql ai-safety ai-alignment donations-list-website

Updated Sep 2, 2018
HTML

EveryOneIsGross / bbBOT

bbBOT is a felixble persona based branching binary sentiment chatbot.

openai tree-structure chatbot-framework ai-alignment python-ai openai-chatgpt

Updated Jul 13, 2023
Python

veeara282 / alignment-jam-2024may

Code for our May 2024 AI security evaluation research sprint project

ai-alignment openai-api

Updated Jun 12, 2024
Python

ai-fail-safe / honeypot

a project to detect environment tampering on the part of an agent

failsafe ai-safety anomaly-detection ai-alignment fail-safe

Updated Oct 31, 2022

EveryOneIsGross / sinewCHAT

sinewCHAT uses instanced chatbots to emulate neural nodes to enrich and generate positive weighted responses.

ml ai-alignment rrn openai-api

Updated Jul 16, 2023
Python

EveryOneIsGross / areteCHAT

A persona chat based on the VIA Character Strengths. Reads emotional tone and summons appropriate virtue to respond.

sentiment-analysis chatbot via chatbot-framework ai-alignment virtues

Updated Jul 14, 2023
Python

ai-fail-safe / life-span

a project to ensure an artificial agent will eventually reach the end of its existence

failsafe ai-safety ai-alignment fail-safe

Updated Oct 29, 2022

ai-fail-safe / mulligan

a library designed to shut down an agent exhibiting unexpected behavior providing a potential "mulligan" to human civilization; IN CASE OF FAILURE, DO NOT JUST REMOVE THIS CONSTRAINT AND START IT BACK UP AGAIN

failsafe ai-safety anomaly-detection ai-alignment fail-safe

Updated Oct 30, 2022

ai-fail-safe / gene-drive

a project to ensure that all child processes created by an agent "inherit" the agent's safety controls

failsafe ai-safety ai-alignment fail-safe

Updated Oct 29, 2022

rmoehn / jursey

Q&A system with reflection and automation, similar to Patchwork, Affable, Mosaic

reflection ida datomic hch ai-alignment factored-cognition

Updated Mar 10, 2019
Clojure

rmoehn / amplification

An implementation of iterated distillation and amplification

transformer ida supervised-learning ai-safety ai-alignment

Updated Jun 22, 2022
Python

rmoehn / farlamp

IDA with RL and overseer failures

ida research-project ai-alignment

Updated Jul 31, 2021
TeX

ai-fail-safe / safe-reward

a prototype for an AI safety library that allows an agent to maximize its reward by solving a puzzle in order to prevent the worst-case outcomes of perverse instantiation

failsafe ai-safety anomaly-detection ai-alignment fail-safe

Updated Nov 8, 2022
Python

llm-cooperation

phelps-sg / llm-cooperation

Code and materials for the paper S. Phelps and Y. I. Russell, Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics, working paper, arXiv:2305.07970, May 2023

economics ai-safety gametheory experimental-economics behavioral-economics prisoners-dilemma ai-alignment experimental-psychology social-dilemmas gpt-3 gpt-4 llm principal-agent-problem

Updated Mar 1, 2024
Python

IQTLabs / daisybell

Scan your AI/ML models for problems before you put them into production.

cybersecurity ai-safety bias-correction bias-detection ai-alignment model-poison ai-assurance

Updated Jun 20, 2024
Python

Signal-Alignment

liondw / Signal-Alignment

An initiative to create concise and widely shareable educational resources, infographics, and animated explainers on the latest contributions to the community AI alignment effort. Boosting the signal and moving the community towards finding and building solutions.

education design ai ai-alignment

Updated Jul 9, 2023

UCSC-VLAA / Sight-Beyond-Text

This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"

alignment vlm ai-alignment vision-language vicuna llm mllm llava llama2

Updated Sep 15, 2023
Python

riceissa / aiwatch

Website to track people, organizations, and products (tools, websites, etc.) in AI safety

mysql php database dataset ai-safety data-portal aisafety ai-alignment

Updated Jun 21, 2024
HTML

RLHFlow / Directional-Preference-Alignment

Directional Preference Alignment

ai-alignment large-language-models rlhf

Updated May 23, 2024

wesg52 / sparse-probing-paper

Sparse probing paper full code.

ai-safety interpretability ai-alignment mechanistic-interpretability

Updated Dec 17, 2023
Jupyter Notebook

Improve this page

Add a description, image, and links to the ai-alignment topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-alignment topic, visit your repo's landing page and select "manage topics."