ai-alignment

a prototype for an AI safety library that allows an agent to maximize its reward by solving a puzzle in order to prevent the worst-case outcomes of perverse instantiation

failsafe ai-safety anomaly-detection ai-alignment fail-safe

Updated Nov 8, 2022
Python

EveryOneIsGross / bbBOT

Star

bbBOT is a felixble persona based branching binary sentiment chatbot.

openai tree-structure chatbot-framework ai-alignment python-ai openai-chatgpt

Updated Jul 13, 2023
Python

EveryOneIsGross / sinewCHAT

Star

sinewCHAT uses instanced chatbots to emulate neural nodes to enrich and generate positive weighted responses.

ml ai-alignment rrn openai-api

Updated Jul 16, 2023
Python

EveryOneIsGross / areteCHAT

Star

A persona chat based on the VIA Character Strengths. Reads emotional tone and summons appropriate virtue to respond.

sentiment-analysis chatbot via chatbot-framework ai-alignment virtues

Updated Jul 14, 2023
Python

a library designed to shut down an agent exhibiting unexpected behavior providing a potential "mulligan" to human civilization; IN CASE OF FAILURE, DO NOT JUST REMOVE THIS CONSTRAINT AND START IT BACK UP AGAIN

failsafe ai-safety anomaly-detection ai-alignment fail-safe

Updated Oct 30, 2022

ai-fail-safe / gene-drive

Star

a project to ensure that all child processes created by an agent "inherit" the agent's safety controls

failsafe ai-safety ai-alignment fail-safe

Updated Oct 29, 2022

veeara282 / alignment-jam-2024may

Star

Code for our May 2024 AI security evaluation research sprint project

ai-alignment openai-api

Updated Jun 12, 2024
Python

ai-fail-safe / life-span

Star

a project to ensure an artificial agent will eventually reach the end of its existence

failsafe ai-safety ai-alignment fail-safe

Updated Oct 29, 2022

liondw / Signal-Alignment

Star

An initiative to create concise and widely shareable educational resources, infographics, and animated explainers on the latest contributions to the community AI alignment effort. Boosting the signal and moving the community towards finding and building solutions.

education design ai ai-alignment

Updated Jul 9, 2023

RLHFlow / Directional-Preference-Alignment

Star

Directional Preference Alignment

ai-alignment large-language-models rlhf

Updated May 23, 2024

UCSC-VLAA / Sight-Beyond-Text

Star

This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"

alignment vlm ai-alignment vision-language vicuna llm mllm llava llama2

Updated Sep 15, 2023
Python

rmoehn / amplification

Star

An implementation of iterated distillation and amplification

transformer ida supervised-learning ai-safety ai-alignment

Updated Jun 22, 2022
Python

phelps-sg / llm-cooperation

Sponsor

Star

Code and materials for the paper S. Phelps and Y. I. Russell, Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics, working paper, arXiv:2305.07970, May 2023