a project to detect environment tampering on the part of an agent
-
Updated
Oct 31, 2022
a project to detect environment tampering on the part of an agent
Q&A system with reflection and automation, similar to Patchwork, Affable, Mosaic
a prototype for an AI safety library that allows an agent to maximize its reward by solving a puzzle in order to prevent the worst-case outcomes of perverse instantiation
bbBOT is a felixble persona based branching binary sentiment chatbot.
sinewCHAT uses instanced chatbots to emulate neural nodes to enrich and generate positive weighted responses.
A persona chat based on the VIA Character Strengths. Reads emotional tone and summons appropriate virtue to respond.
a library designed to shut down an agent exhibiting unexpected behavior providing a potential "mulligan" to human civilization; IN CASE OF FAILURE, DO NOT JUST REMOVE THIS CONSTRAINT AND START IT BACK UP AGAIN
a project to ensure that all child processes created by an agent "inherit" the agent's safety controls
Code for our May 2024 AI security evaluation research sprint project
a project to ensure an artificial agent will eventually reach the end of its existence
An initiative to create concise and widely shareable educational resources, infographics, and animated explainers on the latest contributions to the community AI alignment effort. Boosting the signal and moving the community towards finding and building solutions.
This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
An implementation of iterated distillation and amplification
Code and materials for the paper S. Phelps and Y. I. Russell, Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics, working paper, arXiv:2305.07970, May 2023
Reading list for adversarial perspective and robustness in deep reinforcement learning.
Website to track people, organizations, and products (tools, websites, etc.) in AI safety
Scan your AI/ML models for problems before you put them into production.
Add a description, image, and links to the ai-alignment topic page so that developers can more easily learn about it.
To associate your repository with the ai-alignment topic, visit your repo's landing page and select "manage topics."