-
Updated
Sep 2, 2018 - HTML
ai-alignment
Here are 27 public repositories matching this topic...
bbBOT is a felixble persona based branching binary sentiment chatbot.
-
Updated
Jul 13, 2023 - Python
Code for our May 2024 AI security evaluation research sprint project
-
Updated
Jun 12, 2024 - Python
a project to detect environment tampering on the part of an agent
-
Updated
Oct 31, 2022
sinewCHAT uses instanced chatbots to emulate neural nodes to enrich and generate positive weighted responses.
-
Updated
Jul 16, 2023 - Python
A persona chat based on the VIA Character Strengths. Reads emotional tone and summons appropriate virtue to respond.
-
Updated
Jul 14, 2023 - Python
a project to ensure an artificial agent will eventually reach the end of its existence
-
Updated
Oct 29, 2022
a library designed to shut down an agent exhibiting unexpected behavior providing a potential "mulligan" to human civilization; IN CASE OF FAILURE, DO NOT JUST REMOVE THIS CONSTRAINT AND START IT BACK UP AGAIN
-
Updated
Oct 30, 2022
a project to ensure that all child processes created by an agent "inherit" the agent's safety controls
-
Updated
Oct 29, 2022
Q&A system with reflection and automation, similar to Patchwork, Affable, Mosaic
-
Updated
Mar 10, 2019 - Clojure
An implementation of iterated distillation and amplification
-
Updated
Jun 22, 2022 - Python
a prototype for an AI safety library that allows an agent to maximize its reward by solving a puzzle in order to prevent the worst-case outcomes of perverse instantiation
-
Updated
Nov 8, 2022 - Python
Code and materials for the paper S. Phelps and Y. I. Russell, Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics, working paper, arXiv:2305.07970, May 2023
-
Updated
Mar 1, 2024 - Python
Scan your AI/ML models for problems before you put them into production.
-
Updated
Jun 20, 2024 - Python
An initiative to create concise and widely shareable educational resources, infographics, and animated explainers on the latest contributions to the community AI alignment effort. Boosting the signal and moving the community towards finding and building solutions.
-
Updated
Jul 9, 2023
This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
-
Updated
Sep 15, 2023 - Python
Website to track people, organizations, and products (tools, websites, etc.) in AI safety
-
Updated
Jun 21, 2024 - HTML
Sparse probing paper full code.
-
Updated
Dec 17, 2023 - Jupyter Notebook
Improve this page
Add a description, image, and links to the ai-alignment topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the ai-alignment topic, visit your repo's landing page and select "manage topics."