ai-alignment

Star

Here are 12 public repositories matching this topic...

IQTLabs / daisybell

Star

Scan your AI/ML models for problems before you put them into production.

cybersecurity ai-safety bias-correction bias-detection ai-alignment model-poison ai-assurance

Updated Nov 6, 2024
Python

tashakim / otherness_and_control

Star

Interpretation of otherness and control

ai-alignment

Updated Oct 1, 2024
Python

veeara282 / alignment-jam-2024may

Star

Code for our May 2024 AI security evaluation research sprint project

ai-alignment openai-api

Updated Aug 16, 2024
Python

Code and materials for the paper S. Phelps and Y. I. Russell, Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics, working paper, arXiv:2305.07970, May 2023

economics ai-safety gametheory experimental-economics behavioral-economics prisoners-dilemma ai-alignment experimental-psychology social-dilemmas gpt-3 gpt-4 llm principal-agent-problem

Updated Mar 1, 2024
Python

agencyenterprise / PromptInject

Star

PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022

machine-learning agi language-models ai-safety adversarial-attacks ai-alignment ml-safety gpt-3 large-language-models prompt-engineering chain-of-thought agi-alignment

Updated Feb 26, 2024
Python

tomekkorbak / pretraining-with-human-feedback

Star

Code accompanying the paper Pretraining Language Models with Human Preferences

reinforcement-learning gpt language-models ai-safety ai-alignment pretraining decision-transformers rlhf

Updated Feb 13, 2024
Python

UCSC-VLAA / Sight-Beyond-Text

Star

This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"

alignment vlm ai-alignment vision-language vicuna llm mllm llava llama2

Updated Sep 15, 2023
Python

EveryOneIsGross / sinewCHAT

Star

sinewCHAT uses instanced chatbots to emulate neural nodes to enrich and generate positive weighted responses.

ml ai-alignment rrn openai-api

Updated Jul 16, 2023
Python

EveryOneIsGross / areteCHAT

Star

A persona chat based on the VIA Character Strengths. Reads emotional tone and summons appropriate virtue to respond.

sentiment-analysis chatbot via chatbot-framework ai-alignment virtues

Updated Jul 14, 2023
Python

EveryOneIsGross / bbBOT

Star

bbBOT is a felixble persona based branching binary sentiment chatbot.

openai tree-structure chatbot-framework ai-alignment python-ai openai-chatgpt

Updated Jul 13, 2023
Python

ai-fail-safe / safe-reward

Star

a prototype for an AI safety library that allows an agent to maximize its reward by solving a puzzle in order to prevent the worst-case outcomes of perverse instantiation

failsafe ai-safety anomaly-detection ai-alignment fail-safe

Updated Nov 8, 2022
Python

rmoehn / amplification

Star

An implementation of iterated distillation and amplification

transformer ida supervised-learning ai-safety ai-alignment

Updated Jun 22, 2022
Python

Improve this page

Add a description, image, and links to the ai-alignment topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-alignment topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-alignment

Here are 12 public repositories matching this topic...

IQTLabs / daisybell

tashakim / otherness_and_control

veeara282 / alignment-jam-2024may

phelps-sg / llm-cooperation

agencyenterprise / PromptInject

tomekkorbak / pretraining-with-human-feedback

UCSC-VLAA / Sight-Beyond-Text

EveryOneIsGross / sinewCHAT

EveryOneIsGross / areteCHAT

EveryOneIsGross / bbBOT

ai-fail-safe / safe-reward

rmoehn / amplification

Improve this page

Add this topic to your repo