An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
-
Updated
Sep 30, 2024 - Python
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
Code for Bachelor thesis, The Human Factor: Addressing Diversity in Reinforcement Learning from Human Feedback.
This repository contains the implementation of a Reinforcement Learning with Human Feedback (RLHF) system using custom datasets. The project utilizes the trlX library for training a preference model that integrates human feedback directly into the optimization of language models.
RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback
A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.
[TSMC] Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
Shaping Language Models with Cognitive Insights
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
LMRax is a framework built on JAX to train transformers language models by reinforcement learning, along with the reward model training.
Add a description, image, and links to the reinforcement-learning-from-human-feedback topic page so that developers can more easily learn about it.
To associate your repository with the reinforcement-learning-from-human-feedback topic, visit your repo's landing page and select "manage topics."