#

rlhf

Here are 148 public repositories matching this topic...

arunprsh / ChatGPT-Decoded-GPT2-FAQ-Bot-RLHF-PPO

A Practical Guide to Developing a Reliable FAQ Chatbot with Reinforcement Learning and Human Feedback using GPT-2 on AWS

aws reinforcement-learning chatbot transformers question-answering sagemaker gpt-2 gpt2 rlhf

Updated Feb 11, 2023
Jupyter Notebook

xrsrke / instructGOOSE

Implementation of Reinforcement Learning from Human Feedback (RLHF)

reinforcement-learning chatgpt human-feedback rlhf instructgpt

Updated Apr 7, 2023
Jupyter Notebook

saschaschramm / tiny-chatgpt

Researching the reinforcement learning algorithm of ChatGPT

gae temporal-differencing-learning ppo chatgpt rlhf general-advantage-estimation

Updated Apr 7, 2023
Jupyter Notebook

jeremy-collins / robot-rlhf

Robot Learning from Human Feedback. Inspired by advancements in NLP, we train a robot policy via reinforcement learning using a reward function learned exclusively from human preferences.

reinforcement-learning robotics alignment chatgpt rlhf

Updated Apr 16, 2023
Python

01Kevin01 / awesome-RLHF-Turkish

A curated list of reinforcement learning with human feedback resources[awesome-RLHF-Turkish] (continually updated)

ai artificial-intelligence turkish-language general-language-model human-feedback rlhf value-alignment awesome-rlhf rlhf-turkish

Updated Apr 27, 2023

phonism / llm4cp

Large Language Model for Competitive Programming

competitive-programming llama ppo large-language-models rlhf

Updated Apr 28, 2023
Python

jackaduma / ChatGLM-LoRA-RLHF-PyTorch

A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM

pytorch llama gpt lora finetune ppo peft deepspeed llm chatgpt rlhf reward-models chatglm chatglm-6b

Updated Apr 28, 2023
Python

jackaduma / Alpaca-LoRA-RLHF-PyTorch

A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca

pytorch llama gpt lora alpaca finetune ppo peft deepspeed llm chatgpt rlhf reward-models

Updated Apr 28, 2023
Python

log10-io / log10js

JavaScript client library for managing your LLM data in one place

javascript debugging ai monitoring logging artificial-intelligence openai autonomous-agents openai-api langchain rlhf llmops langchain-js

Updated May 3, 2023
JavaScript

jianzhnie / awesome-open-chatgpt

Open efforts to implement ChatGPT-like models and beyond.

llama gpt4 chatgpt rlhf instruct-gpt

Updated May 10, 2023

l294265421 / my-alpaca

Reproduce alpaca

colab alpaca multi-turn autodl chatgpt rlhf alpaca-lora

Updated May 14, 2023
Jupyter Notebook

Miraclemarvel55 / LLaMA-MOSS-RLHF-LoRA

用RLHF可选LoRA对LLaMA和MOSS进行训练|Training LLaMA or MOSS with RLHF [LoRA]

similarity chinese rl llama lora moss reward ppo rlhf

Updated May 16, 2023
Python

jasonvanf / llama-trl

LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA

adapter transformer llama gpt lora ppo peft trl gpt-4 chatgpt rlhf

Updated May 23, 2023
Python

Miraclemarvel55 / ChatGLM-RLHF

对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF

custom similarity reward nickname ppo rlhf chatglm

Updated May 23, 2023
Python

ymnseol / weekly-paper-reading-group

Summaries of papers related to the alignment problem in NLP

nlp natural-language-processing rlhf instruction-tuning reinforcement-learning-from-human-feedback

Updated May 29, 2023

l294265421 / alpaca-rlhf

Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat

reinforcement-learning llama language-model alpaca large-language-models llm chatgpt rlhf

Updated Jun 5, 2023
Python

victor-iyi / rlhf-trl

Reinforcement Learning from Human Feedback with 🤗 TRL

reinforcment-learning human-feedback rlhf

Updated Jun 14, 2023
Python

shanggangli / ChatGLM-6B-finetuning

ChatGLM-6B-finetuning

pytorch lora natural-language-generation rlhf chatglm-6b p-tuning-v2

Updated Jun 18, 2023
Python

jerry1993-tech / Cornucopia-LLaMA-Fin-Chinese

聚宝盆(Cornucopia): 中文金融系列开源可商用大模型，并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)

nlp finance qa transformers text-generation chinese llama sft large-language-models rlhf

Updated Jun 30, 2023
Python

akashsonowal / ddpo-pytorch

RLHF for Stable Diffusion

reinforcement-learning diffusion stable-diffusion rlhf

Updated Jul 9, 2023
Python

Improve this page

Add a description, image, and links to the rlhf topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the rlhf topic, visit your repo's landing page and select "manage topics."