A Practical Guide to Developing a Reliable FAQ Chatbot with Reinforcement Learning and Human Feedback using GPT-2 on AWS
-
Updated
Feb 11, 2023 - Jupyter Notebook
A Practical Guide to Developing a Reliable FAQ Chatbot with Reinforcement Learning and Human Feedback using GPT-2 on AWS
Implementation of Reinforcement Learning from Human Feedback (RLHF)
Researching the reinforcement learning algorithm of ChatGPT
Robot Learning from Human Feedback. Inspired by advancements in NLP, we train a robot policy via reinforcement learning using a reward function learned exclusively from human preferences.
A curated list of reinforcement learning with human feedback resources[awesome-RLHF-Turkish] (continually updated)
Large Language Model for Competitive Programming
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
JavaScript client library for managing your LLM data in one place
Open efforts to implement ChatGPT-like models and beyond.
Reproduce alpaca
Summaries of papers related to the alignment problem in NLP
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
Reinforcement Learning from Human Feedback with 🤗 TRL
ChatGLM-6B-finetuning
聚宝盆(Cornucopia): 中文金融系列开源可商用大模型,并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)
RLHF for Stable Diffusion
Add a description, image, and links to the rlhf topic page so that developers can more easily learn about it.
To associate your repository with the rlhf topic, visit your repo's landing page and select "manage topics."