rlhf

Here are 106 public repositories matching this topic...

LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

python machine-learning ai nextjs discord-bot assistant language-model chatgpt rlhf

Updated Aug 17, 2024
Python

hiyouga / LLaMA-Factory

Star

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Updated Oct 11, 2024
Python

RUCAIBox / LLMSurvey

Star

The official GitHub page for the survey paper "A Survey of Large Language Models".

natural-language-processing pre-training pre-trained-language-models in-context-learning large-language-models llm llms chain-of-thought chatgpt rlhf instruction-tuning

Updated Aug 20, 2024
Python

ymcui / Chinese-LLaMA-Alpaca-2

Star

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

nlp yarn llama alpaca 64k large-language-models llm rlhf flash-attention llama2 llama-2 alpaca-2 alpaca2

Updated Sep 23, 2024
Python

InternLM / InternLM

Star

Official release of InternLM2.5 base and chat models. 1M context support

chatbot chinese gpt pretrained-models llm long-context rlhf large-language-model flash-attention fine-tuning-llm

Updated Oct 10, 2024
Python

huggingface / alignment-handbook

Star

Robust recipes to align language models with human and AI preferences

transformers llm rlhf

Updated Oct 7, 2024
Python

argilla-io / argilla

Star

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

nlp machine-learning natural-language-processing ai weak-supervision developer-tools active-learning annotation-tool text-annotation weakly-supervised-learning human-in-the-loop mlops text-labeling gpt-4 llm langchain rlhf

Updated Oct 14, 2024
Python

hiyouga / ChatGLM-Efficient-Tuning

Star

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

transformers pytorch lora language-model alpaca fine-tuning peft huggingface chatgpt rlhf chatglm qlora chatglm2

Updated Oct 12, 2023
Python

Docta-ai / docta

Star

A Doctor for your data

data language-model data-curation data-centric-ai data-diagnosis data-centric-machine-learning rlhf

Updated Aug 7, 2024
Python

THUDM / WebGLM

Star

WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)

llm chatgpt rlhf webglm

Updated Jul 29, 2023
Python

argilla-io / distilabel

Star

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

python ai openai synthetic-data synthetic-dataset-generation huggingface llms rlhf rlaif

Updated Oct 14, 2024
Python

PKU-Alignment / safe-rlhf

Star

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Updated Jun 13, 2024
Python

THUDM / ImageReward

Star

[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation

generative-model diffusion-models human-preferences rlhf

Updated Oct 3, 2024
Python

RLHFlow / RLHF-Reward-Modeling

Star

Recipes to train reward model for RLHF.

llm rlhf reward-models llama3

Updated Sep 23, 2024
Python

ContextualAI / HALOs

Star

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

alignment ppo halos dpo kto rlhf

Updated Sep 28, 2024
Python

princeton-nlp / SimPO

Star

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward

alignment large-language-models rlhf preference-alignment

Updated Oct 13, 2024
Python

jerry1993-tech / Cornucopia-LLaMA-Fin-Chinese

Star

聚宝盆(Cornucopia): 中文金融系列开源可商用大模型，并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)

nlp finance qa transformers text-generation chinese llama sft large-language-models rlhf

Updated Jun 30, 2023
Python

jianzhnie / LLamaTuner

Star

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

llama ppo dpo chatgpt rlhf qlora qwen mixtral llama3

Updated Jul 10, 2024
Python

voidful / TextRL

Sponsor

Star

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

nlp reinforcement-learning pytorch nlg language-model gpt-2 gpt-3 controlled-nlg chatgpt rlhf

Updated May 9, 2024
Python

uclaml / SPPO

Star

The official implementation of Self-Play Preference Optimization (SPPO)

deep-learning fine-tuning self-play large-language-models rlhf

Updated Aug 4, 2024
Python

Improve this page

Add a description, image, and links to the rlhf topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the rlhf topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rlhf

Here are 106 public repositories matching this topic...

LAION-AI / Open-Assistant

hiyouga / LLaMA-Factory

RUCAIBox / LLMSurvey

ymcui / Chinese-LLaMA-Alpaca-2

InternLM / InternLM

huggingface / alignment-handbook

argilla-io / argilla

hiyouga / ChatGLM-Efficient-Tuning

Docta-ai / docta

THUDM / WebGLM

argilla-io / distilabel

PKU-Alignment / safe-rlhf

THUDM / ImageReward

RLHFlow / RLHF-Reward-Modeling

ContextualAI / HALOs

princeton-nlp / SimPO

jerry1993-tech / Cornucopia-LLaMA-Fin-Chinese

jianzhnie / LLamaTuner

voidful / TextRL

uclaml / SPPO

Improve this page

Add this topic to your repo