OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
-
Updated
Aug 17, 2024 - Python
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
The official GitHub page for the survey paper "A Survey of Large Language Models".
Official release of InternLM2.5 base and chat models. 1M context support
Robust recipes to align language models with human and AI preferences
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调
A Doctor for your data
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
聚宝盆(Cornucopia): 中文金融系列开源可商用大模型,并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)
Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
The official implementation of Self-Play Preference Optimization (SPPO)
Add a description, image, and links to the rlhf topic page so that developers can more easily learn about it.
To associate your repository with the rlhf topic, visit your repo's landing page and select "manage topics."