dpo
Here are 27 public repositories matching this topic...
Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
-
Updated
Jun 1, 2024 - Python
$\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
-
Updated
Jul 10, 2024 - Python
A open-source framework designed to adapt pre-trained Language Models (LLMs), such as Llama, Mistral, and Mixtral, to a wide array of domains and languages.
-
Updated
May 27, 2024 - Python
Proof-of-concept leveraging DPO loss to fine-tune a ResNet to classify images from CIFAR10 dataset.
-
Updated
Jul 16, 2024 - Python
EasyRLHF aims to provide an easy and minimal interface to train aligned language models, using off-the-shelf solutions and datasets
-
Updated
Dec 12, 2023 - Python
Examples for using the SiLLM framework for training and running Large Language Models (LLMs) on Apple Silicon
-
Updated
Jul 5, 2024 - Python
[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
-
Updated
Jul 26, 2024 - Python
Data and models for the paper "Configurable Safety Tuning of Language Models with Synthetic Preference Data"
-
Updated
Jul 19, 2024 - Python
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step
-
Updated
Jul 10, 2024 - Python
CodeUltraFeedback: aligning large language models to coding preferences
-
Updated
Jun 25, 2024 - Python
Improve this page
Add a description, image, and links to the dpo topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the dpo topic, visit your repo's landing page and select "manage topics."