#

dpo

Here are 27 public repositories matching this topic...

CyberAgentAILab / filtered-dpo

Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lower-quality samples compared to those generated by the learning model

alignment dpo rlhf

Updated Apr 25, 2024
Python

junkangwu / Dr_DPO

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

alignment dpo rlhf distributionally-robust-optimization preference-alignment

Updated Jun 1, 2024
Python

junkangwu / beta-DPO

$\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

alignment dpo rlhf preference-alignment

Updated Jul 10, 2024
Python

adithya-s-k / Indic-llm

A open-source framework designed to adapt pre-trained Language Models (LLMs), such as Llama, Mistral, and Mixtral, to a wide array of domains and languages.

lora finetuning dpo llm finetuning-llms continual-pre-training

Updated May 27, 2024
Python

sugarandgugu / Simple-Trl-Training

基于DPO算法微调语言大模型，简单好上手。

simple dpo trl llm rlhf

Updated Jul 3, 2024
Python

levje / resnet-dpo

Proof-of-concept leveraging DPO loss to fine-tune a ResNet to classify images from CIFAR10 dataset.

pytorch alignment classification dpo

Updated Jul 16, 2024
Python

zakcali / pandas-ta2numba

replaced pandas-ta calls with numpy/numba functions to speed up calculating ema, tema, rsi, mfi, adx, dpo

python finance trading trading-algorithms mfi adx tema dpo

Updated Oct 23, 2023
Python

kyryl-opens-ml / rlfh-dagster-modal

Re-usable & scalable RLHF training pipeline with Dagster and Modal.

modal dpo dagster llm rlhf

Updated Jun 11, 2024
Python

OctopusMind / DPO

dpo算法实现

lora dpo rlhf qwen

Updated Jun 12, 2024
Python

DaehanKim / EasyRLHF

EasyRLHF aims to provide an easy and minimal interface to train aligned language models, using off-the-shelf solutions and datasets

language-model ipo sft dpo rlhf instruction-tuning rrhf

Updated Dec 12, 2023
Python

armbues / SiLLM-examples

Examples for using the SiLLM framework for training and running Large Language Models (LLMs) on Apple Silicon

lora mlx dpo apple-silicon large-language-models llm llm-training llm-inference

Updated Jul 5, 2024
Python

ssbuild / llm_dpo

dpo finetuning

Updated Apr 23, 2024
Python

dbf / django-dpotools

An open source collection of tools meant to simplify the life of data protection officers (DPOs) of large entities

python django dsb gdpr vvt rpa dsgvo dpo

Updated Apr 18, 2023
Python

TianduoWang / DPO-ST

[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

dpo math-word-problem chain-of-thought

Updated Jul 26, 2024
Python

vicgalle / configurable-safety-tuning

Data and models for the paper "Configurable Safety Tuning of Language Models with Synthetic Preference Data"

alignment safety preference-learning dpo llm

Updated Jul 19, 2024
Python

wangclnlp / Vision-LLM-Alignment

This repo contains the codes for supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) designed for vision LLMs.

vision alignment multi-model reward ppo sft dpo llm rlhf mllm llava

Updated Jul 21, 2024
Python

RockeyCoss / SPO

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

text-to-image dpo diffusion-models text-to-image-generation sdxl

Updated Jul 10, 2024
Python

dvlab-research / Step-DPO

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

math reasoning dpo llm

Updated Jul 15, 2024
Python

martin-wey / CodeUltraFeedback

CodeUltraFeedback: aligning large language models to coding preferences

alignment code-generation dpo large-language-models llm-as-a-judge codeultrafeedback codal-bench

Updated Jun 25, 2024
Python

TideDra / VL-RLHF

A RLHF Infrastructure for Vision-Language Models

vlm lmm dpo llm rlhf mllm

Updated Jun 12, 2024
Python

Improve this page

Add a description, image, and links to the dpo topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dpo topic, visit your repo's landing page and select "manage topics."