dpo

Star

Here are 35 public repositories matching this topic...

levje / resnet-dpo

Star

Proof-of-concept leveraging DPO loss to fine-tune a ResNet to classify images from CIFAR10 dataset.

pytorch alignment classification dpo

Updated Aug 30, 2024
Python

wherearethegiggles / Model-Combat

Star

DPO using human Votes, Model Combat is an application that compares responses from different AI models (ChatGPT, Hanooman, and Cohere) based on user inputs. Users can vote on which model gives a better response or remark on the responses. The results are saved in Google Sheets for further analysis.

dpo

Updated Aug 6, 2024
Python

NJUxlj / Travel-Agent-based-on-LLM-and-SFT

Star

A travel agent based on qwen2 and DPO

agent dpo qwen

Updated Sep 9, 2024
Python

zakcali / pandas-ta2numba

Star

replaced pandas-ta calls with numpy/numba functions to speed up calculating ema, tema, rsi, mfi, adx, dpo

python finance trading trading-algorithms mfi adx tema dpo

Updated Oct 23, 2023
Python

dbf / django-dpotools

Star

An open source collection of tools meant to simplify the life of data protection officers (DPOs) of large entities

python django dsb gdpr vvt rpa dsgvo dpo

Updated Apr 18, 2023
Python

kyryl-opens-ml / rlfh-dagster-modal

Star

Re-usable & scalable RLHF training pipeline with Dagster and Modal.

modal dpo dagster llm rlhf

Updated Jun 11, 2024
Python

ssbuild / llm_dpo

Star

dpo finetuning

lora dpo qlora

Updated Apr 23, 2024
Python

DaehanKim / EasyRLHF

Star

EasyRLHF aims to provide an easy and minimal interface to train aligned language models, using off-the-shelf solutions and datasets

language-model ipo sft dpo rlhf instruction-tuning rrhf

Updated Dec 12, 2023
Python

junkangwu / Dr_DPO

Star

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

alignment dpo rlhf distributionally-robust-optimization preference-alignment

Updated Jun 1, 2024
Python

YJiangcm / BMC

Star

Code for "Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization"

alignment dpo rlhf

Updated Sep 23, 2024
Python

CyberAgentAILab / filtered-dpo

Star

Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lower-quality samples compared to those generated by the learning model

alignment dpo rlhf

Updated Oct 22, 2024
Python

adithya-s-k / Indic-llm

Star

A open-source framework designed to adapt pre-trained Language Models (LLMs), such as Llama, Mistral, and Mixtral, to a wide array of domains and languages.

lora finetuning dpo llm finetuning-llms continual-pre-training