#

dpo

Here are 24 public repositories matching this topic...

dbf / django-dpotools

An open source collection of tools meant to simplify the life of data protection officers (DPOs) of large entities

python django dsb gdpr vvt rpa dsgvo dpo

Updated Apr 18, 2023
Python

zakcali / pandas-ta2numba

replaced pandas-ta calls with numpy/numba functions to speed up calculating ema, tema, rsi, mfi, adx, dpo

python finance trading trading-algorithms mfi adx tema dpo

Updated Oct 23, 2023
Python

DaehanKim / EasyRLHF

EasyRLHF aims to provide an easy and minimal interface to train aligned language models, using off-the-shelf solutions and datasets

language-model ipo sft dpo rlhf instruction-tuning rrhf

Updated Dec 12, 2023
Python

argilla-io / notus

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

zephyr fine-tuning dpo trl lm-alignment preference-data alignment-handbook

Updated Jan 15, 2024
Python

levje / resnet-dpo

Proof-of-concept leveraging DPO loss to fine-tune a ResNet to classify images from CIFAR10 dataset.

pytorch alignment classification dpo

Updated Mar 29, 2024
Python

vicgalle / configurable-safety-tuning

Data and models for the paper "Configurable Safety Tuning of Language Models with Synthetic Preference Data"

alignment safety preference-learning dpo llm

Updated Apr 23, 2024
Python

ssbuild / llm_dpo

dpo finetuning

Updated Apr 23, 2024
Python

CyberAgentAILab / filtered-dpo

Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lower-quality samples compared to those generated by the learning model

alignment dpo rlhf

Updated Apr 25, 2024
Python

adithya-s-k / Indic-llm

A open-source framework designed to adapt pre-trained Language Models (LLMs), such as Llama, Mistral, and Mixtral, to a wide array of domains and languages.

lora finetuning dpo llm finetuning-llms continual-pre-training

Updated May 27, 2024
Python

ContextualAI / HALOs

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

alignment ppo halos dpo kto rlhf

Updated May 30, 2024
Python

kyryl-opens-ml / rlfh-dagster-modal

Re-usable & scalable RLHF training pipeline with Dagster and Modal.

modal dpo dagster llm rlhf

Updated Jun 11, 2024
Python

TideDra / VL-RLHF

A RLHF Infrastructure for Vision-Language Models

vlm lmm dpo llm rlhf mllm

Updated Jun 12, 2024
Python

OctopusMind / DPO

dpo算法实现

lora dpo rlhf qwen

Updated Jun 12, 2024
Python

RockeyCoss / SPO

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

text-to-image dpo diffusion-models text-to-image-generation sdxl

Updated Jun 21, 2024
Python

martin-wey / CodeUltraFeedback

CodeUltraFeedback: aligning large language models to coding preferences

alignment code-generation dpo large-language-models llm-as-a-judge codeultrafeedback codal-bench

Updated Jun 25, 2024
Python

shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。

medical llama gpt dpo llm chatgpt medicalgpt

Updated Jun 28, 2024
Python

sugarandgugu / Simple-Trl-Training

基于DPO算法微调语言大模型，简单好上手。

simple dpo trl llm rlhf

Updated Jul 3, 2024
Python

jianzhnie / LLamaTuner

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

llama ppo dpo chatgpt rlhf qlora qwen mixtral llama3

Updated Jul 3, 2024
Python

armbues / SiLLM

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

lora mlx dpo apple-silicon large-language-models llm llm-training llm-inference

Updated Jul 3, 2024
Python

dvlab-research / Step-DPO

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

math reasoning dpo llm

Updated Jul 4, 2024
Python

Improve this page

Add a description, image, and links to the dpo topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dpo topic, visit your repo's landing page and select "manage topics."