dpo

Here are 73 public repositories matching this topic...

shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。

medical llama gpt dpo llm chatgpt medicalgpt

Updated Mar 8, 2025
Python

PKU-Alignment / align-anything

Star

Align Anything: Training All-modality Model with Feedback

chameleon multimodal dpo large-language-models rlhf vision-language-model

Updated Mar 13, 2025
Python

ContextualAI / HALOs

Star

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

alignment ppo halos dpo kto rlhf

Updated Mar 8, 2025
Python

jianzhnie / LLamaTuner

Star

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

llama ppo dpo chatgpt rlhf qlora qwen mixtral llama3

Updated Jan 24, 2025
Python

ukairia777 / tensorflow-nlp-tutorial

Star

tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림 태스크들을 정리한 Deep Learning NLP 저장소입니다.

nlp natural-language-processing tensorflow transformers named-entity-recognition question-answering llama lora trainer bert keras-tutorial sft dpo nlp-tutorial huggingface bert-ner llm

Updated Sep 6, 2024
Jupyter Notebook

dvlab-research / Step-DPO

Star

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

math reasoning dpo llm

Updated Jan 19, 2025
Python

TUDB-Labs / mLoRA

Star

An Efficient "Factory" to Build Multiple LoRA Adapters

gpu llama lora finetune peft dpo baichuan llm rlhf chatglm llama2 mlora

Updated Feb 13, 2025
Python

armbues / SiLLM

Star

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

lora mlx dpo apple-silicon large-language-models llm llm-training llm-inference

Updated Mar 13, 2025
Python

sail-sg / oat

Star

🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

thompson-sampling alignment reasoning distributed-training ppo dueling-bandits dpo distributed-rl llm online-rl rlhf llm-aligment online-alignment llm-exploration grpo r1-zero

Updated Mar 10, 2025
Python

RockeyCoss / SPO

Star

Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization

text-to-image dpo diffusion-models text-to-image-generation sdxl

Updated Dec 17, 2024
Python

YangLing0818 / IterComp

Star

[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

text-to-image dpo rlhf reward-modeling

Updated Feb 19, 2025
Python

TideDra / VL-RLHF

Star

A RLHF Infrastructure for Vision-Language Models

vlm lmm dpo llm rlhf mllm

Updated Nov 15, 2024
Python

argilla-io / notus

Star

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

zephyr fine-tuning dpo trl lm-alignment preference-data alignment-handbook

Updated Jan 15, 2024
Python

anilca / NetTrader.Indicator

Star

Technical anaysis library for .NET

Updated Sep 8, 2024
C#

NiuTrans / Vision-LLM-Alignment

Star

This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.

vision alignment multi-model reward ppo sft dpo llm rlhf mllm llava llama3-vision

Updated Oct 16, 2024
Python

martin-wey / CodeUltraFeedback

Star

CodeUltraFeedback: aligning large language models to coding preferences

alignment code-generation dpo large-language-models llm-as-a-judge codeultrafeedback codal-bench

Updated Jun 25, 2024
Python

YangLing0818 / SuperCorrect-llm

Star

[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

reflection self-correction dpo llm llm-reasoning

Updated Feb 28, 2025
Python

junkangwu / beta-DPO

Star

[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

alignment dpo rlhf preference-alignment

Updated Oct 23, 2024
Python

TianduoWang / DPO-ST

Star

[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

dpo math-word-problem chain-of-thought

Updated Jul 28, 2024
Python

taco-group / Re-Align

Star

A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.

alignment safety vlm post-training ppo hallucination dpo large-language-models llm rlhf mllm vision-language-model multimodal-large-language-models hallucination-mitigation

Updated Feb 19, 2025
Python

Improve this page

Add a description, image, and links to the dpo topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dpo topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dpo

Here are 73 public repositories matching this topic...

shibing624 / MedicalGPT

PKU-Alignment / align-anything

ContextualAI / HALOs

jianzhnie / LLamaTuner

ukairia777 / tensorflow-nlp-tutorial

dvlab-research / Step-DPO

TUDB-Labs / mLoRA

armbues / SiLLM

sail-sg / oat

RockeyCoss / SPO

YangLing0818 / IterComp

TideDra / VL-RLHF

argilla-io / notus

anilca / NetTrader.Indicator

NiuTrans / Vision-LLM-Alignment

martin-wey / CodeUltraFeedback

YangLing0818 / SuperCorrect-llm

junkangwu / beta-DPO

TianduoWang / DPO-ST

taco-group / Re-Align

Improve this page

Add this topic to your repo