dpo
Here are 73 public repositories matching this topic...
Align Anything: Training All-modality Model with Feedback
-
Updated
Mar 13, 2025 - Python
tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림 태스크들을 정리한 Deep Learning NLP 저장소입니다.
-
Updated
Sep 6, 2024 - Jupyter Notebook
SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.
-
Updated
Mar 13, 2025 - Python
🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.
-
Updated
Mar 10, 2025 - Python
Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization
-
Updated
Dec 17, 2024 - Python
[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
-
Updated
Feb 19, 2025 - Python
Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach
-
Updated
Jan 15, 2024 - Python
CodeUltraFeedback: aligning large language models to coding preferences
-
Updated
Jun 25, 2024 - Python
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
-
Updated
Feb 28, 2025 - Python
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
-
Updated
Oct 23, 2024 - Python
[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
-
Updated
Jul 28, 2024 - Python
A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.
-
Updated
Feb 19, 2025 - Python
Improve this page
Add a description, image, and links to the dpo topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the dpo topic, visit your repo's landing page and select "manage topics."