post-training

Star

Here are 37 public repositories matching this topic...

mbzuai-oryx / Awesome-LLM-Post-training

Star

Awesome Reasoning LLM Tutorial/Survey/Guide

reinforcement-learning scaling reasoning fine post-training large-language-models

Updated Jun 16, 2025
Python

SmartFlowAI / EmoLLM

Star

心理健康大模型 (LLM x Mental Health), Pre & Post-training & Dataset & Evaluation & Depoly & RAG, with InternLM / Qwen / Baichuan / DeepSeek / Mixtral / LLama / GLM series models

evaluation dataset post-training llm the-big-model-of-mental-health depoly

Updated May 18, 2025
Python

turningpoint-ai / VisualThinker-R1-Zero

Star

Explore the Multimodal “Aha Moment” on 2B Model

reinforcement-learning reasoning r1 post-training multimodal deepseek deepseek-r1 grpo deepseek-r1-zero r1-zero multimodal-journey multimodal-r1

Updated Mar 18, 2025
Python

anakin87 / qwen-scheduler-grpo

Star

Train a Language Model with GRPO to create a schedule from a list of events and priorities

reinforcement-learning reasoning fine-tuning post-training llm grpo

Updated Apr 29, 2025
Jupyter Notebook

yihedeng9 / rlhf-summary-notes

Star

A brief and partial summary of RLHF algorithms.

reinforcement-learning deep-learning post-training large-language-models rlhf

Updated Mar 4, 2025

GAIR-NLP / OctoThinker

Star

Revisiting Mid-training in the Era of Reinforcement Learning Scaling

rl llama reasoning post-training pre-training llm qwen verl mid-training

Updated Jul 7, 2025
Jupyter Notebook

tiannuo-yang / SearchAgent-X

Star

A High-Efficiency System of Large Language Model Based Search Agents

agent information-retrieval ai approximate-nearest-neighbor-search post-training rag llm rlhf llm-serving vllm efficient-ai

Updated Jul 2, 2025
Python

Jialuo-Li / Science-T2I

Star

[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis

science benchmark computer-vision dataset generative-model post-training reward-modeling

Updated Apr 27, 2025
Python

AoqunJin / Awesome-VLA-Post-Training

Star

A collection of vision-language-action model post-training methods.

vla fine-tuning post-training embodied-agent embodied-ai vision-language-action-model

Updated Jul 3, 2025

DolbyUUU / Logic-RL-Lite

Star

Lightweight replication study of DeepSeek-R1-Zero. Interesting findings include "No Aha Moment", "Longer CoT ≠ Accuracy", and "Language Mixing in Instruct Models".

reinforcement-learning fine-tuning post-training llm deepseek gpt-o1 reasoning-language-models reasoning-models deepseek-r1

Updated Apr 1, 2025
Python

bobxwu / learning-from-rewards-llm-papers

Star

A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-inference stages.

reinforcement-learning post-training self-correction reward-learning large-language-models llm llms reward-models reward-model reward-modeling guided-decoding test-time-scaling

Updated Jun 13, 2025

UIC-Liu-Lab / CPT

Star

[EMNLP 2022] Continual Training of Language Models for Few-Shot Learning

nlp natural-language-processing transformers language-modeling post-training continual-learning catastrophic-forgetting few-shot-learning

Updated Feb 13, 2023
Python

taco-group / Re-Align

Star

A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.

alignment safety vlm post-training ppo hallucination dpo large-language-models llm rlhf mllm vision-language-model multimodal-large-language-models hallucination-mitigation

Updated Apr 21, 2025
Python

DolbyUUU / DeepEnlighten

Star

Pure RL to post-train base models for social reasoning capabilities. Lightweight replication of DeepSeek-R1-Zero with Social IQa dataset.

reinforcement-learning fine-tuning post-training llm deepseek gpt-o1 reasoning-language-models reasoning-models deepseek-r1

Updated Mar 16, 2025
Python

complex-reasoning / RPG

Star

The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)

reinforcement-learning deep-learning post-training foundation-models large-language-models llm

Updated Jul 7, 2025
Python

sastpg / RFTT

Star

RFTT: Reasoning with Reinforced Functional Token Tuning

reinforcement-learning tree-search reasoning post-training large-language-models

Updated Jun 12, 2025
Python

AstraZeneca / vlm

Star

Official implementation for "Diffusion Instruction Tuning"

post-training multimodal-alignment supervised-finetuning vision-language-model multimodal-large-language-models

Updated Jun 10, 2025
Python

yshinya6 / clip-refine

Star

Code repository for "Post-pre-training for Modality Alignment in Vision-Language Foundation Models" (CVPR2025)

post-training multimodal vision-language-model modality-gap

Updated Apr 29, 2025
Python

sylvain-wei / 24-Game-Reasoning

Star

超简单复现Deepseek-R1-Zero和Deepseek-R1，以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL，以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of DeepSeek R1-Zero, DeepSeek R1

alignment reasoning r1 post-training cot sft o1 24game llm rlhf deepseek r1-zero verl long-cot

Updated Apr 5, 2025
Python

anish-bhattacharya / evfly

Star

Official repository for the paper "Monocular Event-Based Vision for Obstacle Avoidance with a Quadrotor" by Bhattacharya, et al. (2024) from GRASP, Penn & RPG, UZH.

robotics quadrotor post-training event-camera vision-transformer sim-to-real

Updated Mar 23, 2025
C++

Improve this page

Add a description, image, and links to the post-training topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the post-training topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

post-training

Here are 37 public repositories matching this topic...

mbzuai-oryx / Awesome-LLM-Post-training

SmartFlowAI / EmoLLM

turningpoint-ai / VisualThinker-R1-Zero

anakin87 / qwen-scheduler-grpo

yihedeng9 / rlhf-summary-notes

GAIR-NLP / OctoThinker

tiannuo-yang / SearchAgent-X

Jialuo-Li / Science-T2I

AoqunJin / Awesome-VLA-Post-Training

DolbyUUU / Logic-RL-Lite

bobxwu / learning-from-rewards-llm-papers

UIC-Liu-Lab / CPT

taco-group / Re-Align

DolbyUUU / DeepEnlighten

complex-reasoning / RPG

sastpg / RFTT

AstraZeneca / vlm

yshinya6 / clip-refine

sylvain-wei / 24-Game-Reasoning

anish-bhattacharya / evfly

Improve this page

Add this topic to your repo