Awesome Reasoning LLM Tutorial/Survey/Guide
-
Updated
Mar 11, 2025 - Python
Awesome Reasoning LLM Tutorial/Survey/Guide
Explore the Multimodal “Aha Moment” on 2B Model
A brief and partial summary of RLHF algorithms.
[EMNLP 2022] Continual Training of Language Models for Few-Shot Learning
A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.
Official implementation for "Diffusion Instruction Tuning"
RFTT: Reasoning with Reinforced Functional Token Tuning
Official repository for the paper "Monocular Event-Based Vision for Obstacle Avoidance with a Quadrotor" by Bhattacharya, et al. (2024) from GRASP, Penn & RPG, UZH.
Official repo for "ProSec: Fortifying Code LLMs with Proactive Security Alignment"
An Approach to Enhancing the Efficacy of Post-Training Using Synthetic Data by Iterative Data Selection
Machine Reading Comprehension Competition w/ Korean BERT Model
Lightweight replication study of DeepSeek-R1-Zero. Interesting findings include "No Aha Moment", "Longer CoT ≠ Accuracy", and "Language Mixing in Instruct Models".
Reproducible figures for "Post Training in Deep Learning"
We use RL to train a SOTA MLLM captioner.
Post Training Android Part 4 for Software Laboratory Center 19-2 Binus University
Pure RL without SFT to post-train base models for social reasoning capabilities. Lightweight replication of DeepSeek-R1-Zero with Social IQa dataset.
Code repository for "Post-pre-training for Modality Alignment in Vision-Language Foundation Models" (CVPR2025)
Post Training Android Part 2 for Software Laboratory Center 19-2 Binus University
Post Training Android Part 1 for Software Laboratory Center 19-2 Binus University
Add a description, image, and links to the post-training topic page so that developers can more easily learn about it.
To associate your repository with the post-training topic, visit your repo's landing page and select "manage topics."