r1
Here are 46 public repositories matching this topic...
Explore the Multimodal “Aha Moment” on 2B Model
-
Updated
Mar 10, 2025 - Python
Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
-
Updated
Feb 19, 2025 - Python
Doge Family of Small Language Model
-
Updated
Mar 13, 2025 - Python
Model Context Protocol server for DeepSeek's advanced language models
-
Updated
Mar 13, 2025 - JavaScript
SOTA RL fine-tuning solution for advanced math reasoning of LLM
-
Updated
Mar 11, 2025 - Python
使用langchain进行任务规划,构建子任务的会话场景资源,通过MCTS任务执行器,来让每个子任务通过在上下文中资源,通过自身反思探索来获取自身对问题的最优答案;这种方式依赖模型的对齐偏好,我们在每种偏好上设计了一个工程框架,来完成自我对不同答案的奖励进行采样策略
-
Updated
Mar 9, 2025 - Jupyter Notebook
Auto-generate fallback and meter display from existing group info in d&b audiotechnik's R1 and ArrayCalc software.
-
Updated
Mar 7, 2024 - Python
Recreating the minimal training methods of DeepSeek-R1 for small langauge models.
-
Updated
Feb 10, 2025 - Python
Latest Advances on (RL based) Multimodal Reasoning in Multimodal Large Language Models
-
Updated
Mar 12, 2025
A multi-stage pipeline that enhances Qwen2.5 language models with DeepSeek Reasoner's chain-of-thought capabilities. Implements the DeepSeek-R1 methodology through cold-start SFT, reasoning-oriented RL, rejection sampling, and optional model distillation.
-
Updated
Jan 24, 2025 - Python
Distributed Reinforcement Learning for LLM Fine-Tuning with multi-GPU utilization
-
Updated
Mar 12, 2025 - Python
Improve this page
Add a description, image, and links to the r1 topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the r1 topic, visit your repo's landing page and select "manage topics."