r1

Star

Here are 46 public repositories matching this topic...

zzli2022 / Awesome-System2-Reasoning-LLM

Star

Latest Advances on System-2 Reasoning

benchmark mcts rl reasoning r1 prm o3 o1 slow-fast system-2 self-improve macro-action

Updated Mar 12, 2025
Python

turningpoint-ai / VisualThinker-R1-Zero

Star

Explore the Multimodal “Aha Moment” on 2B Model

reinforcement-learning reasoning r1 post-training multimodal deepseek deepseek-r1 grpo deepseek-r1-zero r1-zero multimodal-journey multimodal-r1

Updated Mar 10, 2025
Python

modelscope / awesome-deep-reasoning

Star

Collect every awesome work about r1!

collection rl reasoning r1 o1 qwen deepseek grpo

Updated Mar 12, 2025
Python

RyanLiu112 / compute-optimal-tts

Star

Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".

r1 o1 large-language-model test-time-scaling

Updated Feb 19, 2025
Python

SmallDoges / small-doge

Star

Doge Family of Small Language Model

python nlp natural-language-processing reinforcement-learning deep-learning pytorch transformer chinese attention-mechanism r1 attention-is-all-you-need mechine-learning foundation-models small-language-models dynamic-mask-attention cross-domain-mixture-of-experts deepseek-r1

Updated Mar 13, 2025
Python

DMontgomery40 / deepseek-mcp-server

Star

Model Context Protocol server for DeepSeek's advanced language models

mcp r1 deepseek-chat deepseek-api model-context-protocol deepseek-v3 deepseek-r1

Updated Mar 13, 2025
JavaScript

CJReinforce / PURE

Star

SOTA RL fine-tuning solution for advanced math reasoning of LLM

reinforcement-learning mathematics rl reasoning r1 o1 llm reinforcement-finetuning

Updated Mar 11, 2025
Python

LazaUK / AIFoundry-DeepSeek-SDK

Star

Notebooks to demo the use of Azure AI Python SDK / LangChain with DeepSeek R1 reasoning model in Azure AI Foundry.

python sdk ai azure openai foundry r1 langchain deepseek

Updated Feb 6, 2025
Jupyter Notebook

glide-the / InterpretationoDreams

Star

使用langchain进行任务规划，构建子任务的会话场景资源，通过MCTS任务执行器，来让每个子任务通过在上下文中资源，通过自身反思探索来获取自身对问题的最优答案；这种方式依赖模型的对齐偏好，我们在每种偏好上设计了一个工程框架，来完成自我对不同答案的奖励进行采样策略

task-planning r1 cot mcts-agents deep-research

Updated Mar 9, 2025
Jupyter Notebook

lachlancresswell / AutoR1

Star

Auto-generate fallback and meter display from existing group info in d&b audiotechnik's R1 and ArrayCalc software.

r1 dbaudio dbaudiotechnik arraycalc

Updated Mar 7, 2024
Python

The-Swarm-Corporation / AgentGym

Star

A framework making it effortless to convert any llm model into a reasoning agent like o1 or DeepSeek's r1

ai rl agents alibaba r1 o1 llms qwen deepseek

Updated Feb 10, 2025
Python

sdiehl / tiny-r1

Star

Recreating the minimal training methods of DeepSeek-R1 for small langauge models.

reasoning r1 grpo grpotrainer

Updated Feb 10, 2025
Python

IoTDevice / phicomm-r1-controler

Star

斐讯R1音箱控制程序

phicomm r1 yinxiang feixun

Updated Feb 28, 2021
Go

The-Martyr / Awesome-Multimodal-Reasoning

Star

Latest Advances on (RL based) Multimodal Reasoning in Multimodal Large Language Models

reinforcement-learning rl image-generation video-understanding r1 image-understanding multimodal-learning cot o1 video-reasoning large-language-models llm chain-of-thought mllm lvlm multimodal-reasoning image-reasoning

Updated Mar 12, 2025

ericsson-iap / python-sample-app

Star

Python Sample App for SMO Systems like Ericsson Intelligent Automation Platform. We aim to be ORAN aligned. Use this to kickstart your own app!

python smo 3gpp r1 eic ric ran oran o-ran rapp eiap non-rt-ric

Updated Oct 25, 2024
Python

sylvain-wei / 24-Game-Reasoning

Star

超简单复现Deepseek-R1-Zero和Deepseek-R1，以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL，以激发LLM的自主验证反思能力。

alignment reasoning r1 post-training cot sft o1 24game llm rlhf deepseek r1-zero verl long-cot

Updated Mar 3, 2025
Python

nschlaepfer / ChainForge-R1-SuperCoT

Star

A multi-stage pipeline that enhances Qwen2.5 language models with DeepSeek Reasoner's chain-of-thought capabilities. Implements the DeepSeek-R1 methodology through cold-start SFT, reasoning-oriented RL, rejection sampling, and optional model distillation.

training ai reasoning r1 qwen deepseek deepseek-r1 cold-start-sft

Updated Jan 24, 2025
Python

BY571 / DistRL-LLM

Star

Distributed Reinforcement Learning for LLM Fine-Tuning with multi-GPU utilization

reinforcement-learning pg r1 multi-gpu-training multi-gpu-inference llm llm-training llm-finetuning llm-fine-tuning grpo reinforcement-learning-fine-tuning

Updated Mar 12, 2025
Python

OnerootProject / r1

Star

R1 Protocol

protocol exchange dex r1

Updated Mar 7, 2019
JavaScript

lechmazur / goods

Star

LLM public goods game

evaluation economics r1 o1 llm o3-mini

Updated Feb 22, 2025

Improve this page

Add a description, image, and links to the r1 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the r1 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

r1

Here are 46 public repositories matching this topic...

zzli2022 / Awesome-System2-Reasoning-LLM

turningpoint-ai / VisualThinker-R1-Zero

modelscope / awesome-deep-reasoning

RyanLiu112 / compute-optimal-tts

SmallDoges / small-doge

DMontgomery40 / deepseek-mcp-server

CJReinforce / PURE

LazaUK / AIFoundry-DeepSeek-SDK

glide-the / InterpretationoDreams

lachlancresswell / AutoR1

The-Swarm-Corporation / AgentGym

sdiehl / tiny-r1

IoTDevice / phicomm-r1-controler

The-Martyr / Awesome-Multimodal-Reasoning

ericsson-iap / python-sample-app

sylvain-wei / 24-Game-Reasoning

nschlaepfer / ChainForge-R1-SuperCoT

BY571 / DistRL-LLM

OnerootProject / r1

lechmazur / goods

Improve this page

Add this topic to your repo