Skip to content

Latest commit

 

History

History

deepseek-r1

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

LLM reasoning with DeepSeek-R1 distilled models

DeepSeek-R1 is an open-source reasoning model developed by DeepSeek to address tasks requiring logical inference, mathematical problem-solving, and real-time decision-making. With DeepSeek-R1, you can follow its logic, making it easier to understand and, if necessary, challenge its output. This capability gives reasoning models an edge in fields where outcomes need to be explainable, like research or complex decision-making.

Distillation in AI creates smaller, more efficient models from larger ones, preserving much of their reasoning power while reducing computational demands. DeepSeek applied this technique to create a suite of distilled models from R1, using Qwen and Llama architectures. That allows us to try DeepSeek-R1 capability locally on usual laptops.

In this tutorial, we consider how to run DeepSeek-R1 distilled models using OpenVINO.

The tutorial supports different models, you can select one from the provided options to compare the quality of LLM solutions:

  • DeepSeek-R1-Distill-Llama-8B is a distilled model based on Llama-3.1-8B, that prioritizes high performance and advanced reasoning capabilities, particularly excelling in tasks requiring mathematical and factual precision. Check model card for more info.
  • DeepSeek-R1-Distill-Qwen-1.5B is the smallest DeepSeek-R1 distilled model based on Qwen2.5-Math-1.5B. Despite its compact size, the model demonstrates strong capabilities in solving basic mathematical tasks, at the same time its programming capabilities are limited. Check model card for more info.
  • DeepSeek-R1-Distill-Qwen-7B is a distilled model based on Qwen-2.5-Math-7B. The model demonstrates a good balance between mathematical and factual reasoning and can be less suited for complex coding tasks. Check model card for more info.
  • DeepSeek-R1-Distil-Qwen-14B is a distilled model based on Qwen2.5-14B that has great competence in factual reasoning and solving complex mathematical tasks. Check model card for more info.
  • DeepSeek-R1-Distil-Qwen-32B is a distilled model based on Qwen2.5-32B that has comparable capability as OpenAI o1-mini. Check model card for more info. As the original model size is about 65GB, to quantize it to INT4 requires 32GB of RAM with 200GB for Swap File and another 200GB storage to save the models. The INT4 quantized model has about 16GB in size and requires 32GB of RAM when performing inference on CPU or 64GB of RAM on iGPU.

Learn how to accelerate DeepSeek-R1-Distill-Llama-8B with FastDraft and OpenVINO GenAI speculative decoding pipeline in this notebook

Notebook Contents

The tutorial consists of the following steps:

Installation Instructions

This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. For details, please refer to Installation Guide.