# **DeepSeek R1-Zero – Quantization & Custom Code Demo**

This notebook demonstrates how to load the [DeepSeek R1-Zero model](https://huggingface.co/deepseek-ai/DeepSeek-R1-Zero) when it:

- Uses **custom code** (model/config) on the Hugging Face Hub.
- Potentially uses **4-bit or 8-bit quantization** that requires an updated `bitsandbytes`.
- Throws errors about **mismatched model types** or **unsupported quantization**.

We'll address the main points:

1. **Installing/Upgrading** `transformers`, `accelerate`, and `bitsandbytes`.
2. Loading the model with **`AutoModelForSeq2SeqLM`** and **`trust_remote_code=True`**.
3. **Disabling** or **overriding** quantization settings if needed.
4. **Troubleshooting** common errors (e.g., config mismatch, missing `bitsandbytes` support).

## **1. Install/Upgrade Dependencies**

We'll:
- Install or upgrade `transformers`, `accelerate`, and `bitsandbytes`.
- Restart the kernel automatically (in some environments) after installation if needed.

If you get a prompt asking to trust remote code, it means the model repository has custom Python code you need to run locally. This is normal for custom architectures.

**Important**: If you're still seeing an `ImportError` about `bitsandbytes`, ensure the version is at least 0.39.0 or higher.

In [None]:
!pip install --upgrade transformers accelerate
!pip install --upgrade bitsandbytes
import sys

print("Installation complete. If a restart kernel message appears, please restart and re-run.")

## **2. Import Libraries**

We’ll rely on the auto-model classes from `transformers` with `trust_remote_code=True`. This allows the custom model code in the `deepseek-ai/DeepSeek-R1-Zero` repository to run, rather than forcing a standard T5.

If you want to load in 4-bit mode, make sure your GPU (and `bitsandbytes`) supports it. If you prefer standard float16 or float32, we’ll show how to do that as well.

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

print("Imports successful.")

## **3. Load the Model & Tokenizer**

Below are **two** examples:

1. **Full / Half Precision** (float32 or float16), *no quantization.*
2. **4-Bit Quantization** (Requires updated `bitsandbytes`).

### **(A) Full/Half Precision**
If 4-bit is giving trouble or you just want to avoid quantization, load the model in standard float32 or float16:

```python
model = AutoModelForSeq2SeqLM.from_pretrained(
    "deepseek-ai/DeepSeek-R1-Zero",
    trust_remote_code=True,
    torch_dtype=torch.float16,   # or torch.float32
    device_map="auto"          # optional, automatically place model on GPU if available
)
```

In [None]:
model_precision = "float16"  # or "float32"

if model_precision == "float16":
    dtype = torch.float16
else:
    dtype = torch.float32

tokenizer_fp = AutoTokenizer.from_pretrained(
    "deepseek-ai/DeepSeek-R1-Zero",
    trust_remote_code=True
)

model_fp = AutoModelForSeq2SeqLM.from_pretrained(
    "deepseek-ai/DeepSeek-R1-Zero",
    trust_remote_code=True,
    torch_dtype=dtype,
    device_map="auto"  # Places the model on GPU if available, else CPU
)

print("Model loaded in", model_precision, "precision.")

### **(B) 4-bit Quantization**
If you specifically **want** 4-bit quantization, and the config is set up for it, do:

```python
model = AutoModelForSeq2SeqLM.from_pretrained(
    "deepseek-ai/DeepSeek-R1-Zero",
    trust_remote_code=True,
    device_map="auto",
    load_in_4bit=True,
)
```
But you must have an updated version of `bitsandbytes` supporting 4-bit. If it raises an error about `ImportError: Using bitsandbytes 4-bit quantization requires the latest version...`, re-run the earlier installation cell or `pip install --upgrade bitsandbytes`.

In [None]:
# Example code for 4-bit quantization (commented out by default)
# Un-comment if you'd like to try 4-bit.

use_4bit = False  # Change to True if you want to attempt 4-bit

if use_4bit:
    tokenizer_4b = AutoTokenizer.from_pretrained(
        "deepseek-ai/DeepSeek-R1-Zero",
        trust_remote_code=True
    )
    
    model_4b = AutoModelForSeq2SeqLM.from_pretrained(
        "deepseek-ai/DeepSeek-R1-Zero",
        trust_remote_code=True,
        device_map="auto",
        load_in_4bit=True,
    )
    
    print("Model loaded in 4-bit quantization.")
else:
    print("4-bit loading is disabled (use_4bit=False). If you want to try it, set use_4bit=True.")

## **4. Test Inference**

We’ll do a simple generation test. If you loaded the float16 model, we’ll test that. If you loaded the 4-bit version, adapt the relevant model/tokenizer references.

Feel free to change the prompt to something more relevant.

In [None]:
prompt = "Who was the first person to walk on the Moon?"

# We'll use model_fp if we loaded in float16 or float32.
# If you loaded 4-bit, swap in model_4b / tokenizer_4b below.

inputs = tokenizer_fp(prompt, return_tensors="pt").to(model_fp.device)
outputs = model_fp.generate(**inputs, max_length=50)
answer = tokenizer_fp.decode(outputs[0], skip_special_tokens=True)

print("Prompt:", prompt)
print("Answer:", answer)

## **5. Additional Troubleshooting**

1. **`ImportError: Using bitsandbytes 4-bit quantization requires the latest version of bitsandbytes`**:
   - Make sure you ran `!pip install --upgrade bitsandbytes`.
   - Restart the runtime if needed, then retry.
   - If it still fails, confirm your `bitsandbytes` version is at least 0.39.0+ by checking `bitsandbytes.__version__`.

2. **`You are using a model of type deepseek_v3 to instantiate a model of type t5`**:
   - This indicates the config is not recognized as T5 internally.
   - **Solution**: Use `AutoModelForSeq2SeqLM.from_pretrained(..., trust_remote_code=True)` to let the library load the custom code.
   - Avoid forcing `T5ForConditionalGeneration.from_pretrained` on a non-standard T5 model.

3. **`ValueError: Unknown quantization type, got fp8...`**:
   - The model’s config references `fp8`, which is not widely supported.
   - Try specifying your own `torch_dtype` (e.g., `float16`) or disable auto-quantization by removing `load_in_8bit` or `load_in_4bit` arguments.
   - Make sure your version of `transformers` and `bitsandbytes` supports the quantization type.

4. **If you see**: `Do you wish to run the custom code? [y/N]`
   - This is normal for repos with custom architecture. Type `y`, or set `trust_remote_code=True` in your code, so you won’t be prompted again.

## **6. Next Steps**
With the model loaded, you can:

- **Experiment with Prompts**: Try different questions or instructions.
- **Adjust Inference Parameters**: `num_beams`, `temperature`, `top_k`, etc.
- **Fine-tune**: If you have a custom dataset, you can attempt further training with the Hugging Face [Trainer API](https://github.com/huggingface/transformers).
- **Evaluate**: Integrate the model into your workflow for summarization, QA, or general text generation.

If you run into further issues, always check:
- That your library versions are aligned.
- The repository docs on [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-R1-Zero) for any special instructions.
- That you are indeed trusting remote code for custom architectures.


# **Done!**

You now have a notebook that:
1. Installs and upgrades required libraries.
2. Demonstrates how to load DeepSeek R1-Zero with custom code.
3. Explains how to handle quantization errors (`bitsandbytes`).
4. Provides a test inference example.

Use this as a template for your own experiments or expansions!