# Qwen Distillation Lab (System 1 + System 2)

**Colab-ready notebook** to distill **Qwen2.5-7B-Instruct** into two smaller students:
- **System 1** — instruction-following (7B → 0.5B) via black-box KD on DistilQwen_100k
- **System 2** — reasoning / chain-of-thought (1.5B) via SFT-style KD on OmniThought

Uses `distill_app.py` for data prep and EasyDistill for training. **Run cells in order** (or *Run all*); **GPU runtime recommended**. Colab: open from GitHub so the repo is cloned; local: open the notebook from the repo root.

**Important:** Run the **"Project root and imports"** cell before any Prepare / Run Distillation / Test / Comparison cells.

## 0️⃣ Runtime setup

Confirm GPU and Python. In Colab: **Runtime → Change runtime type → GPU** (e.g. T4) before running.

In [1]:
!nvidia-smi
!python --version

/bin/bash: line 1: nvidia-smi: command not found
Python 3.12.12


## 1️⃣ Install dependencies

Core libs + **EasyDistill from source**. Clone EasyDistill so the `easydistill` CLI and templates are available.

### TPU setup (optional, for faster training)

If you chose **TPU** runtime (Runtime → Change runtime type → TPU), run this cell once. It installs PyTorch/XLA so teacher labeling and student training run on TPU. **Note:** EasyDistill's vllm-based teacher inference is GPU-only; on TPU we use our own path (HF generate for teacher + in-notebook TPU training).

In [2]:
import os, subprocess, sys, importlib.util
from pathlib import Path

# Detect TPU: Colab TPU runtimes pre-install torch_xla; also check legacy indicators
_has_tpu_env = bool(os.environ.get("COLAB_TPU_ADDR"))
_has_pjrt_dev = Path("/dev/accel0").exists()
_has_tpu_name = bool(os.environ.get("TPU_NAME"))
_has_xla_pkg = importlib.util.find_spec("torch_xla") is not None

IS_TPU_RUNTIME = _has_tpu_env or _has_pjrt_dev or _has_tpu_name or _has_xla_pkg

print(f"TPU detection: COLAB_TPU_ADDR={_has_tpu_env}, /dev/accel0={_has_pjrt_dev}, "
      f"TPU_NAME={_has_tpu_name}, torch_xla_installed={_has_xla_pkg}")

if IS_TPU_RUNTIME:
    print("TPU runtime detected!")
    # Colab ships matched torch + torch_xla — do NOT reinstall torch.
    # Only reinstall torch_xla if the import is broken (ABI mismatch).
    try:
        import torch_xla.core.xla_model as xm
        dev = xm.xla_device()
        print(f"torch_xla works. XLA device: {dev}")
    except Exception as e:
        print(f"torch_xla import failed: {e}")
        print("Reinstalling matched torch + torch_xla from libtpu index...")
        subprocess.run([sys.executable, "-m", "pip", "install",
                        "torch", "torch_xla[tpu]",
                        "-f", "https://storage.googleapis.com/libtpu-releases/index.html",
                        "--force-reinstall", "--no-deps"], check=False)
        # Also reinstall libtpu
        subprocess.run([sys.executable, "-m", "pip", "install", "libtpu",
                        "-f", "https://storage.googleapis.com/libtpu-releases/index.html"], check=False)
        try:
            import importlib
            import torch_xla
            importlib.reload(torch_xla)
            import torch_xla.core.xla_model as xm
            dev = xm.xla_device()
            print(f"After reinstall — torch_xla works. XLA device: {dev}")
        except Exception as e2:
            print(f"Still broken after reinstall: {e2}")
            print("You may need to restart the runtime (Runtime → Restart runtime) and re-run.")
else:
    print("No TPU detected. Using GPU path.")

TPU detection: COLAB_TPU_ADDR=False, /dev/accel0=False, TPU_NAME=False, torch_xla_installed=True
TPU runtime detected!


  dev = xm.xla_device()


torch_xla works. XLA device: xla:0


In [3]:
import importlib.util as _ilu

# On TPU runtimes, torch + torch_xla are pre-matched — do NOT upgrade torch
if _ilu.find_spec("torch_xla") is not None:
    print("TPU runtime: skipping torch install to preserve torch/torch_xla ABI match")
    %pip install -q "transformers>=4.36.0" "datasets>=2.16.0" "accelerate>=0.25.0" "sentencepiece>=0.1.99"
else:
    %pip install -q "torch>=2.1.0" "transformers>=4.36.0" "datasets>=2.16.0" "accelerate>=0.25.0" "sentencepiece>=0.1.99"

%pip install -q bitsandbytes>=0.43.0 tqdm nltk rouge-score jsonlines "trl>=0.7.0"


TPU runtime: skipping torch install to preserve torch/torch_xla ABI match


In [4]:
import os
import subprocess
import sys
from pathlib import Path

if Path("/content").exists():
    EASYDIR = Path("/content/easydistill")
else:
    EASYDIR = Path.cwd() / "easydistill"

if not EASYDIR.exists():
    subprocess.run(["git", "clone", "https://github.com/modelscope/easydistill.git", str(EASYDIR)], check=True)
r = subprocess.run([sys.executable, "-m", "pip", "install", "-e", str(EASYDIR)], capture_output=True, text=True)
if r.returncode != 0:
    print("Standard install failed:")
    print(r.stderr or r.stdout or "(no output)")
    print("Trying fallback: install with --no-deps, then requirements...")
    r2 = subprocess.run([sys.executable, "-m", "pip", "install", "-e", str(EASYDIR), "--no-deps"], capture_output=True, text=True)
    if r2.returncode != 0:
        print("Fallback --no-deps also failed:", r2.stderr or r2.stdout)
        raise SystemExit(r.returncode)
    req = EASYDIR / "requirements.txt"
    if req.exists():
        subprocess.run([sys.executable, "-m", "pip", "install", "-r", str(req)], check=False)
    subprocess.run([sys.executable, "-m", "pip", "install", "vllm"], check=False)
print("EasyDistill installed from", EASYDIR)

EasyDistill installed from /content/easydistill


### Clone repo (if needed)

If you got FileNotFoundError above (e.g. opened from Drive/upload): run the code cell below once, then re-run the Project root and imports cell. Set GITHUB_REPO to your fork if needed.

In [5]:
# Replace YOUR_USERNAME with your GitHub username
GITHUB_REPO = "https://github.com/zacharias1219/distilled-model-research.git"

import subprocess
from pathlib import Path
if Path("/content").exists() and not (Path("/content/distilled-model-research") / "distill_app.py").exists():
    subprocess.run(["git", "clone", GITHUB_REPO, "/content/distilled-model-research"], check=True)
    import os
    os.chdir("/content/distilled-model-research")
    print("Cloned. Now re-run the 'Project root and imports' cell above.")
else:
    print("Not in Colab or repo already present. If you still see FileNotFoundError, run locally from the repo root.")

Cloned. Now re-run the 'Project root and imports' cell above.


### HF token (optional)

**Colab (open from GitHub):** Add `HF_TOKEN` in Colab Secrets (key icon in the left sidebar) so Hugging Face uses it for auth and higher rate limits. Run this cell once.

**Local:** If you have a `.env` in the repo root with `HF_TOKEN=...`, it is loaded when you import `distill_app` below; no need to do anything here.

In [6]:
import os
try:
    from google.colab import userdata
    os.environ["HF_TOKEN"] = userdata.get("HF_TOKEN")
    print("HF_TOKEN set from Colab secrets.")
except Exception:
    pass  # Local or no secret: .env will be used when distill_app is imported

### Project root and imports

Ensure we're in the repo root (where `distill_app.py` lives). In Colab from GitHub, repo is usually `/content/distilled-model-research`.

In [7]:
import sys
from pathlib import Path

def _find_project_root():
    if Path("/content").exists():
        for d in Path("/content").iterdir():
            if d.is_dir() and (d / "distill_app.py").exists():
                return d
    for p in [Path.cwd()] + list(Path.cwd().parents):
        if (p / "distill_app.py").exists():
            return p
    return Path.cwd()
    

ROOT = _find_project_root()
if ROOT != Path.cwd():
    import os
    os.chdir(ROOT)
    print("Working directory:", ROOT)
sys.path.insert(0, str(ROOT))

if not (ROOT / "distill_app.py").exists():
    raise FileNotFoundError(
        "distill_app.py not found. Colab (Drive/upload): run the 'Clone repo (if needed)' cell below, then re-run this cell. "
        "Local: run this notebook from the repo root (the folder that contains distill_app.py)."
    )

from distill_app import (
    load_teacher,
    prepare_system1_dataset,
    prepare_system2_dataset,
    distill_system1,
    distill_system2,
    compare_models,
    load_student,
    infer_student,
    format_prompt,
    find_checkpoint,
    evaluate_student,
)
print("distill_app imported from", ROOT)


distill_app imported from /content/distilled-model-research


In [8]:
# Imports are in the "Project root and imports" cell above. Skip this cell.

---
## 2️⃣ System 1: Instruction-following distillation (7B → 0.5B)

Load a subset of **DistilQwen_100k**, optionally re-label with the teacher, then run black-box KD.

### Config

Increase `DATASET_SLICE_SYS1` (e.g. `train[:5000]`) or `NUM_EPOCHS_SYS1` for better quality.

In [9]:
TEACHER_MODEL_SYS1 = "Qwen/Qwen2.5-7B-Instruct"
STUDENT_MODEL_SYS1 = "Qwen/Qwen2.5-0.5B-Instruct"
DATASET_SLICE_SYS1 = "train[:1000]"
NUM_EPOCHS_SYS1 = 1

### Prepare Data & Label (Optional)

Loads DistilQwen_100k, maps to `{instruction, input, output}`. Set `RELABEL_WITH_TEACHER = True` to re-generate outputs with the teacher (slower, more VRAM).

**Note:** HF Hub may show warnings about `HF_TOKEN` / unauthenticated requests. You can ignore them; downloads still work. For higher rate limits, add `HF_TOKEN` in Colab secrets (key icon in the sidebar) and run `from huggingface_hub import login; login()`.

In [10]:
RELABEL_WITH_TEACHER = False  # Set True to re-label with teacher (requires loading teacher first)

teacher_sys1 = None
tokenizer_sys1 = None
if RELABEL_WITH_TEACHER:
    teacher_sys1, tokenizer_sys1 = load_teacher(TEACHER_MODEL_SYS1)

prepare_system1_dataset(
    slice_str=DATASET_SLICE_SYS1,
    teacher_model=teacher_sys1,
    teacher_tokenizer=tokenizer_sys1,
    relabel_with_teacher=RELABEL_WITH_TEACHER,
    out_instructions="data/train_instructions.json",
    out_labeled="data/train_labeled.json",
)

Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


README.md: 0.00B [00:00, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/124M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/100000 [00:00<?, ? examples/s]

Saved System 1 instructions to data/train_instructions.json
Saved System 1 labeled data to data/train_labeled.json


### Run Distillation

Calls EasyDistill (black-box KD). Checkpoint will be written to `./distilled-qwen2.5-0.5b` (or the path you set in config).

In [12]:
config_sys1 = {
    "teacher_model": TEACHER_MODEL_SYS1,
    "student_model": STUDENT_MODEL_SYS1,
    "labeled_path": "data/train_labeled.json",
    "num_epochs": NUM_EPOCHS_SYS1,
    "out_dir": "./distilled-qwen2.5-0.5b",
    "config_path": "configs/kd_black_box_qwen_0_5b.json",
    "template_path": None,
}

# If EasyDistill was cloned, point to its template (configs/chat_template/chat_template_kd.jinja)
if Path("/content").exists() and Path("/content/easydistill/configs/chat_template/chat_template_kd.jinja").exists():
    config_sys1["template_path"] = "/content/easydistill/configs/chat_template/chat_template_kd.jinja"
elif (Path.cwd() / "easydistill" / "configs" / "chat_template" / "chat_template_kd.jinja").exists():
    config_sys1["template_path"] = str(Path.cwd() / "easydistill" / "configs" / "chat_template" / "chat_template_kd.jinja")

path_sys1 = distill_system1(config_sys1)
if path_sys1:
    print("Final checkpoint path:", path_sys1)

TPU detected: using TPU training path (no vllm).


TypeError: SFTTrainer.__init__() got an unexpected keyword argument 'tokenizer'

In [None]:
!cat debug-5b3ceb.log

{"sessionId": "5b3ceb", "location": "distill_app.py:_is_tpu", "message": "TPU env check", "data": {"COLAB_TPU_ADDR": null, "pjrt_dev": false, "TPU_NAME": null}, "hypothesisId": "H1", "timestamp": 1772367744797}
{"sessionId": "5b3ceb", "location": "distill_app.py:_is_tpu", "message": "torch_xla NOT installed", "data": {"pjrt_dev": false}, "hypothesisId": "H3", "timestamp": 1772367744801}
{"sessionId": "5b3ceb", "location": "distill_app.py:distill_system1", "message": "path selection", "data": {"tpu_detected": false, "cuda_available": false}, "hypothesisId": "H1", "timestamp": 1772367744801}


In [None]:
!cat debug-5b3ceb.log

{"sessionId": "5b3ceb", "location": "distill_app.py:_is_tpu", "message": "TPU env check", "data": {"COLAB_TPU_ADDR": null, "pjrt_dev": false, "TPU_NAME": null}, "hypothesisId": "H1", "timestamp": 1772367744797}
{"sessionId": "5b3ceb", "location": "distill_app.py:_is_tpu", "message": "torch_xla NOT installed", "data": {"pjrt_dev": false}, "hypothesisId": "H3", "timestamp": 1772367744801}
{"sessionId": "5b3ceb", "location": "distill_app.py:distill_system1", "message": "path selection", "data": {"tpu_detected": false, "cuda_available": false}, "hypothesisId": "H1", "timestamp": 1772367744801}


In [14]:
!cat debug-5b3ceb.log

{"sessionId": "5b3ceb", "location": "distill_app.py:_is_tpu", "message": "TPU env check", "data": {"COLAB_TPU_ADDR": null, "pjrt_dev": false, "TPU_NAME": null}, "hypothesisId": "H1", "timestamp": 1772367744797}
{"sessionId": "5b3ceb", "location": "distill_app.py:_is_tpu", "message": "torch_xla NOT installed", "data": {"pjrt_dev": false}, "hypothesisId": "H3", "timestamp": 1772367744801}
{"sessionId": "5b3ceb", "location": "distill_app.py:distill_system1", "message": "path selection", "data": {"tpu_detected": false, "cuda_available": false}, "hypothesisId": "H1", "timestamp": 1772367744801}


### Test System 1 student

Load the distilled model and run a few prompts.

In [12]:
try:
    _base = ROOT
except NameError:
    _base = Path.cwd()

# Use find_checkpoint to locate the actual model files
student_path_sys1 = find_checkpoint(str(_base / "distilled-qwen2.5-0.5b"))
if not student_path_sys1 and "path_sys1" in dir() and path_sys1:
    student_path_sys1 = find_checkpoint(path_sys1)

if student_path_sys1:
    print("Loading checkpoint:", student_path_sys1)
    student_sys1, tok_sys1 = load_student(student_path_sys1)
    for p in [
        "Explain what a large language model is to a high school student.",
        "Write a Python function to check if a number is prime.",
        "Give me three use cases of knowledge distillation in deep learning.",
    ]:
        print("=" * 72)
        print("Prompt:", p)
        print("Student (System 1):", infer_student(student_sys1, tok_sys1, p, mode="system1", max_new_tokens=256))
        print()
else:
    print("Checkpoint not found at distilled-qwen2.5-0.5b/")
    print("Run System 1 distillation first.")
    # Diagnostic info
    _p = _base / "distilled-qwen2.5-0.5b"
    if _p.exists():
        print(f"Directory exists but contains: {[f.name for f in _p.iterdir()][:20]}")


Checkpoint not found at distilled-qwen2.5-0.5b/
Run System 1 distillation first.


### Evaluate System 1 student

Compute perplexity, BLEU, and ROUGE-L on a held-out sample from the training data.

In [13]:
if student_path_sys1 and 'student_sys1' in dir():
    from distill_app import read_json, evaluate_student
    # Use last 50 items from labeled data as eval set
    _eval_data = read_json("data/train_labeled.json")[-50:]
    print(f"Evaluating System 1 on {len(_eval_data)} held-out samples...")
    eval_results_sys1 = evaluate_student(
        student_sys1, tok_sys1, _eval_data, mode="system1", max_new_tokens=256, max_eval=50
    )
else:
    print("System 1 student not loaded. Run distillation and test cells first.")


System 1 student not loaded. Run distillation and test cells first.


---
## System 2 Distillation (Reasoning / CoT)

Train a CoT-capable student on OmniThought so it shows step-by-step reasoning.

### Config

In [14]:
STUDENT_MODEL_SYS2 = "Qwen/Qwen2.5-1.5B-Instruct"
DATASET_SLICE_SYS2 = "train[:2000]"
RV_MIN = 0.6
CD_MIN = 0.6
NUM_EPOCHS_SYS2 = 1

### Prepare CoT Data

Load OmniThought, filter by RV/CD if present, map to `{instruction, output=cot}` and save to `data/omnithought_cot.json`.

In [15]:
prepare_system2_dataset(
    slice_str=DATASET_SLICE_SYS2,
    rv_min=RV_MIN,
    cd_min=CD_MIN,
    out_cot="data/omnithought_cot.json",
)

Preparing System 2 dataset: collecting 2000 samples via streaming...


README.md: 0.00B [00:00, ?B/s]

Resolving data files:   0%|          | 0/135 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/135 [00:00<?, ?it/s]

Streaming OmniThought:  39%|███▉      | 779/2000 [00:09<00:08, 141.06it/s]

First OmniThought sample keys: ['question', 'reasoning']


Streaming OmniThought: 100%|█████████▉| 1999/2000 [00:09<00:00, 203.96it/s]


Collected 2000 CoT samples via streaming.
Saved System 2 CoT data to data/omnithought_cot.json


### Run CoT Distillation

Calls EasyDistill (kd_black_box_train_only). Checkpoint: `./distilled-qwen2.5-1.5b-cot`.

In [16]:
config_sys2 = {
    "student_model": STUDENT_MODEL_SYS2,
    "cot_path": "data/omnithought_cot.json",
    "num_epochs": NUM_EPOCHS_SYS2,
    "out_dir": "./distilled-qwen2.5-1.5b-cot",
    "config_path": "configs/kd_cot_qwen_1_5b.json",
}
# Use EasyDistill template from clone (same as System 1)
_tpl = Path("/content/easydistill/configs/chat_template/chat_template_kd.jinja") if Path("/content").exists() else Path.cwd() / "easydistill" / "configs" / "chat_template" / "chat_template_kd.jinja"
if _tpl.exists():
    config_sys2["template_path"] = str(_tpl)

path_sys2 = distill_system2(config_sys2)
if path_sys2:
    print("Final checkpoint path:", path_sys2)

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Wrote System 2 KD config to configs/kd_cot_qwen_1_5b.json
Running: /usr/bin/python3 -m easydistill.cli --config /content/distilled-model-research/configs/kd_cot_qwen_1_5b.json
[stderr] 2026-02-28 12:58:08,470 - INFO - Running command: accelerate launch --config_file /content/easydistill/configs/accelerate_config/muti_gpu.yaml /content/easydistill/easydistill/kd/train.py --config /content/distilled-model-research/configs/kd_cot_qwen_1_5b.json
[stderr] 2026-02-28 12:58:10,653 - INFO - Traceback (most recent call last):
[stderr] 2026-02-28 12:58:10,653 - ERROR - Detected error in output: Traceback (most recent call last):
[stderr] 2026-02-28 12:58:10,653 - INFO -   File "/usr/local/bin/accelerate", line 10, in <module>
[stderr] 2026-02-28 12:58:10,653 - INFO -     sys.exit(main())
[stderr] 2026-02-28 12:58:10,653 - INFO -              ^^^^^^
[stderr] 2026-02-28 12:58:10,653 - INFO -   File "/usr/local/lib/python3.12/dist-packages/accelerate/commands/accelerate_cli.py", line 50, in main
[s

### Test System 2 (CoT) student

Prompts include CoT instruction; responses should show step-by-step reasoning.

In [None]:
try:
    _base2 = ROOT
except NameError:
    _base2 = Path.cwd()

# Use find_checkpoint to locate the actual model files
student_path_sys2 = find_checkpoint(str(_base2 / "distilled-qwen2.5-1.5b-cot"))
if not student_path_sys2 and "path_sys2" in dir() and path_sys2:
    student_path_sys2 = find_checkpoint(path_sys2)

if student_path_sys2:
    print("Loading checkpoint:", student_path_sys2)
    student_sys2, tok_sys2 = load_student(student_path_sys2)
    for p in [
        "A train travels 120 km in 2 hours. If it continues at the same speed, how far will it travel in 5 hours?",
        "You flip a fair coin 3 times. What is the probability of getting exactly two heads?",
        "Explain the difference between overfitting and underfitting with an example.",
    ]:
        print("=" * 72)
        print("Prompt:", p)
        print("Student (System 2 CoT):", infer_student(student_sys2, tok_sys2, p, mode="system2", max_new_tokens=512))
        print()
else:
    print("Checkpoint not found at distilled-qwen2.5-1.5b-cot/")
    print("Run System 2 distillation first.")
    _p = _base2 / "distilled-qwen2.5-1.5b-cot"
    if _p.exists():
        print(f"Directory exists but contains: {[f.name for f in _p.iterdir()][:20]}")


### Evaluate System 2 (CoT) student

Compute perplexity, BLEU, and ROUGE-L on a held-out sample from the CoT data.

In [None]:
if student_path_sys2 and 'student_sys2' in dir():
    from distill_app import read_json, evaluate_student
    # Use last 50 items from CoT data as eval set
    _eval_data_cot = read_json("data/omnithought_cot.json")[-50:]
    print(f"Evaluating System 2 on {len(_eval_data_cot)} held-out samples...")
    eval_results_sys2 = evaluate_student(
        student_sys2, tok_sys2, _eval_data_cot, mode="system2", max_new_tokens=512, max_eval=50
    )
else:
    print("System 2 student not loaded. Run distillation and test cells first.")


---
## 4️⃣ Teacher vs student comparison

Side-by-side: **Prompt → Teacher | System 1 | System 2**. Missing checkpoints are skipped with a clear message.

In [17]:
COMPARE_PROMPTS = [
    "Explain what overfitting means.",
    "What is the time complexity of binary search?",
    "A train travels 120 km in 2 hours. What is its average speed?",
    "Explain the concept of knowledge distillation and why it is useful.",
]

try:
    _base = ROOT
except NameError:
    _base = Path.cwd()

_s1 = next((p for p in [
    path_sys1 if "path_sys1" in dir() and path_sys1 else None,
    str(_base / "distilled-qwen2.5-0.5b"),
] if p and Path(p).exists()), str(_base / "distilled-qwen2.5-0.5b"))

_s2 = next((p for p in [
    path_sys2 if "path_sys2" in dir() and path_sys2 else None,
    str(_base / "distilled-qwen2.5-1.5b-cot"),
] if p and Path(p).exists()), str(_base / "distilled-qwen2.5-1.5b-cot"))

compare_models(
    COMPARE_PROMPTS,
    teacher_path="Qwen/Qwen2.5-7B-Instruct",
    system1_path=_s1,
    system2_path=_s2,
)

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/663 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
None of the available devices `available_devices = None` are supported by the bitsandbytes version you have installed: `bnb_supported_devices = {'npu', 'hpu', 'xpu', 'cuda', '"cpu" (needs an Intel CPU and intel_extension_for_pytorch installed and compatible with the PyTorch version)', 'mps'}`. Please check the docs to see if the backend you intend to use is available and how to install it: https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend


Model not found at Qwen/Qwen2.5-7B-Instruct (None of the available devices `available_devices = None` are supported by the bitsandbytes version you have installed: `bnb_supported_devices = {'npu', 'hpu', 'xpu', 'cuda', '"cpu" (needs an Intel CPU and intel_extension_for_pytorch installed and compatible with the PyTorch version)', 'mps'}`. Please check the docs to see if the backend you intend to use is available and how to install it: https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend)
Model not found at /content/distilled-model-research/distilled-qwen2.5-0.5b
Model not found at /content/distilled-model-research/distilled-qwen2.5-1.5b-cot

[Prompt 1] Explain what overfitting means.
Teacher: (skipped)
System 1 student: (not loaded)
System 2 student: (not loaded)


[Prompt 2] What is the time complexity of binary search?
Teacher: (skipped)
System 1 student: (not loaded)
System 2 student: (not loaded)


[Prompt 3] A train travels 120 km in 2 hours. What is its aver

### Optional: Teacher vs System 2 CoT (side-by-side)

Compare teacher and System 2 student on reasoning prompts with CoT-style prompting. Loads teacher and student if not already in memory.

In [18]:
COT_COMPARE = [
    "A bag has 3 red balls and 2 blue balls. If you draw two without replacement, what is the probability both are red?",
    "What is the derivative of x^3 + 2x^2 - 5x + 7? Explain the steps.",
]
if Path("./distilled-qwen2.5-1.5b-cot").exists():
    try:
        _t, _tt = load_teacher("Qwen/Qwen2.5-7B-Instruct")
        _s2, _ts2 = load_student("./distilled-qwen2.5-1.5b-cot")
        for p in COT_COMPARE:
            print("#" * 72)
            print("Prompt:", p)
            print("\n[Teacher CoT]", infer_student(_t, _tt, p, mode="system2", max_new_tokens=512)[:1000])
            print("\n[Student CoT]", infer_student(_s2, _ts2, p, mode="system2", max_new_tokens=512)[:1000])
            print()
    except Exception as e:
        print("Could not load models:", e)
else:
    print("Run System 2 distillation first.")

Run System 2 distillation first.


---
## 6️⃣ Evaluation Summary

Side-by-side metrics for both distilled students.

In [None]:
# Collect results from both evaluations
_s1 = eval_results_sys1 if 'eval_results_sys1' in dir() else {}
_s2 = eval_results_sys2 if 'eval_results_sys2' in dir() else {}

print("\n" + "=" * 60)
print("DISTILLATION EVALUATION SUMMARY")
print("=" * 60)
print(f"{'Metric':<20} {'System 1 (0.5B)':>18} {'System 2 (1.5B CoT)':>20}")
print("-" * 60)
for metric in ['perplexity', 'bleu', 'rouge_l']:
    v1 = _s1.get(metric, 'N/A')
    v2 = _s2.get(metric, 'N/A')
    print(f"{metric:<20} {str(v1):>18} {str(v2):>20}")
n1 = _s1.get('num_evaluated', 0)
n2 = _s2.get('num_evaluated', 0)
print(f"{'num_evaluated':<20} {str(n1):>18} {str(n2):>20}")
print("=" * 60)


---
## 5️⃣ Scaling up

Once a small run works:
- **Data:** Increase slices (e.g. `train[:10000]` System 1, `train[:5000]` System 2).
- **Epochs:** Set `NUM_EPOCHS_SYS1` / `NUM_EPOCHS_SYS2` to 2–3.
- **Batch size:** Increase in generated configs if VRAM allows.
- **System 2 student:** Use `Qwen/Qwen2.5-0.5B-Instruct` if VRAM is tight.
- **Relabeling:** Set `RELABEL_WITH_TEACHER = True` for teacher-generated labels (slower, often better).

### Quick single-prompt inference

Uncomment and run after you have a checkpoint.

In [19]:
# model, tokenizer = load_student("./distilled-qwen2.5-0.5b")
# print(infer_student(model, tokenizer, "Explain what overfitting means.", mode="system1"))