# OpenR1 Qwen2-0.5B-math-sft

- gpu: T4*2
- model: Qwen/Qwen2-0.5B
- data: stpete2/openr1-math-part
- method: sft
- output: Qwen2-0.5B-math-sft

## Open-R1 
is an open initiative to replicate and extend the techniques behind DeepSeek-R1, a state-of-the-art reasoning model, in a fully transparent and collaborative way: 

https://github.com/huggingface/open-r1



By selecting the model, dataset, and method, and running the training command from the command line, we were able to successfully perform training using the OpenR1 environment.

Cconsidering the limitations of the notebook environment, I limited the model and data to a minimum. And the following techniques are used. 

* 1. Using LoRA (Low-Rank Adaptation)
* 2. Gradient checkpointing
* 3. Batching optimizations
* 4. BF16 mixed precision
* 5. Sequence length limit
* 6. Data packing

This setting is far from sufficient for effective training, but on the other hand, it allows us to check the operation of the method in a short time.

This minimal configuration allows for rapid validation of the training pipeline even with limited resources, and is a useful starting point before scaling up to larger experiments.

In [1]:
from kaggle_secrets import UserSecretsClient
import wandb
user_secrets = UserSecretsClient()
secret_value = user_secrets.get_secret("wandb_api_key")
wandb.login(key=secret_value)

# save metrics into wandb folder
import os
os.environ["WANDB_DIR"] = "./wandb"
wandb.init(project="250413or", mode="online")

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33mstpeteishii[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Tracking run with wandb version 0.19.1
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/kaggle/working/wandb/run-20250414_014044-dbeve63o[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33mfearless-lion-7[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/stpeteishii/250413or[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/stpeteishii/250413or/runs/dbeve63o[0m


In [2]:
!git clone https://github.com/huggingface/open-r1.git
!pip install -e ./open-r1
!pip show open-r1

Cloning into 'open-r1'...
remote: Enumerating objects: 2985, done.[K
remote: Counting objects: 100% (300/300), done.[K
remote: Compressing objects: 100% (126/126), done.[K
remote: Total 2985 (delta 266), reused 174 (delta 174), pack-reused 2685 (from 3)[K
Receiving objects: 100% (2985/2985), 1.25 MiB | 9.31 MiB/s, done.
Resolving deltas: 100% (1689/1689), done.
Obtaining file:///kaggle/working/open-r1
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting trl@ git+https://github.com/huggingface/trl.git@d625c5533a6b1c84d3565c8080857f6bb81c538a (from open-r1==0.1.0.dev0)
  Cloning https://github.com/huggingface/trl.git (to revision d625c5533a6b1c84d3565c8080857f6bb81c538a) to /tmp/pip-install-sx0zphp5/trl_6946923d46ba4341a0e5c5e4e5c4d2dd
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/trl.git /tmp/pip-install-sx0zphp5/trl_6946923d46ba4341a0e5c5e4e5c4d2dd
  Running command git rev-parse -q --verify 'sha^d625c5533a6b1c84d3565c8

In [3]:
import os
os.chdir('./open-r1')

In [4]:
!ls

assets	 logs	   README.md  scripts	 setup.py  src
LICENSE  Makefile  recipes    setup.cfg  slurm	   tests


In [5]:
from pathlib import Path

config_content = """
compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
  gradient_clipping: 1.0
  zero3_init_flag: true
  zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
"""

config_path = "custom_config.yaml"
Path(config_path).write_text(config_content)

!accelerate launch --config_file custom_config.yaml src/open_r1/sft.py \
    --model_name_or_path Qwen/Qwen2-0.5B \
    --dataset_name stpete2/openr1-math-part \
    --learning_rate 1.0e-5 \
    --num_train_epochs 1 \
    --packing \
    --max_seq_length 1024 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --gradient_checkpointing \
    --bf16 \
    --use_peft \
    --lora_alpha 16 \
    --lora_dropout 0.1 \
    --lora_r 8 \
    --output_dir data/Qwen2-0.5B-math-sft

[2025-04-14 01:41:59,454] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
df: /root/.triton/autotune: No such file or directory
2025-04-14 01:42:04.182740: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-04-14 01:42:04.434533: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-04-14 01:42:04.502407: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0414 01:42:15.653000 262 torch/distributed/run.py:793] 
W0414 01:42:15.653000 262 torch/distributed/run.py:793] *****************************************
W0414 01:42:15.653000 262 tor