## OpenR1 Qwen2.5-0.5B-gsm8k-sft 
* num_train_epochs: 10


https://github.com/lzhxmu/CPPO.git


- gpu: T4*2
- model: Qwen/Qwen2.5-0.5B
- data: stpete2/openai-gsm8k-part
- method: sft
- output: Qwen2.5-0.5B-gsm8k-sft

###### unique setting for cppo in custom_config2.yaml
- metric: smallest
- pruning: 0.5 
- allocation: true
###### unique setting for drgrpo in custom_config2.yaml
- scale_rewards: false

## Open-R1 
is an open initiative to replicate and extend the techniques behind DeepSeek-R1, a state-of-the-art reasoning model, in a fully transparent and collaborative way: 

https://github.com/huggingface/open-r1



By selecting the model, dataset, and method, and running the training command from the command line, we were able to successfully perform training using the OpenR1 environment.

Cconsidering the limitations of the notebook environment, I limited the model and data to a minimum. And the following techniques are used. 

* 1. Using LoRA (Low-Rank Adaptation)
* 2. Gradient checkpointing
* 3. Batching optimizations
* 4. BF16 mixed precision
* 5. Sequence length limit
* 6. Data packing

This setting is far from sufficient for effective training, but on the other hand, it allows us to check the operation of the method in a short time.

This minimal configuration allows for rapid validation of the training pipeline even with limited resources, and is a useful starting point before scaling up to larger experiments.

In [1]:
from kaggle_secrets import UserSecretsClient
import wandb
user_secrets = UserSecretsClient()
secret_value = user_secrets.get_secret("wandb_api_key")
wandb.login(key=secret_value)

# save metrics into wandb folder
import os
os.environ["WANDB_DIR"] = "./wandb"
wandb.init(project="250424sft", mode="online")

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33mstpeteishii[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Tracking run with wandb version 0.19.1
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/kaggle/working/wandb/run-20250503_115322-vle4dhhu[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33mdandy-fire-12[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/stpeteishii/250424sft[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/stpeteishii/250424sft/runs/vle4dhhu[0m


In [2]:
!git clone https://github.com/huggingface/open-r1.git
!pip install -e ./open-r1
!pip show open-r1

Cloning into 'open-r1'...
remote: Enumerating objects: 3297, done.[K
remote: Counting objects: 100% (1343/1343), done.[K
remote: Compressing objects: 100% (302/302), done.[K
remote: Total 3297 (delta 1239), reused 1049 (delta 1034), pack-reused 1954 (from 3)[K
Receiving objects: 100% (3297/3297), 1.27 MiB | 11.29 MiB/s, done.
Resolving deltas: 100% (1926/1926), done.
Obtaining file:///kaggle/working/open-r1
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting transformers@ git+https://github.com/huggingface/transformers.git@acdbe627e323dbc822f21499fead789b439cf45b (from open-r1==0.1.0.dev0)
  Cloning https://github.com/huggingface/transformers.git (to revision acdbe627e323dbc822f21499fead789b439cf45b) to /tmp/pip-install-bh1kki5r/transformers_2b07aae2853647c3961a853b8b06044b
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers.git /tmp/pip-install-bh1kki5r/transformers_2b07aae2853647c3961a853b8b06044b
  Running co

In [3]:
import os
os.chdir('./open-r1')

In [4]:
!ls

assets	 logs	   README.md  scripts	 setup.py  src
LICENSE  Makefile  recipes    setup.cfg  slurm	   tests


In [5]:
#!pip install flash-attn --no-build-isolation
# attn_implementation: flash_attention_2  NOT BUT: eager

In [6]:
from pathlib import Path


config_content = """
compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
  gradient_clipping: 1.0
  zero3_init_flag: true
  zero_stage: 1
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
"""

config_path = "custom_config.yaml"
Path(config_path).write_text(config_content)


#################################


config_content2 = """
# Model arguments
model_name_or_path: Qwen/Qwen2.5-0.5B
model_revision: main
torch_dtype: bfloat16
attn_implementation: eager

# Training arguments
dataset_name: stpete2/openai-gsm8k-part
learning_rate: 2.0e-06
dataset_text_field: question

num_train_epochs: 10

packing: true
max_seq_length: 1024
per_device_train_batch_size: 4
gradient_accumulation_steps: 4
gradient_checkpointing: true
bf16: true
use_peft: true

# LoRA configuration
lora_alpha: 16
lora_dropout: 0.1
lora_r: 8

# Output
output_dir: data/Qwen2.5-0.5B-gsm8k-sft
"""

config_path2 = "custom_config2.yaml"
Path(config_path2).write_text(config_content2)


##########################################################


!accelerate launch --config_file custom_config.yaml src/open_r1/sft.py \
--config custom_config2.yaml \
--disable_tqdm=False

[2025-05-03 11:58:36,472] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2025-05-03 11:58:40.999493: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-05-03 11:58:41.281089: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-05-03 11:58:41.346667: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0503 11:58:52.327000 389 torch/distributed/run.py:792] 
W0503 11:58:52.327000 389 torch/distributed/run.py:792] *****************************************
W0503 11:58:52.327000 389 torch/distributed/run.py:792] Setting OMP_NUM_THREADS envi