# Quick Start

This is a quick start guide for starting training the SSA RL version. Because of the VRAM limitation, we use the Qwen2.5-0.5B model. It would still require 40GB A100 on colab. In addition, for the demo purpose, we set the data shuffle to False so it can just run some steps for GSM8K, which is much shorter than MATH solutions. Please set it back to True during actual training! And in that case you might need a GPU with more VRAM (probbaly 48GB+). Because the MATH solutions are longer, it increases the VRAM usage that 40GB A100 will have OOM.

## Install dependencies

### Step 1: Install dependencies
It will restart the runtime after the installtion on the colab. So please run the step 2 after the installation is done.

In [2]:
try:
    import google.colab
    IN_COLAB = True
    print(f"Running in Google Colab: {IN_COLAB}")
    print("Installing repo")
    !git clone https://github.com/user074/ssa.git
    %cd ssa

    # Read and clean the requirements file
    with open('requirements_colab.txt', 'r') as f:
        lines = f.readlines()

    # Filter out conda-specific paths and keep only standard package specs
    clean_lines = []
    for line in lines:
        line = line.strip()
        if not line.startswith('#') and not '/home/conda/' in line and not 'file://' in line:
            # Extract just the package name and version if it's a standard format
            if '==' in line:
                clean_lines.append(line.split()[0])  # Take just the package==version part

    # Create a new clean requirements file
    with open('requirements_clean.txt', 'w') as f:
        f.write('\n'.join(clean_lines))

    # Now install from the clean file
    !pip install -r requirements_clean.txt

except:
    IN_COLAB = False
    print(f"Running in Google Colab: {IN_COLAB}")
    !conda env create -f environment.yml
    !conda activate SSA


Running in Google Colab: True
Installing repo
fatal: destination path 'ssa' already exists and is not an empty directory.
/content/ssa
Collecting absl-py==2.1.0 (from -r requirements_clean.txt (line 1))
  Downloading absl_py-2.1.0-py3-none-any.whl.metadata (2.3 kB)
Collecting accelerate==1.4.0 (from -r requirements_clean.txt (line 2))
  Downloading accelerate-1.4.0-py3-none-any.whl.metadata (19 kB)
Collecting aiohappyeyeballs==2.6.0 (from -r requirements_clean.txt (line 3))
  Downloading aiohappyeyeballs-2.6.0-py3-none-any.whl.metadata (5.9 kB)
Collecting airportsdata==20250224 (from -r requirements_clean.txt (line 6))
  Downloading airportsdata-20250224-py3-none-any.whl.metadata (9.0 kB)
Collecting antlr4-python3-runtime==4.9.3 (from -r requirements_clean.txt (line 8))
  Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m117.0/117.0 kB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py)

### Step 2: Continue the installation
It will restart the runtime. Then run the following code:

In [1]:
try:
    import google.colab
    %cd ssa
except:
    pass
!git clone https://github.com/openai/prm800k
%cd torchtune
!pip install -e .
%cd ..

/content/ssa
fatal: destination path 'prm800k' already exists and is not an empty directory.
/content/ssa/torchtune
Obtaining file:///content/ssa/torchtune
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: torchtune
  Building editable for torchtune (pyproject.toml) ... [?25l[?25hdone
  Created wheel for torchtune: filename=torchtune-0.0.0-0.editable-py3-none-any.whl size=12313 sha256=529aec1c419dc40163326fd3a8392477025b98cd627f4fc2edae9f535dde4411
  Stored in directory: /tmp/pip-ephem-wheel-cache-j85bpiwk/wheels/b3/5c/62/9d1f60c2689fadf56e4ad76d51631d9e9837e64164dd3b6f3a
Successfully built torchtune
Installing collected packages: torchtune
  Attempting uninstall: torchtune
    Found existing installation: torchtune 0.0.0
    Uninstalling tor

## Download the model

For the demo purpose, we use the Qwen2.5-0.5B model. You can download it from huggingface.

In [2]:
!pip install "huggingface_hub[hf_transfer]"
!HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download Qwen/Qwen2.5-0.5B --local-dir model/Qwen2.5-0.5B

Downloading '.gitattributes' to 'model/Qwen2.5-0.5B/.cache/huggingface/download/wPaCkH-WbT7GsmxMKKrNZTV4nSM=.a6344aac8c09253b3b630fb776ae94478aa0275b.incomplete'
.gitattributes: 100% 1.52k/1.52k [00:00<00:00, 11.7MB/s]
Download complete. Moving file to model/Qwen2.5-0.5B/.gitattributes
Downloading 'LICENSE' to 'model/Qwen2.5-0.5B/.cache/huggingface/download/DhCjcNQuMpl4FL346qr3tvNUCgY=.6634c8cc3133b3848ec74b9f275acaaa1ea618ab.incomplete'
LICENSE: 100% 11.3k/11.3k [00:00<00:00, 48.7MB/s]
Download complete. Moving file to model/Qwen2.5-0.5B/LICENSE
Downloading 'README.md' to 'model/Qwen2.5-0.5B/.cache/huggingface/download/Xn7B-BWUGOee2Y6hCZtEhtFu4BE=.2d22b523f8968e42fc6b805e656133fa83f771ab.incomplete'
README.md: 100% 3.85k/3.85k [00:00<00:00, 28.0MB/s]
Download complete. Moving file to model/Qwen2.5-0.5B/README.md
Downloading 'config.json' to 'model/Qwen2.5-0.5B/.cache/huggingface/download/8_PA_wEVGiVa2goH2H4KQOQpvVY=.141506237b9022c4ab3e01734cfafb9310b57d2d.incomplete'
config.json: 100

## Train

For the dataset, we will use our existing cleaned dataset from huggingface, which is `user074/concat_cleaned_gsm8k_math_5`. The dataset is already cleaned and ready to use. We prepared a config file for the training.

In [4]:
#First login to wandb
import wandb
wandb.login()

True

Here is an example of the demo results from wandb log. We can see the success rate goes up very quickly even with only 200 steps.

![alt text](figures/demo-results.png "Demo 200 steps")


Start training now! You can see the log of each step below.

In [5]:
!tune run --nproc_per_node 1 dev/grpo_full_finetune_distributed --config ./05B_rl_SSA_qwen.yaml

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
To convert the driving time from minutes to hours:
\[ 240 \text{ minutes} \div 60 \text{ minutes per hour} = 4 \text{ hours} \]

Additionally, Manex stays 2 hours at the destination. Therefore, the total time for the entire tour is the sum of the driving time and the time spent at the destination:
\[ 4 \text{ hours} + 2 \text{ hours} = 6 \text{ hours} \]

Thus, the total time for the entire tour is:
#### 6 hours.

Answer 4:
First, let's calculate the total distance Manex will travel. The trip to the destination is 55 miles, and the return trip is 10 miles farther, which means it is 55 + 10 = 65 miles.

Next, we add the distance to the destination and the return trip:
\[ 55 \text{ miles} + 65 \text{ miles} = 120 \text{ miles} \]

Since Manex drives 1 mile in 2 minutes, we can calculate the total driving time by multiplying the total distance by the time per mile:
\[ 120 \text{ miles} \times 2 \text{ minutes/mile} = 240 \te