# SVAMP Math Reasoning with GRPO Fine-tuning

This notebook fine-tunes **Gemma 3 1B-IT** on the SVAMP dataset using **Group Relative Policy Optimization (GRPO)**, a reinforcement learning technique that teaches the model to solve elementary arithmetic word problems with structured reasoning.

**Pipeline Overview**:
1. Load SVAMP dataset (1000 arithmetic word problems)
2. Initialize Gemma 3 1B with LoRA adapters
3. Define multi-component reward function
4. Train using GRPO on TPU v5e
5. Evaluate improvement on test set

**Key Innovation**: GRPO generates multiple responses per question and learns from relative quality comparisons, avoiding the need for human preference labels.

In [1]:
import os
os.environ["HF_HUB_DISABLE_XET"] = "1"

## üì¶ Environment Setup

Installing dependencies for TPU-accelerated training with the Tunix framework (Google's RL toolkit for LLM fine-tuning).

**Key Libraries**:
- `google-tunix`: GRPO implementation and model utilities
- `flax==0.12.0`: Neural network framework compatible with JAX
- `datasets`: For loading SVAMP from Hugging Face
- `qwix`: LoRA (Low-Rank Adaptation) utilities

In [2]:
# Fix fsspec conflict first (this is blocking datasets)
!pip install -q "fsspec==2023.10.0"

# Install tunix and let it handle dependencies
!pip install -q --force-reinstall "google-tunix[prod]==0.1.3"

# Fix remaining conflicts
!pip install -q --upgrade "numpy>=2.0,<3.0"
!pip install -q --force-reinstall --no-deps pyarrow==16.1.0

# Install other required packages
!pip install -q kagglehub 
!pip install -q ipywidgets 
!pip install -q tensorflow 
!pip install -q tensorflow_datasets
!pip install -q tensorboardX
!pip install -q wandb

print("Installation complete!")

# Verify tunix is installed
import sys
import importlib.util

tunix_spec = importlib.util.find_spec("tunix")
if tunix_spec is None:
    print("ERROR: tunix is not installed!")
    print("Attempting to reinstall...")
    import subprocess
    subprocess.run([sys.executable, "-m", "pip", "install", "--force-reinstall", "google-tunix[prod]==0.1.3"], check=True)
else:
    print("‚úì tunix is installed at:", tunix_spec.origin)

# Now import everything
print("\nImporting libraries...")

import functools
import gc
import os
from pprint import pprint
import re
import csv
import shutil

# Import JAX and related libraries
import jax
import jax.numpy as jnp
print("‚úì JAX imported")

# Import Flax
from flax import nnx
print("‚úì Flax imported")

# Import other core libraries
import grain
import humanize
import kagglehub
import optax
from orbax import checkpoint as ocp
from pathlib import Path
import qwix
import tensorflow_datasets as tfds
from tqdm.auto import tqdm
print("‚úì Core libraries imported")

# Import tunix modules - doing this step by step to see where it fails
try:
    import tunix
    print("‚úì tunix base module imported")
    
    from tunix.generate import sampler as sampler_lib
    print("‚úì tunix.generate.sampler imported")
    
    from tunix.generate import tokenizer_adapter as tokenizer_lib
    print("‚úì tunix.generate.tokenizer_adapter imported")
    
    from tunix.models.gemma3 import params
    print("‚úì tunix.models.gemma3.params imported")
    
    from tunix.models.gemma3 import model
    print("‚úì tunix.models.gemma3.model imported")
    
    from tunix.rl import rl_cluster as rl_cluster_lib
    print("‚úì tunix.rl.rl_cluster imported")
    
    from tunix.rl.grpo.grpo_learner import GRPOConfig, GRPOLearner
    print("‚úì tunix.rl.grpo.grpo_learner imported")
    
    from tunix.rl.rollout import base_rollout
    print("‚úì tunix.rl.rollout imported")
    
    from tunix.sft import metrics_logger
    print("‚úì tunix.sft.metrics_logger imported")
    
except ImportError as e:
    print(f"ERROR importing tunix module: {e}")
    import traceback
    traceback.print_exc()

# Import datasets last
from datasets import load_dataset
print("‚úì datasets imported")

print("\n" + "="*50)
print("ALL IMPORTS SUCCESSFUL!")
print("="*50)

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
numba 0.63.1 requires numpy<2.4,>=1.22, but you have numpy 2.4.1 which is incompatible.[0m[31m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: 



‚úì JAX imported
‚úì Flax imported
‚úì Core libraries imported
‚úì tunix base module imported
‚úì tunix.generate.sampler imported
‚úì tunix.generate.tokenizer_adapter imported
‚úì tunix.models.gemma3.params imported
‚úì tunix.models.gemma3.model imported
‚úì tunix.rl.rl_cluster imported
‚úì tunix.rl.grpo.grpo_learner imported
‚úì tunix.rl.rollout imported
‚úì tunix.sft.metrics_logger imported
‚úì datasets imported

ALL IMPORTS SUCCESSFUL!


In [None]:
'''# Install core libraries
!pip install -q kagglehub 
!pip install -q ipywidgets 
!pip install -q tensorflow 
!pip install -q tensorflow_datasets
!pip install -q tensorboardX
!pip install -q grain

# Install the Google Tunix framework (the "teacher's toolkit")
!pip install "google-tunix[prod]==0.1.3"

# Reinstall Flax to ensure compatibility
!pip uninstall -q -y flax
!pip install -q flax==0.12.0

# Fix huggingface-hub version conflict
!pip install -q --force-reinstall "huggingface-hub>=0.34.0,<1.0"

# Fix transformers to a compatible version
!pip install -q --force-reinstall transformers

# Fix datasets and pyarrow compatibility
!pip install -q --force-reinstall pyarrow==15.0.0
!pip install -q --force-reinstall datasets==2.19.0

# Install wandb for experiment tracking
!pip install -q wandb'''


In [3]:
# Set up Weights & Biases (for tracking training)
import wandb, os
from kaggle_secrets import UserSecretsClient
os.environ['WANDB_API_KEY'] = UserSecretsClient().get_secret("WANDB_API_KEY")
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("WANDB_API_KEY")

## üîß Import Core Libraries

Setting up JAX for TPU computation, Flax for neural networks, and Tunix for GRPO training infrastructure.

In [None]:
'''# Import all necessary libraries
import functools
import gc
import os
from pprint import pprint
import re
import csv
import shutil
from flax import nnx
import grain
import humanize
import jax
import jax.numpy as jnp
import kagglehub
import optax
from orbax import checkpoint as ocp
from pathlib import Path
import qwix
import tensorflow_datasets as tfds
from tqdm.auto import tqdm
from tunix.generate import sampler as sampler_lib
from tunix.generate import tokenizer_adapter as tokenizer_lib
from tunix.models.gemma3 import params
from tunix.models.gemma3 import model
from tunix.rl import rl_cluster as rl_cluster_lib
from tunix.rl.grpo.grpo_learner import GRPOConfig, GRPOLearner
from tunix.rl.rollout import base_rollout
from tunix.sft import metrics_logger
from datasets import load_dataset

print("All imports successful!")
'''

In [None]:
'''# First, let's install everything tunix needs, allowing it to resolve dependencies
!pip install -q "google-tunix[prod]==0.1.3"

# Now fix the specific conflicts that arose
!pip install -q --upgrade "numpy>=2.0,<3.0"
!pip install -q --force-reinstall --no-deps pyarrow==16.1.0
!pip install -q --force-reinstall flax==0.12.0

# Reinstall other required packages
!pip install -q kagglehub 
!pip install -q ipywidgets 
!pip install -q tensorflow 
!pip install -q tensorflow_datasets
!pip install -q tensorboardX
!pip install -q grain
!pip install -q wandb

print("Installation complete! Now importing...")

# Import all necessary libraries
import functools
import gc
import os
from pprint import pprint
import re
import csv
import shutil

# Import JAX and related libraries
import jax
import jax.numpy as jnp

# Import other core libraries
import grain
import humanize
import kagglehub
import optax
from orbax import checkpoint as ocp
from pathlib import Path
import qwix
import tensorflow_datasets as tfds
from tqdm.auto import tqdm

# Import Flax
from flax import nnx

# Import tunix modules
from tunix.generate import sampler as sampler_lib
from tunix.generate import tokenizer_adapter as tokenizer_lib
from tunix.models.gemma3 import params
from tunix.models.gemma3 import model
from tunix.rl import rl_cluster as rl_cluster_lib
from tunix.rl.grpo.grpo_learner import GRPOConfig, GRPOLearner
from tunix.rl.rollout import base_rollout
from tunix.sft import metrics_logger

# Import datasets last
from datasets import load_dataset

print("‚úì All imports successful!")'''

## ‚öôÔ∏è Hyperparameter Configuration

Comprehensive training configuration organized by category.

### üìä Data Parameters
- **TRAIN_FRACTION**: Using 100% of training data
- **MESH**: Distributed training layout (1√ó4 = FSDP √ó Tensor Parallel)

### üéØ LoRA Parameters
- **RANK=64, ALPHA=64**: Controls adapter capacity
- Enables training <1% of parameters while maintaining performance

### üé≤ GRPO Algorithm
- **NUM_GENERATIONS=4**: Generate 4 responses per question for comparison
- **BETA=0.04**: KL divergence penalty (prevents drift from base model)
- **EPSILON=0.2**: PPO clipping parameter for stable updates
- **TOTAL_GENERATION_STEPS=384**: Allows detailed reasoning chains

### üìà Optimization
- **LEARNING_RATE=3e-6**: Conservative for RL stability
- **WARMUP_STEPS=170**: Gradual learning rate ramp-up
- **MAX_GRAD_NORM=0.1**: Gradient clipping for training stability
- **NUM_BATCHES=1000**: Total training iterations

### üíæ Infrastructure
- **Checkpointing**: Save every 500 steps, keep 4 most recent
- **Evaluation**: Test model every 10 steps

In [4]:
# ====== Data ======
TRAIN_DATA_DIR = "./data/train"
TEST_DATA_DIR = "./data/test"
TRAIN_FRACTION = 1.0

# ====== LoRA ======
RANK = 64
ALPHA = 64.0

# ====== Sharding ======
MESH = [(1, 4), ("fsdp", "tp")]

# ====== GRPO ======
MAX_PROMPT_LENGTH = 256
TOTAL_GENERATION_STEPS = 384  # Increased for longer math solutions
TEMPERATURE = 0.9
TOP_P = 1.0
TOP_K = 50
NUM_GENERATIONS = 4

# ====== Training ======
NUM_ITERATIONS = 1
BETA = 0.04
EPSILON = 0.2

TRAIN_MICRO_BATCH_SIZE = 2
NUM_BATCHES = 1000  
NUM_TEST_BATCHES = 100
EVAL_EVERY_N_STEPS = 10
NUM_EPOCHS = 2

MAX_STEPS = int(NUM_BATCHES * NUM_ITERATIONS * TRAIN_FRACTION * NUM_EPOCHS)

# ====== Optimizer ======
LEARNING_RATE = 3e-6
B1 = 0.9
B2 = 0.99
WEIGHT_DECAY = 0.1
WARMUP_STEPS = 0.1 * MAX_STEPS
MAX_GRAD_NORM = 0.1

# ====== Checkpointing ======
INTERMEDIATE_CKPT_DIR = "/tmp/content/intermediate_ckpt/"
CKPT_DIR = "/tmp/content/ckpts/"
SAVE_INTERVAL_STEPS = 500
MAX_TO_KEEP = 4

# ====== Inference ======
GENERATION_CONFIGS = {
    "greedy": {"temperature": 1e-4, "top_k": 1, "top_p": 1.0},
    "standard": {"temperature": 0.7, "top_k": 50, "top_p": 0.95},
    "liberal": {"temperature": 0.85, "top_k": 2000, "top_p": 1.0},
}

print(f"Total training steps: {MAX_STEPS}")

Total training steps: 2000


In [5]:
def show_hbm_usage():
    """Displays memory usage per device."""
    fmt_size = functools.partial(humanize.naturalsize, binary=True)
    
    for d in jax.local_devices():
        stats = d.memory_stats()
        used = stats["bytes_in_use"]
        limit = stats["bytes_limit"]
        print(f"Using {fmt_size(used)} / {fmt_size(limit)} ({used/limit:%}) on {d}")

## üéØ Prompt Engineering for SVAMP

Designing the system prompt that teaches the model our desired output format.

**Format Requirements**:
1. `<reasoning>` tags: Step-by-step mathematical thinking
2. `<answer>` tags: Final numerical answer only

This structured format enables precise reward calculation and ensures interpretable solutions.

In [6]:
reasoning_start = "<reasoning>"
reasoning_end = "</reasoning>"
solution_start = "<answer>"
solution_end = "</answer>"

# UPDATED: System prompt for SVAMP math word problems
SYSTEM_PROMPT = f"""You are a mathematical reasoning expert specializing in solving arithmetic word problems.
Your goal is to solve problems by breaking them down into logical steps.

You must strictly follow this format:
1. Start with {reasoning_start}.
2. Write out your step-by-step solution with clear mathematical reasoning.
3. Show all calculations and explain each step.
4. End reasoning with {reasoning_end}.
5. Provide the final numerical answer between {solution_start} and {solution_end}.

Example:
User: Rachel had to complete 5 pages of math homework. If she had to complete 4 more pages of reading homework than math homework, how many pages did she have to complete in all?

Model:
{reasoning_start}
Step 1: Rachel has 5 pages of math homework.
Step 2: She has 4 more pages of reading than math, so reading = 5 + 4 = 9 pages.
Step 3: Total pages = math + reading = 5 + 9 = 14 pages.
{reasoning_end}
{solution_start}14{solution_end}

Now solve the problem below using this exact format."""

TEMPLATE = """<start_of_turn>user
{system_prompt}

{question}<end_of_turn>
<start_of_turn>model"""

print("‚úÖ System prompt configured for SVAMP dataset")

‚úÖ System prompt configured for SVAMP dataset


## üìö Dataset Loading and Preprocessing

SVAMP (Simple Variations on Arithmetic Math word Problems) contains 1000 elementary math word problems requiring 1-2 arithmetic operations.

**Preprocessing Steps**:
1. Combine `Body` and `Question` fields
2. Extract numerical `Answer`
3. Format with system prompt template
4. **Curriculum Learning**: Sort by equation complexity (easy‚Üíhard) to improve training efficiency

In [7]:
def extract_svamp_answer(text: str) -> str | None:
    """Extract the numerical answer from SVAMP response."""
    # Look for answer between tags
    match = re.search(rf'{solution_start}\s*([\d.]+)\s*{solution_end}', text)
    if match:
        return match.group(1).strip()
    
    # Fallback: look for any number after "answer" keyword
    match = re.search(r'answer.*?([\d.]+)', text, re.IGNORECASE)
    if match:
        return match.group(1).strip()
    
    return None

def get_svamp_dataset(split="train"):
    """Load and preprocess SVAMP dataset."""
    print(f"Loading SVAMP dataset split: {split}")
    
    # Load from Hugging Face
    dataset = load_dataset("ChilleD/SVAMP", split="train")
    
    # Split into train/test (80-20)
    if split == "train":
        dataset = dataset.select(range(int(len(dataset) * 0.8)))
    else:  # test
        dataset = dataset.select(range(int(len(dataset) * 0.8), len(dataset)))
    
    def preprocess(example):
        # Combine body and question
        question = f"{example['Body']} {example['Question']}"
        
        # Get answer (convert to string)
        answer = str(example['Answer'])
        
        return {
            "prompts": TEMPLATE.format(
                system_prompt=SYSTEM_PROMPT,
                question=question
            ),
            "question": question,
            "answer": answer,
            "equation": example.get('Equation', ''),
            "type": example.get('Type', '')
        }
    
    # Convert to grain dataset
    data = [preprocess(item) for item in dataset]
    
    # Sort by equation complexity for curriculum learning (easy to hard)
    data.sort(key=lambda x: len(str(x.get("equation", ""))))
    print(f"‚úÖ Data sorted by complexity for Curriculum Learning")
    
    grain_dataset = grain.MapDataset.source(data)
    
    return grain_dataset

def get_dataset(data_dir, split="train", source="huggingface"):
    """Wrapper function to maintain compatibility."""
    return get_svamp_dataset(split=split)

### Loading and Splitting Data

Creating 80/20 train/test split with batching for efficient TPU processing.

In [8]:
# Load SVAMP datasets
print("Loading SVAMP training data...")
dataset = get_svamp_dataset("train").batch(TRAIN_MICRO_BATCH_SIZE)[:NUM_BATCHES]

if TRAIN_FRACTION == 1.0:
    train_dataset = dataset.repeat(NUM_EPOCHS)
    val_dataset = None
else:
    train_dataset = dataset[:int(len(dataset) * TRAIN_FRACTION)]
    train_dataset = train_dataset.repeat(NUM_EPOCHS)
    val_dataset = dataset[int(len(dataset) * TRAIN_FRACTION):].repeat(NUM_EPOCHS)

print("Loading SVAMP test data...")
test_dataset = get_svamp_dataset("test").batch(TRAIN_MICRO_BATCH_SIZE)[:NUM_TEST_BATCHES]

dataset_lengths = (
    len(train_dataset),
    len(val_dataset) if val_dataset is not None else 0,
    len(test_dataset),
)
print(f"‚úÖ Dataset contains {dataset_lengths} batches (train, val, test)")

# Show a sample
print("\nüìã Sample from training data:")
for ele in train_dataset[:1]:
    pprint(ele)

Loading SVAMP training data...
Loading SVAMP dataset split: train


README.md:   0%|          | 0.00/675 [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/111k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/54.8k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/700 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/300 [00:00<?, ? examples/s]

‚úÖ Data sorted by complexity for Curriculum Learning
Loading SVAMP test data...
Loading SVAMP dataset split: test
‚úÖ Data sorted by complexity for Curriculum Learning
‚úÖ Dataset contains (560, 0, 70) batches (train, val, test)

üìã Sample from training data:
{'answer': array(['8', '3'], dtype='<U1'),
 'equation': array(['8.0', '( 6.0 / 2.0 )'], dtype='<U13'),
 'prompts': array(['<start_of_turn>user\nYou are a mathematical reasoning expert specializing in solving arithmetic word problems.\nYour goal is to solve problems by breaking them down into logical steps.\n\nYou must strictly follow this format:\n1. Start with <reasoning>.\n2. Write out your step-by-step solution with clear mathematical reasoning.\n3. Show all calculations and explain each step.\n4. End reasoning with </reasoning>.\n5. Provide the final numerical answer between <answer> and </answer>.\n\nExample:\nUser: Rachel had to complete 5 pages of math homework. If she had to complete 4 more pages of reading homework tha

## üîë Kaggle Authentication

Configuring Kaggle credentials for model downloads and artifact management.

In [9]:
import os
import json

# 1. DEFINE YOUR CREDENTIALS
# (Make sure there are no spaces around the strings!)
kaggle_username = "rachitha02"
kaggle_key = "Rachitha CB"

# 2. SETUP THE DIRECTORY
# This creates the hidden .kaggle folder if it doesn't exist
kaggle_dir = os.path.expanduser("~/.kaggle")
os.makedirs(kaggle_dir, exist_ok=True)

# 3. FORCE WRITE THE FILE
# This overwrites any broken file that might be causing the 401
json_path = os.path.join(kaggle_dir, "kaggle.json")
with open(json_path, "w") as f:
    json.dump({"username": kaggle_username, "key": kaggle_key}, f)

# 4. SET PERMISSIONS (Linux/Mac requirement for safety)
os.chmod(json_path, 0o600)

print("‚úÖ Credentials successfully written to", json_path)

‚úÖ Credentials successfully written to /root/.kaggle/kaggle.json


## ü§ñ Base Model Initialization

Loading **Gemma 3 1B-IT** (instruction-tuned variant) as our starting point.

**Process**:
1. Load pre-trained model from Kaggle
2. Save to intermediate checkpoint
3. Free memory for training setup

The instruction-tuned variant already understands chat formatting and following instructions, making it ideal for our structured reasoning task.

In [10]:
# Clean up checkpoint directories
!rm /tmp/content/intermediate_ckpt/* -rf
!rm /tmp/content/ckpts/* -rf

# Load Gemma 3 base model
import gc
from orbax import checkpoint as ocp
from flax import nnx
from tunix.models.gemma3 import params
from tunix.models.gemma3 import model

print("üîÑ Loading Gemma 3 1B-IT base model...")

model_family = "gemma3"
if model_family == "gemma3":
    MODEL_CP_PATH = params.GEMMA3_1B_IT
    config = model.ModelConfig.gemma3_1b()
    gemma = params.create_model_from_checkpoint(MODEL_CP_PATH, config)
    tokenizer = params.create_tokenizer()
    
    # Save intermediate checkpoint
    checkpointer = ocp.StandardCheckpointer()
    _, state = nnx.split(gemma)
    checkpointer.save(os.path.join(INTERMEDIATE_CKPT_DIR, "state"), state)
    checkpointer.wait_until_finished()
    
    print("‚úÖ Base model loaded and saved to intermediate checkpoint")
    
    # Delete intermediate model to save memory
    del params
    del gemma
    del state
    gc.collect()
    
    print("‚úÖ Memory cleaned up")

üîÑ Loading Gemma 3 1B-IT base model...


E0000 00:00:1768083150.885141    1294 common_lib.cc:650] Could not set metric server port: INVALID_ARGUMENT: Could not find SliceBuilder port 8471 in any of the 0 ports provided in `tpu_process_addresses`="local"
=== Source Location Trace: ===
learning/45eac/tfrc/runtime/common_lib.cc:238
E0110 22:12:56.547338    2512 google_auth_provider.cc:188] Could not find the credentials file in the standard gcloud location [/root/.config/gcloud/application_default_credentials.json]. You may specify a credentials file using $GOOGLE_APPLICATION_CREDENTIALS, or to use Google application default credentials, run: gcloud auth application-default login


‚úÖ Base model loaded and saved to intermediate checkpoint
‚úÖ Memory cleaned up


In [11]:
# Verify checkpoint was saved
import os
print("üìÅ Checkpoint contents:", os.listdir(INTERMEDIATE_CKPT_DIR))
# Expected output: ['state', ...] or similar files

üìÅ Checkpoint contents: ['state']


## üîÑ Reference Model Setup

Creating a **frozen copy** of the base model for KL divergence calculation.

**Purpose**: 
- Measures how much the policy deviates from original behavior
- Prevents catastrophic forgetting of language abilities
- Core component of GRPO that balances task learning with coherent generation

**LoRA Application**:
Adds trainable low-rank matrices to attention and feedforward layers while keeping base weights frozen.

In [12]:
from tunix.models.gemma3 import params

def get_gemma_ref_model(ckpt_path):
    """Load reference model with sharding across TPU chips."""
    mesh = jax.make_mesh(*MESH)
    model_config = model.ModelConfig.gemma3_1b()
    
    # Create abstract model structure
    abs_gemma: nnx.Module = nnx.eval_shape(
        lambda: params.create_model_from_checkpoint(MODEL_CP_PATH, config)
    )
    
    abs_state = nnx.state(abs_gemma)
    abs_state = jax.tree.map(
        lambda a, s: jax.ShapeDtypeStruct(a.shape, jnp.bfloat16, sharding=s),
        abs_state,
        nnx.get_named_sharding(abs_state, mesh),
    )
    
    # Restore from checkpoint
    checkpointer = ocp.StandardCheckpointer()
    restored_params = checkpointer.restore(ckpt_path, target=abs_state)
    
    graph_def, _ = nnx.split(abs_gemma)
    gemma = nnx.merge(graph_def, restored_params)
    return gemma, mesh, model_config

def get_lora_model(base_model, mesh):
    """Apply LoRA adapters to base model."""
    lora_provider = qwix.LoraProvider(
        module_path=(
            ".*q_einsum|.*kv_einsum|.*gate_proj|.*down_proj|.*up_proj|"
            ".*attn_vec_einsum"
        ),
        rank=RANK,
        alpha=ALPHA,
    )
    
    model_input = base_model.get_model_input()
    lora_model = qwix.apply_lora_to_model(
        base_model, lora_provider, **model_input
    )
    
    # Shard the model across TPU
    with mesh:
        state = nnx.state(lora_model)
        pspecs = nnx.get_partition_spec(state)
        sharded_state = jax.lax.with_sharding_constraint(state, pspecs)
        nnx.update(lora_model, sharded_state)
    
    return lora_model

In [13]:
# Load reference model (frozen, for KL divergence calculation)
if model_family == "gemma3":
    ref_model, mesh, model_config = get_gemma_ref_model(
        ckpt_path=os.path.join(INTERMEDIATE_CKPT_DIR, "state")
    )
    print("‚úÖ Reference model loaded")

  mesh = jax.make_mesh(*MESH)
ERROR:asyncio:Exception in callback Task.__step()
handle: <Handle Task.__step()>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
RuntimeError: cannot enter context: <_contextvars.Context object at 0x7dae85323d00> is already entered
ERROR:asyncio:Task was destroyed but it is pending!
task: <Task pending name='Task-3740' coro=<_async_in_context.<locals>.run_in_context() done, defined at /usr/local/lib/python3.12/site-packages/ipykernel/utils.py:57> wait_for=<Task pending name='Task-3741' coro=<Kernel.shell_main() running at /usr/local/lib/python3.12/site-packages/ipykernel/kernelbase.py:590> cb=[Task.__wakeup()]> cb=[ZMQStream._run_callback.<locals>._log_error() at /usr/local/lib/python3.12/site-packages/zmq/eventloop/zmqstream.py:563]>
  for axis_index, axis_size in enumerate(shard0.data.shape)
ERROR:asyncio:Task was destroyed but it is pending!
task

‚úÖ Reference model loaded


### Flax NNX Compatibility Patch

Applying a compatibility fix for Flax API changes between versions.

In [14]:
# Apply compatibility patch for Flax NNX
from flax import nnx

# Save original function
if not hasattr(nnx.Variable, "_original_set_metadata"):
    nnx.Variable._original_set_metadata = nnx.Variable.set_metadata

# Define patched function
def patched_set_metadata(self, *args, **kwargs):
    """Fix for Flax NNX API changes."""
    if len(args) == 2 and isinstance(args[0], str):
        key = args[0]
        value = args[1]
        kwargs[key] = value
        args = ()
    return nnx.Variable._original_set_metadata(self, *args, **kwargs)

# Apply patch
nnx.Variable.set_metadata = patched_set_metadata

print("‚úÖ Flax compatibility patch applied successfully!")

‚úÖ Flax compatibility patch applied successfully!


### Applying LoRA to Create Policy Model

Wrapping the reference model with trainable LoRA adapters and sharding across TPU cores for distributed training.

In [15]:
# Create policy model with LoRA adapters
lora_policy = get_lora_model(ref_model, mesh=mesh)
print("‚úÖ Policy model with LoRA created")
print("\nüß† Model structure:")
# nnx.display(lora_policy)  # Uncomment to see model architecture

# Show memory usage
show_hbm_usage()



‚úÖ Policy model with LoRA created

üß† Model structure:
Using 1.0 GiB / 15.7 GiB (6.501034%) on TPU_0(process=0,(0,0,0,0))
Using 1.0 GiB / 15.7 GiB (6.476935%) on TPU_1(process=0,(1,0,0,0))
Using 1.0 GiB / 15.7 GiB (6.476935%) on TPU_2(process=0,(0,1,0,0))
Using 1.0 GiB / 15.7 GiB (6.476935%) on TPU_3(process=0,(1,1,0,0))
Using 26.5 KiB / 15.7 GiB (0.000160%) on TPU_4(process=0,(0,2,0,0))
Using 26.5 KiB / 15.7 GiB (0.000160%) on TPU_5(process=0,(1,2,0,0))
Using 26.5 KiB / 15.7 GiB (0.000160%) on TPU_6(process=0,(0,3,0,0))
Using 26.5 KiB / 15.7 GiB (0.000160%) on TPU_7(process=0,(1,3,0,0))


## üèÜ Multi-Component Reward Function Design

GRPO requires reward functions to score model outputs. We use **7 complementary reward components** that together encourage correct, well-formatted, and well-reasoned solutions.

### Reward Breakdown:

**Format Rewards** (up to 3.5 points):
- `match_format_exactly`: +3.0 for perfect tag structure
- `match_format_approximately`: +2.0 for partial tag compliance

**Correctness Rewards** (up to 5.0 points):
- `check_answer_svamp`: +3.0 for exact numerical match
- `check_number_svamp`: +2.0 for extractable correct number

**Quality Rewards** (up to 2.5 points):
- `soft_reasoning_steps`: +1.0 for logical connectors ("first", "then", "therefore")
- `meaningful_reasoning_length`: +1.0 for appropriate length (15-400 words)
- `reward_algebraic_notation`: +1.5 for mathematical notation (equations, variables)

**Total Maximum**: ~11 points per response

This multi-faceted approach ensures the model learns both task-specific skills (correctness) and generalizable reasoning patterns (structure, logic).

In [16]:
# Regex patterns for matching format and answers
match_format = re.compile(
    rf"^[\s]{{0,}}"
    rf"{reasoning_start}.+?{reasoning_end}.*?"
    rf"{solution_start}(.+?){solution_end}"
    rf"[\s]{{0,}}$",
    flags=re.MULTILINE | re.DOTALL,
)

# For extracting numerical answers
match_number = re.compile(
    rf"{solution_start}.*?([\d.]+)", flags=re.MULTILINE | re.DOTALL
)

def match_format_exactly(prompts, completions, **kwargs):
    """Reward for exact format compliance."""
    return [
        0 if match_format.search(response) is None else 3.0
        for response in completions
    ]

def match_format_approximately(prompts, completions, **kwargs):
    """Reward for approximate format compliance."""
    scores = []
    for completion in completions:
        score = 0
        response = completion
        # Reward seeing each tag exactly once
        score += 0.5 if response.count(reasoning_start) == 1 else -0.5
        score += 0.5 if response.count(reasoning_end) == 1 else -0.5
        score += 0.5 if response.count(solution_start) == 1 else -0.5
        score += 0.5 if response.count(solution_end) == 1 else -0.5
        scores.append(score)
    return scores

def check_answer_svamp(prompts, completions, answer, **kwargs):
    """Reward for correct numerical answer."""
    responses = completions
    
    extracted_responses = [
        guess.group(1).strip() 
        if (guess := match_format.search(r)) is not None 
        else None
        for r in responses
    ]
    
    scores = []
    for guess, true_answer in zip(extracted_responses, answer):
        score = 0
        if guess is None:
            scores.append(0)
            continue
        
        try:
            # Convert to float for comparison
            guess_num = float(guess)
            true_num = float(true_answer)
            
            # Exact match gets full points
            if abs(guess_num - true_num) < 0.01:  # Allow small floating point error
                score += 3.0
            # Close match gets partial credit
            elif abs(guess_num - true_num) < 1.0:
                score += 1.5
            else:
                score -= 1.0  # Penalize wrong answers
        except:
            score = 0
        
        scores.append(score)
    return scores

def check_number_svamp(prompts, completions, answer, **kwargs):
    """Extract and check numerical answer."""
    question = kwargs.get("question", [])
    responses = completions
    
    extracted_responses = [
        guess.group(1).strip() 
        if (guess := match_number.search(r)) is not None 
        else None
        for r in responses
    ]
    
    scores = []
    print("START ============================")
    if len(question) > 0:
        print(f"Question: {question[0][:100]}...")
    if len(answer) > 0:
        print(f"Correct Answer: {answer[0]}")
    if len(responses) > 0:
        print(f"Response: {responses[0][:200]}...")
    if len(extracted_responses) > 0:
        print(f"Extracted: {extracted_responses[0]}")
    print("END ==============================")
    
    for guess, true_answer in zip(extracted_responses, answer):
        if guess is None:
            scores.append(0)
            continue
        
        try:
            # Compare numbers
            if abs(float(guess) - float(true_answer)) < 0.01:
                scores.append(2.0)
            else:
                scores.append(0.0)
        except:
            scores.append(0.0)
    
    return scores

def soft_reasoning_steps(prompts, completions, **kwargs):
    """Reward for using logical connector words."""
    rewards = []
    logical_keywords = [
        "step", "first", "next", "then", "therefore", 
        "because", "since", "so", "implies", "consequently",
        "thus", "hence", "given", "solving"
    ]
    
    for response in completions:
        match = re.search(r"<reasoning>(.+?)</reasoning>", response, flags=re.DOTALL)
        if match:
            reasoning_text = match.group(1).lower()
            found_keywords = sum(1 for word in logical_keywords if word in reasoning_text)
            # 0.1 reward per keyword, capped at 1.0
            rewards.append(min(1.0, found_keywords * 0.1))
        else:
            rewards.append(0.0)
    return rewards

def meaningful_reasoning_length(prompts, completions, **kwargs):
    """Reward for appropriate reasoning length."""
    rewards = []
    for response in completions:
        match = re.search(r"<reasoning>(.+?)</reasoning>", response, flags=re.DOTALL)
        if match:
            word_count = len(match.group(1).split())
            
            if word_count < 15:  # Too short
                rewards.append(0.0)
            elif word_count > 400:  # Too long
                rewards.append(0.5)
            else:  # Good length
                rewards.append(1.0)
        else:
            rewards.append(0.0)
    return rewards

def reward_algebraic_notation(prompts, completions, **kwargs):
    """Reward for using algebraic notation and equations."""
    rewards = []
    algebra_patterns = [
        r'[a-z]\s*[+\-*/=]\s*\d',  # x + 5
        r'\d\s*[+\-*/=]\s*[a-z]',  # 5 + x
        r'[a-z]\s*=\s*',           # x =
        r'\([^)]*[a-z][^)]*\)',    # (x + 5)
    ]
    
    for response in completions:
        match = re.search(r"<reasoning>(.+?)</reasoning>", response, flags=re.DOTALL)
        if match:
            reasoning_text = match.group(1)
            found_patterns = sum(
                1 for pattern in algebra_patterns 
                if re.search(pattern, reasoning_text)
            )
            rewards.append(min(1.5, found_patterns * 0.3))
        else:
            rewards.append(0.0)
    return rewards

## üé≤ Generation Utilities

Helper functions for model inference and evaluation.

In [17]:
def generate(
    question, sampler, temperature=0.7, top_k=50, top_p=0.95, seed=None, options=None
):
    """Generate text given a prompt."""
    
    if isinstance(question, str):
        # Single question - SVAMP doesn't have options
        input_batch = [
            TEMPLATE.format(
                system_prompt=SYSTEM_PROMPT,
                question=question
            ),
        ]
    else:
        # Batch of questions
        input_batch = [
            TEMPLATE.format(
                system_prompt=SYSTEM_PROMPT,
                question=q
            )
            for q in question
        ]
    
    out_data = sampler(
        input_strings=input_batch,
        max_generation_steps=TOTAL_GENERATION_STEPS,
        temperature=temperature,
        top_k=top_k,
        top_p=top_p,
        echo=False,
        seed=seed if seed is not None else None,
        eos_tokens=[1, 106],
    )
    
    output = out_data.text
    if isinstance(question, str):
        return output[0]
    return output

In [18]:
def evaluate(
    dataset,
    sampler,
    temperature=0.7,
    top_k=50,
    top_p=0.95,
    num_passes=1,
    corr_lst=False,
    make_lst=False,
):
    """Compute accuracy and format compliance."""
    
    response_lst = []
    corr = 0
    partially_corr = 0
    corr_format = 0
    total = 0
    
    for batch in tqdm(dataset):
        answers = batch["answer"]
        questions = batch["question"]
        
        multiple_call_responses = [[] for _ in range(len(questions))]
        for p in range(num_passes):
            responses = generate(
                questions, sampler, temperature, top_k, top_p, seed=p, options=None
            )
            for idx, response in enumerate(responses):
                multiple_call_responses[idx].append(response)
        
        for question, multiple_call_response, answer in zip(
            questions, multiple_call_responses, answers
        ):
            corr_ctr_per_question = 0
            partially_corr_per_question = 0
            corr_format_per_question = 0
            
            for response in multiple_call_response:
                # Extract numerical answer
                extracted_response = (
                    guess.group(1).strip()
                    if (guess := match_number.search(response)) is not None
                    else None
                )
                
                # Check correctness
                if extracted_response:
                    try:
                        if abs(float(extracted_response) - float(answer)) < 0.01:
                            corr_ctr_per_question += 1
                            partially_corr_per_question += 1
                    except:
                        pass
                
                # Check format
                if match_format.search(response) is not None:
                    corr_format_per_question += 1
                
                if (
                    corr_ctr_per_question > 0
                    and partially_corr_per_question > 0
                    and corr_format_per_question > 0
                ):
                    break
            
            if corr_ctr_per_question > 0:
                corr += 1
                if corr_lst and make_lst:
                    response_lst.append((question, answer, multiple_call_response))
            else:
                if not corr_lst and make_lst:
                    response_lst.append((question, answer, multiple_call_response))
            
            if partially_corr_per_question > 0:
                partially_corr += 1
            if corr_format_per_question > 0:
                corr_format += 1
            
            total += 1
            if total % 10 == 0:
                print(
                    f"===> {corr=}, {total=}, Acc={corr / total * 100:.2f}%, "
                    f"Partial={partially_corr / total * 100:.2f}%, Format={corr_format / total * 100:.2f}%"
                )
    
    to_return = (
        corr,
        total,
        corr / total * 100,
        partially_corr / total * 100,
        corr_format / total * 100,
    )
    if make_lst:
        return to_return, response_lst
    return to_return

### Creating Sampler for Inference

Initializing the generation engine with KV-cache for efficient sequential decoding.

In [19]:
# Create sampler for generation
sampler = sampler_lib.Sampler(
    transformer=lora_policy,
    tokenizer=tokenizer,
    cache_config=sampler_lib.CacheConfig(
        cache_size=MAX_PROMPT_LENGTH + TOTAL_GENERATION_STEPS + 256,
        num_layers=model_config.num_layers,
        num_kv_heads=model_config.num_kv_heads,
        head_dim=model_config.head_dim,
    ),
)

print("‚úÖ Sampler created successfully")

‚úÖ Sampler created successfully


## üìä Baseline Evaluation (Pre-Training)

Testing the **untrained** Gemma 3 1B-IT model to establish our improvement benchmark.

**Metrics**:
- **Accuracy**: Exact numerical answer correctness
- **Partial**: Any correct numerical value extracted
- **Format**: Proper `<reasoning>` and `<answer>` tag structure

Expected baseline: ~50-60% accuracy (Gemma 3 already has some math ability from pre-training)

In [20]:
# PRE-TRAINING EVALUATION
print("\n" + "="*60)
print("üìä EVALUATING MODEL BEFORE TRAINING")
print("="*60)
print("‚è≥ This will take 3-5 minutes. Patience is a virtue!")
print()

(corr, total, accuracy, partial_accuracy, format_accuracy) = evaluate(
    test_dataset,
    sampler,
    **GENERATION_CONFIGS["greedy"],
)

print("\n" + "="*60)
print("üìà PRE-TRAINING RESULTS:")
print(f"   Correct answers: {corr}/{total}")
print(f"   Accuracy: {accuracy:.2f}%")
print(f"   Partial accuracy: {partial_accuracy:.2f}%")
print(f"   Format compliance: {format_accuracy:.2f}%")
print("="*60)


üìä EVALUATING MODEL BEFORE TRAINING
‚è≥ This will take 3-5 minutes. Patience is a virtue!



  0%|          | 0/70 [00:00<?, ?it/s]

===> corr=7, total=10, Acc=70.00%, Partial=70.00%, Format=90.00%
===> corr=13, total=20, Acc=65.00%, Partial=65.00%, Format=80.00%
===> corr=20, total=30, Acc=66.67%, Partial=66.67%, Format=83.33%
===> corr=25, total=40, Acc=62.50%, Partial=62.50%, Format=85.00%
===> corr=31, total=50, Acc=62.00%, Partial=62.00%, Format=84.00%
===> corr=34, total=60, Acc=56.67%, Partial=56.67%, Format=83.33%
===> corr=37, total=70, Acc=52.86%, Partial=52.86%, Format=80.00%
===> corr=45, total=80, Acc=56.25%, Partial=56.25%, Format=82.50%
===> corr=51, total=90, Acc=56.67%, Partial=56.67%, Format=82.22%
===> corr=58, total=100, Acc=58.00%, Partial=58.00%, Format=84.00%
===> corr=61, total=110, Acc=55.45%, Partial=55.45%, Format=83.64%
===> corr=69, total=120, Acc=57.50%, Partial=57.50%, Format=85.00%
===> corr=73, total=130, Acc=56.15%, Partial=56.15%, Format=84.62%
===> corr=76, total=140, Acc=54.29%, Partial=54.29%, Format=85.71%

üìà PRE-TRAINING RESULTS:
   Correct answers: 76/140
   Accuracy: 54.2

### Checkpointing and Logging Configuration

Setting up model saving and TensorBoard metrics tracking.

In [21]:
# Checkpoint saving configuration
checkpointing_options = ocp.CheckpointManagerOptions(
    save_interval_steps=SAVE_INTERVAL_STEPS, max_to_keep=MAX_TO_KEEP
)

# Metrics logger for TensorBoard
metrics_logging_options = metrics_logger.MetricsLoggerOptions(
    log_dir="/tmp/content/tmp/tensorboard/grpo", flush_every_n_steps=20
)

print("‚úÖ Checkpointing configured")
print("‚úÖ Metrics logging configured")

‚úÖ Checkpointing configured
‚úÖ Metrics logging configured


## üìà Optimizer Configuration

Using **AdamW** with warmup + cosine decay schedule.

**Schedule Design**:
- Warm-up (steps 0-170): Learning rate gradually increases from 0 ‚Üí 3e-6
- Training (steps 170-1700): Cosine decay from 3e-6 ‚Üí 0
- Gradient clipping at 0.1 prevents instability

This conservative approach is critical for RL, where unstable updates can cause catastrophic collapse.

In [22]:
# Optimizer with learning rate schedule and gradient clipping
optimizer = optax.adamw(
    learning_rate=optax.schedules.warmup_cosine_decay_schedule(
        init_value=0.0,
        peak_value=LEARNING_RATE,
        warmup_steps=WARMUP_STEPS,
        decay_steps=MAX_STEPS,
        end_value=0.0,
    ),
    b1=B1,
    b2=B2,
    weight_decay=WEIGHT_DECAY,
)

if MAX_GRAD_NORM is not None:
    optimizer = optax.chain(
        optax.clip_by_global_norm(max_norm=MAX_GRAD_NORM),
        optimizer,
    )

print("‚úÖ Optimizer configured with:")
print(f"   Learning rate: {LEARNING_RATE}")
print(f"   Warmup steps: {WARMUP_STEPS}")
print(f"   Max gradient norm: {MAX_GRAD_NORM}")

‚úÖ Optimizer configured with:
   Learning rate: 3e-06
   Warmup steps: 200.0
   Max gradient norm: 0.1


## üéÆ RL Cluster Architecture

Configuring the distributed reinforcement learning infrastructure.

**Three-Role Architecture**:
1. **Actor**: Current policy model generating responses
2. **Reference**: Frozen base model for KL divergence
3. **Rollout**: Manages generation process and batching

**Rollout Configuration**:
- Temperature=0.9: Balanced exploration vs exploitation
- Top-K=50, Top-P=1.0: Moderate sampling diversity
- EOS tokens: Proper sequence termination

This separation enables efficient parallel processing across 8 TPU cores.

In [23]:
# RL Cluster configuration
cluster_config = rl_cluster_lib.ClusterConfig(
    role_to_mesh={
        rl_cluster_lib.Role.ACTOR: mesh,
        rl_cluster_lib.Role.REFERENCE: mesh,
        rl_cluster_lib.Role.ROLLOUT: mesh,
    },
    rollout_engine='vanilla',
    offload_to_cpu=False,
    training_config=rl_cluster_lib.RLTrainingConfig(
        actor_optimizer=optimizer,
        eval_every_n_steps=EVAL_EVERY_N_STEPS,
        max_steps=MAX_STEPS,
        mini_batch_size=TRAIN_MICRO_BATCH_SIZE,
        train_micro_batch_size=TRAIN_MICRO_BATCH_SIZE,
        metrics_logging_options=metrics_logging_options,
        checkpoint_root_directory=CKPT_DIR,
        checkpointing_options=checkpointing_options,
    ),
    rollout_config=base_rollout.RolloutConfig(
        max_tokens_to_generate=TOTAL_GENERATION_STEPS,
        max_prompt_length=MAX_PROMPT_LENGTH,
        kv_cache_size=MAX_PROMPT_LENGTH + TOTAL_GENERATION_STEPS + 256,
        temperature=TEMPERATURE,
        top_p=TOP_P,
        top_k=TOP_K,
        eos_tokens=[1, 106],
    ),
)

# GRPO configuration
grpo_config = GRPOConfig(
    num_generations=NUM_GENERATIONS,
    num_iterations=NUM_ITERATIONS,
    beta=BETA,
    epsilon=EPSILON,
)

print("‚úÖ RL Cluster configured")
print("‚úÖ GRPO configured")
print(f"   Num generations per prompt: {NUM_GENERATIONS}")
print(f"   Beta (KL penalty): {BETA}")
print(f"   Epsilon (clipping): {EPSILON}")

‚úÖ RL Cluster configured
‚úÖ GRPO configured
   Num generations per prompt: 4
   Beta (KL penalty): 0.04
   Epsilon (clipping): 0.2


## üöÄ GRPO Training Loop

Starting the main training process.

**GRPO Algorithm Flow**:
1. **Sample** batch of 2 questions from training data
2. **Generate** 4 responses per question (exploration)
3. **Score** each response using 7 reward functions
4. **Compute advantages**: Normalize rewards to identify best/worst responses
5. **Update policy**: Increase probability of high-reward responses
6. **Apply KL penalty**: Prevent over-optimization away from base model
7. **Repeat** for 1700 steps

**Expected Runtime**: 15-25 minutes on TPU v5e-8

The model learns by comparing responses to the same question, discovering what makes some better than others.

In [24]:
import os
import torch
from pathlib import Path

print("\n" + "="*60)
print("üöÄ STARTING GRPO TRAINING")
print("="*60)
print()

# RL cluster setup
rl_cluster = rl_cluster_lib.RLCluster(
    actor=lora_policy,
    reference=ref_model,
    tokenizer=tokenizer,
    cluster_config=cluster_config,
)

print("‚úÖ RL Cluster initialized")

# GRPO Trainer with reward functions for SVAMP
grpo_trainer = GRPOLearner(
    rl_cluster=rl_cluster,
    reward_fns=[
        match_format_exactly,        # Structural constraint
        match_format_approximately,  # Soft structure check
        check_answer_svamp,          # Hard correctness for numbers
        check_number_svamp,          # Flexible number extraction
        soft_reasoning_steps,        # Logical connectors
        meaningful_reasoning_length, # Appropriate length
        reward_algebraic_notation,   # Algebraic equations
    ],
    grpo_config=grpo_config,
)

print("‚úÖ GRPO Trainer initialized")
print(f"   Active reward functions: 7")
print()

print("="*60)
print("‚è≥ TRAINING IN PROGRESS...")
print("="*60)
print(f"üìä Total steps: {MAX_STEPS}")
print()
print("‚òï This will take 1-2 hours. Go grab coffee!")
print("   You can monitor progress in the output below.")
print("="*60)
print()

# TRAINING LOOP
try:
    with mesh:
        grpo_trainer.train(train_dataset)
    
    print("\n" + "="*60)
    print("‚úÖ TRAINING COMPLETED SUCCESSFULLY!")
    print("="*60)
    
except Exception as e:
    print("\n" + "="*60)
    print("‚ùå TRAINING FAILED!")
    print(f"Error: {e}")
    print("="*60)
    raise




üöÄ STARTING GRPO TRAINING



[34m[1mwandb[0m: Currently logged in as: [33msaisurya24[0m ([33msaisurya24-technical-university-of-applied-sciences-w-rz[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


‚úÖ RL Cluster initialized
‚úÖ GRPO Trainer initialized
   Active reward functions: 7

‚è≥ TRAINING IN PROGRESS...
üìä Total steps: 2000

‚òï This will take 1-2 hours. Go grab coffee!
   You can monitor progress in the output below.

Question: For the walls of the house he would use 8 large planks of wood. If each plank of wood needs 74 piece...
Correct Answer: 8
Response: Yes, I‚Äôm ready to put on my mathematical reasoning expert hat! Let‚Äôs tackle this problem.

<reasoning>
Step 1: John needs 8 large planks of wood.
Step 2: Each plank needs 74 pieces of nails.
Step 3: T...
Extracted: 592


Actor Training:   0%|          | 0/2000 [00:00<?, ?step/s]

Question: Every day Ryan spends 7 hours on learning english, 2 hours on learning chinese and 4 hours on learni...
Correct Answer: 3
Response: Yes, I‚Äôm ready to analyze the problem and provide a detailed solution following your specified format.

<reasoning>
Step 1: Ryan spends 7 hours on learning English each day.
Step 2: He spends 2 hours ...
Extracted: 7
Question: Jack received 10 emails in the morning, 5 emails in the afternoon and 4 emails in the evening. How m...
Correct Answer: 1
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: Jack received 10 emails in the morning, 5 emails in the afternoon and 4 emails in the evening.
Step 2: Calculate the number of emails received in th...
Extracted: 1




Question: Jack received 5 emails and 6 letters in the morning. He then received 9 emails and 7 letters in the ...
Correct Answer: 13
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: Jack received 5 emails in the morning.
Step 2: He received 9 emails in the afternoon.
Step 3: Total emails received = 5 + 9 = 14 emails.
Step 4: Tot...
Extracted: 27
Question: Every day Ryan spends 5 hours on learning chinese and some more hours on learning english. If he spe...
Correct Answer: 7
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: Ryan spends 5 hours on Chinese and 2 hours more on English. So, the time spent on English is 5 + 2 = 7 hours.
Step 2: The time spent on learning Eng...
Extracted: 7
Question: Rachel picked 7 ripe apples from her tree. Now the tree has 5 apples still on it. If 3 of those are ...
Correct Answer: 2
Response: Yes, I‚Äôm ready to analyze the problem and provide a step-by-step solution following your specified format.

<reasoning>
Step 1: Rach



Question: Every day Ryan spends 6 hours on learning english 3 hours on learning chinese and 58 hours on learni...
Correct Answer: 3
Response: Yes, I‚Äôm ready to put on my mathematical reasoning expert hat!

<reasoning>
Step 1: We need to calculate the total hours spent on English, Chinese, and Spanish.
Step 2: Calculate the hours spent on En...
Extracted: 3
Question: Every day Ryan spends 4 hours on learning english and 6 hours on learning chinese. If he learns for ...
Correct Answer: 10
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: Ryan spends 4 hours learning English each day.
Step 2: Ryan spends 6 hours learning Chinese each day.
Step 3: He learns for 86 days.
Step 4:...
Extracted: 4
Question: Jack received 9 emails in the morning, 10 emails in the afternoon and 7 emails in the evening. How m...
Correct Answer: 2
Response: Yes, I understand. Let‚Äôs solve this problem.

<reasoning>
Step 1: Jack received 9 emails in the morning.
Step 2: He received 10 email



Question: Allan brought 7 balloons and 5 balls while Jake brought 6 balloons and 4 balls to the park. How many...
Correct Answer: 9
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: Let's calculate the number of balloons Allan brought. Allan brought 7 balloons.
Step 2: Let's calculate the number of balls Allan brought. A...
Extracted: 10
Question: An industrial machine made 9 shirts yesterday and 8 shirts today. It can make 2 shirts a minute. How...
Correct Answer: 4
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: The machine made 9 shirts yesterday.
Step 2: The machine made 8 shirts today.
Step 3: Total shirts made = 9 + 8 = 17 shirts.
Step 4: The machine mak...
Extracted: 8.5
Question: Stray cats loved eating goldfish in the pond leaving 6 goldfish. Paige had raised 8 goldfish in the ...
Correct Answer: 2
Response: Yes, I‚Äôm ready to put on my mathematical reasoning expert hat!

<reasoning>
Step 1: Paige started with 8 goldfish.
Step 2: Stray ca



Question: Dan has $ 3. He bought 2 candy bar for $ 4, each one costing the same amount of money. How much did ...
Correct Answer: 2
Response: Yes, let's solve this problem step-by-step.

<reasoning>
Step 1: Dan starts with $3.
Step 2: He bought 2 candy bar for $4 each, so the total cost of the candy bar is 2 * $4 = $8.
Step 3: After buying ...
Extracted: None
Question: Every day Ryan spends 7 hours on learning english and some more hours on learning chinese. If he spe...
Correct Answer: 5
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Ryan spends 7 hours on English and 2 hours more on Chinese than English.  So, the time spent on Chinese is 7 + 2 = 9 hours.
Step 2: The time spent o...
Extracted: 9
Question: Rachel had to complete 9 pages of math homework, 2 pages of reading homework and 96 more pages of bi...
Correct Answer: 7
Response: Yes, I understand. Let‚Äôs solve this problem.

<reasoning>
Step 1: We are given that Rachel completed 9 pages of math homework.
Step 2: 



Question: Jack received a total of 9 emails in the day. If he received 7 emails in the morning and some more i...
Correct Answer: 2
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: Jack received a total of 9 emails in the day.
Step 2: He received 7 emails in the morning.
Step 3: We are given that he received some more emails in...
Extracted: 2
Question: Rachel has 52 apple trees. She picked 2 apples from 1 of her trees. Now the tree has 7 apples still ...
Correct Answer: 9
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: Rachel has 52 apple trees.
Step 2: She picked 2 apples from one of her trees.
Step 3: The tree now has 7 apples still on it.
Step 4: We need...
Extracted: 52
Question: Last week Fred had 86 dollars and Jason had 5 dollars. Fred washed cars over the weekend and now has...
Correct Answer: 14
Response: Yes, please! Let‚Äôs solve this problem.

<reasoning>
Step 1: Fred‚Äôs initial amount was 86 dollars.
Step 2: Fred washed cars and now



Question: Rachel had to complete 7 pages of math homework. If she had to complete 4 more pages of math homewor...
Correct Answer: 3
Response: Yes, let‚Äôs do that!

<reasoning>
Step 1: We are given that Rachel had to complete 7 pages of math homework.
Step 2: We are told that she had to complete 4 more pages of math homework than reading home...
Extracted: 3
Question: Allan brought 2 balloons and Jake brought 3 balloons to the park. How many more balloons did Jake ha...
Correct Answer: 1
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: Allan brought 2 balloons.
Step 2: Jake brought 3 balloons.
Step 3: The difference in the number of balloons is Jake's balloons - Allan's balloons = ...
Extracted: 1
Question: Rachel's tree had 4 apples initially. She picked some apples and now there are 2 apples left on the ...
Correct Answer: 2
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Rachel started with 4 apples.
Step 2: She picked some apples, so the number of apple



Question: A waiter had 3 customers. After some more arrived he had 8 customers. How many new customers arrived...
Correct Answer: 5
Response: Yes, let‚Äôs do that!

<reasoning>
Step 1: The waiter initially had 3 customers.
Step 2: After more arrived, he had 8 customers.
Step 3: The difference between the number of customers after and before i...
Extracted: 5
Question: Edward spent $ 4 to buy books and $ 3 to buy pens. Now he has $ 12. How much did Edward spend on boo...
Correct Answer: 7
Response: Yes, let‚Äôs solve this problem step-by-step.

<reasoning>
Step 1: We know Edward spent $4 on books and $3 on pens.
Step 2: His total spending is $4 + $3 = $7.
Step 3: He now has $12.
Step 4: To find ho...
Extracted: 0
Question: 6 green peaches, 60 yellow peaches and 2 red peaches are in the basket. How many more green peaches ...
Correct Answer: 4
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We have 6 green peaches, 60 yellow peaches, and 2 red peaches.
Step 2: To find the n



Question: Jack received 6 emails and sent 91 letters in the morning. He then received 2 emails and sent 74 let...
Correct Answer: 8
Response: Yes, let‚Äôs solve this problem step-by-step.

<reasoning>
Step 1: Jack received 6 emails in the morning.
Step 2: He sent 91 letters in the morning.
Step 3: He received 2 emails in the afternoon.
Step 4...
Extracted: 8
Question: Dan has $ 3. For a total of $ 4 he bought 2 candy bar each one costing the same amount of money. How...
Correct Answer: 2
Response: Yes, let's solve this problem step-by-step.

<reasoning>
Step 1: Dan has $3.
Step 2: He bought 2 candy bars each costing the same amount of money.
Step 3: The total cost of the 2 candy bars is $2 * 2 ...
Extracted: 2
Question: Adam has 4 more apples than Jackie. Together Adam and Jackie have 14 apples. Bob has 6 apples more t...
Correct Answer: 20
Response: Yes, here‚Äôs the solution to the problem:

<reasoning>
Step 1: Let 'A' be the number of apples Adam has.
Step 2: Let 'J' be the number o



Question: Jesse's room is 7 feet wide. If she needs a carpet of size 14 square feet What is the length of her ...
Correct Answer: 2
Response: Yes, let's solve this problem!

<reasoning>
Step 1: The room is 7 feet wide.
Step 2: The carpet is 14 square feet.
Step 3: The area of a rectangle is calculated by multiplying length and width.
Step 4...
Extracted: 2
Question: 5 red peaches, 14 yellow peaches and 6 green peaches are in the basket. How many green and yellow pe...
Correct Answer: 20
Response: Yes, let's do that!

<reasoning>
Step 1: We have 5 red peaches, 14 yellow peaches, and 6 green peaches.
Step 2: To find the number of green and yellow peaches, we add the number of green peaches to th...
Extracted: 20
Question: Emily is making bead necklaces for her 44 friends. She made 26 necklaces and each necklace takes 2 b...
Correct Answer: 52
Response: Yes, let‚Äôs do that!

<reasoning>
Step 1: Emily made 26 necklaces.
Step 2: Each necklace takes 2 beads.
Step 3: The total number of bead



Question: Being his favorite, he saved checking on the grapevines for his last stop. He was told by 94 of the ...
Correct Answer: 15
Response: Yes, let‚Äôs do that!

<reasoning>
Step 1: The pickers filled 90 drums of grapes in 6 days.
Step 2: To find the number of drums filled per day, we divide the total number of drums by the number of days:...
Extracted: 15
Question: Faye was placing her pencils and crayons into 19 rows with 4 pencils and 27 crayons in each row. How...
Correct Answer: 76
Response: Yes, let's do that!

<reasoning>
Step 1: The total number of rows is 19.
Step 2: The number of pencils per row is 4.
Step 3: The total number of pencils is the number of rows multiplied by the number ...
Extracted: 76
Question: Bryan took a look at his books as well. Each bookshelf contains 2 books. If he has a total of 38 boo...
Correct Answer: 19
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: Bryan has a total of 38 books.
Step 2: Each bookshelf contains 2 books.
Step 3:



Question: Last week Fred had 49 dollars and Jason had 3 dollars. Fred washed cars over the weekend and now has...
Correct Answer: 60
Response: Yes, let‚Äôs do this!

<reasoning>
Step 1: We know Fred‚Äôs initial amount was 49 dollars.
Step 2: Fred washed cars and now has 112 dollars.
Step 3: Subtract the amount he washed cars from his initial amo...
Extracted: 60
Question: Dave had 15 apps and 24 files on his phone. After deleting some apps and files he had 21 apps and 4 ...
Correct Answer: 17
Response: Yes, let's solve this problem step-by-step.

<reasoning>
Step 1: Dave initially had 15 apps and 24 files.
Step 2: He deleted some apps and files, so he now has 21 apps and 4 files.
Step 3: The number ...
Extracted: 3
Question: Lucy went to the grocery store. She bought 2 packs of cookie and 12 packs of cake. In the end she ha...
Correct Answer: 14
Response: Yes, let‚Äôs solve this problem step-by-step.

<reasoning>
Step 1: Lucy bought 2 packs of cookie and 12 packs of cake. The total num



Question: A mailman has to give 25 pieces of junk mail to each block. If he gives 5 mails to each house in a b...
Correct Answer: 5
Response: Yes, let's solve this problem!

<reasoning>
Step 1: The mailman gives 25 pieces of junk mail to each block.
Step 2: He gives 5 pieces of mail to each house in a block.
Step 3: The number of houses in ...
Extracted: 5
Question: Julia played tag with 7 kids on monday and 13 kids on tuesday. She played cards wtih 20 kids on wedn...
Correct Answer: 20
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: Julia played tag with 7 kids on Monday.
Step 2: She played tag with 13 kids on Tuesday.
Step 3: She played cards with 20 kids on Wednesday.
Step 4: ...
Extracted: 7
Question: Luke played a trivia game and gained 3 points in each round of a game. If he scored 78 points in the...
Correct Answer: 26
Response: Yes, let‚Äôs solve this problem step-by-step.

<reasoning>
Step 1: Luke gained 3 points in each round.
Step 2: He scored 78 points in th



Question: He then went to see the oranges being harvested. He found out that they harvest 8 sacks per day and ...
Correct Answer: 280
Response: Yes, I understand. Let‚Äôs solve this problem.

<reasoning>
Step 1: The oranges are harvested at a rate of 8 sacks per day.
Step 2: Each sack contains 35 oranges.
Step 3: Total oranges = (sacks/day) * (...
Extracted: 280
Question: Robin has 28 packages of gum and 13 packages of candy. There are 4 pieces in each package. How many ...
Correct Answer: 112
Response: Re:
<reasoning>
Step 1: Robin has 28 packages of gum.
Step 2: Each package has 4 pieces, so the total number of pieces of gum is 28 * 4 = 112 pieces.
</reasoning>
<answer>112</answer>...
Extracted: 112
Question: There were 3 dollars in Olivia's wallet. She collected 49 more dollars from an atm. After she visite...
Correct Answer: 46
Response: Yes, I understand. Let‚Äôs solve this problem step-by-step.

<reasoning>
Step 1: Olivia started with 3 dollars in her wallet.
Step 2: She collecte



Question: Mary is baking a cake. The recipe calls for 6 cups of flour and 13 cups of sugar. She already put in...
Correct Answer: 21
Response: Re:
<reasoning>
Step 1: The recipe calls for 6 cups of flour and 13 cups of sugar.
Step 2: The problem states that she needs 8 more cups of flour than sugar.
Step 3: Let 'x' be the number of more cups...
Extracted: 15
Question: If each bag has 3 cookies and you had 21 cookies in total How many bags of cookies do you have?...
Correct Answer: 7
Response: Re:
<reasoning>
Step 1: We are given that there are a total of 21 cookies.
Step 2: Each bag contains 3 cookies.
Step 3: We want to find the number of bags of cookies.
To find the number of bags, we di...
Extracted: 7
Question: There were 10 roses in the vase. Jessica cut 8 more roses from her flower garden and put them in the...
Correct Answer: 18
Response: Re:
<reasoning>
Step 1: Initially, there were 10 roses in the vase.
Step 2: Jessica cut 8 more roses, so the number of roses decreases by 8.
S



Question: Julia played tag with 8 kids on monday, 11 kids on tuesday and 9 kids on wednesday. How many more ki...
Correct Answer: 2
Response: Re:
<reasoning>
Step 1: Julia played tag with 8 kids on Monday.
Step 2: She played tag with 11 kids on Tuesday.
Step 3: She played tag with 9 kids on Wednesday.
Step 4: We want to find the difference ...
Extracted: 2
Question: Emily is making bead necklaces for her friends. She was able to make 6 necklaces and she had 18 bead...
Correct Answer: 3
Response: Re:
<reasoning>
Step 1: Emily made 6 necklaces.
Step 2: She had 18 beads in total.
Step 3: To find the number of beads per necklace, we divide the total number of beads by the number of necklaces.
Ste...
Extracted: 3
Question: Carol was sending out birthday invitations to 12 friends. She bought 3 packs each one having the sam...
Correct Answer: 4
Response: Re:Carol was sending out birthday invitations to 12 friends. She bought 3 packs each one having the same number of invitations. How many inv



Question: Dave had 19 apps and 18 files on his phone. After deleting some apps and files he had 6 apps and 15 ...
Correct Answer: 9
Response: Yes, I understand. Let‚Äôs solve this problem.

<reasoning>
Step 1: Dave initially had 19 apps and 18 files.
Step 2: He deleted some apps and files, so he has 6 apps and 15 files left.
Step 3: The diffe...
Extracted: 9
Question: At the arcade Dave had won 13 tickets. If he used 8 to buy some toys and 18 more to buy some clothes...
Correct Answer: 10
Response: Re: Solve the problem below using this exact format:

<reasoning>
Step 1: Dave initially had 13 tickets.
Step 2: He used 8 tickets to buy toys.
Step 3: He used 18 more tickets to buy clothes.
Step 4: ...
Extracted: 10
Question: Marco and his dad went strawberry picking. His strawberries weighed 8 pounds while his dad's strawbe...
Correct Answer: 40
Response: Re:
<reasoning>
Step 1: We are given that Marco‚Äôs strawberries weighed 8 pounds and his dad‚Äôs strawberries weighed 32 pounds.
Step 



Question: Faye was placing 30 pencils into rows with 5 pencils in each row. How many rows could she make?...
Correct Answer: 6
Response: Re:
<reasoning>
Step 1: Faye is placing 30 pencils into rows with 5 pencils in each row.
Step 2: The number of rows is the total number of pencils divided by the number of pencils per row.
Step 3: Num...
Extracted: 6
Question: The school is planning a field trip. The school has 102 classrooms. There are 11 students in the sch...
Correct Answer: 3
Response: Re: Solve the problem below using this exact format:

<reasoning>
Step 1: The school has 102 classrooms.
Step 2: There are 11 students in the school.
Step 3: There are 99 seats on each school bus.
Ste...
Extracted: 11
Question: Dave had 16 apps and 77 files on his phone. After deleting some apps and files he had 5 apps and 23 ...
Correct Answer: 11
Response: Yes, I‚Äôm ready to tackle this problem! Let‚Äôs do it.

<reasoning>
Step 1: Dave started with 16 apps.
Step 2: He deleted some apps, so the nu



Question: Rachel had to complete 9 pages of math homework, 11 pages of reading homework and 29 more pages of b...
Correct Answer: 20
Response: Yes, let‚Äôs do that!

<reasoning>
Step 1: We are given that Rachel completed 9 pages of math homework.
Step 2: We are given that she completed 11 pages of reading homework.
Step 3: We are given that sh...
Extracted: 10
Question: Steven has 12 peaches. Jake has 4 fewer peaches than Steven who has 67 more peaches than Jill. How m...
Correct Answer: 8
Response: Yes, I‚Äôm ready to analyze the problem and provide a detailed solution following your specified format.

<reasoning>
Step 1: Steven has 12 peaches.
Step 2: Jill has 67 more peaches than Steven, so Jill...
Extracted: 75
Question: Baker sold 8 cakes. If he had made 40 cakes initially How many more cakes did baker make than those ...
Correct Answer: 32
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: Baker initially sold 8 cakes.
Step 2: He initially planned to make 40 



Question: After resting they decided to go for a swim. The depth of the water is 5 times Ron's height. Dean is...
Correct Answer: 60
Response: Yes, I‚Äôm ready to tackle this problem! Let‚Äôs break it down step by step.

<reasoning>
Step 1: The depth of the water is 5 times Ron‚Äôs height.  Let the depth be 'd'. So, d = 5 * Ron‚Äôs height.
Step 2: D...
Extracted: 60
Question: Josh had 21 marbles in his collection. He found 7 marbles. How many marbles does he have now?...
Correct Answer: 28
Response: Yes, I‚Äôm ready to put on my mathematical reasoning expert hat!

<reasoning>
Step 1: Josh initially had 21 marbles.
Step 2: He found 7 more marbles.
Step 3: To find the new total, we add the initial nu...
Extracted: 28
Question: The grasshopper, the frog and the mouse had a jumping contest. The grasshopper jumped 19 inches. The...
Correct Answer: 15
Response: Yes, I‚Äôm ready to tackle this problem! Let‚Äôs break it down systematically.

<reasoning>
Step 1: Let 'x' be the distance the frog



Question: There are 87 oranges and 290 bananas in Philip's collection. If the bananas are organized into 2 gro...
Correct Answer: 145
Response: Re:
<reasoning>
Step 1: We are given that there are 87 oranges and 290 bananas in Philip‚Äôs collection.
Step 2: The bananas are organized into 2 groups, and the oranges are organized into 93 groups.
St...
Extracted: None
Question: Frank was reading through his favorite book. The book had 3 chapters, each with the same number of p...
Correct Answer: 198
Response: Re:
<reasoning>
Step 1: Let 'x' be the number of pages in each chapter.
Step 2: Since there are 3 chapters, the total number of pages is 3 * x = 594.
Step 3: Divide both sides by 3 to solve for x: x =...
Extracted: 198
Question: Randy has 95 blocks. He uses 20 blocks to build a house and 50 blocks to build a tower. How many mor...
Correct Answer: 30
Response: Re:
<reasoning>
Step 1: Let 'h' be the number of blocks used to build the house and 't' be the number of blocks used to build th



Question: Lewis earns $ 2 every week during the harvest. If he earns a total of $ 178 How many weeks did the h...
Correct Answer: 89
Response: Re:
<reasoning>
Step 1: Lewis earns $2 per week.
Step 2: His total earnings are $178.
Step 3: To find the number of weeks, divide the total earnings by the weekly earnings: 178 / 2 = 89 weeks.
</reaso...
Extracted: 89
Question: Danny collects bottle caps and wrappers. He found 71 bottle caps and 24 wrappers at the park. Now he...
Correct Answer: 12
Response: Re:
<reasoning>
Step 1: Danny initially had 71 bottle caps and 24 wrappers.
Step 2: He now has 28 bottle caps and 16 wrappers.
Step 3: To find how many more bottle caps than wrappers he has, subtract ...
Extracted: 12
Question: Frank put 11 pieces of candy in each bag. If he had 22 pieces of candy How many bags would he have?...
Correct Answer: 2
Response: Re:
<reasoning>
Step 1: Frank put 11 pieces of candy in each bag.
Step 2: He has 22 pieces of candy.
Step 3: Divide the total number of c



Question: There are 20 houses in a block. If a mailman has to give 32 pieces of junk mail to each house in eac...
Correct Answer: 640
Response: Re:
<reasoning>
Step 1: The block has 20 houses.
Step 2: The mailman gives 32 pieces of junk mail to each house in each block.
Step 3: To find the total number of pieces of junk mail, we multiply the ...
Extracted: 640
Question: The Razorback t-shirt shop makes $ 98 dollars off each t-shirt sold. During the Arkansas game and th...
Correct Answer: 8722
Response: Re:
<reasoning>
Step 1: The Razorback t-shirt shop makes $98 off each t-shirt sold.
Step 2: They sold a total of 163 t-shirts.
Step 3: They sold 89 t-shirts during the Arkansas game.
Step 4: The amoun...
Extracted: 8722
Question: The grasshopper, the frog and the mouse had a jumping contest. The grasshopper jumped 36 inches. The...
Correct Answer: 53
Response: Re:
<reasoning>
Step 1: Let's denote the distance jumped by the grasshopper as G, the distance jumped by the frog as F, and the d



Question: A grocery store had 49 bottles of regular soda, 40 bottles of diet soda and 6 bottles of lite soda. ...
Correct Answer: 89
Response: Re:
<reasoning>
Step 1: Let's calculate the number of bottles of regular soda. We are given that there are 49 bottles of regular soda.
Step 2: Let's calculate the number of bottles of diet soda. We ar...
Extracted: 89
Question: A grocery store had 81 bottles of regular soda, 60 bottles of diet soda and 60 bottles of lite soda....
Correct Answer: 21
Response: Re:
<reasoning>
Step 1: Let's calculate the number of bottles of regular soda. We have 81 bottles of regular soda.
Step 2: Let's calculate the number of bottles of diet soda. We have 60 bottles of die...
Extracted: 21
Question: Julia played tag with 2 kids on monday, 14 kids on tuesday and 16 kids on wednesday. How many kids d...
Correct Answer: 30
Response: Re:
<reasoning>
Step 1: We are given that Julia played tag with 2 kids on Monday, 14 kids on Tuesday, and 16 kids on Wednesday.
Step 2:



Question: There are 14 different movies and 15 different books in the ' crazy silly school ' series. If you re...
Correct Answer: 1
Response: Re:
<reasoning>
Step 1: Let 'b' be the number of different books in the series, and 'm' be the number of different movies in the series. We are given that b = 14 and m = 15.
Step 2: Let 'x' be the num...
Extracted: None
Question: 44 campers went rowing in the morning 39 campers went rowing in the afternoon and 31 campers went ro...
Correct Answer: 5
Response: Re:
<reasoning>
Step 1: Let 'm' be the number of campers who went rowing in the morning.
Step 2: Let 'a' be the number of campers who went rowing in the afternoon.
Step 3: Let 'e' be the number of cam...
Extracted: 5
Question: The grasshopper, the frog and the mouse had a jumping contest. The grasshopper jumped 14 inches. The...
Correct Answer: 21
Response: Re:
<reasoning>
Step 1: Let's define the jumps as follows:
Grasshopper's jump = G = 14 inches
Mouse's jump = M
Frog's jump = F
Step 2: T



Question: He also had 56 aquariums for saltwater animals and 10 aquariums for freshwater animals. Each aquariu...
Correct Answer: 2184
Response: Re:
<reasoning>
Step 1: Tyler has 56 aquariums for saltwater animals.
Step 2: Each aquarium has 39 animals.
Step 3: Total saltwater animals = number of aquariums * animals per aquarium = 56 * 39.
Step...
Extracted: 2184
Question: 62 campers went rowing in the morning. 39 campers went rowing in the afternoon. How many more camper...
Correct Answer: 23
Response: Re:
<reasoning>
Step 1: We are given that 62 campers went rowing in the morning and 39 campers went rowing in the afternoon.
Step 2: To find how many more campers went rowing in the morning than in th...
Extracted: 23
Question: Frank was reading through his favorite book. The book had 612 pages equally distributed over 24 chap...
Correct Answer: 102
Response: Re:
<reasoning>
Step 1: The book has 612 pages and is divided into 24 chapters. Therefore, the number of pages per chapter is 612 



Question: He then went to see the oranges being harvested. He found out that they harvest 67 sacks per day and...
Correct Answer: 420
Response: Re:
<reasoning>
Step 1: The problem states that the oranges are harvested at a rate of 67 sacks per day.
Step 2: Each sack contains 12 oranges.
Step 3: To find the total number of oranges harvested, w...
Extracted: 804
Question: Paige was helping her mom plant flowers and together they planted 36 seeds. They put 12 seeds in eac...
Correct Answer: 3
Response: Re:
<reasoning>
Step 1: Let 'x' be the number of flower beds.
Step 2: They planted 12 seeds in each flower bed and a total of 36 seeds, so 12 * x = 36.
Step 3: Divide both sides of the equation by 12 ...
Extracted: 3
Question: Marco and his dad went strawberry picking. Marco's strawberries weighed 15 pounds. If together their...
Correct Answer: 22
Response: Re:
<reasoning>
Step 1: We are given that Marco‚Äôs strawberries weighed 15 pounds.
Step 2: Their strawberries weighed together 37 poun



Question: Matthew had 24 crackers. If Matthew gave equal numbers of crackers to his 3 friends and still had 17...
Correct Answer: 7
Response: Re:
<reasoning>
Step 1: We know Matthew started with 24 crackers.
Step 2: Matthew gave equal numbers of crackers to his 3 friends. Let 'x' be the number of crackers each friend received.
Step 3: The t...
Extracted: None
Question: Last week Fred had 19 dollars and Jason had 16 dollars. They washed cars over the weekend and now Fr...
Correct Answer: 21
Response: Re: Fred‚Äôs earnings

<reasoning>
Step 1: We are given that Fred had 19 dollars last week and now has 40 dollars.
Step 2: We are given that Jason had 16 dollars last week and now has 69 dollars.
Step 3...
Extracted: 24
Question: Randy has 90 blocks. He uses 89 blocks to build a house and 63 blocks to build a tower. How many mor...
Correct Answer: 26
Response: Re:
<reasoning>
Step 1: We are given that Randy has 90 blocks.
Step 2: He used 89 blocks to build a house and 63 blocks to build a t



Question: Jake has 6 fewer peaches and 8 more apples than Steven. Steven has 17 peaches and 16 apples. How man...
Correct Answer: 1
Response: Re:
<reasoning>
Step 1: Let 'x' represent the number of peaches Jake has.
Step 2: Let 'y' represent the number of apples Steven has.
Step 3: We are given that Jake has 6 fewer peaches than Steven, so ...
Extracted: None
Question: He then went to see the oranges being harvested. He found out that they harvest 71 sacks of unripe o...
Correct Answer: 11
Response: Re:
<reasoning>
Step 1: Let 'u' be the number of sacks of unripe oranges harvested per day, and 'r' be the number of sacks of ripe oranges harvested per day.
Step 2: We are given that they harvest 71 ...
Extracted: 11
Question: At the zoo, a cage had 95 snakes and 61 alligators. If 64 snakes were hiding How many snakes were no...
Correct Answer: 31
Response: Re:
<reasoning>
Step 1: Let 's' be the number of snakes that were hiding.
Step 2: Let 'a' be the number of alligators that were hiding



Question: Paco had 39 sweet cookies and 6 salty cookies. He ate 23 salty cookies and 32 sweet cookies. How man...
Correct Answer: 9
Response: Re:
<reasoning>
Step 1: Paco started with 39 sweet cookies and 6 salty cookies.
Step 2: He ate 23 salty cookies.
Step 3: The number of sweet cookies he ate is 32.
Step 4: The number of salty cookies h...
Extracted: 9
Question: 19 red peaches, 11 yellow peaches and 12 green peaches are in the basket. How many more red peaches ...
Correct Answer: 8
Response: Re:
<reasoning>
Step 1: We are given the number of red peaches, yellow peaches, and green peaches.
Step 2: Number of red peaches = 19
Step 3: Number of yellow peaches = 11
Step 4: Number of green peac...
Extracted: 8
Question: Danny collects bottle caps and wrappers. He found 46 wrappers and 50 bottle caps at the park. Now he...
Correct Answer: 4
Response: Re:
<reasoning>
Step 1: Let 'b' be the number of bottle caps Danny found and 'w' be the number of wrappers Danny found.
Step 2: We are given



Question: Bobby had 32 pieces of candy. He ate some pieces of candy. If he has 20 pieces of candy left How man...
Correct Answer: 12
Response: Re:
<reasoning>
Step 1: Bobby started with 32 pieces of candy.
Step 2: He ate some pieces of candy. Let 'x' be the number of pieces he ate.
Step 3: The number of pieces left is 32 - x.
Step 4: We are ...
Extracted: 12
Question: There are 22 different books and 10 different movies in the ' crazy silly school ' series. If you re...
Correct Answer: 10
Response: Re:
<reasoning>
Step 1: We are given that there are 22 different books and 10 different movies in the ‚Äò crazy silly school‚Äô series.
Step 2: We read 12 of the books.
Step 3: We watched 56 of the movies...
Extracted: 10
Question: A waiter had 12 customers. After some left he still had 14 customers. Then he got 10 new customers H...
Correct Answer: 24
Response: Re:
<reasoning>
Step 1: The waiter initially had 12 customers.
Step 2: After some customers left, he had 14 customers.
Step 3: This



Question: Last week Fred had 78 dollars and Jason had 31 dollars. Jason delivered newspapers and now has 57 do...
Correct Answer: 26
Response: Re:
<reasoning>
Step 1: We need to find out how much money Jason earned by delivering newspapers last week.
Step 2: Jason‚Äôs initial amount was 31 dollars.
Step 3: Jason received 57 dollars, so he earn...
Extracted: 26
Question: Josh lost 11 marbles. If he had 19 marbles in his collection earlier How many marbles does he have n...
Correct Answer: 8
Response: Re:
<reasoning>
Step 1: Josh started with 19 marbles.
Step 2: He lost 11 marbles.
Step 3: To find the number of marbles he has now, we subtract the number of marbles he lost from the number of marbles...
Extracted: 8
Question: The grasshopper, the frog and the mouse had a jumping contest. The grasshopper jumped 19 inches. The...
Correct Answer: 58
Response: Re:
<reasoning>
Step 1: Let's represent the jumps as follows:
Grasshopper's jump = G = 19 inches
Frog's jump = F
Mouse's jump = M

Step



Question: A grocery store had 19 bottles of diet soda and 60 bottles of regular soda. How many more bottles of...
Correct Answer: 41
Response: Re:
<reasoning>
Step 1: Let 'D' be the number of bottles of diet soda and 'R' be the number of bottles of regular soda.
Step 2: We are given that D = 19 and R = 60.
Step 3: We want to find the differe...
Extracted: 41
Question: 53 campers went rowing in the morning 48 campers went rowing in the afternoon and 49 campers went ro...
Correct Answer: 4
Response: Re:
<reasoning>
Step 1: We are given the number of campers who went rowing in the morning, afternoon, and evening.
Step 2: The number of campers who went rowing in the morning is 48.
Step 3: The numbe...
Extracted: 1
Question: There were 13 roses and 84 orchids in the vase. Jessica cut some more roses and orchids from her flo...
Correct Answer: 1
Response: Re:
<reasoning>
Step 1: Let 'r' be the number of roses Jessica cut, and 'o' be the number of orchids Jessica cut.
Step 2: We are given tha



Question: Danny collects bottle caps and wrappers. He found 58 bottle caps and 25 wrappers at the park. Now he...
Correct Answer: 1
Response: Re:
<reasoning>
Step 1: We are given that Danny initially had 58 bottle caps and 25 wrappers.
Step 2: He now has 11 wrappers and 12 bottle caps.
Step 3: The number of bottle caps is 12, and the number...
Extracted: 1
Question: The grasshopper, the frog and the mouse had a jumping contest. The grasshopper jumped 19 inches. The...
Correct Answer: 30
Response: Re:
<reasoning>
Step 1: Let's denote the distance jumped by the grasshopper as G = 19 inches, the distance jumped by the frog as F, and the distance jumped by the mouse as M.
Step 2: The frog jumped 1...
Extracted: 30
Question: A grocery store had a total of 30 bottles of which 28 were bottles of regular soda and the rest were...
Correct Answer: 2
Response: Re:
<reasoning>
Step 1: Let 'x' be the number of bottles of diet soda.
Step 2: The total number of bottles is 30, and 28 are regular soda.



Question: There are 16 different books and 14 different movies in the ' crazy silly school ' series. If you re...
Correct Answer: 2
Response: Re:
<reasoning>
Step 1: We are given that there are 16 different books and 14 different movies in the ' crazy silly school ' series.
Step 2: We read 19 books and watched 13 movies.
Step 3: We want to ...
Extracted: 6
Question: The grasshopper, the frog and the mouse had a jumping contest. The grasshopper jumped 21 inches. The...
Correct Answer: 41
Response: Re:
<reasoning>
Step 1: Let's define the jumps of each animal. The grasshopper jumped 21 inches. The frog jumped 25 inches farther than the grasshopper. The mouse jumped 16 inches less than the frog.
...
Extracted: 9
Question: Bobby ate some pieces of candy. Then he ate 25 more. If he ate a total of 43 pieces of candy How man...
Correct Answer: 18
Response: Re:
<reasoning>
Step 1: Bobby started with some pieces of candy. Let 'x' be the number of pieces he ate initially.
Step 2: He ate 25 more 



Question: There are 8 different books and 5 different movies in the ' crazy silly school ' series. If you read...
Correct Answer: 3
Response: Re:
<reasoning>
Step 1: We are given that there are 8 different books and 5 different movies in the ' crazy silly school ' series.
Step 2: We are given that you read 19 of the movies and watched 16 of...
Extracted: 3
Question: You had 26 bags with equal number of cookies. If you had 15 candies and 52 cookies in total How many...
Correct Answer: 2
Response: Re:
<reasoning>
Step 1: We are given that you had 26 bags with equal number of cookies. Let 'x' be the number of cookies in each bag.
Step 2: You had 15 candies and 52 cookies in total.
Step 3: The to...
Extracted: 2
Question: Brenda's mother made cookies for 14. If each of them had 30 cookies How many cookies did she prepare...
Correct Answer: 420
Response: Re:
<reasoning>
Step 1: We are given that Brenda's mother made 14 cookies.
Step 2: There are 2 people (Brenda and her mother).
Step 3: Eac



Question: If Lewis earns a total of $ 460 during 5 weeks of harvest How much money does he earn each week?...
Correct Answer: 92
Response: Re:
<reasoning>
Step 1: We are given that Lewis earns a total of $460 during 5 weeks of harvest.
Step 2: We need to find out how much money he earns each week.
To find the weekly earnings, we divide t...
Extracted: 92
Question: Katie had 91 ds games and her new friends had 88 games and old friends had 53 games. How many games ...
Correct Answer: 141
Response: Re:
<reasoning>
Step 1: Let 'x' represent the number of games Katie's friends have.
Step 2: We know that Katie's games + Friends' games = Total games.
Step 3: We are given that Katie's games = 91, and...
Extracted: 232
Question: Bobby ate 28 pieces of candy. Then he ate 42 more. He also ate 63 pieces of chocolate. How many piec...
Correct Answer: 70
Response: Re:
<reasoning>
Step 1: Bobby started with 28 pieces of candy.
Step 2: He ate 42 more pieces of candy.
Step 3: He ate 63 pieces of chocol



Question: The school is planning a field trip. The school has 24 classrooms. There are 120 students in the sch...
Correct Answer: 5
Response: Re:
<reasoning>
Step 1: The school has 24 classrooms.
Step 2: There are 120 students in total.
Step 3: Each classroom has the same number of students, so the number of students per classroom is 120 / ...
Extracted: 5
Question: Luke played a trivia game and scored 154 points. If he gained the 11 points in each round How many r...
Correct Answer: 14
Response: Re:
<reasoning>
Step 1: Let 'x' be the number of rounds Luke played.
Step 2: In each round, he gained 11 points. So, the total points gained is 11x.
Step 3: Luke scored a total of 154 points, so we ha...
Extracted: 14
Question: Baker made 133 cakes. If he sold 51 of them How many more cakes did baker make than those he sold?...
Correct Answer: 82
Response: Re:
<reasoning>
Step 1: Baker made 133 cakes.
Step 2: He sold 51 cakes.
Step 3: We want to find the difference between the cakes made and t



Question: Debby bought 95 soda bottles and 180 water bottles when they were on sale. If she drank 15 water bot...
Correct Answer: 12
Response: Re:
<reasoning>
Step 1: Let 'w' be the number of water bottles Debby drank in a day, and 's' be the number of soda bottles she drank in a day. We are given that w = 15 and s = 54.
Step 2: The total nu...
Extracted: 3
Question: The Razorback shop makes $ 5 dollars off each jersey and $ 215 off each t-shirt. During the Arkansas...
Correct Answer: 4300
Response: Re:
<reasoning>
Step 1: Calculate the cost of the t-shirts. The shop sold 20 t-shirts at $5 off each, so the cost of the t-shirts is 20 * $5 = $100.
Step 2: Calculate the revenue from the jerseys. The...
Extracted: 13780
Question: The school is planning a field trip. The school has 84 classrooms. 95 school buses are fully filled ...
Correct Answer: 11210
Response: Re:
<reasoning>
Step 1: We are given that the school has 84 classrooms.
Step 2: We are given that there are 95 school buses.
Ste



Question: Because of the decision Sofia asked the students to suggest specific types of food. 457 students sug...
Correct Answer: 394
Response: Re:
<reasoning>
Step 1: Let 'x' be the number of students who suggested adding mashed potatoes.
Step 2: Let 'y' be the number of students who suggested adding bacon.
Step 3: We are given that 457 stud...
Extracted: 394
Question: Paul got a box of 531 crayons and 38 erasers for his birthday. At the end of the school year he only...
Correct Answer: 353
Response: Re:
<reasoning>
Step 1: Paul started with 531 crayons and 38 erasers.
Step 2: He had 391 crayons left.
Step 3:  Crayons - erasers = 531 - 391 = 140.
</reasoning>
<answer>140</answer>...
Extracted: 140
Question: Paul had 50 books. After buying some in a garage sale he had 151 left. How many books did he buy?...
Correct Answer: 101
Response: Re:
<reasoning>
Step 1: Let 'x' be the number of books Paul bought at the garage sale.
Step 2: Paul started with 50 books and had 151 left, so 50 - x =



Question: A farmer had 160 tomatoes in his garden. If he picked 56 of them yesterday and 41 today. How many di...
Correct Answer: 104
Response: Re:
<reasoning>
Step 1: The farmer started with 160 tomatoes.
Step 2: He picked 56 tomatoes yesterday.
Step 3: The number of tomatoes remaining after yesterday is 160 - 56 = 104 tomatoes.
Step 4: He p...
Extracted: 63
Question: Paul had 115 books. He sold 78 books in a garage sale. How many books does he have left with him?...
Correct Answer: 37
Response: Re:
<reasoning>
Step 1: Paul initially had 115 books.
Step 2: He sold 78 books.
Step 3: To find the number of books remaining, we subtract the number of books sold from the initial number of books: 11...
Extracted: 37
Question: There are 141 pots. Each pot has 71 flowers and 91 sticks in it. How many flowers are there in all?...
Correct Answer: 10011
Response: Re:
<reasoning>
Step 1: We are given that there are 141 pots.
Step 2: Each pot has 71 flowers.
Step 3: Each pot has 91 sticks.
Step 4: 



Question: The ring toss game at the carnival made the same amount of money each day. In total in 30 days they ...
Correct Answer: 14
Response: Re:
<reasoning>
Step 1: Let 'x' be the amount of money the ring toss game made per day.
Step 2: The total earnings in 30 days are 420 dollars.
Step 3: The total earnings with game is 22 dollars.
Step ...
Extracted: 13.27
Question: There were 22 parents in the program and 676 people in total. How many pupils were present in the pr...
Correct Answer: 654
Response: Re:
<reasoning>
Step 1: We are given that there were 22 parents in the program.
Step 2: We are given that there were 676 people in total.
Step 3: We need to find the number of pupils present in the pr...
Extracted: 654
Question: Baker made 13 cakes. He sold 91 of them and bought 154 new cakes. How many more cakes did baker buy ...
Correct Answer: 63
Response: Re:
<reasoning>
Step 1: Baker started with 13 cakes.
Step 2: He sold 91 cakes.
Step 3: The number of cakes remaining is 13 - 91 = 



Question: Nell collects cards. She had 309 baseball cards and 356 Ace cards. She gave some of her cards to Jef...
Correct Answer: 266
Response: Re:
<reasoning>
Step 1: Let 'B' be the number of baseball cards Nell had initially, and 'A' be the number of Ace cards Nell had initially. We are given that B = 309 and A = 356.
Step 2: Nell gave some...
Extracted: None
Question: Paul got a box of 267 crayons for his birthday. During the school year he lost 231 crayons while he ...
Correct Answer: 77
Response: Re:
<reasoning>
Step 1: We are given that Paul started with 267 crayons.
Step 2: He lost 231 crayons during the school year.
Step 3: He gave away 308 crayons to his friends.
Step 4: We need to find th...
Extracted: 77
Question: There are 142 bananas and 356 oranges in Philip's collection. If the bananas are organized into 47 g...
Correct Answer: 2
Response: Re:
<reasoning>
Step 1: We are given that there are 142 bananas and 356 oranges in Philip‚Äôs collection.
Step 2: The bananas are org



Question: Next on his checklist is wax to stick the feathers together. He needs 159 g of wax more. If the feat...
Correct Answer: 469
Response: Re:
<reasoning>
Step 1: We are given that he needs 159 grams of wax more.
Step 2: We are given that the feathers require a total of 628 grams of wax.
Step 3: We want to find out how many grams of wax ...
Extracted: 469
Question: Nell collects cards. She had 239 baseball cards and 38 Ace cards. She gave some of her cards to Jeff...
Correct Answer: 265
Response: Re:
<reasoning>
Step 1: We are given that Nell had 239 baseball cards and 38 Ace cards.
Step 2: She gave some of her cards to Jeff. Let 'x' be the number of cards Nell gave to Jeff.
Step 3: After givi...
Extracted: 485
Question: The Razorback shop makes $ 210 dollars off each jersey and $ 240 off each t-shirt. During the Arkans...
Correct Answer: 30
Response: Re:
<reasoning>
Step 1: Let 'x' be the cost of a jersey and 'y' be the cost of a t-shirt. We are given that the Razorback shop make



Question: Baker's friend bought 137 cakes from him. If he had made 169 cakes initially How many cakes would ba...
Correct Answer: 32
Response: Re:
<reasoning>
Step 1: Baker's friend bought 137 cakes.
Step 2: The initial number of cakes made was 169.
Step 3: The difference between the number of cakes bought and the number made is 137 - 169 = ...
Extracted: 32
Question: Nell collects baseball cards. She gave 301 of her cards to Jeff and now has 154 cards left.. How man...
Correct Answer: 455
Response: Re:
<reasoning>
Step 1: Nell started with 301 cards.
Step 2: She gave away 301 cards to Jeff. So, the number of cards remaining is 301 - 301 = 0 cards.
Step 3: Now she has 154 cards left.
Step 4: The ...
Extracted: 0
Question: Baker made 7 cakes and 148 pastries. If he sold 15 cakes and 103 pastries How many pastries would ba...
Correct Answer: 45
Response: Re:
<reasoning>
Step 1: Let 'cakes' be the number of cakes Baker made, and 'pastries' be the number of pastries Baker made.
Step 2: We 



Question: The Razorback t-shirt shop makes $ 106 dollars off each t-shirt sold. During the Arkansas game and t...
Correct Answer: 127
Response: Re:
<reasoning>
Step 1: The Razorback t-shirt shop makes $106 off each t-shirt sold.
Step 2: They sold a total of 242 t-shirts.
Step 3: During the Arkansas game, they sold 115 t-shirts.
Step 4: The am...
Extracted: 0
Question: Paul got a box of some crayons for his birthday. By the end of the school year he only had 291 crayo...
Correct Answer: 606
Response: Re:
<reasoning>
Step 1: We are given that Paul had 291 crayons at the end of the school year.
Step 2: He lost or gave away 315 crayons.
Step 3: We need to find the number of crayons he got for his bir...
Extracted: None
Question: There are 3941 skittles in Steven's skittles collection. Steven also has 4950 erasers. If the eraser...
Correct Answer: 10
Response: Re:
<reasoning>
Step 1: We are given that Steven has 3941 skittles in his collection.
Step 2: Steven also has 4950 erasers.
Step 3: 



Question: For Gwen's birthday she received 2 dollars from her mom. Her dad gave her 4 more dollars. If she spe...
Correct Answer: 3
Response: Re:
<reasoning>
Step 1: Gwen received 2 dollars from her mom.
Step 2: Her dad gave her 4 more dollars, so the total amount given is 2 + 4 = 6 dollars.
Step 3: Gwen spent 3 dollars, so the remaining mo...
Extracted: 3
Question: 3 birds and 4 storks were sitting on the fence. 2 more birds came to join them. How many more birds ...
Correct Answer: 1
Response: Re:
<reasoning>
Step 1: We are given that there were initially 3 birds and 4 storks on the fence.
Step 2: 2 more birds came to join them.
Step 3: We want to find out how many more birds than storks ar...
Extracted: 1
Question: Jerry had 7 action figures and 2 books on a shelf in his room. Later he added 4 more books to the sh...
Correct Answer: 1
Response: Re:
<reasoning>
Step 1: Jerry initially had 7 action figures and 2 books.
Step 2: He added 4 more books.
Step 3: We need to find how many mo



Question: 6 birds and 3 storks were sitting on the fence. 2 more storks came to join them. How many more birds...
Correct Answer: 1
Response: Re:
<reasoning>
Step 1: We are given that there were initially 6 birds and 3 storks.
Step 2: 2 more storks came to join them, so the new number of storks is 3 + 2 = 5.
Step 3: We want to find the diff...
Extracted: 1
Question: Faye had 35 packs of pencils each one having 4 pencils. She was placing her pencils into rows with 2...
Correct Answer: 70
Response: Re:
<reasoning>
Step 1: Faye has 35 packs of pencils.
Step 2: Each pack has 4 pencils.
Step 3: Total number of pencils = 35 * 4 = 140 pencils.
Step 4: She is placing pencils into rows with 2 pencils i...
Extracted: 70
Question: After resting they decided to go for a swim. The depth of the water is 2 times Dean's height. Dean i...
Correct Answer: 12
Response: Re:
<reasoning>
Step 1: We are given that Dean is 8 feet shorter than Ron. Let Dean's height be 'D' and Ron's height be 'R'. We can write



Question: Haley grew 14 trees in her backyard. After a typhoon 9 died. How many more trees died in the typhoon...
Correct Answer: 4
Response: Re:
<reasoning>
Step 1: The problem states that Haley grew 14 trees.
Step 2: We are given that 9 trees died.
Step 3: We want to find the difference between the number of trees that died and the number...
Extracted: 5
Question: Winter is almost here and most animals are migrating to warmer countries. There were 87 bird familie...
Correct Answer: 73
Response: Re:
<reasoning>
Step 1: We are given that there were 87 bird families living near the mountain.
Step 2: We are told that 7 bird families flew away for winter.
Step 3: We need to find how many more bir...
Extracted: 80
Question: Mary is baking a cake. The recipe calls for 14 cups of flour and 6 cups of sugar. She already put in...
Correct Answer: 3
Response: Re:
<reasoning>
Step 1: We are given the recipe calls for 14 cups of flour and 6 cups of sugar.
Step 2: Mary has already put in 5 cups of 



Question: Frank had 33 pieces of candy. He lost 24 of them. If he put the remaining pieces into bags with 9 pi...
Correct Answer: 1
Response: Re:
<reasoning>
Step 1: Frank started with 33 pieces of candy.
Step 2: He lost 24 pieces of candy, so the remaining pieces are 33 - 24 = 9 pieces.
Step 3: He wants to put the remaining pieces into bag...
Extracted: 1
Question: Robin's hair was 14 inches long. It grew by 8 inches. Then he cut off 20 inches. How long is his hai...
Correct Answer: 2
Response: Re:
<reasoning>
Step 1: Robin's original hair length is 14 inches.
Step 2: His hair grew by 8 inches, so the new length is 14 + 8 = 22 inches.
Step 3: He cut off 20 inches, so the remaining length is ...
Extracted: 2
Question: Zachary did 51 push-ups in gym class today. David did 22 more push-ups than Zachary. John did 4 push...
Correct Answer: 69
Response: Re:
<reasoning>
Step 1: We are given that Zachary did 51 push-ups.
Step 2: David did 22 more push-ups than Zachary. So, David's push-ups = 



Question: Edward spent $ 6 to buy books and $ 16 to buy pens. Now he has $ 19. How much did Edward have before...
Correct Answer: 41
Response: Re:
<reasoning>
Step 1: We are given that Edward spent $6 on books and $16 on pens.
Step 2: The total amount spent is $6 + $16 = $22.
Step 3: We are told that now Edward has $19.
Step 4: The amount Ed...
Extracted: 3
Question: In Haley's class 4 are boys who love to play marbles. If Haley has 23 marbles and wants to keep 15 m...
Correct Answer: 2
Response: Re:
<reasoning>
Step 1: Haley has a total of 23 marbles.
Step 2: Haley wants to keep 15 marbles for herself.
Step 3: The number of marbles to be given to each boy is total marbles - marbles kept = 23 ...
Extracted: 8
Question: Matthew had 15 crackers. If Matthew gave equal numbers of crackers to his 5 friends and still had 10...
Correct Answer: 1
Response: Re:
<reasoning>
Step 1: Matthew initially had 15 crackers.
Step 2: He gave equal numbers of crackers to his 5 friends.
Step 3: Let 'x' be t



Question: Zachary did 44 push-ups in gym class today. David did 58 more push-ups than zachary. How many push-u...
Correct Answer: 146
Response: Re:
<reasoning>
Step 1: We are given that Zachary did 44 push-ups.
Step 2: David did 58 more push-ups than Zachary.
Step 3: Let 'x' represent the number of push-ups David did.  Then x = 44 + 58 = 102....
Extracted: 146
Question: Marco and his dad went strawberry picking. Marco's strawberries weighed 10 pounds. If together their...
Correct Answer: 6
Response: Re:
<reasoning>
Step 1: We are given that Marco's strawberries weighed 10 pounds and together their strawberries weighed 26 pounds.
Step 2: We want to find the difference between the weights of his da...
Extracted: 16
Question: Marco and his dad went strawberry picking. Together their strawberries weighed 35 pounds. Marco's st...
Correct Answer: 9
Response: Re:
<reasoning>
Step 1: We are given that the total weight of Marco and his dad's strawberries is 35 pounds.
Step 2: Marco's strawberri



Question: Winter is almost here and most animals are migrating to warmer countries. There are 85 bird families...
Correct Answer: 25
Response: Re:
<reasoning>
Step 1: We are given the number of bird families living near the mountain, which is 85.
Step 2: We are given the number of bird families that flew away to africa, which is 23.
Step 3: ...
Extracted: 25
Question: Bryan took a look at his books and magazines. If he has 23 books and 61 magazines in each of his 29 ...
Correct Answer: 2436
Response: Re:
<reasoning>
Step 1: We are given that Bryan has 23 books and 61 magazines in each of his 29 bookshelves.
Step 2: The total number of books is the number of books per bookshelf multiplied by the nu...
Extracted: 2436
Question: He then went to see the oranges being harvested. He found out that they harvest 76 sacks per day and...
Correct Answer: 600
Response: Re:
<reasoning>
Step 1: The problem states that they harvest 76 sacks of oranges per day and discard 64 of them.
Step 2: We need t



Question: Randy has 79 blocks. He uses 14 blocks to build a tower and 11 blocks to build a house. How many blo...
Correct Answer: 54
Response: Re:
<reasoning>
Step 1: We are given that Randy has a total of 79 blocks.
Step 2: He uses 14 blocks for the tower and 11 blocks for the house.
Step 3: The total number of blocks used is 14 + 11 = 25 b...
Extracted: 54
Question: Marco and his dad went strawberry picking. Marco's strawberries weighed 30 pounds. If together their...
Correct Answer: 13
Response: Re:
<reasoning>
Step 1: We are given that Marco's strawberries weighed 30 pounds.
Step 2: Their strawberries weighed together 47 pounds.
Step 3: We need to find the difference in weight between Marco'...
Extracted: 30
Question: Danny collects bottle caps. He threw away 60 of the old ones at the park while he found 58 bottle ca...
Correct Answer: 69
Response: Re:
<reasoning>
Step 1: We are given that Danny initially had 67 bottle caps.
Step 2: He threw away 60 bottle caps, so the remaining bo



Question: Paul had 108 books. After selling some books in a garage sale and giving 35 books to his friend he h...
Correct Answer: 11
Response: Re:
<reasoning>
Step 1: We are given that Paul initially had 108 books.
Step 2: He gave 35 books to his friend, so the number of books left is 108 - 35 = 73 books.
Step 3: We are told that he had 62 b...
Extracted: 73
Question: Helen the hippo and her friends are preparing for thanksgiving at Helen's house. Helen baked 19 choc...
Correct Answer: 25
Response: Re:
<reasoning>
Step 1: We need to find the difference between the number of chocolate chip cookies and the number of raisin cookies.
Step 2: Calculate the number of chocolate chip cookies: 19 + 231 =...
Extracted: 218
Question: A farmer had 105 green tomatoes and 71 red tomatoes in his garden. If he picked 137 tomatoes How man...
Correct Answer: 39
Response: Re:
<reasoning>
Step 1: We are given that the farmer had 105 green tomatoes and 71 red tomatoes.
Step 2: The total number of tomatoes 



Question: Nell collects baseball cards. She had 573 cards. She gave 195 cards to John and yet some more to Jef...
Correct Answer: 168
Response: Re:
<reasoning>
Step 1: We are given that Nell initially had 573 cards.
Step 2: She gave 195 cards to John.
Step 3: The number of cards remaining is 573 - 195 = 378 cards.
Step 4: The number of cards ...
Extracted: 183
Question: Paul got a box of 440 crayons for his birthday. During the school year he gave 111 crayons to his fr...
Correct Answer: 223
Response: Re:
<reasoning>
Step 1: Let 'C' be the initial number of crayons Paul had.
Step 2: Let 'L' be the number of crayons Paul lost.
Step 3: Let 'G' be the number of crayons Paul gave away.
Step 4: Let 'R' ...
Extracted: None
Question: In a school there are 706 girls and 222 boys. 418 more girls joined the school. How many pupils are ...
Correct Answer: 1346
Response: Re:
<reasoning>
Step 1: Initially, there were 706 girls and 222 boys, so the total number of pupils was 706 + 222 = 928.
Step 2:



Question: Every day Ryan spends 7 hours on learning english, 2 hours on learning chinese and 4 hours on learni...
Correct Answer: 3
Response: Yes, let's solve this problem step-by-step.

<reasoning>
Step 1: We are given that Ryan spends 7 hours on learning English, 2 hours on learning Chinese, and 4 hours on learning Spanish.
Step 2: We nee...
Extracted: 3
Question: Jack received 10 emails in the morning, 5 emails in the afternoon and 4 emails in the evening. How m...
Correct Answer: 1
Response: Yes, let's solve this problem step-by-step.

<reasoning>
Step 1: We are given that Jack received 10 emails in the morning, 5 emails in the afternoon and 4 emails in the evening.
Step 2: We need to fin...
Extracted: 1
Question: Jack received 5 emails and 6 letters in the morning. He then received 9 emails and 7 letters in the ...
Correct Answer: 13
Response: Yes, let's solve this problem step-by-step.

<reasoning>
Step 1: We are given that Jack received 5 emails and 6 letters in the morning.
Ste



Question: 2 birds were sitting on the fence. Some more birds came to join them. If there are a total of 6 bird...
Correct Answer: 4
Response: Yes, I‚Äôm ready to tackle this problem! Let‚Äôs break it down systematically.

<reasoning>
Step 1: We are given that initially there were 2 birds on the fence.
Step 2: Then, more birds came to join them....
Extracted: 4
Question: Dan has $ 3. He bought 2 candy bar for $ 4, each one costing the same amount of money. How much did ...
Correct Answer: 2
Response: Yes, I‚Äôm ready to help! Let‚Äôs solve this problem.

<reasoning>
Step 1: We know Dan has $3.
Step 2: He bought 2 candy bars for $4 each.
Step 3: Let 'x' be the cost of each candy bar.
Step 4: The total ...
Extracted: 2
Question: Every day Ryan spends 7 hours on learning english and some more hours on learning chinese. If he spe...
Correct Answer: 5
Response: Yes, I‚Äôm ready to assist! Let‚Äôs solve this problem.

<reasoning>
Step 1: We are given that Ryan spends 7 hours on learning Engli



Question: Jack received a total of 9 emails in the day. If he received 7 emails in the morning and some more i...
Correct Answer: 2
Response: Yes, I‚Äôm ready to assist! Let‚Äôs solve this problem step-by-step.

<reasoning>
Step 1: We are given that Jack received a total of 9 emails in the day.
Step 2: He received 7 emails in the morning.
Step ...
Extracted: 9
Question: Rachel has 52 apple trees. She picked 2 apples from 1 of her trees. Now the tree has 7 apples still ...
Correct Answer: 9
Response: Yes, I‚Äôm ready to put on my mathematical reasoning expert hat! Let‚Äôs solve this problem.

<reasoning>
Step 1: We are given that Rachel has 52 apple trees.
Step 2: She picked 2 apples from 1 of her tre...
Extracted: 52
Question: Last week Fred had 86 dollars and Jason had 5 dollars. Fred washed cars over the weekend and now has...
Correct Answer: 14
Response: Yes, I‚Äôm ready to analyze the problem and provide a detailed solution following your specified format.

<reasoning>
Step 1: We a



Question: Mary is baking a cake. The recipe calls for 9 cups of flour and 5 cups of sugar. She already put in ...
Correct Answer: 6
Response: Yes, I‚Äôm ready to help! Let‚Äôs solve this problem.

<reasoning>
Step 1: The recipe calls for 9 cups of flour and 5 cups of sugar.
Step 2: Mary has already put in 3 cups of flour and 62 cups of sugar.
S...
Extracted: None
Question: Jack received 5 emails in the morning, 8 emails in the afternoon and 72 emails in the evening. How m...
Correct Answer: 13
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We are given the number of emails received in the morning, afternoon and evening.
Step 2: Emails in the morning = 5
Step 3: Emails in the af...
Extracted: 13
Question: Rachel had to complete 7 pages of math homework. If she had to complete 4 more pages of math homewor...
Correct Answer: 3
Response: Yes, let‚Äôs solve this problem step-by-step.

<reasoning>
Step 1: We are given that Rachel had to complete 7 pages of math homew



Question: For Gwen's birthday she received 3 dollars from her mom. Her dad gave her 6 more dollars. If she spe...
Correct Answer: 3
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: Gwen received 3 dollars from her mom.
Step 2: Her dad gave her 6 more dollars, so the total amount received from her dad is 3 + 6 = 9 dollar...
Extracted: 6
Question: Tiffany was collecting cans for recycling. On monday she had 8 bags of cans. The next day she found ...
Correct Answer: 1
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We need to find the number of cans collected on Monday. We are given that Tiffany had 8 bags of cans on Monday.
Step 2: The next day she fou...
Extracted: 7
Question: Every day Ryan spends 6 hours on learning english and 2 hours on learning chinese. How many more hou...
Correct Answer: 4
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We need to find the difference in the number of hours spent on each subjec



Question: 5 red peaches, 14 yellow peaches and 6 green peaches are in the basket. How many green and yellow pe...
Correct Answer: 20
Response: yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We are given the number of each type of peach.
Step 2: Number of red peaches = 5
Step 3: Number of yellow peaches = 14
Step 4: Number of gre...
Extracted: 29
Question: Emily is making bead necklaces for her 44 friends. She made 26 necklaces and each necklace takes 2 b...
Correct Answer: 52
Response: yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We know Emily made 26 necklaces.
Step 2: Each necklace takes 2 beads.
Step 3: The total number of beads is the number of necklaces multiplie...
Extracted: 52
Question: Julia played tag with 5 kids on monday, 9 kids on tuesday and 15 kids on wednesday. How many kids di...
Correct Answer: 20
Response: yes, let‚Äôs solve this problem step by step.

<reasoning>
Step 1: We are given information about the number of kids played with ea



Question: Being his favorite, he saved checking on the grapevines for his last stop. He was told by 94 of the ...
Correct Answer: 15
Response: yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We are given that 94 pickers said that 90 drums of grapes were filled in 6 days.
Step 2: We want to find out how many drums of grapes are fi...
Extracted: 15
Question: Faye was placing her pencils and crayons into 19 rows with 4 pencils and 27 crayons in each row. How...
Correct Answer: 76
Response: yes, I‚Äôm ready!

<reasoning>
Step 1: We are given that there are 19 rows and 4 pencils and 27 crayons in each row.
Step 2: The total number of pencils is the number of rows multiplied by the number of...
Extracted: 76
Question: Bryan took a look at his books as well. Each bookshelf contains 2 books. If he has a total of 38 boo...
Correct Answer: 19
Response: yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We are given that Bryan has a total of 38 books.
Step 2: Each bookshe



Question: Dave had 24 apps and 9 files on his phone. After deleting some apps and files he had 5 files and 12 ...
Correct Answer: 7
Response: yes, let‚Äôs solve this problem step by step.

<reasoning>
Step 1: Dave initially had 24 apps and 9 files.
Step 2: After deleting some apps and files, he had 5 files and 12 apps left.
Step 3: The number...
Extracted: 3
Question: Last week Fred had 49 dollars and Jason had 3 dollars. Fred washed cars over the weekend and now has...
Correct Answer: 60
Response: yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We are given that Fred had 49 dollars last week and now has 112 dollars. This means Fred's total earnings increased by 112 - 49 = 63 dollars...
Extracted: 60
Question: Dave had 15 apps and 24 files on his phone. After deleting some apps and files he had 21 apps and 4 ...
Correct Answer: 17
Response: yes, let‚Äôs solve this problem.

<reasoning>
Step 1: We are given that Dave initially had 15 apps and 24 files.
Step 2: After deleti



Question: A grocery store had a total of 17 bottles of which 9 were bottles of regular soda and the rest were ...
Correct Answer: 8
Response: yes, let‚Äôs do this!

<reasoning>
Step 1: We are given that the grocery store had a total of 17 bottles of bottles.
Step 2: 9 bottles were regular soda.
Step 3: The rest were bottles of diet soda.
Step...
Extracted: 8
Question: He then went to see the oranges being harvested. He found out that they harvest 8 sacks per day and ...
Correct Answer: 280
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We are given that the oranges are harvested at a rate of 8 sacks per day.
Step 2: Each sack contains 35 oranges.
Step 3: Therefore, the numb...
Extracted: 280
Question: Robin has 28 packages of gum and 13 packages of candy. There are 4 pieces in each package. How many ...
Correct Answer: 112
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We are given that Robin has 28 packages of gum and 13 packages of ca



Question: Edward spent $ 16 to buy 92 books each book costing him the same amount of money. Now he has $ 6. Ho...
Correct Answer: 22
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We are given that Edward spent $16 to buy 92 books.
Step 2: The cost of each book is the same, and we are told that the total cost is $16.
S...
Extracted: 0.173
Question: Haley has 20 marbles. In her class 2 boys love to play marbles. If she distributes her marbles equal...
Correct Answer: 10
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We are given that Haley has 20 marbles.
Step 2: We are told that 2 boys love to play marbles.
Step 3: This means that the number of marbles ...
Extracted: 10
Question: Mary is baking a cake. The recipe calls for 6 cups of flour and 13 cups of sugar. She already put in...
Correct Answer: 21
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We are given the recipe calls for 6 cups of flour and 13 cups of s



Question: There were 10 roses in the vase. Jessica cut 8 more roses from her flower garden and put them in the...
Correct Answer: 18
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We are given that there were initially 10 roses in the vase.
Step 2: Jessica cut 8 roses.
Step 3: We need to find the number of roses remain...
Extracted: 2
Question: Rebecca wants to split a collection of eggs into groups of 6. Rebecca has 18 eggs 72 bananas and 66 ...
Correct Answer: 3
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We need to find the number of groups of eggs. We are given that the eggs are to be split into groups of 6.
Step 2: We are given that Rebecca...
Extracted: 3
Question: He then went to see the oranges being harvested. He found out that the harvest will go on for 4 days...
Correct Answer: 14
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We are given that the harvest will go on for 4 days.
Step 2: We are give



Question: Marco and his dad went strawberry picking. His strawberries weighed 8 pounds while his dad's strawbe...
Correct Answer: 40
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We are given the weights of Marco‚Äôs and his dad‚Äôs strawberries.
Step 2: Marco‚Äôs strawberries weigh 8 pounds, and his dad‚Äôs strawberries weig...
Extracted: 40
Question: Matthew had some crackers. If Matthew gave 2 crackers to each of his 11 friends How many crackers di...
Correct Answer: 22
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We need to find the total number of crackers Matthew gave out.
Step 2: Each friend receives 2 crackers, and Matthew has 11 friends.
Step 3: ...
Extracted: 22
Question: Jerry had 8 action figures and 9 books on a shelf in his room. Later he added 10 more books to the s...
Correct Answer: 19
Response: Yes, I understand. Let‚Äôs do this!

<reasoning>
Step 1: We are given that Jerry started with 8 action figures and 9 books 



Question: The grasshopper, the frog and the mouse had a jumping contest. The grasshopper jumped 19 inches. The...
Correct Answer: 15
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: Let's represent the jumps as follows:
   - Grasshopper's jump: G = 19 inches
   - Frog's jump: F
   - Mouse's jump: M

Step 2: We're given t...
Extracted: None
Question: Faye was placing her pencils into rows with 5 pencils in each row. If she had 35 pencils and 7 crayo...
Correct Answer: 7
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We are given that Faye has a total of 35 pencils and 7 crayons.
Step 2: We know that each row contains 5 pencils.
Step 3: To find the number...
Extracted: 7
Question: Jackie has 10 apples. Adam has 8 apples. How many more apples does Jackie have than Adam?...
Correct Answer: 2
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We are given that Jackie has 10 apples and Adam has 8 apples.
Step 2: To find the



Question: Frank was reading through his favorite book. The book had 3 chapters, each with the same number of p...
Correct Answer: 198
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We are given that the book has a total of 594 pages.
Step 2: The book has 3 chapters, and each chapter has the same number of pages.
Step 3:...
Extracted: 198
Question: Randy has 95 blocks. He uses 20 blocks to build a house and 50 blocks to build a tower. How many mor...
Correct Answer: 30
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We are given that Randy has 95 blocks.
Step 2: He uses 20 blocks to build a house and 50 blocks to build a tower.
Step 3: We need to find th...
Extracted: 30
Question: Dave had 15 apps on his phone. He added 71 new apps. After deleting some he had 14 left. How many mo...
Correct Answer: 1
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: We are given that Dave had 15 apps on his phone initially.
Step 2: H



Question: There are 14 different movies and 15 different books in the ' crazy silly school ' series. If you re...
Correct Answer: 1
Response: Yes, I‚Äôm ready to tackle this problem! Let‚Äôs break it down systematically.

<reasoning>
Step 1: We are given that there are 14 different movies and 15 different books in the ' crazy silly school ' ser...
Extracted: None
Question: 44 campers went rowing in the morning 39 campers went rowing in the afternoon and 31 campers went ro...
Correct Answer: 5
Response: Yes, I‚Äôm ready to tackle this problem! Let‚Äôs break it down systematically.

<reasoning>
Step 1: We are given the number of campers who went rowing in the morning, afternoon, and evening.
Step 2: Numbe...
Extracted: 0
Question: The grasshopper, the frog and the mouse had a jumping contest. The grasshopper jumped 14 inches. The...
Correct Answer: 21
Response: Yes, I‚Äôm ready to tackle this problem! Let‚Äôs break it down systematically.

<reasoning>
Step 1: We are given the jumping dis



Question: He also had 56 aquariums for saltwater animals and 10 aquariums for freshwater animals. Each aquariu...
Correct Answer: 2184
Response: Yes, I‚Äôm ready to tackle this problem! Let‚Äôs break it down systematically.

<reasoning>
Step 1: We are given that Tyler has 56 aquariums for saltwater animals and 10 aquariums for freshwater animals.
...
Extracted: 2184
Question: 62 campers went rowing in the morning. 39 campers went rowing in the afternoon. How many more camper...
Correct Answer: 23
Response: Yes, I‚Äôm ready to tackle this problem! Let‚Äôs break it down systematically.

<reasoning>
Step 1: We are given that 62 campers went rowing in the morning and 39 campers went rowing in the afternoon.
Ste...
Extracted: 23
Question: Frank was reading through his favorite book. The book had 612 pages equally distributed over 24 chap...
Correct Answer: 102
Response: Yes, I‚Äôm ready to tackle this problem! Let‚Äôs break it down systematically.

<reasoning>
Step 1: We are given that the 



Question: Frank was reading through his favorite book. He read 8 pages per day. If the book had 576 pages How ...
Correct Answer: 72
Response: Yes, I‚Äôm ready to tackle this problem! Let‚Äôs break it down systematically.

<reasoning>
Step 1: We are given that the book has 576 pages.
Step 2: Frank reads 8 pages per day.
Step 3: We want to find h...
Extracted: 72
Question: He then went to see the oranges being harvested. He found out that they harvest 67 sacks per day and...
Correct Answer: 420
Response: Yes, I‚Äôm ready to tackle this problem! Let‚Äôs break it down systematically.

<reasoning>
Step 1: We are given that he harvested 67 sacks of oranges per day and each sack contains 12 oranges.
Step 2: To...
Extracted: 804
Question: Paige was helping her mom plant flowers and together they planted 36 seeds. They put 12 seeds in eac...
Correct Answer: 3
Response: Yes, I‚Äôm ready to tackle this problem! Let‚Äôs break it down systematically.

<reasoning>
Step 1: We are given that Paige an



Question: Paco had 39 sweet cookies and 6 salty cookies. He ate 23 salty cookies and 32 sweet cookies. How man...
Correct Answer: 9
Response: Yes, I‚Äôm ready to tackle this problem! Let‚Äôs break it down systematically.

<reasoning>
Step 1: We are given that Paco had a total of 39 sweet cookies and 6 salty cookies.
Step 2: He ate 23 salty cook...
Extracted: 9
Question: 19 red peaches, 11 yellow peaches and 12 green peaches are in the basket. How many more red peaches ...
Correct Answer: 8
Response: Yes, I‚Äôm ready to tackle this problem! Let‚Äôs break it down systematically.

<reasoning>
Step 1: We are given the number of red peaches, yellow peaches, and green peaches.
Step 2: Number of red peaches...
Extracted: 8
Question: Danny collects bottle caps and wrappers. He found 46 wrappers and 50 bottle caps at the park. Now he...
Correct Answer: 4
Response: Yes, I‚Äôm ready to tackle this problem! Let‚Äôs break it down systematically.

<reasoning>
Step 1: We are given that Danny found 46



Question: The grasshopper, the frog and the mouse had a jumping contest. The grasshopper jumped 19 inches. The...
Correct Answer: 58
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given that the grasshopper jumped 19 inches.
Step 2: The frog jumped 39 inches farther than the grasshopper.  This means the frog jumped 19 ...
Extracted: 58
Question: David did 30 more push-ups than Zachary in gym class today. If David did 37 push-ups How many push-u...
Correct Answer: 7
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given that David did 30 more push-ups than Zachary. We can represent this as: David's push-ups - Zachary's push-ups = 30.
Step 2: Let 'x' be...
Extracted: 7
Question: 18 red peaches, 14 yellow peaches and 17 green peaches are in the basket. How many red and yellow pe...
Correct Answer: 32
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given the number of red peaches: 18.
Step 2: We are given the number of



Question: Julia played tag with 16 kids on monday. If she played tag with 12 more kids on monday than on tuesd...
Correct Answer: 4
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given that Julia played tag with 16 kids on Monday.
Step 2: On Tuesday, she played with 12 more kids than on Monday.  So, the number of kids...
Extracted: 28
Question: There are 16 different books and 14 different movies in the ' crazy silly school ' series. If you re...
Correct Answer: 2
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given the number of different books (16) and the number of different movies (14).
Step 2: We read 19 books and watched 13 movies.
Step 3: We...
Extracted: 3
Question: The grasshopper, the frog and the mouse had a jumping contest. The grasshopper jumped 21 inches. The...
Correct Answer: 41
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given that the grasshopper jumped 21 inches.
Step 2: The grasshopper jum



Question: If Lewis earns a total of $ 460 during 5 weeks of harvest How much money does he earn each week?...
Correct Answer: 92
Response: Yes, let's tackle this problem!

<reasoning>
Step 1: We are given that Lewis earns a total of $460 during 5 weeks of harvest.
Step 2: We are asked to find out how much money he earns each week.
Step 3...
Extracted: 92
Question: Katie had 91 ds games and her new friends had 88 games and old friends had 53 games. How many games ...
Correct Answer: 141
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given that Katie has 91 ds games.
Step 2: Her new friends have 88 games.
Step 3: Old friends have 53 games.
Step 4: We want to find the tota...
Extracted: 179
Question: Bobby ate 28 pieces of candy. Then he ate 42 more. He also ate 63 pieces of chocolate. How many piec...
Correct Answer: 70
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given that Bobby ate 28 pieces of candy initially.
Step 2: He ate 42 more



Question: Frank was reading through his favorite book. He read 22 pages per day and it took him 569 days to fi...
Correct Answer: 12518
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given that Frank read 22 pages per day for 569 days.
Step 2: To find the total number of pages in the book, we multiply the number of pages ...
Extracted: 12458
Question: Paul had 51 books and 106 pens. After selling some books and pens in a garage sale he had 82 books a...
Correct Answer: 92
Response: Yes, let's do this!

<reasoning>
Step 1: We are given that Paul had 51 books and 106 pens initially.
Step 2: He sold some books and pens, and now he has 82 books and 14 pens left.
Step 3: Let ‚Äòx‚Äô be t...
Extracted: None
Question: The school is planning a field trip. The school has 24 classrooms. There are 120 students in the sch...
Correct Answer: 5
Response: Yes, let's tackle this problem!

<reasoning>
Step 1: We are given that the school has 24 classrooms.
Step 2: We are given 



Question: Baker made 133 cakes. If he sold 51 of them How many more cakes did baker make than those he sold?...
Correct Answer: 82
Response: Yes, let's do this!

<reasoning>
Step 1: We are given that Baker made 133 cakes.
Step 2: We are told that he sold 51 cakes.
Step 3: To find how many more cakes Baker made, we need to subtract the numb...
Extracted: 82
Question: The Razorback shop makes $ 62 dollars off each t-shirt and $ 99 off each jersey. During the Arkansas...
Correct Answer: 11346
Response: Yes, let's do this!

<reasoning>
Step 1: Let 'x' represent the amount of money they made from selling the t-shirts.
Step 2: The discount on each t-shirt is $62, so the amount made from t-shirts is $62...
Extracted: 14735
Question: Debby bought 95 soda bottles and 180 water bottles when they were on sale. If she drank 15 water bot...
Correct Answer: 12
Response: Yes, let's tackle this problem!

<reasoning>
Step 1: We are given that Debby bought a total of 95 soda bottles and 180 water bottle



Question: Paul had 50 books. After buying some in a garage sale he had 151 left. How many books did he buy?...
Correct Answer: 101
Response: Yes, let's do this!

<reasoning>
Step 1: We are given that Paul had 50 books initially.
Step 2: After buying some books, he had 151 left.
Step 3: Let 'x' be the number of books he bought.  So, 50 - x ...
Extracted: 151
Question: In a school there are 902 girls and 811 boys. 44 more girls joined the school. How many girls are th...
Correct Answer: 946
Response: Yes, let's do this!

<reasoning>
Step 1: We are given that there are 902 girls and 811 boys in the school.
Step 2: 44 more girls joined the school.
Step 3:  We need to find the new number of girls.  N...
Extracted: 946
Question: A farmer had 160 tomatoes in his garden. If he picked 56 of them yesterday and 41 today. How many di...
Correct Answer: 104
Response: Yes, let's do this!

<reasoning>
Step 1: The farmer started with 160 tomatoes.
Step 2: He picked 56 tomatoes yesterday.
Step 3:  Rema



Question: In a school there are 308 girls and 318 boys. There are also 36 teachers How many pupils are there i...
Correct Answer: 626
Response: Yes, let's do that!

<reasoning>
Step 1: We are given that there are 308 girls and 318 boys in the school.
Step 2: The total number of students is the sum of the number of girls and the number of boys...
Extracted: 626
Question: Next on his checklist is wax to stick the feathers together. He needs 159 g of wax more. If the feat...
Correct Answer: 469
Response: Yes, let's tackle this problem!

<reasoning>
Step 1: We are given that the feathers require a total of 628 grams of wax.
Step 2: He needs 159 grams more of wax.
Step 3:  This means the amount of wax h...
Extracted: 469
Question: Nell collects cards. She had 239 baseball cards and 38 Ace cards. She gave some of her cards to Jeff...
Correct Answer: 265
Response: Yes, let's do that!

<reasoning>
Step 1: We are given that Nell had 239 baseball cards and 38 Ace cards.
Step 2: She gave some car



Question: There are 3941 skittles in Steven's skittles collection. Steven also has 4950 erasers. If the eraser...
Correct Answer: 10
Response: Yes, let's tackle this problem!

<reasoning>
Step 1: We are given that Steven has a total of 3941 skittles in his collection.
Step 2: We are given that Steven has 4950 erasers.
Step 3: We need to dete...
Extracted: 4950
Question: Jerry had 7 books and 3 action figures on a shelf in his room. Later he added 2 more action figures ...
Correct Answer: 2
Response: Yes, let's tackle this problem!

<reasoning>
Step 1: We are given that Jerry initially had 7 books and 3 action figures.
Step 2: He added 2 more action figures.
Step 3: To find out how many more books...
Extracted: 4
Question: Every day Ryan spends 6 hours on learning english and 7 hours on learning chinese. If he learns for ...
Correct Answer: 65
Response: Yes, let's tackle this problem!

<reasoning>
Step 1: We are given that Ryan spends 6 hours learning English and 7 hours learning Chines



Question: Jack received 6 emails in the morning, 3 emails in the afternoon and 5 emails in the evening. How ma...
Correct Answer: 14
Response: Yes, let's tackle this problem!

<reasoning>
Step 1: We are given the number of emails received in the morning, afternoon, and evening.
Step 2: Emails received in the morning = 6
Step 3: Emails receiv...
Extracted: 14
Question: Rachel picked 7 apples from her tree. Thereafter 2 new apples grew on the tree. Now the tree has 6 a...
Correct Answer: 11
Response: Yes, let's tackle this problem!

<reasoning>
Step 1: We are given that Rachel picked 7 apples.
Step 2: 2 new apples grew on the tree.
Step 3: So, the number of apples on the tree now is 7 + 2 = 9 appl...
Extracted: 9
Question: Jerry had 3 books and 4 action figures on a shelf in his room. Later he added 2 more action figures ...
Correct Answer: 3
Response: Yes, let's tackle this problem!

<reasoning>
Step 1: We are given that Jerry initially had 3 books and 4 action figures.
Step 2: He added



Question: Winter is almost here and most animals are migrating to warmer countries. There were 87 bird familie...
Correct Answer: 73
Response: Yes, let's tackle this problem!

<reasoning>
Step 1: We are given the number of bird families living near the mountain, which is 87.
Step 2: We are given the number of bird families that flew away for...
Extracted: 80
Question: Mary is baking a cake. The recipe calls for 14 cups of flour and 6 cups of sugar. She already put in...
Correct Answer: 3
Response: Yes, let's tackle this problem!

<reasoning>
Step 1: We are given that the recipe calls for 14 cups of flour and 6 cups of sugar.
Step 2: Mary has already put in 5 cups of flour.
Step 3: We need to fi...
Extracted: 9
Question: They decided to hold the party in their backyard. They have 4 sets of tables and each set has 3 chai...
Correct Answer: 2
Response: Yes, let's tackle this problem!

<reasoning>
Step 1: We are given that there are 4 sets of tables and each set has 3 chairs. So, the total



Question: Edward spent $ 6 to buy books and $ 16 to buy pens. Now he has $ 19. How much did Edward have before...
Correct Answer: 41
Response: Yes, let's tackle this problem!

<reasoning>
Step 1: We are given that Edward spent $6 on books and $16 on pens. So the total amount spent is $6 + $16 = $22.
Step 2: After buying the books and pens, E...
Extracted: 41
Question: In Haley's class 4 are boys who love to play marbles. If Haley has 23 marbles and wants to keep 15 m...
Correct Answer: 2
Response: Yes, let's tackle this problem!

<reasoning>
Step 1: We are given that there are 4 boys in Haley's class.
Step 2: Haley has a total of 23 marbles.
Step 3: Haley wants to keep 15 marbles for herself.
S...
Extracted: 2
Question: Matthew had 15 crackers. If Matthew gave equal numbers of crackers to his 5 friends and still had 10...
Correct Answer: 1
Response: Yes, let's tackle this problem!

<reasoning>
Step 1: We are given that Matthew initially had 15 crackers.
Step 2: Matthew gave equal numbe

0,1
actor/train/kl,‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÇ‚ñÇ‚ñÉ‚ñÇ‚ñÉ‚ñÇ‚ñÇ‚ñÉ‚ñÇ‚ñÉ‚ñÖ‚ñÑ‚ñÖ‚ñÑ‚ñÖ‚ñÑ‚ñÑ‚ñÜ‚ñÜ‚ñÉ‚ñÑ‚ñÖ‚ñá‚ñà‚ñá‚ñá‚ñÖ
actor/train/loss,‚ñÉ‚ñÉ‚ñÅ‚ñÉ‚ñÑ‚ñÉ‚ñÉ‚ñÑ‚ñÖ‚ñÉ‚ñá‚ñÜ‚ñÉ‚ñÑ‚ñÉ‚ñÉ‚ñá‚ñÑ‚ñÜ‚ñÜ‚ñÖ‚ñÅ‚ñÉ‚ñÜ‚ñà‚ñá‚ñÑ‚ñÑ‚ñÖ‚ñÉ‚ñÑ‚ñÉ‚ñÉ‚ñÜ‚ñÜ‚ñÑ‚ñÜ‚ñÉ‚ñÉ‚ñÜ
actor/train/perplexity,‚ñÖ‚ñÑ‚ñÜ‚ñÇ‚ñÅ‚ñÇ‚ñÉ‚ñÉ‚ñá‚ñá‚ñÅ‚ñÑ‚ñà‚ñÜ‚ñà‚ñÉ‚ñÑ‚ñá‚ñÑ‚ñÉ‚ñÜ‚ñà‚ñÉ‚ñá‚ñÉ‚ñÖ‚ñÑ‚ñÖ‚ñÑ‚ñÉ‚ñÖ‚ñÇ‚ñÑ‚ñÑ‚ñÑ‚ñÇ‚ñÑ‚ñÉ‚ñÜ‚ñÑ
actor/train/step_time_sec,‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñà‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñà‚ñÅ‚ñÅ
actor/train/steps_per_sec,‚ñÅ‚ñà‚ñà‚ñà‚ñá‚ñà‚ñà‚ñà‚ñà‚ñà‚ñá‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñá‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
actor/train/tflops_per_step,‚ñÅ
jax/checkpoint/write/blocking_gbytes_per_sec,‚ñÅ
jax/checkpoint/write/gbytes,‚ñÅ
jax/checkpoint/write/gbytes_per_sec,‚ñÅ
jax/core/compile/backend_compile_duration,‚ñÅ

0,1
actor/train/kl,0.12281
actor/train/loss,-0.00723
actor/train/perplexity,0.9928
actor/train/step_time_sec,0.10119
actor/train/steps_per_sec,9.88194
actor/train/tflops_per_step,8.29505
jax/checkpoint/write/blocking_gbytes_per_sec,0.06708
jax/checkpoint/write/gbytes,0.09363
jax/checkpoint/write/gbytes_per_sec,0.03854
jax/core/compile/backend_compile_duration,1768083566.96693



‚úÖ TRAINING COMPLETED SUCCESSFULLY!


## üìà Post-Training Evaluation

Testing the fine-tuned model on the held-out test set to measure improvement.

**Expected Improvements**:
- Accuracy: +10-20% absolute gain
- Format compliance: +10-15% (near-perfect structure)
- Reasoning quality: More logical, step-by-step solutions

In [25]:
import os
import jax._src.monitoring as monitoring

# Disable wandb
os.environ['WANDB_MODE'] = 'disabled'

# Clear JAX monitoring callbacks - access the internal list directly
try:
    monitoring._scalar_listeners.clear()
    print("‚úÖ Cleared JAX monitoring callbacks")
except Exception as e:
    print(f"‚ö†Ô∏è Could not clear callbacks: {e}")
    # Fallback: replace the list entirely
    monitoring._scalar_listeners = []
    print("‚úÖ Replaced monitoring callbacks with empty list")

print("\n" + "="*60)
print("üìä EVALUATING TRAINED MODEL")
print("="*60)
print("‚è≥ This will take 3-5 minutes...")
print()

# Recreate sampler with trained model
trained_sampler = sampler_lib.Sampler(
    transformer=lora_policy,
    tokenizer=tokenizer,
    cache_config=sampler_lib.CacheConfig(
        cache_size=MAX_PROMPT_LENGTH + TOTAL_GENERATION_STEPS + 256,
        num_layers=model_config.num_layers,
        num_kv_heads=model_config.num_kv_heads,
        head_dim=model_config.head_dim,
    ),
)

(corr_after, total_after, accuracy_after, partial_accuracy_after, format_accuracy_after) = evaluate(
    test_dataset,
    trained_sampler,
    **GENERATION_CONFIGS["greedy"],
)

print("\n" + "="*60)
print("üìà POST-TRAINING RESULTS:")
print(f"   Correct answers: {corr_after}/{total_after}")
print(f"   Accuracy: {accuracy_after:.2f}%")
print(f"   Partial accuracy: {partial_accuracy_after:.2f}%")
print(f"   Format compliance: {format_accuracy_after:.2f}%")
print("="*60)

print("\n" + "="*60)
print("üìä IMPROVEMENT COMPARISON:")
print("="*60)
print(f"   Accuracy:        {accuracy:.2f}% ‚Üí {accuracy_after:.2f}% (+{accuracy_after - accuracy:.2f}%)")
print(f"   Partial:         {partial_accuracy:.2f}% ‚Üí {partial_accuracy_after:.2f}% (+{partial_accuracy_after - partial_accuracy:.2f}%)")
print(f"   Format:          {format_accuracy:.2f}% ‚Üí {format_accuracy_after:.2f}% (+{format_accuracy_after - format_accuracy:.2f}%)")
print("="*60)

‚úÖ Cleared JAX monitoring callbacks

üìä EVALUATING TRAINED MODEL
‚è≥ This will take 3-5 minutes...



  0%|          | 0/70 [00:00<?, ?it/s]

===> corr=7, total=10, Acc=70.00%, Partial=70.00%, Format=100.00%
===> corr=14, total=20, Acc=70.00%, Partial=70.00%, Format=100.00%
===> corr=21, total=30, Acc=70.00%, Partial=70.00%, Format=96.67%
===> corr=29, total=40, Acc=72.50%, Partial=72.50%, Format=97.50%
===> corr=37, total=50, Acc=74.00%, Partial=74.00%, Format=98.00%
===> corr=42, total=60, Acc=70.00%, Partial=70.00%, Format=98.33%
===> corr=50, total=70, Acc=71.43%, Partial=71.43%, Format=98.57%
===> corr=56, total=80, Acc=70.00%, Partial=70.00%, Format=98.75%
===> corr=64, total=90, Acc=71.11%, Partial=71.11%, Format=97.78%
===> corr=71, total=100, Acc=71.00%, Partial=71.00%, Format=98.00%
===> corr=78, total=110, Acc=70.91%, Partial=70.91%, Format=98.18%
===> corr=85, total=120, Acc=70.83%, Partial=70.83%, Format=98.33%
===> corr=89, total=130, Acc=68.46%, Partial=68.46%, Format=98.46%
===> corr=94, total=140, Acc=67.14%, Partial=67.14%, Format=98.57%

üìà POST-TRAINING RESULTS:
   Correct answers: 94/140
   Accuracy: 6

## üß™ Interactive Testing

Generating responses for sample questions to qualitatively assess the model's learned reasoning patterns.

We test with:
1. Custom DVDs discount problem
2. Original SVAMP examples
3. Various problem types (addition, division, multi-step)

This helps verify the model produces human-readable, mathematically sound reasoning.

In [26]:
print("\n" + "="*60)
print("üß™ TESTING ON SAMPLE QUESTIONS")
print("="*60)

# Sample SVAMP question
sample_question = """Each pack of DVDs costs 76 dollars. If there is a discount of 25 dollars on each pack, how much do you have to pay to buy each pack?"""

print(f"\nüìù Question: {sample_question}")
print("\n‚è≥ Generating answer...\n")

# Generate with trained model
response = generate(
    sample_question,
    trained_sampler,
    **GENERATION_CONFIGS["standard"]
)

print("="*60)
print("ü§ñ MODEL RESPONSE:")
print("="*60)
print(response)
print("="*60)

# Try a few more examples
test_questions = [
    "Rachel had to complete 5 pages of math homework. If she had to complete 4 more pages of reading homework than math homework, how many pages did she have to complete in all?",
    "There were 8 friends playing a video game online when 3 players quit. If each player left had 5 lives, how many lives did they have total?",
    "A farmer has 56 apples. He wants to put them in boxes of 8. How many boxes does he need?",
]

print("\n" + "="*60)
print("üß™ ADDITIONAL TEST QUESTIONS:")
print("="*60)

for i, q in enumerate(test_questions, 1):
    print(f"\n--- Test {i} ---")
    print(f"Q: {q}")
    
    resp = generate(q, trained_sampler, **GENERATION_CONFIGS["greedy"])
    print(f"\nModel response:\n{resp}\n")
    print("-" * 60)


üß™ TESTING ON SAMPLE QUESTIONS

üìù Question: Each pack of DVDs costs 76 dollars. If there is a discount of 25 dollars on each pack, how much do you have to pay to buy each pack?

‚è≥ Generating answer...

ü§ñ MODEL RESPONSE:
Yes, let's tackle this problem!

<reasoning>
Step 1: The original price of each pack of DVDs is 76 dollars.
Step 2: There is a discount of 25 dollars on each pack. This means the price after the discount is 76 - 25 = 51 dollars.
Step 3: Therefore, you have to pay 51 dollars to buy each pack of DVDs. </reasoning>
<answer>51</answer>

üß™ ADDITIONAL TEST QUESTIONS:

--- Test 1 ---
Q: Rachel had to complete 5 pages of math homework. If she had to complete 4 more pages of reading homework than math homework, how many pages did she have to complete in all?

Model response:
Yes





<
Stepblue
Step

------------------------------------------------------------

--- Test 2 ---
Q: There were 8 friends playing a video game online when 3 players quit. If each player le

## üíæ Saving Trained Model

Persisting the fine-tuned LoRA adapters, tokenizer, and training configuration.

**Saved Artifacts**:
- `lora_final/` or `lora_state.pkl`: Trained adapter weights (~2GB)
- `tokenizer/`: Vocabulary and special tokens
- `training_config.pkl`: Hyperparameters for reproducibility

**Usage**: These adapters can be merged with the base Gemma 3 1B model for deployment.

In [27]:
import os
from pathlib import Path

os.environ['WANDB_MODE'] = 'disabled'  # Disable wandb to avoid the error

print("\nüíæ Saving final trained model...")

# Create save directory
save_dir = Path("./trained_models")
save_dir.mkdir(parents=True, exist_ok=True)

try:
    # Save the LoRA policy using orbax
    final_checkpointer = ocp.StandardCheckpointer()
    _, final_state = nnx.split(lora_policy)
    
    # Disable any monitoring callbacks that might trigger wandb
    import jax
    jax.monitoring._scalar_listeners.clear()
    
    final_checkpointer.save(str(save_dir / "lora_final"), final_state)
    final_checkpointer.wait_until_finished()
    
    print(f"‚úÖ Final model saved to {save_dir / 'lora_final'}")
    
    # Save tokenizer
    tokenizer.save_pretrained(str(save_dir / "tokenizer"))
    print(f"‚úÖ Tokenizer saved to {save_dir / 'tokenizer'}")
    
    # Save training config
    import pickle
    config_save_path = save_dir / "training_config.pkl"
    with open(config_save_path, 'wb') as f:
        pickle.dump({
            'grpo_config': grpo_config,
            'cluster_config': cluster_config,
            'hyperparameters': {
                'learning_rate': LEARNING_RATE,
                'num_epochs': NUM_EPOCHS,
                'batch_size': TRAIN_MICRO_BATCH_SIZE,
                'num_generations': NUM_GENERATIONS,
                'beta': BETA,
                'epsilon': EPSILON,
            }
        }, f)
    print(f"‚úÖ Training config saved to {config_save_path}")
    
    print("\n" + "="*60)
    print("üéâ ALL FILES SAVED SUCCESSFULLY!")
    print("="*60)
    print(f"üìÅ Location: {save_dir}")
    print("="*60)
    
except Exception as e:
    print(f"‚ùå Error saving model: {e}")
    print("Trying alternative save method...")
    
    # Alternative: Save just the state dict
    import cloudpickle
    with open(save_dir / "lora_state.pkl", 'wb') as f:
        cloudpickle.dump(final_state, f)
    print(f"‚úÖ Model state saved to {save_dir / 'lora_state.pkl'}")


üíæ Saving final trained model...
‚ùå Error saving model: module 'jax.monitoring' has no attribute '_scalar_listeners'
Trying alternative save method...
‚úÖ Model state saved to trained_models/lora_state.pkl
