# SVAMP Math Reasoning with GRPO Fine-tuning

This notebook fine-tunes **Gemma 3 1B-IT** on the SVAMP dataset using **Group Relative Policy Optimization (GRPO)**, a reinforcement learning technique that teaches the model to solve elementary arithmetic word problems with structured reasoning.

**Pipeline Overview**:
1. Load SVAMP dataset (1000 arithmetic word problems)
2. Initialize Gemma 3 1B with LoRA adapters
3. Define multi-component reward function
4. Train using GRPO on TPU v5e
5. Evaluate improvement on test set

**Key Innovation**: GRPO generates multiple responses per question and learns from relative quality comparisons, avoiding the need for human preference labels.

In [1]:
import os
os.environ["HF_HUB_DISABLE_XET"] = "1"

## üì¶ Environment Setup

Installing dependencies for TPU-accelerated training with the Tunix framework (Google's RL toolkit for LLM fine-tuning).

**Key Libraries**:
- `google-tunix`: GRPO implementation and model utilities
- `flax==0.12.0`: Neural network framework compatible with JAX
- `datasets`: For loading SVAMP from Hugging Face
- `qwix`: LoRA (Low-Rank Adaptation) utilities

In [2]:
# Install core libraries
!pip install -q kagglehub 
!pip install -q ipywidgets 
!pip install -q tensorflow 
!pip install -q tensorflow_datasets
!pip install -q tensorboardX
!pip install -q transformers
!pip install -q grain

# Install the Google Tunix framework (the "teacher's toolkit")
!pip install "google-tunix[prod]==0.1.3"

# Reinstall Flax to ensure compatibility
!pip uninstall -q -y flax
!pip install flax==0.12.0

# Install datasets library and wandb for experiment tracking
!pip install -q datasets
!pip install -q wandb

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip instal

In [3]:
# Set up Weights & Biases (for tracking training)
import wandb, os
from kaggle_secrets import UserSecretsClient
os.environ['WANDB_API_KEY'] = UserSecretsClient().get_secret("WANDB_API_KEY")
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("WANDB_API_KEY")



## üîß Import Core Libraries

Setting up JAX for TPU computation, Flax for neural networks, and Tunix for GRPO training infrastructure.

In [4]:
# Import all necessary libraries
import functools
import gc
import os
from pprint import pprint
import re
import csv
import shutil

from flax import nnx
import grain
import humanize
import jax
import jax.numpy as jnp
import kagglehub
import optax
from orbax import checkpoint as ocp
from pathlib import Path
import qwix
import tensorflow_datasets as tfds
from tqdm.auto import tqdm
from tunix.generate import sampler as sampler_lib
from tunix.generate import tokenizer_adapter as tokenizer_lib
from tunix.models.gemma3 import params
from tunix.models.gemma3 import model
from tunix.rl import rl_cluster as rl_cluster_lib
from tunix.rl.grpo.grpo_learner import GRPOConfig, GRPOLearner
from tunix.rl.rollout import base_rollout
from tunix.sft import metrics_logger
from datasets import load_dataset



## ‚öôÔ∏è Hyperparameter Configuration

Comprehensive training configuration organized by category.

### üìä Data Parameters
- **TRAIN_FRACTION**: Using 100% of training data
- **MESH**: Distributed training layout (1√ó4 = FSDP √ó Tensor Parallel)

### üéØ LoRA Parameters
- **RANK=64, ALPHA=64**: Controls adapter capacity
- Enables training <1% of parameters while maintaining performance

### üé≤ GRPO Algorithm
- **NUM_GENERATIONS=4**: Generate 4 responses per question for comparison
- **BETA=0.04**: KL divergence penalty (prevents drift from base model)
- **EPSILON=0.2**: PPO clipping parameter for stable updates
- **TOTAL_GENERATION_STEPS=384**: Allows detailed reasoning chains

### üìà Optimization
- **LEARNING_RATE=3e-6**: Conservative for RL stability
- **WARMUP_STEPS=170**: Gradual learning rate ramp-up
- **MAX_GRAD_NORM=0.1**: Gradient clipping for training stability
- **NUM_BATCHES=1000**: Total training iterations

### üíæ Infrastructure
- **Checkpointing**: Save every 500 steps, keep 4 most recent
- **Evaluation**: Test model every 10 steps

In [5]:
# ====== Data ======
TRAIN_DATA_DIR = "./data/train"
TEST_DATA_DIR = "./data/test"
TRAIN_FRACTION = 1.0

# ====== LoRA ======
RANK = 64
ALPHA = 64.0

# ====== Sharding ======
MESH = [(1, 4), ("fsdp", "tp")]

# ====== GRPO ======
MAX_PROMPT_LENGTH = 256
TOTAL_GENERATION_STEPS = 384  # Increased for longer math solutions
TEMPERATURE = 0.9
TOP_P = 1.0
TOP_K = 50
NUM_GENERATIONS = 4

# ====== Training ======
NUM_ITERATIONS = 1
BETA = 0.04
EPSILON = 0.2

TRAIN_MICRO_BATCH_SIZE = 2
NUM_BATCHES = 1000  # MATH dataset has ~7500 train samples
NUM_TEST_BATCHES = 100
EVAL_EVERY_N_STEPS = 10
NUM_EPOCHS = 1

MAX_STEPS = int(NUM_BATCHES * NUM_ITERATIONS * TRAIN_FRACTION * NUM_EPOCHS)

# ====== Optimizer ======
LEARNING_RATE = 3e-6
B1 = 0.9
B2 = 0.99
WEIGHT_DECAY = 0.1
WARMUP_STEPS = 0.1 * MAX_STEPS
MAX_GRAD_NORM = 0.1

# ====== Checkpointing ======
INTERMEDIATE_CKPT_DIR = "/tmp/content/intermediate_ckpt/"
CKPT_DIR = "/tmp/content/ckpts/"
SAVE_INTERVAL_STEPS = 500
MAX_TO_KEEP = 4

# ====== Inference ======
GENERATION_CONFIGS = {
    "greedy": {"temperature": 1e-4, "top_k": 1, "top_p": 1.0},
    "standard": {"temperature": 0.7, "top_k": 50, "top_p": 0.95},
    "liberal": {"temperature": 0.85, "top_k": 2000, "top_p": 1.0},
}

print(f"Total training steps: {MAX_STEPS}")

Total training steps: 1700


In [6]:
def show_hbm_usage():
    """Displays memory usage per device."""
    fmt_size = functools.partial(humanize.naturalsize, binary=True)
    
    for d in jax.local_devices():
        stats = d.memory_stats()
        used = stats["bytes_in_use"]
        limit = stats["bytes_limit"]
        print(f"Using {fmt_size(used)} / {fmt_size(limit)} ({used/limit:%}) on {d}")

## üéØ Prompt Engineering for SVAMP

Designing the system prompt that teaches the model our desired output format.

**Format Requirements**:
1. `<reasoning>` tags: Step-by-step mathematical thinking
2. `<answer>` tags: Final numerical answer only

This structured format enables precise reward calculation and ensures interpretable solutions.

In [7]:
reasoning_start = "<reasoning>"
reasoning_end = "</reasoning>"
solution_start = "<answer>"
solution_end = "</answer>"

# UPDATED: System prompt for SVAMP math word problems
SYSTEM_PROMPT = f"""You are a mathematical reasoning expert specializing in solving arithmetic word problems.
Your goal is to solve problems by breaking them down into logical steps.

You must strictly follow this format:
1. Start with {reasoning_start}.
2. Write out your step-by-step solution with clear mathematical reasoning.
3. Show all calculations and explain each step.
4. End reasoning with {reasoning_end}.
5. Provide the final numerical answer between {solution_start} and {solution_end}.

Example:
User: Rachel had to complete 5 pages of math homework. If she had to complete 4 more pages of reading homework than math homework, how many pages did she have to complete in all?

Model:
{reasoning_start}
Step 1: Rachel has 5 pages of math homework.
Step 2: She has 4 more pages of reading than math, so reading = 5 + 4 = 9 pages.
Step 3: Total pages = math + reading = 5 + 9 = 14 pages.
{reasoning_end}
{solution_start}14{solution_end}

Now solve the problem below using this exact format."""

TEMPLATE = """<start_of_turn>user
{system_prompt}

{question}<end_of_turn>
<start_of_turn>model"""

print("‚úÖ System prompt configured for SVAMP dataset")

‚úÖ System prompt configured for SVAMP dataset


## üìö Dataset Loading and Preprocessing

SVAMP (Simple Variations on Arithmetic Math word Problems) contains 1000 elementary math word problems requiring 1-2 arithmetic operations.

**Preprocessing Steps**:
1. Combine `Body` and `Question` fields
2. Extract numerical `Answer`
3. Format with system prompt template
4. **Curriculum Learning**: Sort by equation complexity (easy‚Üíhard) to improve training efficiency

In [8]:
def extract_svamp_answer(text: str) -> str | None:
    """Extract the numerical answer from SVAMP response."""
    # Look for answer between tags
    match = re.search(rf'{solution_start}\s*([\d.]+)\s*{solution_end}', text)
    if match:
        return match.group(1).strip()
    
    # Fallback: look for any number after "answer" keyword
    match = re.search(r'answer.*?([\d.]+)', text, re.IGNORECASE)
    if match:
        return match.group(1).strip()
    
    return None

def get_svamp_dataset(split="train"):
    """Load and preprocess SVAMP dataset."""
    print(f"Loading SVAMP dataset split: {split}")
    
    # Load from Hugging Face
    dataset = load_dataset("ChilleD/SVAMP", split="train")
    
    # Split into train/test (80-20)
    if split == "train":
        dataset = dataset.select(range(int(len(dataset) * 0.8)))
    else:  # test
        dataset = dataset.select(range(int(len(dataset) * 0.8), len(dataset)))
    
    def preprocess(example):
        # Combine body and question
        question = f"{example['Body']} {example['Question']}"
        
        # Get answer (convert to string)
        answer = str(example['Answer'])
        
        return {
            "prompts": TEMPLATE.format(
                system_prompt=SYSTEM_PROMPT,
                question=question
            ),
            "question": question,
            "answer": answer,
            "equation": example.get('Equation', ''),
            "type": example.get('Type', '')
        }
    
    # Convert to grain dataset
    data = [preprocess(item) for item in dataset]
    
    # Sort by equation complexity for curriculum learning (easy to hard)
    data.sort(key=lambda x: len(str(x.get("equation", ""))))
    print(f"‚úÖ Data sorted by complexity for Curriculum Learning")
    
    grain_dataset = grain.MapDataset.source(data)
    
    return grain_dataset

def get_dataset(data_dir, split="train", source="huggingface"):
    """Wrapper function to maintain compatibility."""
    return get_svamp_dataset(split=split)

### Loading and Splitting Data

Creating 80/20 train/test split with batching for efficient TPU processing.

In [9]:
# Load SVAMP datasets
print("Loading SVAMP training data...")
dataset = get_svamp_dataset("train").batch(TRAIN_MICRO_BATCH_SIZE)[:NUM_BATCHES]

if TRAIN_FRACTION == 1.0:
    train_dataset = dataset.repeat(NUM_EPOCHS)
    val_dataset = None
else:
    train_dataset = dataset[:int(len(dataset) * TRAIN_FRACTION)]
    train_dataset = train_dataset.repeat(NUM_EPOCHS)
    val_dataset = dataset[int(len(dataset) * TRAIN_FRACTION):].repeat(NUM_EPOCHS)

print("Loading SVAMP test data...")
test_dataset = get_svamp_dataset("test").batch(TRAIN_MICRO_BATCH_SIZE)[:NUM_TEST_BATCHES]

dataset_lengths = (
    len(train_dataset),
    len(val_dataset) if val_dataset is not None else 0,
    len(test_dataset),
)
print(f"‚úÖ Dataset contains {dataset_lengths} batches (train, val, test)")

# Show a sample
print("\nüìã Sample from training data:")
for ele in train_dataset[:1]:
    pprint(ele)

Loading SVAMP training data...
Loading SVAMP dataset split: train
‚úÖ Data sorted by complexity for Curriculum Learning
Loading SVAMP test data...
Loading SVAMP dataset split: test
‚úÖ Data sorted by complexity for Curriculum Learning
‚úÖ Dataset contains (280, 0, 70) batches (train, val, test)

üìã Sample from training data:
{'answer': array(['8', '3'], dtype='<U1'),
 'equation': array(['8.0', '( 6.0 / 2.0 )'], dtype='<U13'),
 'prompts': array(['<start_of_turn>user\nYou are a mathematical reasoning expert specializing in solving arithmetic word problems.\nYour goal is to solve problems by breaking them down into logical steps.\n\nYou must strictly follow this format:\n1. Start with <reasoning>.\n2. Write out your step-by-step solution with clear mathematical reasoning.\n3. Show all calculations and explain each step.\n4. End reasoning with </reasoning>.\n5. Provide the final numerical answer between <answer> and </answer>.\n\nExample:\nUser: Rachel had to complete 5 pages of math hom

## üîë Kaggle Authentication

Configuring Kaggle credentials for model downloads and artifact management.

In [10]:
import os
import json

# 1. DEFINE YOUR CREDENTIALS
# (Make sure there are no spaces around the strings!)
kaggle_username = "rachitha02"
kaggle_key = "Rachitha CB"

# 2. SETUP THE DIRECTORY
# This creates the hidden .kaggle folder if it doesn't exist
kaggle_dir = os.path.expanduser("~/.kaggle")
os.makedirs(kaggle_dir, exist_ok=True)

# 3. FORCE WRITE THE FILE
# This overwrites any broken file that might be causing the 401
json_path = os.path.join(kaggle_dir, "kaggle.json")
with open(json_path, "w") as f:
    json.dump({"username": kaggle_username, "key": kaggle_key}, f)

# 4. SET PERMISSIONS (Linux/Mac requirement for safety)
os.chmod(json_path, 0o600)

print("‚úÖ Credentials successfully written to", json_path)

‚úÖ Credentials successfully written to /root/.kaggle/kaggle.json


## ü§ñ Base Model Initialization

Loading **Gemma 3 1B-IT** (instruction-tuned variant) as our starting point.

**Process**:
1. Load pre-trained model from Kaggle
2. Save to intermediate checkpoint
3. Free memory for training setup

The instruction-tuned variant already understands chat formatting and following instructions, making it ideal for our structured reasoning task.

In [11]:
# Clean up checkpoint directories
!rm /tmp/content/intermediate_ckpt/* -rf
!rm /tmp/content/ckpts/* -rf

# Load Gemma 3 base model
import gc
from orbax import checkpoint as ocp
from flax import nnx
from tunix.models.gemma3 import params
from tunix.models.gemma3 import model

print("üîÑ Loading Gemma 3 1B-IT base model...")

model_family = "gemma3"
if model_family == "gemma3":
    MODEL_CP_PATH = params.GEMMA3_1B_IT
    config = model.ModelConfig.gemma3_1b()
    gemma = params.create_model_from_checkpoint(MODEL_CP_PATH, config)
    tokenizer = params.create_tokenizer()
    
    # Save intermediate checkpoint
    checkpointer = ocp.StandardCheckpointer()
    _, state = nnx.split(gemma)
    checkpointer.save(os.path.join(INTERMEDIATE_CKPT_DIR, "state"), state)
    checkpointer.wait_until_finished()
    
    print("‚úÖ Base model loaded and saved to intermediate checkpoint")
    
    # Delete intermediate model to save memory
    del params
    del gemma
    del state
    gc.collect()
    
    print("‚úÖ Memory cleaned up")

üîÑ Loading Gemma 3 1B-IT base model...


E0000 00:00:1768008415.040358    1929 common_lib.cc:648] Could not set metric server port: INVALID_ARGUMENT: Could not find SliceBuilder port 8471 in any of the 0 ports provided in `tpu_process_addresses`="local"
=== Source Location Trace: === 
learning/45eac/tfrc/runtime/common_lib.cc:238
E0110 01:27:06.176720    3121 google_auth_provider.cc:188] Could not find the credentials file in the standard gcloud location [/root/.config/gcloud/application_default_credentials.json]. You may specify a credentials file using $GOOGLE_APPLICATION_CREDENTIALS, or to use Google application default credentials, run: gcloud auth application-default login


‚úÖ Base model loaded and saved to intermediate checkpoint
‚úÖ Memory cleaned up


In [12]:
# Verify checkpoint was saved
import os
print("üìÅ Checkpoint contents:", os.listdir(INTERMEDIATE_CKPT_DIR))
# Expected output: ['state', ...] or similar files

üìÅ Checkpoint contents: ['state']


## üîÑ Reference Model Setup

Creating a **frozen copy** of the base model for KL divergence calculation.

**Purpose**: 
- Measures how much the policy deviates from original behavior
- Prevents catastrophic forgetting of language abilities
- Core component of GRPO that balances task learning with coherent generation

**LoRA Application**:
Adds trainable low-rank matrices to attention and feedforward layers while keeping base weights frozen.

In [13]:
from tunix.models.gemma3 import params

def get_gemma_ref_model(ckpt_path):
    """Load reference model with sharding across TPU chips."""
    mesh = jax.make_mesh(*MESH)
    model_config = model.ModelConfig.gemma3_1b()
    
    # Create abstract model structure
    abs_gemma: nnx.Module = nnx.eval_shape(
        lambda: params.create_model_from_checkpoint(MODEL_CP_PATH, config)
    )
    
    abs_state = nnx.state(abs_gemma)
    abs_state = jax.tree.map(
        lambda a, s: jax.ShapeDtypeStruct(a.shape, jnp.bfloat16, sharding=s),
        abs_state,
        nnx.get_named_sharding(abs_state, mesh),
    )
    
    # Restore from checkpoint
    checkpointer = ocp.StandardCheckpointer()
    restored_params = checkpointer.restore(ckpt_path, target=abs_state)
    
    graph_def, _ = nnx.split(abs_gemma)
    gemma = nnx.merge(graph_def, restored_params)
    return gemma, mesh, model_config

def get_lora_model(base_model, mesh):
    """Apply LoRA adapters to base model."""
    lora_provider = qwix.LoraProvider(
        module_path=(
            ".*q_einsum|.*kv_einsum|.*gate_proj|.*down_proj|.*up_proj|"
            ".*attn_vec_einsum"
        ),
        rank=RANK,
        alpha=ALPHA,
    )
    
    model_input = base_model.get_model_input()
    lora_model = qwix.apply_lora_to_model(
        base_model, lora_provider, **model_input
    )
    
    # Shard the model across TPU
    with mesh:
        state = nnx.state(lora_model)
        pspecs = nnx.get_partition_spec(state)
        sharded_state = jax.lax.with_sharding_constraint(state, pspecs)
        nnx.update(lora_model, sharded_state)
    
    return lora_model

In [14]:
# Load reference model (frozen, for KL divergence calculation)
if model_family == "gemma3":
    ref_model, mesh, model_config = get_gemma_ref_model(
        ckpt_path=os.path.join(INTERMEDIATE_CKPT_DIR, "state")
    )
    print("‚úÖ Reference model loaded")



‚úÖ Reference model loaded


### Flax NNX Compatibility Patch

Applying a compatibility fix for Flax API changes between versions.

In [15]:
# Apply compatibility patch for Flax NNX
from flax import nnx

# Save original function
if not hasattr(nnx.Variable, "_original_set_metadata"):
    nnx.Variable._original_set_metadata = nnx.Variable.set_metadata

# Define patched function
def patched_set_metadata(self, *args, **kwargs):
    """Fix for Flax NNX API changes."""
    if len(args) == 2 and isinstance(args[0], str):
        key = args[0]
        value = args[1]
        kwargs[key] = value
        args = ()
    return nnx.Variable._original_set_metadata(self, *args, **kwargs)

# Apply patch
nnx.Variable.set_metadata = patched_set_metadata

print("‚úÖ Flax compatibility patch applied successfully!")

‚úÖ Flax compatibility patch applied successfully!


### Applying LoRA to Create Policy Model

Wrapping the reference model with trainable LoRA adapters and sharding across TPU cores for distributed training.

In [16]:
# Create policy model with LoRA adapters
lora_policy = get_lora_model(ref_model, mesh=mesh)
print("‚úÖ Policy model with LoRA created")
print("\nüß† Model structure:")
# nnx.display(lora_policy)  # Uncomment to see model architecture

# Show memory usage
show_hbm_usage()

‚úÖ Policy model with LoRA created

üß† Model structure:
Using 1.0 GiB / 15.7 GiB (6.500140%) on TPU_0(process=0,(0,0,0,0))
Using 1.0 GiB / 15.7 GiB (6.476650%) on TPU_1(process=0,(1,0,0,0))
Using 1.0 GiB / 15.7 GiB (6.476650%) on TPU_2(process=0,(0,1,0,0))
Using 1.0 GiB / 15.7 GiB (6.476650%) on TPU_3(process=0,(1,1,0,0))
Using 26.5 KiB / 15.7 GiB (0.000160%) on TPU_4(process=0,(0,2,0,0))
Using 26.5 KiB / 15.7 GiB (0.000160%) on TPU_5(process=0,(1,2,0,0))
Using 26.5 KiB / 15.7 GiB (0.000160%) on TPU_6(process=0,(0,3,0,0))
Using 26.5 KiB / 15.7 GiB (0.000160%) on TPU_7(process=0,(1,3,0,0))


## üèÜ Multi-Component Reward Function Design

GRPO requires reward functions to score model outputs. We use **7 complementary reward components** that together encourage correct, well-formatted, and well-reasoned solutions.

### Reward Breakdown:

**Format Rewards** (up to 3.5 points):
- `match_format_exactly`: +3.0 for perfect tag structure
- `match_format_approximately`: +2.0 for partial tag compliance

**Correctness Rewards** (up to 5.0 points):
- `check_answer_svamp`: +3.0 for exact numerical match
- `check_number_svamp`: +2.0 for extractable correct number

**Quality Rewards** (up to 2.5 points):
- `soft_reasoning_steps`: +1.0 for logical connectors ("first", "then", "therefore")
- `meaningful_reasoning_length`: +1.0 for appropriate length (15-400 words)
- `reward_algebraic_notation`: +1.5 for mathematical notation (equations, variables)

**Total Maximum**: ~11 points per response

This multi-faceted approach ensures the model learns both task-specific skills (correctness) and generalizable reasoning patterns (structure, logic).

In [17]:
# Regex patterns for matching format and answers
match_format = re.compile(
    rf"^[\s]{{0,}}"
    rf"{reasoning_start}.+?{reasoning_end}.*?"
    rf"{solution_start}(.+?){solution_end}"
    rf"[\s]{{0,}}$",
    flags=re.MULTILINE | re.DOTALL,
)

# For extracting numerical answers
match_number = re.compile(
    rf"{solution_start}.*?([\d.]+)", flags=re.MULTILINE | re.DOTALL
)

def match_format_exactly(prompts, completions, **kwargs):
    """Reward for exact format compliance."""
    return [
        0 if match_format.search(response) is None else 3.0
        for response in completions
    ]

def match_format_approximately(prompts, completions, **kwargs):
    """Reward for approximate format compliance."""
    scores = []
    for completion in completions:
        score = 0
        response = completion
        # Reward seeing each tag exactly once
        score += 0.5 if response.count(reasoning_start) == 1 else -0.5
        score += 0.5 if response.count(reasoning_end) == 1 else -0.5
        score += 0.5 if response.count(solution_start) == 1 else -0.5
        score += 0.5 if response.count(solution_end) == 1 else -0.5
        scores.append(score)
    return scores

def check_answer_svamp(prompts, completions, answer, **kwargs):
    """Reward for correct numerical answer."""
    responses = completions
    
    extracted_responses = [
        guess.group(1).strip() 
        if (guess := match_format.search(r)) is not None 
        else None
        for r in responses
    ]
    
    scores = []
    for guess, true_answer in zip(extracted_responses, answer):
        score = 0
        if guess is None:
            scores.append(0)
            continue
        
        try:
            # Convert to float for comparison
            guess_num = float(guess)
            true_num = float(true_answer)
            
            # Exact match gets full points
            if abs(guess_num - true_num) < 0.01:  # Allow small floating point error
                score += 3.0
            # Close match gets partial credit
            elif abs(guess_num - true_num) < 1.0:
                score += 1.5
            else:
                score -= 1.0  # Penalize wrong answers
        except:
            score = 0
        
        scores.append(score)
    return scores

def check_number_svamp(prompts, completions, answer, **kwargs):
    """Extract and check numerical answer."""
    question = kwargs.get("question", [])
    responses = completions
    
    extracted_responses = [
        guess.group(1).strip() 
        if (guess := match_number.search(r)) is not None 
        else None
        for r in responses
    ]
    
    scores = []
    print("START ============================")
    if len(question) > 0:
        print(f"Question: {question[0][:100]}...")
    if len(answer) > 0:
        print(f"Correct Answer: {answer[0]}")
    if len(responses) > 0:
        print(f"Response: {responses[0][:200]}...")
    if len(extracted_responses) > 0:
        print(f"Extracted: {extracted_responses[0]}")
    print("END ==============================")
    
    for guess, true_answer in zip(extracted_responses, answer):
        if guess is None:
            scores.append(0)
            continue
        
        try:
            # Compare numbers
            if abs(float(guess) - float(true_answer)) < 0.01:
                scores.append(2.0)
            else:
                scores.append(0.0)
        except:
            scores.append(0.0)
    
    return scores

def soft_reasoning_steps(prompts, completions, **kwargs):
    """Reward for using logical connector words."""
    rewards = []
    logical_keywords = [
        "step", "first", "next", "then", "therefore", 
        "because", "since", "so", "implies", "consequently",
        "thus", "hence", "given", "solving"
    ]
    
    for response in completions:
        match = re.search(r"<reasoning>(.+?)</reasoning>", response, flags=re.DOTALL)
        if match:
            reasoning_text = match.group(1).lower()
            found_keywords = sum(1 for word in logical_keywords if word in reasoning_text)
            # 0.1 reward per keyword, capped at 1.0
            rewards.append(min(1.0, found_keywords * 0.1))
        else:
            rewards.append(0.0)
    return rewards

def meaningful_reasoning_length(prompts, completions, **kwargs):
    """Reward for appropriate reasoning length."""
    rewards = []
    for response in completions:
        match = re.search(r"<reasoning>(.+?)</reasoning>", response, flags=re.DOTALL)
        if match:
            word_count = len(match.group(1).split())
            
            if word_count < 15:  # Too short
                rewards.append(0.0)
            elif word_count > 400:  # Too long
                rewards.append(0.5)
            else:  # Good length
                rewards.append(1.0)
        else:
            rewards.append(0.0)
    return rewards

def reward_algebraic_notation(prompts, completions, **kwargs):
    """Reward for using algebraic notation and equations."""
    rewards = []
    algebra_patterns = [
        r'[a-z]\s*[+\-*/=]\s*\d',  # x + 5
        r'\d\s*[+\-*/=]\s*[a-z]',  # 5 + x
        r'[a-z]\s*=\s*',           # x =
        r'\([^)]*[a-z][^)]*\)',    # (x + 5)
    ]
    
    for response in completions:
        match = re.search(r"<reasoning>(.+?)</reasoning>", response, flags=re.DOTALL)
        if match:
            reasoning_text = match.group(1)
            found_patterns = sum(
                1 for pattern in algebra_patterns 
                if re.search(pattern, reasoning_text)
            )
            rewards.append(min(1.5, found_patterns * 0.3))
        else:
            rewards.append(0.0)
    return rewards

## üé≤ Generation Utilities

Helper functions for model inference and evaluation.

In [18]:
def generate(
    question, sampler, temperature=0.7, top_k=50, top_p=0.95, seed=None, options=None
):
    """Generate text given a prompt."""
    
    if isinstance(question, str):
        # Single question - SVAMP doesn't have options
        input_batch = [
            TEMPLATE.format(
                system_prompt=SYSTEM_PROMPT,
                question=question
            ),
        ]
    else:
        # Batch of questions
        input_batch = [
            TEMPLATE.format(
                system_prompt=SYSTEM_PROMPT,
                question=q
            )
            for q in question
        ]
    
    out_data = sampler(
        input_strings=input_batch,
        max_generation_steps=TOTAL_GENERATION_STEPS,
        temperature=temperature,
        top_k=top_k,
        top_p=top_p,
        echo=False,
        seed=seed if seed is not None else None,
        eos_tokens=[1, 106],
    )
    
    output = out_data.text
    if isinstance(question, str):
        return output[0]
    return output

In [19]:
def evaluate(
    dataset,
    sampler,
    temperature=0.7,
    top_k=50,
    top_p=0.95,
    num_passes=1,
    corr_lst=False,
    make_lst=False,
):
    """Compute accuracy and format compliance."""
    
    response_lst = []
    corr = 0
    partially_corr = 0
    corr_format = 0
    total = 0
    
    for batch in tqdm(dataset):
        answers = batch["answer"]
        questions = batch["question"]
        
        multiple_call_responses = [[] for _ in range(len(questions))]
        for p in range(num_passes):
            responses = generate(
                questions, sampler, temperature, top_k, top_p, seed=p, options=None
            )
            for idx, response in enumerate(responses):
                multiple_call_responses[idx].append(response)
        
        for question, multiple_call_response, answer in zip(
            questions, multiple_call_responses, answers
        ):
            corr_ctr_per_question = 0
            partially_corr_per_question = 0
            corr_format_per_question = 0
            
            for response in multiple_call_response:
                # Extract numerical answer
                extracted_response = (
                    guess.group(1).strip()
                    if (guess := match_number.search(response)) is not None
                    else None
                )
                
                # Check correctness
                if extracted_response:
                    try:
                        if abs(float(extracted_response) - float(answer)) < 0.01:
                            corr_ctr_per_question += 1
                            partially_corr_per_question += 1
                    except:
                        pass
                
                # Check format
                if match_format.search(response) is not None:
                    corr_format_per_question += 1
                
                if (
                    corr_ctr_per_question > 0
                    and partially_corr_per_question > 0
                    and corr_format_per_question > 0
                ):
                    break
            
            if corr_ctr_per_question > 0:
                corr += 1
                if corr_lst and make_lst:
                    response_lst.append((question, answer, multiple_call_response))
            else:
                if not corr_lst and make_lst:
                    response_lst.append((question, answer, multiple_call_response))
            
            if partially_corr_per_question > 0:
                partially_corr += 1
            if corr_format_per_question > 0:
                corr_format += 1
            
            total += 1
            if total % 10 == 0:
                print(
                    f"===> {corr=}, {total=}, Acc={corr / total * 100:.2f}%, "
                    f"Partial={partially_corr / total * 100:.2f}%, Format={corr_format / total * 100:.2f}%"
                )
    
    to_return = (
        corr,
        total,
        corr / total * 100,
        partially_corr / total * 100,
        corr_format / total * 100,
    )
    if make_lst:
        return to_return, response_lst
    return to_return

### Creating Sampler for Inference

Initializing the generation engine with KV-cache for efficient sequential decoding.

In [20]:
# Create sampler for generation
sampler = sampler_lib.Sampler(
    transformer=lora_policy,
    tokenizer=tokenizer,
    cache_config=sampler_lib.CacheConfig(
        cache_size=MAX_PROMPT_LENGTH + TOTAL_GENERATION_STEPS + 256,
        num_layers=model_config.num_layers,
        num_kv_heads=model_config.num_kv_heads,
        head_dim=model_config.head_dim,
    ),
)

print("‚úÖ Sampler created successfully")

‚úÖ Sampler created successfully


## üìä Baseline Evaluation (Pre-Training)

Testing the **untrained** Gemma 3 1B-IT model to establish our improvement benchmark.

**Metrics**:
- **Accuracy**: Exact numerical answer correctness
- **Partial**: Any correct numerical value extracted
- **Format**: Proper `<reasoning>` and `<answer>` tag structure

Expected baseline: ~50-60% accuracy (Gemma 3 already has some math ability from pre-training)

In [21]:
# PRE-TRAINING EVALUATION
print("\n" + "="*60)
print("üìä EVALUATING MODEL BEFORE TRAINING")
print("="*60)
print("‚è≥ This will take 3-5 minutes. Patience is a virtue!")
print()

(corr, total, accuracy, partial_accuracy, format_accuracy) = evaluate(
    test_dataset,
    sampler,
    **GENERATION_CONFIGS["greedy"],
)

print("\n" + "="*60)
print("üìà PRE-TRAINING RESULTS:")
print(f"   Correct answers: {corr}/{total}")
print(f"   Accuracy: {accuracy:.2f}%")
print(f"   Partial accuracy: {partial_accuracy:.2f}%")
print(f"   Format compliance: {format_accuracy:.2f}%")
print("="*60)


üìä EVALUATING MODEL BEFORE TRAINING
‚è≥ This will take 3-5 minutes. Patience is a virtue!



  0%|          | 0/70 [00:00<?, ?it/s]

===> corr=7, total=10, Acc=70.00%, Partial=70.00%, Format=90.00%
===> corr=13, total=20, Acc=65.00%, Partial=65.00%, Format=80.00%
===> corr=20, total=30, Acc=66.67%, Partial=66.67%, Format=83.33%
===> corr=25, total=40, Acc=62.50%, Partial=62.50%, Format=85.00%
===> corr=31, total=50, Acc=62.00%, Partial=62.00%, Format=84.00%
===> corr=34, total=60, Acc=56.67%, Partial=56.67%, Format=83.33%
===> corr=37, total=70, Acc=52.86%, Partial=52.86%, Format=80.00%
===> corr=45, total=80, Acc=56.25%, Partial=56.25%, Format=82.50%
===> corr=51, total=90, Acc=56.67%, Partial=56.67%, Format=82.22%
===> corr=58, total=100, Acc=58.00%, Partial=58.00%, Format=84.00%
===> corr=61, total=110, Acc=55.45%, Partial=55.45%, Format=83.64%
===> corr=69, total=120, Acc=57.50%, Partial=57.50%, Format=85.00%
===> corr=73, total=130, Acc=56.15%, Partial=56.15%, Format=84.62%
===> corr=76, total=140, Acc=54.29%, Partial=54.29%, Format=85.71%

üìà PRE-TRAINING RESULTS:
   Correct answers: 76/140
   Accuracy: 54.2

### Checkpointing and Logging Configuration

Setting up model saving and TensorBoard metrics tracking.

In [22]:
# Checkpoint saving configuration
checkpointing_options = ocp.CheckpointManagerOptions(
    save_interval_steps=SAVE_INTERVAL_STEPS, max_to_keep=MAX_TO_KEEP
)

# Metrics logger for TensorBoard
metrics_logging_options = metrics_logger.MetricsLoggerOptions(
    log_dir="/tmp/content/tmp/tensorboard/grpo", flush_every_n_steps=20
)

print("‚úÖ Checkpointing configured")
print("‚úÖ Metrics logging configured")

‚úÖ Checkpointing configured
‚úÖ Metrics logging configured


## üìà Optimizer Configuration

Using **AdamW** with warmup + cosine decay schedule.

**Schedule Design**:
- Warm-up (steps 0-170): Learning rate gradually increases from 0 ‚Üí 3e-6
- Training (steps 170-1700): Cosine decay from 3e-6 ‚Üí 0
- Gradient clipping at 0.1 prevents instability

This conservative approach is critical for RL, where unstable updates can cause catastrophic collapse.

In [23]:
# Optimizer with learning rate schedule and gradient clipping
optimizer = optax.adamw(
    learning_rate=optax.schedules.warmup_cosine_decay_schedule(
        init_value=0.0,
        peak_value=LEARNING_RATE,
        warmup_steps=WARMUP_STEPS,
        decay_steps=MAX_STEPS,
        end_value=0.0,
    ),
    b1=B1,
    b2=B2,
    weight_decay=WEIGHT_DECAY,
)

if MAX_GRAD_NORM is not None:
    optimizer = optax.chain(
        optax.clip_by_global_norm(max_norm=MAX_GRAD_NORM),
        optimizer,
    )

print("‚úÖ Optimizer configured with:")
print(f"   Learning rate: {LEARNING_RATE}")
print(f"   Warmup steps: {WARMUP_STEPS}")
print(f"   Max gradient norm: {MAX_GRAD_NORM}")

‚úÖ Optimizer configured with:
   Learning rate: 3e-06
   Warmup steps: 170.0
   Max gradient norm: 0.1


## üéÆ RL Cluster Architecture

Configuring the distributed reinforcement learning infrastructure.

**Three-Role Architecture**:
1. **Actor**: Current policy model generating responses
2. **Reference**: Frozen base model for KL divergence
3. **Rollout**: Manages generation process and batching

**Rollout Configuration**:
- Temperature=0.9: Balanced exploration vs exploitation
- Top-K=50, Top-P=1.0: Moderate sampling diversity
- EOS tokens: Proper sequence termination

This separation enables efficient parallel processing across 8 TPU cores.

In [24]:
# RL Cluster configuration
cluster_config = rl_cluster_lib.ClusterConfig(
    role_to_mesh={
        rl_cluster_lib.Role.ACTOR: mesh,
        rl_cluster_lib.Role.REFERENCE: mesh,
        rl_cluster_lib.Role.ROLLOUT: mesh,
    },
    rollout_engine='vanilla',
    offload_to_cpu=False,
    training_config=rl_cluster_lib.RLTrainingConfig(
        actor_optimizer=optimizer,
        eval_every_n_steps=EVAL_EVERY_N_STEPS,
        max_steps=MAX_STEPS,
        mini_batch_size=TRAIN_MICRO_BATCH_SIZE,
        train_micro_batch_size=TRAIN_MICRO_BATCH_SIZE,
        metrics_logging_options=metrics_logging_options,
        checkpoint_root_directory=CKPT_DIR,
        checkpointing_options=checkpointing_options,
    ),
    rollout_config=base_rollout.RolloutConfig(
        max_tokens_to_generate=TOTAL_GENERATION_STEPS,
        max_prompt_length=MAX_PROMPT_LENGTH,
        kv_cache_size=MAX_PROMPT_LENGTH + TOTAL_GENERATION_STEPS + 256,
        temperature=TEMPERATURE,
        top_p=TOP_P,
        top_k=TOP_K,
        eos_tokens=[1, 106],
    ),
)

# GRPO configuration
grpo_config = GRPOConfig(
    num_generations=NUM_GENERATIONS,
    num_iterations=NUM_ITERATIONS,
    beta=BETA,
    epsilon=EPSILON,
)

print("‚úÖ RL Cluster configured")
print("‚úÖ GRPO configured")
print(f"   Num generations per prompt: {NUM_GENERATIONS}")
print(f"   Beta (KL penalty): {BETA}")
print(f"   Epsilon (clipping): {EPSILON}")

‚úÖ RL Cluster configured
‚úÖ GRPO configured
   Num generations per prompt: 4
   Beta (KL penalty): 0.04
   Epsilon (clipping): 0.2


## üöÄ GRPO Training Loop

Starting the main training process.

**GRPO Algorithm Flow**:
1. **Sample** batch of 2 questions from training data
2. **Generate** 4 responses per question (exploration)
3. **Score** each response using 7 reward functions
4. **Compute advantages**: Normalize rewards to identify best/worst responses
5. **Update policy**: Increase probability of high-reward responses
6. **Apply KL penalty**: Prevent over-optimization away from base model
7. **Repeat** for 1700 steps

**Expected Runtime**: 15-25 minutes on TPU v5e-8

The model learns by comparing responses to the same question, discovering what makes some better than others.

In [25]:
import os
import torch
from pathlib import Path

print("\n" + "="*60)
print("üöÄ STARTING GRPO TRAINING")
print("="*60)
print()

# RL cluster setup
rl_cluster = rl_cluster_lib.RLCluster(
    actor=lora_policy,
    reference=ref_model,
    tokenizer=tokenizer,
    cluster_config=cluster_config,
)

print("‚úÖ RL Cluster initialized")

# GRPO Trainer with reward functions for SVAMP
grpo_trainer = GRPOLearner(
    rl_cluster=rl_cluster,
    reward_fns=[
        match_format_exactly,        # Structural constraint
        match_format_approximately,  # Soft structure check
        check_answer_svamp,          # Hard correctness for numbers
        check_number_svamp,          # Flexible number extraction
        soft_reasoning_steps,        # Logical connectors
        meaningful_reasoning_length, # Appropriate length
        reward_algebraic_notation,   # Algebraic equations
    ],
    grpo_config=grpo_config,
)

print("‚úÖ GRPO Trainer initialized")
print(f"   Active reward functions: 7")
print()

print("="*60)
print("‚è≥ TRAINING IN PROGRESS...")
print("="*60)
print(f"üìä Total steps: {MAX_STEPS}")
print()
print("‚òï This will take 1-2 hours. Go grab coffee!")
print("   You can monitor progress in the output below.")
print("="*60)
print()

# TRAINING LOOP
try:
    with mesh:
        grpo_trainer.train(train_dataset)
    
    print("\n" + "="*60)
    print("‚úÖ TRAINING COMPLETED SUCCESSFULLY!")
    print("="*60)
    
except Exception as e:
    print("\n" + "="*60)
    print("‚ùå TRAINING FAILED!")
    print(f"Error: {e}")
    print("="*60)
    raise




üöÄ STARTING GRPO TRAINING



[34m[1mwandb[0m: Currently logged in as: [33msaisurya24[0m ([33msaisurya24-technical-university-of-applied-sciences-w-rz[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


‚úÖ RL Cluster initialized
‚úÖ GRPO Trainer initialized
   Active reward functions: 7

‚è≥ TRAINING IN PROGRESS...
üìä Total steps: 1700

‚òï This will take 1-2 hours. Go grab coffee!
   You can monitor progress in the output below.

Question: For the walls of the house he would use 8 large planks of wood. If each plank of wood needs 74 piece...
Correct Answer: 8
Response: Yes, I‚Äôm ready to put on my mathematical reasoning expert hat! Let‚Äôs tackle this problem.

<reasoning>
Step 1: John needs 8 large planks of wood.
Step 2: Each plank needs 74 pieces of nails.
Step 3: T...
Extracted: 592


Actor Training:   0%|          | 0/1700 [00:00<?, ?step/s]

Question: Every day Ryan spends 7 hours on learning english, 2 hours on learning chinese and 4 hours on learni...
Correct Answer: 3
Response: Yes, I‚Äôm ready to analyze the problem and provide a detailed solution following your specified format.

<reasoning>
Step 1: Ryan spends 7 hours on learning English each day.
Step 2: He spends 2 hours ...
Extracted: 7
Question: Jack received 10 emails in the morning, 5 emails in the afternoon and 4 emails in the evening. How m...
Correct Answer: 1
Response: Yes, I‚Äôm ready to analyze the problem and provide a detailed solution following your specified format.

<reasoning>
Step 1: Jack received 10 emails in the morning.
Step 2: He received 5 emails in the ...
Extracted: 1
Question: Jack received 5 emails and 6 letters in the morning. He then received 9 emails and 7 letters in the ...
Correct Answer: 13
Response: Yes, let‚Äôs solve this problem step-by-step.

<reasoning>
Step 1: Jack received 5 emails in the morning.
Step 2: He received 9 emails 



Question: Every day Ryan spends 5 hours on learning chinese and some more hours on learning english. If he spe...
Correct Answer: 7
Response: Yes, let's solve this problem step-by-step.

<reasoning>
Step 1: Ryan spends 5 hours on learning Chinese and 2 hours more on learning English. So, the time spent on English is 5 + 2 = 7 hours.
Step 2:...
Extracted: 7
Question: Rachel picked 7 ripe apples from her tree. Now the tree has 5 apples still on it. If 3 of those are ...
Correct Answer: 2
Response: Yes, I‚Äôm ready to analyze the problem and provide a step-by-step solution following your specified format.

<reasoning>
Step 1: Rachel picked 7 ripe apples.
Step 2: The tree now has 5 apples remaining...
Extracted: 4
Question: 7 red peaches, 71 yellow peaches and 8 green peaches are in the basket. How many more green peaches ...
Correct Answer: 1
Response: Yes, I understand. Let‚Äôs solve this problem.

<reasoning>
Step 1: We have 7 red peaches, 71 yellow peaches, and 8 green peaches.
Step 2:



Question: Every day Ryan spends 4 hours on learning english and 6 hours on learning chinese. If he learns for ...
Correct Answer: 10
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: Ryan spends 4 hours learning English each day.
Step 2: He spends 6 hours learning Chinese each day.
Step 3: He learns for 86 days.
Step 4: T...
Extracted: 4
Question: Jack received 9 emails in the morning, 10 emails in the afternoon and 7 emails in the evening. How m...
Correct Answer: 2
Response: Yes, I‚Äôm ready to tackle this problem!

<reasoning>
Step 1: Jack received 9 emails in the morning.
Step 2: He received 10 emails in the afternoon.
Step 3: He received 7 emails in the evening.
Step 4: ...
Extracted: 2
Question: Rachel had to complete 8 pages of math homework. If she had to complete 6 more pages of reading home...
Correct Answer: 14
Response: Yes, here‚Äôs the solution:

<reasoning>
Step 1: We are given that Rachel had to complete 8 pages of math homework.
Step 2: We are tol



Question: An industrial machine made 9 shirts yesterday and 8 shirts today. It can make 2 shirts a minute. How...
Correct Answer: 4
Response: Yes, let's solve this problem!

<reasoning>
Step 1: The machine made 9 shirts yesterday.
Step 2: The machine made 8 shirts today.
Step 3: The total number of shirts made is 9 + 8 = 17 shirts.
Step 4: ...
Extracted: 8.5
Question: Stray cats loved eating goldfish in the pond leaving 6 goldfish. Paige had raised 8 goldfish in the ...
Correct Answer: 2
Response: Yes, I‚Äôm ready to analyze the problem and provide a step-by-step solution following your specified format.

<reasoning>
Step 1: Paige started with 8 goldfish.
Step 2: Stray cats ate 6 goldfish.
Step 3...
Extracted: 2
Question: For Gwen's birthday she received 8 dollars from her mom. Her dad gave her 5 more dollars. If she spe...
Correct Answer: 3
Response: Yes, I understand. Let‚Äôs solve this problem step-by-step.

<reasoning>
Step 1: Gwen received 8 dollars from her mom.
Step 2: Her dad g



Question: Every day Ryan spends 7 hours on learning english and some more hours on learning chinese. If he spe...
Correct Answer: 5
Response: Yes, here‚Äôs the solution:

<reasoning>
Step 1: Let 'e' represent the number of hours Ryan spends on learning English each day.
Step 2: Let 'c' represent the number of hours Ryan spends on learning Chi...
Extracted: 2.5
Question: Rachel had to complete 9 pages of math homework, 2 pages of reading homework and 96 more pages of bi...
Correct Answer: 7
Response: Yes, here‚Äôs the solution:

<reasoning>
Step 1: Calculate the number of pages of math homework: Rachel has 9 pages of math homework.
Step 2: Calculate the number of pages of reading homework: She has 2...
Extracted: 7
Question: Every day Ryan spends 6 hours on learning english and 7 hours on learning chinese. How many more hou...
Correct Answer: 1
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Ryan spends 6 hours on English and 7 hours on Chinese each day.
Step 2: To find th



Question: Rachel has 52 apple trees. She picked 2 apples from 1 of her trees. Now the tree has 7 apples still ...
Correct Answer: 9
Response: Yes, here‚Äôs the solution to the problem:

<reasoning>
Step 1: Rachel has 52 apple trees.
Step 2: She picked 2 apples from 1 of her trees.
Step 3: The tree now has 7 apples still on it.
Step 4: Let 'x'...
Extracted: 9
Question: Last week Fred had 86 dollars and Jason had 5 dollars. Fred washed cars over the weekend and now has...
Correct Answer: 14
Response: Yes, here‚Äôs the solution to the problem:

<reasoning>
Step 1: Fred's initial amount was 86 dollars.
Step 2: Fred now has 9 dollars.
Step 3: Jason had 5 dollars last week, and he didn‚Äôt work this week,...
Extracted: 9
Question: Josh had 3 marbles in his collection. He found 6 marbles while he lost 5 marbles. How many more marb...
Correct Answer: 1
Response: Yes, please! Let‚Äôs tackle this problem.

<reasoning>
Step 1: Josh initially had 3 marbles.
Step 2: He found 6 marbles, so the numbe



Question: Rachel had to complete 7 pages of math homework. If she had to complete 4 more pages of math homewor...
Correct Answer: 3
Response: Yes, here‚Äôs the solution to the problem:

<reasoning>
Step 1: We are given that Rachel had to complete 7 pages of math homework.
Step 2: We are told that she had to complete 4 more pages of math homew...
Extracted: 4
Question: Allan brought 2 balloons and Jake brought 3 balloons to the park. How many more balloons did Jake ha...
Correct Answer: 1
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: Allan brought 2 balloons.
Step 2: Jake brought 3 balloons.
Step 3: The difference between the number of balloons brought by Jake and Allan is 3 - 2 ...
Extracted: 1
Question: Rachel's tree had 4 apples initially. She picked some apples and now there are 2 apples left on the ...
Correct Answer: 2
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Rachel started with 4 apples.
Step 2: She picked some apples, so the number of apple



Question: Edward spent $ 4 to buy books and $ 3 to buy pens. Now he has $ 12. How much did Edward spend on boo...
Correct Answer: 7
Response: Yes, let's solve this problem step-by-step.

<reasoning>
Step 1: We need to determine the cost of books and pens.
Step 2: Edward spent $4 on books and $3 on pens.
Step 3: His total spending is $4 + $3...
Extracted: 7
Question: 6 green peaches, 60 yellow peaches and 2 red peaches are in the basket. How many more green peaches ...
Correct Answer: 4
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We have 6 green peaches, 60 yellow peaches, and 2 red peaches.
Step 2: We want to find how many more green peaches than red peaches there are.
Step ...
Extracted: 4
Question: For Gwen's birthday she received 3 dollars from her mom. Her dad gave her 6 more dollars. If she spe...
Correct Answer: 3
Response: Yes, here‚Äôs the solution to Gwen‚Äôs birthday problem:

<reasoning>
Step 1: Gwen received 3 dollars from her mom.
Step 2: Her dad gave her



Question: Adam has 4 more apples than Jackie. Together Adam and Jackie have 14 apples. Bob has 6 apples more t...
Correct Answer: 20
Response: Yes, I understand. Let‚Äôs solve this problem.

<reasoning>
Step 1: Let 'A' be the number of apples Adam has, and 'J' be the number of apples Jackie has.
Step 2: We are given that Adam has 4 more apples...
Extracted: 14
Question: A mailman gives 2 junk mails to each house in a block. If the mailman has to give 14 pieces of junk ...
Correct Answer: 7
Response: Yes, let's solve this problem!

<reasoning>
Step 1: The mailman gives 2 junk mails to each house.
Step 2: He has to give 14 pieces of junk mail to each block.
Step 3: Let 'n' be the number of houses i...
Extracted: 7
Question: Jesse's room is 7 feet wide. If she needs a carpet of size 14 square feet What is the length of her ...
Correct Answer: 2
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We are given that the width of Jesse's room is 7 feet.
Step 2: We are given that the



Question: Julia played tag with 5 kids on monday, 9 kids on tuesday and 15 kids on wednesday. How many kids di...
Correct Answer: 20
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We are given that Julia played with 5 kids on Monday, 9 kids on Tuesday, and 15 kids on Wednesday.
Step 2: We want to find the number of kids she pl...
Extracted: 20
Question: There were 5 roses and 3 orchids in the vase. Jessica cut some more roses and orchids from her flowe...
Correct Answer: 10
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Initially, there were 5 roses and 3 orchids in the vase.
Step 2: Jessica cut some roses and orchids. Let 'r' be the number of roses cut and 'o' be t...
Extracted: 2
Question: Being his favorite, he saved checking on the grapevines for his last stop. He was told by 94 of the ...
Correct Answer: 15
Response: Yes, let's do that!

<reasoning>
Step 1: The pickers filled 90 drums of grapes in 6 days.
Step 2: To find the number of drums filled per



Question: Bryan took a look at his books as well. Each bookshelf contains 2 books. If he has a total of 38 boo...
Correct Answer: 19
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Bryan has a total of 38 books.
Step 2: Each bookshelf contains 2 books.
Step 3: To find the number of bookshelves, divide the total number of books ...
Extracted: 19
Question: Frank had 42 pieces of candy. If he put them equally into 2 bags How many pieces of candy are in eac...
Correct Answer: 21
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Frank has 42 pieces of candy.
Step 2: He wants to put them equally into 2 bags.
Step 3: To find out how many pieces of candy are in each bag, we nee...
Extracted: 21
Question: Dave had 24 apps and 9 files on his phone. After deleting some apps and files he had 5 files and 12 ...
Correct Answer: 7
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Dave initially had 24 apps and 9 files.
Step 2: After deleting some apps and files, 



Question: Some campers went rowing in the morning while 7 campers went rowing in the afternoon. If a total of ...
Correct Answer: 53
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We are given that 7 campers went rowing in the morning and 2 campers went rowing in the afternoon.
Step 2: The total number of campers who went rowi...
Extracted: 58
Question: The school is planning a field trip. The school has 21 classrooms. There are 98 students in the scho...
Correct Answer: 14
Response: Yes, let's solve this problem!

<reasoning>
Step 1: The school has 21 classrooms.
Step 2: There are 98 students in the school.
Step 3: There are 7 school buses.
Step 4: Each bus can hold 78 students.
...
Extracted: 105
Question: A mailman has to give 25 pieces of junk mail to each block. If he gives 5 mails to each house in a b...
Correct Answer: 5
Response: Yes, let's solve this problem!

<reasoning>
Step 1: The mailman gives 25 pieces of junk mail to each block.
Step 2: He gives 5 pieces o



Question: 3 red peaches, and some more green peaches are in the basket. If there are 10 more green peaches tha...
Correct Answer: 13
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We are given that there are 3 red peaches and some more green peaches in the basket.
Step 2: We are told that there are 10 more green peaches than r...
Extracted: 13
Question: A grocery store had a total of 17 bottles of which 9 were bottles of regular soda and the rest were ...
Correct Answer: 8
Response: Yes, here‚Äôs the solution:

<reasoning>
Step 1: The total number of bottles is 17.
Step 2: The number of bottles of regular soda is 9.
Step 3: The number of bottles of diet soda is the total number of ...
Extracted: 8
Question: He then went to see the oranges being harvested. He found out that they harvest 8 sacks per day and ...
Correct Answer: 280
Response: Yes, here‚Äôs the solution:

<reasoning>
Step 1: The oranges are harvested at a rate of 8 sacks per day.
Step 2: Each sack contains 35



Question: Edward spent $ 16 to buy 92 books each book costing him the same amount of money. Now he has $ 6. Ho...
Correct Answer: 22
Response: Yes, let's solve this problem!

<reasoning>
Step 1: The cost of each book is $16.
Step 2: Edward bought 92 books.
Step 3: The total cost of the books is $16 * 92 = $1472.
Step 4:  After buying the boo...
Extracted: 1478
Question: Haley has 20 marbles. In her class 2 boys love to play marbles. If she distributes her marbles equal...
Correct Answer: 10
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Haley has 20 marbles.
Step 2: There are 2 boys in her class who love to play marbles.
Step 3: She distributes her marbles equally among the boys, so...
Extracted: 10
Question: Mary is baking a cake. The recipe calls for 6 cups of flour and 13 cups of sugar. She already put in...
Correct Answer: 21
Response: Yes, let's solve this problem!

<reasoning>
Step 1: The recipe calls for 6 cups of flour and 13 cups of sugar.
Step 2: The amount of 



Question: Rebecca wants to split a collection of eggs into groups of 6. Rebecca has 18 eggs 72 bananas and 66 ...
Correct Answer: 3
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We need to find the total number of eggs, bananas, and marbles.
Step 2: The number of eggs is 18.
Step 3: The number of bananas is 72.
Step 4: The n...
Extracted: 6
Question: He then went to see the oranges being harvested. He found out that the harvest will go on for 4 days...
Correct Answer: 14
Response: Yes, let's solve this problem!

<reasoning>
Step 1: The harvest will last for 4 days.
Step 2: We need to find out how many sacks are harvested per day.
Step 3: We know the total number of sacks harves...
Extracted: 56
Question: 2 red peaches, 6 yellow peaches and 14 green peaches are in the basket. How many more green peaches ...
Correct Answer: 8
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We have 2 red peaches, 6 yellow peaches, and 14 green peaches.
Step 2: To find the num



Question: Carol was sending out birthday invitations to 12 friends. She bought 3 packs each one having the sam...
Correct Answer: 4
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Carol has 12 friends.
Step 2: She bought 3 packs of invitations.
Step 3: Each pack has the same number of invitations, so the number of invitations ...
Extracted: 4
Question: Paco had 28 cookies. He ate some of them and had 7 cookies left. How many cookies did Paco eat?...
Correct Answer: 21
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We know Paco started with 28 cookies.
Step 2: He ate some cookies, so he has 7 cookies left.
Step 3: To find how many cookies he ate, we subtract th...
Extracted: 21
Question: Dave had 19 apps and 18 files on his phone. After deleting some apps and files he had 6 apps and 15 ...
Correct Answer: 9
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Dave initially had 19 apps and 18 files.
Step 2: He deleted some apps and files, so the num



Question: Matthew had some crackers. If Matthew gave 2 crackers to each of his 11 friends How many crackers di...
Correct Answer: 22
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Matthew has 11 friends.
Step 2: He gives 2 crackers to each friend, so he gives 11 * 2 = 22 crackers.
Step 3: Matthew started with some crackers and...
Extracted: 22
Question: Jerry had 8 action figures and 9 books on a shelf in his room. Later he added 10 more books to the s...
Correct Answer: 19
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: Jerry initially had 8 action figures and 9 books.
Step 2: He added 10 more books.
Step 3: To find the total number of books, we add the initial numb...
Extracted: 27
Question: An industrial machine can make 6 shirts a minute. It worked for 5 minutes yesterday and for 12 minut...
Correct Answer: 72
Response: Yes, let's solve this problem!

<reasoning>
Step 1: The machine can make 6 shirts per minute.
Step 2: Yesterday, it worked for 5 minu



Question: Dave had 16 apps and 77 files on his phone. After deleting some apps and files he had 5 apps and 23 ...
Correct Answer: 11
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Dave initially had 16 apps and 77 files.
Step 2: He deleted some apps and files, so he has 5 apps and 23 files left.
Step 3: To find the number of a...
Extracted: 11
Question: Rebecca wants to split a collection of eggs into groups of 5. Rebecca has 10 marbles and 15 eggs. Ho...
Correct Answer: 3
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We need to determine how many groups of 5 eggs Rebecca can make from her 10 marbles. We divide the number of marbles by the number of eggs per group...
Extracted: 5
Question: Rachel had to complete 11 pages of math homework, 2 pages of reading homework and 3 more pages of bi...
Correct Answer: 8
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We are given that Rachel completed 11 pages of math homework.
Step 2: She completed 2 



Question: Steven has 12 peaches. Jake has 4 fewer peaches than Steven who has 67 more peaches than Jill. How m...
Correct Answer: 8
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that Steven has 12 peaches.
Step 2: Steven has 67 more peaches than Jill. We need to find the number of peaches Jill has. Let 'j' repre...
Extracted: None
Question: Baker sold 8 cakes. If he had made 40 cakes initially How many more cakes did baker make than those ...
Correct Answer: 32
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Baker initially sold 8 cakes.
Step 2: He had made 40 cakes initially.
Step 3: The difference between the initial number of cakes and the number of c...
Extracted: 32
Question: Dave had 11 apps and 3 files on his phone. After deleting some apps and files he had 24 files and 2 ...
Correct Answer: 22
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Dave initially had 11 apps and 3 files.
Step 2: After deleting some apps and fil



Question: The grasshopper, the frog and the mouse had a jumping contest. The grasshopper jumped 19 inches. The...
Correct Answer: 15
Response: Yes, let‚Äôs solve this problem step-by-step!

<reasoning>
Step 1: Let‚Äôs denote the distance the frog jumped as 'x' inches.
Step 2: The grasshopper jumped 19 inches.
Step 3: The grasshopper jumped 4 inc...
Extracted: None
Question: Faye was placing her pencils into rows with 5 pencils in each row. If she had 35 pencils and 7 crayo...
Correct Answer: 7
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that Faye has 35 pencils and 7 crayons.
Step 2: The number of pencils in each row is 5.
Step 3: We need to find out how many rows she c...
Extracted: 7
Question: Jackie has 10 apples. Adam has 8 apples. How many more apples does Jackie have than Adam?...
Correct Answer: 2
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: Jackie has 10 apples.
Step 2: Adam has 8 apples.
Step 3: To find the difference, we sub



Question: There are 87 oranges and 290 bananas in Philip's collection. If the bananas are organized into 2 gro...
Correct Answer: 145
Response: Yes, let's solve this problem step-by-step.

<reasoning>
Step 1: We are given that there are 87 oranges and 290 bananas in Philip's collection.
Step 2: The bananas are organized into 2 groups.
Step 3:...
Extracted: 145
Question: Frank was reading through his favorite book. The book had 3 chapters, each with the same number of p...
Correct Answer: 198
Response: Yes, let‚Äôs solve this problem step-by-step.

<reasoning>
Step 1: We know the book has a total of 594 pages.
Step 2: It took Frank 607 days to finish the book.
Step 3: The book has 3 chapters, each wit...
Extracted: 198
Question: Randy has 95 blocks. He uses 20 blocks to build a house and 50 blocks to build a tower. How many mor...
Correct Answer: 30
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that Randy has 95 blocks.
Step 2: He used 20 blocks to build a



Question: Zachary did 46 push-ups and 58 crunches in gym class today. David did 38 more push-ups but 62 less c...
Correct Answer: 12
Response: Yes, let‚Äôs solve this problem step-by-step.

<reasoning>
Step 1: We need to find the number of crunches Zachary did minus the number of push-ups he did.
Step 2: Let ‚Äòx‚Äô be the number of crunches Zacha...
Extracted: None
Question: Lewis earns $ 2 every week during the harvest. If he earns a total of $ 178 How many weeks did the h...
Correct Answer: 89
Response: Yes, let's solve this problem step-by-step.

<reasoning>
Step 1: Lewis earns $2 every week.
Step 2: His total earnings are $178.
Step 3: To find the number of weeks, we divide the total earnings by th...
Extracted: 89
Question: Danny collects bottle caps and wrappers. He found 71 bottle caps and 24 wrappers at the park. Now he...
Correct Answer: 12
Response: Yes, let‚Äôs solve this problem step-by-step.

<reasoning>
Step 1: Danny started with 71 bottle caps and 24 wrappers.
Step 2: H



Question: There are 20 houses in a block. If a mailman has to give 32 pieces of junk mail to each house in eac...
Correct Answer: 640
Response: Yes, let's solve this problem step-by-step.

<reasoning>
Step 1: We are given that there are 20 houses in a block.
Step 2: The mailman has to give 32 pieces of junk mail to each house in each block.
S...
Extracted: 640
Question: The Razorback t-shirt shop makes $ 98 dollars off each t-shirt sold. During the Arkansas game and th...
Correct Answer: 8722
Response: Yes, let's solve this problem step-by-step.

<reasoning>
Step 1: The Razorback t-shirt shop makes $98 per t-shirt sold.
Step 2: They sold a total of 163 t-shirts.
Step 3: They sold 89 t-shirts during ...
Extracted: 8732
Question: The grasshopper, the frog and the mouse had a jumping contest. The grasshopper jumped 36 inches. The...
Correct Answer: 53
Response: Yes, let‚Äôs solve this problem step-by-step.

<reasoning>
Step 1: We are given that the grasshopper jumped 36 inches.
Step 2: Th



Question: 13 campers went rowing and 59 campers went hiking in the morning. 21 campers went rowing in the afte...
Correct Answer: 34
Response: Yes, let‚Äôs solve this problem step-by-step.

<reasoning>
Step 1: We are given that 13 campers went rowing and 59 campers went hiking in the morning.
Step 2: We are also given that 21 campers went rowi...
Extracted: 42
Question: A grocery store had 49 bottles of regular soda, 40 bottles of diet soda and 6 bottles of lite soda. ...
Correct Answer: 89
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that the grocery store has 49 bottles of regular soda, 40 bottles of diet soda, and 6 bottles of lite soda.
Step 2: We want to find the...
Extracted: 89
Question: A grocery store had 81 bottles of regular soda, 60 bottles of diet soda and 60 bottles of lite soda....
Correct Answer: 21
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that the grocery store has 81 bottles of regular soda, 60 bottle



Question: Last week Fred had 60 dollars and Jason had 19 dollars. Jason washed cars over the weekend and now h...
Correct Answer: 14
Response: Yes, let‚Äôs solve this problem step-by-step!

<reasoning>
Step 1: We need to find out how much money Jason made washing cars.
Step 2: Jason started with 19 dollars and now has 33 dollars. The differenc...
Extracted: 14
Question: There are 14 different movies and 15 different books in the ' crazy silly school ' series. If you re...
Correct Answer: 1
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We are given that there are 14 different movies and 15 different books in the ' crazy silly school ' series.
Step 2: We read 11 of the books and wat...
Extracted: 1
Question: 44 campers went rowing in the morning 39 campers went rowing in the afternoon and 31 campers went ro...
Correct Answer: 5
Response: Yes, let‚Äôs solve this problem step-by-step.

<reasoning>
Step 1: We are given the number of campers who went rowing in the morning, in



Question: Julia played tag with some kids on monday. She played tag with 14 kids on tuesday. If she played wit...
Correct Answer: 2
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that Julia played tag with 14 kids on Tuesday.
Step 2: We are told that she played with a total of 16 kids on Tuesday.
Step 3: We want ...
Extracted: 2
Question: He also had 56 aquariums for saltwater animals and 10 aquariums for freshwater animals. Each aquariu...
Correct Answer: 2184
Response: Yes, I‚Äôm ready to tackle this problem! Let‚Äôs do it.

<reasoning>
Step 1: Tyler has 56 saltwater aquariums and 10 freshwater aquariums.
Step 2: Each aquarium has 39 animals.
Step 3: Total number of ani...
Extracted: 1894
Question: 62 campers went rowing in the morning. 39 campers went rowing in the afternoon. How many more camper...
Correct Answer: 23
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that 62 campers went rowing in the morning and 39 campers we



Question: Frank was reading through his favorite book. He read 8 pages per day. If the book had 576 pages How ...
Correct Answer: 72
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that Frank reads 8 pages per day.
Step 2: We are given that the book has 576 pages.
Step 3: To find the number of days, we divide the t...
Extracted: 72
Question: He then went to see the oranges being harvested. He found out that they harvest 67 sacks per day and...
Correct Answer: 420
Response: Yes, I‚Äôm ready to put on my mathematical reasoning expert hat!

<reasoning>
Step 1: The oranges harvest each day is 67 sacks/day * 12 oranges/sack = 804 oranges/day.
Step 2: They harvest a total of 35...
Extracted: 28140
Question: Paige was helping her mom plant flowers and together they planted 36 seeds. They put 12 seeds in eac...
Correct Answer: 3
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We know the total number of seeds planted is 36.
Step 2: They planted 12 s



Question: Matthew had 24 crackers. If Matthew gave equal numbers of crackers to his 3 friends and still had 17...
Correct Answer: 7
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: Matthew started with 24 crackers.
Step 2: He gave equal numbers of crackers to his 3 friends. Let ‚Äòx‚Äô be the number of crackers each friend received...
Extracted: None
Question: Last week Fred had 19 dollars and Jason had 16 dollars. They washed cars over the weekend and now Fr...
Correct Answer: 21
Response: Yes, I understand. Let‚Äôs solve this problem step-by-step.

<reasoning>
Step 1: Let 'f' be the amount of money Fred earned.
Step 2: Let 'j' be the amount of money Jason earned.
Step 3: We are given tha...
Extracted: 109
Question: Randy has 90 blocks. He uses 89 blocks to build a house and 63 blocks to build a tower. How many mor...
Correct Answer: 26
Response: Yes, I understand. Let‚Äôs solve this problem.

<reasoning>
Step 1: We are given that Randy has 90 blocks.
Step 2: He used 89



Question: Jake has 6 fewer peaches and 8 more apples than Steven. Steven has 17 peaches and 16 apples. How man...
Correct Answer: 1
Response: Yes, let‚Äôs do this!

<reasoning>
Step 1: We are given that Jake has 6 fewer peaches than Steven. Let 'x' represent the number of peaches Jake has. We can write this as: x = 17 - 6 = 11 peaches.
Step 2...
Extracted: 5
Question: He then went to see the oranges being harvested. He found out that they harvest 71 sacks of unripe o...
Correct Answer: 11
Response: Yes, I understand. Let‚Äôs tackle this problem!

<reasoning>
Step 1: The problem states that he harvested 71 sacks of unripe oranges and 60 sacks of ripe oranges per day.
Step 2: Let 'x' be the number o...
Extracted: 11
Question: At the zoo, a cage had 95 snakes and 61 alligators. If 64 snakes were hiding How many snakes were no...
Correct Answer: 31
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given that there were 95 snakes and 61 alligators at the zoo.
Step 2: W



Question: Paco had 39 sweet cookies and 6 salty cookies. He ate 23 salty cookies and 32 sweet cookies. How man...
Correct Answer: 9
Response: Yes, let‚Äôs do this!

<reasoning>
Step 1: We are given that Paco has 39 sweet cookies and 6 salty cookies.
Step 2: He ate 23 salty cookies.
Step 3: We are given that he ate 32 sweet cookies.
Step 4: We...
Extracted: 9
Question: 19 red peaches, 11 yellow peaches and 12 green peaches are in the basket. How many more red peaches ...
Correct Answer: 8
Response: Yes, let‚Äôs do this!

<reasoning>
Step 1: We have 19 red peaches, 11 yellow peaches, and 12 green peaches.
Step 2: We want to find how many more red peaches than yellow peaches there are.
Step 3:  Let ...
Extracted: 8
Question: Danny collects bottle caps and wrappers. He found 46 wrappers and 50 bottle caps at the park. Now he...
Correct Answer: 4
Response: Yes, I understand. Let‚Äôs solve this problem step-by-step.

<reasoning>
Step 1: Danny starts with 46 wrapper and 50 bottle caps.
Step 2



Question: Bobby had 32 pieces of candy. He ate some pieces of candy. If he has 20 pieces of candy left How man...
Correct Answer: 12
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: Bobby started with 32 pieces of candy.
Step 2: He ate some pieces of candy, so we need to subtract the number of pieces he ate from the total number...
Extracted: 12
Question: There are 22 different books and 10 different movies in the ' crazy silly school ' series. If you re...
Correct Answer: 10
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given that there are 22 different books and 10 different movies in the ' crazy silly school ' series.
Step 2: We read 12 of the books and wa...
Extracted: 10
Question: A waiter had 12 customers. After some left he still had 14 customers. Then he got 10 new customers H...
Correct Answer: 24
Response: Yes, let‚Äôs solve this problem step-by-step!

<reasoning>
Step 1: The waiter initially had 12 customers.
Step 2: After some left, 



Question: Josh lost 11 marbles. If he had 19 marbles in his collection earlier How many marbles does he have n...
Correct Answer: 8
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: Josh started with 19 marbles.
Step 2: He lost 11 marbles.
Step 3: To find the number of marbles he now has, we subtract the number of marbles he los...
Extracted: 8
Question: The grasshopper, the frog and the mouse had a jumping contest. The grasshopper jumped 19 inches. The...
Correct Answer: 58
Response: Yes, let‚Äôs solve this word problem step-by-step!

<reasoning>
Step 1: Let 'x' represent the distance the mouse jumped.
Step 2: The frog jumped 39 inches farther than the grasshopper, so the frog jumpe...
Extracted: None
Question: David did 30 more push-ups than Zachary in gym class today. If David did 37 push-ups How many push-u...
Correct Answer: 7
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that David did 30 more push-ups than Zachary. We can represent th



Question: A grocery store had 19 bottles of diet soda and 60 bottles of regular soda. How many more bottles of...
Correct Answer: 41
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that the grocery store had 19 bottles of diet soda and 60 bottles of regular soda.
Step 2: We want to find the difference between the n...
Extracted: 41
Question: 53 campers went rowing in the morning 48 campers went rowing in the afternoon and 49 campers went ro...
Correct Answer: 4
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given the number of campers who went rowing in the morning, in the afternoon, and in the evening.
Step 2: Let 'm' represent the number of cam...
Extracted: 1
Question: There were 13 roses and 84 orchids in the vase. Jessica cut some more roses and orchids from her flo...
Correct Answer: 1
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given that there were initially 13 roses and 84 orchids in the vase.
Ste



Question: The grasshopper, the frog and the mouse had a jumping contest. The grasshopper jumped 19 inches. The...
Correct Answer: 30
Response: Yes, let‚Äôs solve this problem step-by-step!

<reasoning>
Step 1: Let 'x' represent the distance the mouse jumped in inches.
Step 2: The frog jumped 10 inches farther than the grasshopper, so the frog ...
Extracted: None
Question: A grocery store had a total of 30 bottles of which 28 were bottles of regular soda and the rest were...
Correct Answer: 2
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that the grocery store had a total of 30 bottles of soda.
Step 2: 28 of the bottles were regular soda.
Step 3: The rest of the bottles ...
Extracted: 2
Question: Julia played tag with 16 kids on monday. If she played tag with 12 more kids on monday than on tuesd...
Correct Answer: 4
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: On Monday, Julia played with 16 kids.
Step 2: On Tuesday, she played with 12 m



Question: The grasshopper, the frog and the mouse had a jumping contest. The grasshopper jumped 21 inches. The...
Correct Answer: 41
Response: Yes, let‚Äôs solve this problem step-by-step!

<reasoning>
Step 1: Let 'x' be the number of inches the mouse jumped.
Step 2: The grasshopper jumped 21 inches.
Step 3: The grasshopper jumped 25 inches fa...
Extracted: 37
Question: Bobby ate some pieces of candy. Then he ate 25 more. If he ate a total of 43 pieces of candy How man...
Correct Answer: 18
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that Bobby ate 25 more pieces of candy.
Step 2: We are told that he ate a total of 43 pieces of candy.
Step 3: Let 'x' be the number of...
Extracted: 18
Question: Winter is almost here and most animals are migrating to warmer countries. There are 38 bird families...
Correct Answer: 47
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given that there are 38 bird families living near the mountain.
Step 



Question: Brenda's mother made cookies for 14. If each of them had 30 cookies How many cookies did she prepare...
Correct Answer: 420
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that Brenda‚Äôs mother made a total of 14 cookies.
Step 2: Each of them had 30 cookies.
Step 3: We need to find the total number of cooki...
Extracted: 90
Question: Robin cut off 13 inches of his hair. If his hair was 14 inches long initially How long is his hair n...
Correct Answer: 1
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that Robin cut off 13 inches of his hair initially.
Step 2: Let the new length of his hair be 'x' inches.
Step 3: We are told that his ...
Extracted: 1
Question: If Lewis earns a total of $ 460 during 5 weeks of harvest How much money does he earn each week?...
Correct Answer: 92
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that Lewis earns a total of $460 during 5 weeks of harvest.
Step 2:



Question: Paul got a box of 110 crayons for his birthday. During the school year he gave 90 crayons to his fri...
Correct Answer: 322
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: Let 'c' be the number of crayons Paul started with. We are given that he had 110 crayons, so c = 110.
Step 2: Let ‚Äòg‚Äô be the number of crayons he g...
Extracted: 322
Question: You have 104 dollars. How many packs of dvds can you buy if each pack costs 26 dollars?...
Correct Answer: 4
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that you have 104 dollars.
Step 2: We are also given that each pack of dvds costs 26 dollars.
Step 3: We need to find out how many pack...
Extracted: 4
Question: Next on his checklist is wax to stick the feathers together. If he has 557 g of wax and right now he...
Correct Answer: 574
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given that the person has 557 grams of wax.
Step 2: He needs 17 grams of wax 



Question: The school is planning a field trip. The school has 24 classrooms. There are 120 students in the sch...
Correct Answer: 5
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given that the school has 24 classrooms.
Step 2: We are given that there are 120 students in the school.
Step 3: We are given that each clas...
Extracted: 5
Question: Luke played a trivia game and scored 154 points. If he gained the 11 points in each round How many r...
Correct Answer: 14
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that Luke scored a total of 154 points.
Step 2: We are told that he gained 11 points in each round.
Step 3: Let 'r' be the number of ro...
Extracted: 14
Question: Baker made 133 cakes. If he sold 51 of them How many more cakes did baker make than those he sold?...
Correct Answer: 82
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that Baker made 133 cakes.
Step 2: We are told that he sold 51 of th



Question: Debby bought 95 soda bottles and 180 water bottles when they were on sale. If she drank 15 water bot...
Correct Answer: 12
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that Debby bought 95 soda bottles and 180 water bottles.
Step 2: She drank 15 water bottles and 54 soda bottles a day.
Step 3: We want ...
Extracted: 12
Question: The Razorback shop makes $ 5 dollars off each jersey and $ 215 off each t-shirt. During the Arkansas...
Correct Answer: 4300
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: Let 't' be the price of each t-shirt and 'j' be the price of each jersey.
Step 2: The total revenue from t-shirts is 20t and the total revenue from ...
Extracted: None
Question: The school is planning a field trip. The school has 84 classrooms. 95 school buses are fully filled ...
Correct Answer: 11210
Response: ÊàòÊñóÔºÅ

<reasoning>
Step 1: We are given that the school has 84 classrooms.
Step 2: We are given that 95 school buses are



Question: Baker made 19 cakes and 131 pastries. If he sold 70 cakes and 88 pastries How many more pastries tha...
Correct Answer: 112
Response: ÊàòÊñóÔºÅ

<reasoning>
Let‚Äôs break down this problem step-by-step.
First, we need to find out how many cakes Baker made. He made 19 cakes.
Next, let‚Äôs find out how many pastries he made. He sold 70 cakes an...
Extracted: 18
Question: Because of the decision Sofia asked the students to suggest specific types of food. 457 students sug...
Correct Answer: 394
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: Let 'x' be the number of students who suggested adding mashed potatoes.
Step 2: Let 'y' be the number of students who suggested adding bacon.
Step ...
Extracted: 197
Question: Paul got a box of 531 crayons and 38 erasers for his birthday. At the end of the school year he only...
Correct Answer: 353
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: Let ‚Äòc‚Äô be the number of crayons Paul had initially and ‚Äòe‚



Question: In a school there are 902 girls and 811 boys. 44 more girls joined the school. How many girls are th...
Correct Answer: 946
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given the number of girls in the school initially: 902.
Step 2: We are given the number of boys in the school initially: 811.
Step 3: We are ...
Extracted: 946
Question: A farmer had 160 tomatoes in his garden. If he picked 56 of them yesterday and 41 today. How many di...
Correct Answer: 104
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that the farmer initially had 160 tomatoes.
Step 2: He picked 56 tomatoes yesterday.
Step 3: We are told he picked 41 tomatoes today.
S...
Extracted: 104
Question: Paul had 115 books. He sold 78 books in a garage sale. How many books does he have left with him?...
Correct Answer: 37
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that Paul had 115 books initially.
Step 2: He sold 78 books.
St



Question: The bananas in Philip's collection are organized into groups of size 18. If there are a total of 180...
Correct Answer: 10
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given that there are a total of 180 bananas in Philip‚Äôs banana collection.
Step 2: The bananas are organized into groups of size 18.
Step 3:...
Extracted: 10
Question: The ring toss game at the carnival made the same amount of money each day. In total in 30 days they ...
Correct Answer: 14
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given that the total earnings for the 30 days were 420 dollars.
Step 2: We are also given that the total earnings with the game were 22 doll...
Extracted: 14
Question: There were 22 parents in the program and 676 people in total. How many pupils were present in the pr...
Correct Answer: 654
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that there were 22 parents in the program.
Step 2: We are giv



Question: Because of the decision Sofia asked 288 students to suggest specific types of food. 264 students sug...
Correct Answer: 24
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given that 264 students suggested adding bacon, and 240 students suggested adding mashed potatoes.
Step 2: We want to find out how many stud...
Extracted: 504
Question: In a school there are 308 girls and 318 boys. There are also 36 teachers How many pupils are there i...
Correct Answer: 626
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: Let 'g' be the number of girls and 'b' be the number of boys.
Step 2: We are given that there are 308 girls and 318 boys, so g = 308 and b = 318.
St...
Extracted: 662
Question: Next on his checklist is wax to stick the feathers together. He needs 159 g of wax more. If the feat...
Correct Answer: 469
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given that the feathers require a total of 628 grams of wax.
Step



Question: Baker made 8 cakes. He bought 139 new cakes and sold 145 cakes. How many more cakes did baker sell t...
Correct Answer: 6
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that Baker made 8 cakes initially.
Step 2: He bought 139 new cakes.
Step 3: He sold 145 cakes.
Step 4: We want to find the difference b...
Extracted: 137
Question: After eating a hearty meal they went to see the Buckingham palace. There, Rachel learned that 132 vi...
Correct Answer: 274
Response: Yes, let‚Äôs tackle this problem!

<reasoning>
Step 1: We are given that 132 people visited the Buckingham palace on the day they ate a hearty meal.
Step 2: We are also given that there were 327 days in...
Extracted: None
Question: The Razorback t-shirt shop makes $ 106 dollars off each t-shirt sold. During the Arkansas game and t...
Correct Answer: 127
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that the Razorback t-shirt shop makes $106 per t-shirt sold.



Question: There are 3941 skittles in Steven's skittles collection. Steven also has 4950 erasers. If the eraser...
Correct Answer: 10
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We are given that Steven has 3941 skittles.
Step 2: We are given that Steven has 4950 erasers.
Step 3: We are told that the erasers are organized in...
Extracted: 10
Question: Jerry had 7 books and 3 action figures on a shelf in his room. Later he added 2 more action figures ...
Correct Answer: 2
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that Jerry had 7 books and 3 action figures on a shelf initially.
Step 2: He added 2 more action figures to the shelf.
Step 3: We want ...
Extracted: 2
Question: Every day Ryan spends 6 hours on learning english and 7 hours on learning chinese. If he learns for ...
Correct Answer: 65
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that Ryan spends 6 hours on learning English and 7 hours on learning



Question: Rachel picked 7 apples from her tree. Thereafter 2 new apples grew on the tree. Now the tree has 6 a...
Correct Answer: 11
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that Rachel picked 7 apples from her tree.
Step 2: Then 2 new apples grew on the tree. So the total number of apples now is 7 + 2 = 9 a...
Extracted: 3
Question: Jerry had 3 books and 4 action figures on a shelf in his room. Later he added 2 more action figures ...
Correct Answer: 3
Response: Yes, let‚Äôs solve this problem!

<reasoning>
Step 1: We are given that Jerry had 3 books and 4 action figures on a shelf initially.
Step 2: He added 2 more action figures to the shelf.
Step 3: We need ...
Extracted: 3
Question: 3 birds were sitting on the fence. 6 more storks and 2 more birds came to join them. How many more s...
Correct Answer: 1
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We are given that there were initially 3 birds sitting on the fence.
Step 2: Then 6



Question: Jerry had 7 action figures and 2 books on a shelf in his room. Later he added 4 more books to the sh...
Correct Answer: 1
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We are given that Jerry had 7 action figures and 2 books on a shelf initially.
Step 2: He added 4 more books to the shelf.
Step 3: We need to find h...
Extracted: 1
Question: Mary is baking a cake. The recipe calls for 9 cups of sugar 7 cups of flour and 4 cups of salt. She ...
Correct Answer: 1
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We are given the recipe for a cake, requiring 9 cups of sugar, 7 cups of flour, and 4 cups of salt.
Step 2: Mary has already put in 2 cups of flour....
Extracted: 7
Question: Jerry had 5 action figures and 9 books on a shelf in his room. Later he added 7 more action figures ...
Correct Answer: 3
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We are given that Jerry had 5 action figures and 9 books on a shelf.
Step 2: He added 7 



Question: Haley grew 14 trees in her backyard. After a typhoon 9 died. How many more trees died in the typhoon...
Correct Answer: 4
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We are given that Haley grew 14 trees in her backyard.
Step 2: A typhoon killed 9 trees.
Step 3: We need to find out how many more trees died in the...
Extracted: 5
Question: Winter is almost here and most animals are migrating to warmer countries. There were 87 bird familie...
Correct Answer: 73
Response: Yes, let's solve this word problem!

<reasoning>
Step 1: We are given that there were 87 bird families living near the mountain.
Step 2: We are told that 7 bird families flew away for winter.
Step 3: ...
Extracted: 80
Question: Mary is baking a cake. The recipe calls for 14 cups of flour and 6 cups of sugar. She already put in...
Correct Answer: 3
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We are given the recipe calls for 14 cups of flour and 6 cups of sugar.
Step 2: Mary a



Question: Marco and his dad went strawberry picking. Together they collected strawberries that weighed 24 poun...
Correct Answer: 12
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Let 'x' be the weight of Marco's dad's strawberries in pounds.
Step 2: We can write a system of two equations based on the given information:
   *  ...
Extracted: 12
Question: Mary is baking a cake. The recipe calls for 11 cups of flour and 7 cups of sugar. She already put in...
Correct Answer: 2
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Let 'f' represent the number of cups of flour Mary put in.
Step 2: Let 's' represent the number of cups of sugar Mary put in.
Step 3: We are given t...
Extracted: 9
Question: Mary is baking a cake. The recipe calls for 9 cups of flour and 11 cups of sugar. She already put in...
Correct Answer: 6
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Let 'f' represent the cups of flour needed, and 's' represent the cups of sugar needed



Question: Matthew had 23 crackers. He has 11 crackers left after he gave equal numbers of crackers to his 2 fr...
Correct Answer: 6
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Let 'x' be the number of crackers each friend ate.
Step 2: Matthew gave away a total of 23 crackers, so x + x = 23.
Step 3: Combine like terms: 2x =...
Extracted: 6
Question: Edward spent $ 6 to buy books and $ 16 to buy pens. Now he has $ 19. How much did Edward have before...
Correct Answer: 41
Response: Yes, let's solve this problem step-by-step.

<reasoning>
Step 1: Let 'x' represent the amount of money Edward had before buying the books and pens.
Step 2: He spent $6 on books, so the amount spent on...
Extracted: 41
Question: In Haley's class 4 are boys who love to play marbles. If Haley has 23 marbles and wants to keep 15 m...
Correct Answer: 2
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Let 'x' be the number of marbles each boy receives.
Step 2: Haley has 23 marbles, and 



Question: Matthew had 14 crackers and 21 cakes. If Matthew gave equal numbers of crackers and cakes to his 7 f...
Correct Answer: 5
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Let 'x' be the number of crackers each friend receives.
Step 2: Let 'y' be the number of cakes each friend receives.
Step 3: We know that Matthew ga...
Extracted: 7
Question: Zachary did 44 push-ups in gym class today. David did 58 more push-ups than zachary. How many push-u...
Correct Answer: 146
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We are given that Zachary did 44 push-ups.
Step 2: David did 58 more push-ups than Zachary. This means David's push-ups = Zachary's push-ups + 58.
S...
Extracted: 146
Question: Marco and his dad went strawberry picking. Marco's strawberries weighed 10 pounds. If together their...
Correct Answer: 6
Response: Yes, let's solve this problem!

<reasoning>
Step 1: We are given that Marco's strawberries weighed 10 pounds.
Step 2: We are given that 



Question: The grasshopper, the frog and the mouse had a jumping contest. The grasshopper jumped 25 inches. The...
Correct Answer: 31
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Let's denote the distance the grasshopper jumped as 'g', the distance the frog jumped as 'f', and the distance the mouse jumped as 'm'.
Step 2: We a...
Extracted: 31
Question: There were 13 roses in the vase. Jessica cut some more roses from her flower garden which had a tota...
Correct Answer: 4
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Let 'x' be the number of roses Jessica cut from the garden.
Step 2: We know that Jessica cut 12 roses, so x = 12.
Step 3: After cutting, there were ...
Extracted: 5
Question: Winter is almost here and most animals are migrating to warmer countries. There are 85 bird families...
Correct Answer: 25
Response: Yes, let's solve this arithmetic word problem!

<reasoning>
Step 1: Let 'x' be the number of bird families remaining near the mountain.
S



Question: Mary is baking a cake. The recipe calls for 12 cups of sugar and 14 cups of flour. She already put i...
Correct Answer: 12
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Let 's' represent the amount of sugar needed, and 'f' represent the amount of flour needed.
Step 2: The recipe calls for 12 cups of sugar and 14 cup...
Extracted: 12
Question: Jessie currently weighs 9 kilograms. After she started to go jogging everyday she lost 62 kilograms ...
Correct Answer: 211
Response: Yes, let's solve this problem step-by-step.

<reasoning>
Step 1: Let ‚Äòx‚Äô be the initial weight of Jessie in kilograms.
Step 2: After the first week, Jessie lost 62 kilograms, so her weight became x - ...
Extracted: 202
Question: Randy has 79 blocks. He uses 14 blocks to build a tower and 11 blocks to build a house. How many blo...
Correct Answer: 54
Response: Yes, let's solve this problem!

<reasoning>
Step 1: Let 'x' be the number of blocks remaining.
Step 2: We know Randy started with



Question: A farmer had 105 green tomatoes and 71 red tomatoes in his garden. If he picked 137 tomatoes How man...
Correct Answer: 39
Response: Yes, let's solve this problem step by step.

<reasoning>
Step 1: Let 'g' be the number of green tomatoes and 'r' be the number of red tomatoes.
Step 2: We are given that the farmer has 105 green tomat...
Extracted: 39
Question: Each pack of dvds costs 107 dollars. If there is a discount of 106 dollars on each pack How many pac...
Correct Answer: 93
Response: Yes, let's solve this problem step-by-step.

<reasoning>
Step 1: Let $n$ be the number of packs of dvds that can be bought. The total cost of $n$ packs of dvds is $107n$ dollars.
Step 2: We are given ...
Extracted: 0
Question: There were 100 dollars in Olivia's wallet. She collected 148 more dollars from an atm. After she vis...
Correct Answer: 89
Response: Yes, let's solve this problem step-by-step.

<reasoning>
Step 1: Let‚Äôs represent the initial amount of money in Olivia‚Äôs wallet as $



Question: Paul got a box of 440 crayons for his birthday. During the school year he gave 111 crayons to his fr...
Correct Answer: 223
Response: Yes, let's solve this problem step-by-step.

<reasoning>
Step 1: Let the number of crayons Paul had initially be $C = 440$.
Step 2: Let the number of crayons Paul gave to his friends be $G = 111$.
Ste...
Extracted: 223
Question: In a school there are 706 girls and 222 boys. 418 more girls joined the school. How many pupils are ...
Correct Answer: 1346
Response: Yes, let's solve this problem step-by-step.

<reasoning>
Step 1: Let $G$ be the number of girls and $B$ be the number of boys in the school initially. We are given that $G = 706$ and $B = 222$.
Step 2...
Extracted: 1346


ERROR:asyncio:Exception in callback Task.__step()
handle: <Handle Task.__step()>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
RuntimeError: cannot enter context: <_contextvars.Context object at 0x7bcd82f3ffc0> is already entered
ERROR:asyncio:Exception in callback Task.__step()
handle: <Handle Task.__step()>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
RuntimeError: cannot enter context: <_contextvars.Context object at 0x7bcd82f3ffc0> is already entered
ERROR:asyncio:Exception in callback Task.__step()
handle: <Handle Task.__step()>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
RuntimeError: cannot enter context: <_contextvars.Context object at 0x7bcd82f3ff

0,1
actor/train/kl,‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÇ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñà‚ñÅ‚ñÅ‚ñÇ‚ñÅ‚ñÇ‚ñÅ‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÅ‚ñÇ‚ñÇ‚ñÉ‚ñÇ‚ñÉ‚ñÉ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÇ‚ñÇ‚ñÑ‚ñÇ
actor/train/loss,‚ñÇ‚ñÉ‚ñà‚ñÑ‚ñÉ‚ñÖ‚ñÉ‚ñÉ‚ñÉ‚ñà‚ñÉ‚ñÉ‚ñÑ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñÑ‚ñÇ‚ñÅ‚ñÜ‚ñÇ‚ñÜ‚ñÉ‚ñÑ‚ñÇ‚ñÑ‚ñÉ‚ñÜ‚ñÖ‚ñÅ‚ñÖ‚ñÇ‚ñá‚ñÖ‚ñÖ‚ñÅ‚ñÇ‚ñÑ‚ñÑ
actor/train/perplexity,‚ñÑ‚ñÑ‚ñà‚ñá‚ñÉ‚ñÑ‚ñÑ‚ñÉ‚ñÖ‚ñÇ‚ñÇ‚ñÉ‚ñÑ‚ñÑ‚ñÉ‚ñÑ‚ñÉ‚ñÉ‚ñá‚ñÜ‚ñÇ‚ñÉ‚ñÖ‚ñÅ‚ñÑ‚ñÉ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÇ‚ñá‚ñÑ‚ñÜ‚ñÑ‚ñÖ‚ñÖ‚ñÑ‚ñÖ‚ñÖ
actor/train/step_time_sec,‚ñà‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ
actor/train/steps_per_sec,‚ñá‚ñà‚ñà‚ñà‚ñá‚ñà‚ñá‚ñà‚ñá‚ñà‚ñá‚ñá‚ñá‚ñà‚ñà‚ñÅ‚ñá‚ñá‚ñà‚ñá‚ñà‚ñà‚ñá‚ñà‚ñà‚ñá‚ñà‚ñá‚ñà‚ñá‚ñÅ‚ñá‚ñá‚ñà‚ñà‚ñá‚ñá‚ñá‚ñá‚ñà
actor/train/tflops_per_step,‚ñÅ
jax/core/compile/backend_compile_duration,‚ñÅ
jax/core/compile/jaxpr_to_mlir_module_duration,‚ñÅ
jax/core/compile/jaxpr_trace_duration,‚ñÅ
jax/orbax/write/replicated_array_gb,‚ñÅ

0,1
actor/train/kl,0.03727
actor/train/loss,-0.04859
actor/train/perplexity,0.95257
actor/train/step_time_sec,0.08434
actor/train/steps_per_sec,11.85686
actor/train/tflops_per_step,8.26922
jax/core/compile/backend_compile_duration,1768009363.65133
jax/core/compile/jaxpr_to_mlir_module_duration,1768009362.16526
jax/core/compile/jaxpr_trace_duration,1768009360.51058
jax/orbax/write/replicated_array_gb,6e-05



‚úÖ TRAINING COMPLETED SUCCESSFULLY!


## üìà Post-Training Evaluation

Testing the fine-tuned model on the held-out test set to measure improvement.

**Expected Improvements**:
- Accuracy: +10-20% absolute gain
- Format compliance: +10-15% (near-perfect structure)
- Reasoning quality: More logical, step-by-step solutions

In [26]:
import os
import jax._src.monitoring as monitoring

# Disable wandb
os.environ['WANDB_MODE'] = 'disabled'

# Clear JAX monitoring callbacks - access the internal list directly
try:
    monitoring._scalar_listeners.clear()
    print("‚úÖ Cleared JAX monitoring callbacks")
except Exception as e:
    print(f"‚ö†Ô∏è Could not clear callbacks: {e}")
    # Fallback: replace the list entirely
    monitoring._scalar_listeners = []
    print("‚úÖ Replaced monitoring callbacks with empty list")

print("\n" + "="*60)
print("üìä EVALUATING TRAINED MODEL")
print("="*60)
print("‚è≥ This will take 3-5 minutes...")
print()

# Recreate sampler with trained model
trained_sampler = sampler_lib.Sampler(
    transformer=lora_policy,
    tokenizer=tokenizer,
    cache_config=sampler_lib.CacheConfig(
        cache_size=MAX_PROMPT_LENGTH + TOTAL_GENERATION_STEPS + 256,
        num_layers=model_config.num_layers,
        num_kv_heads=model_config.num_kv_heads,
        head_dim=model_config.head_dim,
    ),
)

(corr_after, total_after, accuracy_after, partial_accuracy_after, format_accuracy_after) = evaluate(
    test_dataset,
    trained_sampler,
    **GENERATION_CONFIGS["greedy"],
)

print("\n" + "="*60)
print("üìà POST-TRAINING RESULTS:")
print(f"   Correct answers: {corr_after}/{total_after}")
print(f"   Accuracy: {accuracy_after:.2f}%")
print(f"   Partial accuracy: {partial_accuracy_after:.2f}%")
print(f"   Format compliance: {format_accuracy_after:.2f}%")
print("="*60)

print("\n" + "="*60)
print("üìä IMPROVEMENT COMPARISON:")
print("="*60)
print(f"   Accuracy:        {accuracy:.2f}% ‚Üí {accuracy_after:.2f}% (+{accuracy_after - accuracy:.2f}%)")
print(f"   Partial:         {partial_accuracy:.2f}% ‚Üí {partial_accuracy_after:.2f}% (+{partial_accuracy_after - partial_accuracy:.2f}%)")
print(f"   Format:          {format_accuracy:.2f}% ‚Üí {format_accuracy_after:.2f}% (+{format_accuracy_after - format_accuracy:.2f}%)")
print("="*60)

‚úÖ Cleared JAX monitoring callbacks

üìä EVALUATING TRAINED MODEL
‚è≥ This will take 3-5 minutes...



  0%|          | 0/70 [00:00<?, ?it/s]

===> corr=7, total=10, Acc=70.00%, Partial=70.00%, Format=100.00%
===> corr=15, total=20, Acc=75.00%, Partial=75.00%, Format=100.00%
===> corr=21, total=30, Acc=70.00%, Partial=70.00%, Format=96.67%
===> corr=26, total=40, Acc=65.00%, Partial=65.00%, Format=97.50%
===> corr=34, total=50, Acc=68.00%, Partial=68.00%, Format=98.00%
===> corr=39, total=60, Acc=65.00%, Partial=65.00%, Format=96.67%
===> corr=48, total=70, Acc=68.57%, Partial=68.57%, Format=97.14%
===> corr=55, total=80, Acc=68.75%, Partial=68.75%, Format=97.50%
===> corr=62, total=90, Acc=68.89%, Partial=68.89%, Format=97.78%
===> corr=71, total=100, Acc=71.00%, Partial=71.00%, Format=98.00%
===> corr=77, total=110, Acc=70.00%, Partial=70.00%, Format=98.18%
===> corr=84, total=120, Acc=70.00%, Partial=70.00%, Format=98.33%
===> corr=91, total=130, Acc=70.00%, Partial=70.00%, Format=97.69%
===> corr=98, total=140, Acc=70.00%, Partial=70.00%, Format=97.86%

üìà POST-TRAINING RESULTS:
   Correct answers: 98/140
   Accuracy: 7

## üß™ Interactive Testing

Generating responses for sample questions to qualitatively assess the model's learned reasoning patterns.

We test with:
1. Custom DVDs discount problem
2. Original SVAMP examples
3. Various problem types (addition, division, multi-step)

This helps verify the model produces human-readable, mathematically sound reasoning.

In [27]:
print("\n" + "="*60)
print("üß™ TESTING ON SAMPLE QUESTIONS")
print("="*60)

# Sample SVAMP question
sample_question = """Each pack of DVDs costs 76 dollars. If there is a discount of 25 dollars on each pack, how much do you have to pay to buy each pack?"""

print(f"\nüìù Question: {sample_question}")
print("\n‚è≥ Generating answer...\n")

# Generate with trained model
response = generate(
    sample_question,
    trained_sampler,
    **GENERATION_CONFIGS["standard"]
)

print("="*60)
print("ü§ñ MODEL RESPONSE:")
print("="*60)
print(response)
print("="*60)

# Try a few more examples
test_questions = [
    "Rachel had to complete 5 pages of math homework. If she had to complete 4 more pages of reading homework than math homework, how many pages did she have to complete in all?",
    "There were 8 friends playing a video game online when 3 players quit. If each player left had 5 lives, how many lives did they have total?",
    "A farmer has 56 apples. He wants to put them in boxes of 8. How many boxes does he need?",
]

print("\n" + "="*60)
print("üß™ ADDITIONAL TEST QUESTIONS:")
print("="*60)

for i, q in enumerate(test_questions, 1):
    print(f"\n--- Test {i} ---")
    print(f"Q: {q}")
    
    resp = generate(q, trained_sampler, **GENERATION_CONFIGS["greedy"])
    print(f"\nModel response:\n{resp}\n")
    print("-" * 60)


üß™ TESTING ON SAMPLE QUESTIONS

üìù Question: Each pack of DVDs costs 76 dollars. If there is a discount of 25 dollars on each pack, how much do you have to pay to buy each pack?

‚è≥ Generating answer...

ü§ñ MODEL RESPONSE:
Yes, let's solve this problem!

<reasoning>
Step 1: The original price of each pack of DVDs is 76 dollars.
Step 2: There is a discount of 25 dollars on each pack. So the discount amount is 25 dollars.
Step 3: The price after discount is the original price minus the discount amount. Price after discount = 76 - 25 = 51 dollars.
</reasoning>
<answer>51</answer>

üß™ ADDITIONAL TEST QUESTIONS:

--- Test 1 ---
Q: Rachel had to complete 5 pages of math homework. If she had to complete 4 more pages of reading homework than math homework, how many pages did she have to complete in all?

Model response:















------------------------------------------------------------

--- Test 2 ---
Q: There were 8 friends playing a video game online when 3 players quit. I

## üíæ Saving Trained Model

Persisting the fine-tuned LoRA adapters, tokenizer, and training configuration.

**Saved Artifacts**:
- `lora_final/` or `lora_state.pkl`: Trained adapter weights (~2GB)
- `tokenizer/`: Vocabulary and special tokens
- `training_config.pkl`: Hyperparameters for reproducibility

**Usage**: These adapters can be merged with the base Gemma 3 1B model for deployment.

In [28]:
import os
from pathlib import Path

os.environ['WANDB_MODE'] = 'disabled'  # Disable wandb to avoid the error

print("\nüíæ Saving final trained model...")

# Create save directory
save_dir = Path("./trained_models")
save_dir.mkdir(parents=True, exist_ok=True)

try:
    # Save the LoRA policy using orbax
    final_checkpointer = ocp.StandardCheckpointer()
    _, final_state = nnx.split(lora_policy)
    
    # Disable any monitoring callbacks that might trigger wandb
    import jax
    jax.monitoring._scalar_listeners.clear()
    
    final_checkpointer.save(str(save_dir / "lora_final"), final_state)
    final_checkpointer.wait_until_finished()
    
    print(f"‚úÖ Final model saved to {save_dir / 'lora_final'}")
    
    # Save tokenizer
    tokenizer.save_pretrained(str(save_dir / "tokenizer"))
    print(f"‚úÖ Tokenizer saved to {save_dir / 'tokenizer'}")
    
    # Save training config
    import pickle
    config_save_path = save_dir / "training_config.pkl"
    with open(config_save_path, 'wb') as f:
        pickle.dump({
            'grpo_config': grpo_config,
            'cluster_config': cluster_config,
            'hyperparameters': {
                'learning_rate': LEARNING_RATE,
                'num_epochs': NUM_EPOCHS,
                'batch_size': TRAIN_MICRO_BATCH_SIZE,
                'num_generations': NUM_GENERATIONS,
                'beta': BETA,
                'epsilon': EPSILON,
            }
        }, f)
    print(f"‚úÖ Training config saved to {config_save_path}")
    
    print("\n" + "="*60)
    print("üéâ ALL FILES SAVED SUCCESSFULLY!")
    print("="*60)
    print(f"üìÅ Location: {save_dir}")
    print("="*60)
    
except Exception as e:
    print(f"‚ùå Error saving model: {e}")
    print("Trying alternative save method...")
    
    # Alternative: Save just the state dict
    import cloudpickle
    with open(save_dir / "lora_state.pkl", 'wb') as f:
        cloudpickle.dump(final_state, f)
    print(f"‚úÖ Model state saved to {save_dir / 'lora_state.pkl'}")


üíæ Saving final trained model...
‚ùå Error saving model: module 'jax.monitoring' has no attribute '_scalar_listeners'
Trying alternative save method...
‚úÖ Model state saved to trained_models/lora_state.pkl
