# Qwen2.5-Math-7B HuggingFace Inference Pipeline

> **Purpose**: Generate responses from Qwen2.5-Math-7B-Instruct model on 30 linear algebra problems

**Notebook**: `4_qwen_math_7b_hf_inference_and_verification_pipeline.ipynb`  
**Author**: Dr. Shradha's Research Team | **Date**: 2025-12-11 | **Version**: 1.0.0

## Step 1: Install Dependencies

Install required packages for HuggingFace inference.

In [None]:
import pandas as pd
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import re
import time

# ================================================================================
# CONFIGURATION & FILE DEFINITIONS
# ================================================================================

INPUT_FILE = "data/ds2_qna.xlsx"
OUTPUT_FILE = "data/qna_responses_qwen2.5_math_7b.xlsx"
CHECKPOINT_PREFIX = "qwen_math_7b_checkpoint_row"

print("Configuration loaded:")
print(f"  Input:  {INPUT_FILE}")
print(f"  Output: {OUTPUT_FILE}")

## Step 2: Mount Google Drive & Set Working Directory

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
import os
os.chdir('/content/drive/MyDrive/MS&T_Projects/LLM_Research/Exp02-Visual QnA')

In [4]:
# Load the main dataset
df = pd.read_excel(INPUT_FILE)
print(f"Loaded {len(df)} problems")
print("Columns:", list(df.columns))
print('\nSample of id and instruction columns:')
print(df[['id', 'instruction']].head())

Loaded 30 problems
Columns: ['id', 'level', 'category', 'problem_text', 'problem_latex', 'answer_latex', 'instruction']

Sample of id and instruction columns:
                id                                        instruction
0  L3_det_3x3_7420  Solve this linear algebra problem. Show your w...
1  L3_det_3x3_8513  Solve this linear algebra problem. Show your w...
2  L3_det_3x3_1161  Solve this linear algebra problem. Show your w...
3  L3_det_3x3_9684  Solve this linear algebra problem. Show your w...
4     L3_mult_9996  Solve this linear algebra problem. Show your w...


In [5]:
# Load the model and tokenizer
model_name = "Qwen/Qwen2.5-Math-7B-Instruct"
device = "cuda" if torch.cuda.is_available() else "cpu"

print(f"Loading model: {model_name}")
print(f"Device: {device}")

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16 if device == "cuda" else torch.float32,
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

print("Model and tokenizer loaded successfully!")

Loading model: Qwen/Qwen2.5-Math-7B-Instruct
Device: cuda


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/658 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/3.86G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/3.86G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/3.56G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/3.95G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/161 [00:00<?, ?B/s]



tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Model and tokenizer loaded successfully!


In [6]:
def get_qwen_math_response(instruction, model, tokenizer, device):
    """
    Get response from Qwen Math model
    """
    try:
        # Format with Chain of Thought prompting
        messages = [
            {"role": "user", "content": instruction}
        ]

        text = tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True
        )

        model_inputs = tokenizer([text], return_tensors="pt").to(device)

        generated_ids = model.generate(
            **model_inputs,
            max_new_tokens=4000,
            temperature=0,
            pad_token_id=tokenizer.eos_token_id
        )

        # Remove input tokens from generated output
        generated_ids_trimmed = [
            output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
        ]

        response = tokenizer.batch_decode(generated_ids_trimmed, skip_special_tokens=True)[0]

        return response
    except Exception as e:
        return f"Error: {str(e)}"

# Test with an example
example_instruction = df.iloc[0]['instruction']
print("Example instruction:", example_instruction[:200], "...")

# Uncomment to test with example
# example_response = get_qwen_math_response(example_instruction, model, tokenizer, device)
# print("Example response:", example_response[:500], "...")

Example instruction: Solve this linear algebra problem. Show your work and give the final answer.

Find the determinant of the 3×3 matrix A.

A = \begin{bmatrix} 18.25 & 8.75 & -5.50 \\ -12.75 & 0.75 & 18.20 \\ -1.50 & 2. ...


In [None]:
# Process each row and get responses for the Qwen-Math-7B column where it's empty
print(f"Starting to process {len(df)} rows...")

for idx, row in df.iterrows():
    # Check current value in Qwen-Math-7B column
    current_response = row.get('Qwen-Math-7B', pd.NA)
    current_pid = row.get('id', '')

    # Only process if the response is empty/NaN
    if pd.isna(current_response) or current_response == "" or str(current_response) == "nan":
        print(f"Processing row {idx+1}/{len(df)}: {row['id']}")

        instruction = row['instruction']
        response = get_qwen_math_response(instruction, model, tokenizer, device)

        # Add response to dataframe
        df.at[idx, 'Qwen-Math-7B'] = response

        print(f"  Got response ({len(response)} chars): {response[:100]}...")

        # Save periodically to prevent data loss
        if (idx + 1) % 5 == 0:  # Save every 5 rows
            df.to_excel(f'{CHECKPOINT_PREFIX}_{idx+1}.xlsx', index=False)
            print(f"  Saved checkpoint at row {idx+1}")

        # Add a small delay to be respectful to resources
        time.sleep(0.5)
    else:
        print(f"Skipping row {idx+1}/{len(df)}: {row['id']} (already has response)")

print("Completed processing all rows!")

In [None]:
# Save the final dataframe with Qwen-Math responses
df.to_excel(OUTPUT_FILE, index=False)
print(f"Final dataset saved to {OUTPUT_FILE}!")

In [None]:
# Verify the results
print('Final results summary:')
for idx, row in df.iterrows():
    qwen_resp = row.get('Qwen-Math_7B', pd.NA)
    if not pd.isna(qwen_resp) and qwen_resp != "" and str(qwen_resp) != "nan":
        print(f"{row['id']}: Response length = {len(str(qwen_resp))}")
    else:
        print(f"{row['id']}: NO RESPONSE")

In [None]:
# Verify the results
print('Final results summary:')
for idx, row in df.iterrows():
    qwen_resp = row.get('Qwen-Math-7B', pd.NA)
    if not pd.isna(qwen_resp) and qwen_resp != "" and str(qwen_resp) != "nan":
        print(f"{row['id']}: Response length = {len(str(qwen_resp))}")
    else:
        print(f"{row['id']}: NO RESPONSE")