### 🏆 SQuAD v2: T5 finetuning  

The **Stanford Question Answering Dataset v2 (SQuAD v2)** is a benchmark dataset that pushes machine reading comprehension to the next level! It builds upon **SQuAD v1.1** by introducing a crucial twist—**unanswerable questions**.  

📖 **Key Features:**  
- Over **150,000** question-answer pairs based on Wikipedia articles.  
- Includes **impossible questions**, where no answer exists in the given passage.  
- Designed to test **both answer extraction and rejection** skills in NLP models.  

🤖 **Why It Matters:**  
- Helps evaluate how well models distinguish between **answerable** and **unanswerable** questions.  
- Encourages the development of **more reliable** and **human-like** QA systems. 



This notebook demonstrates how to fine-tune and evaluate the T5 model on the SQuAD v2 dataset. We will:

- Understand the SQuAD v2 dataset
- Preprocess and tokenize the data
- Evaluate the pre-trained T5 model
- Fine-tune T5 on SQuAD v2
- Re-evaluate after fine-tuning

In [2]:
#Import necessary libraries
from datasets import load_dataset
import pandas as pd
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration, TrainingArguments, Trainer
import numpy as np
pd.set_option('display.max_colwidth', None)  # Show full text in columns
pd.set_option('display.max_columns', None)  # Show all columns

  from .autonotebook import tqdm as notebook_tqdm


#### Loading the dataset and Analysis

In [3]:
# Load the SQuAD v2 dataset
dataset = load_dataset("squad_v2")

# Dataset Analysis
print("Dataset Structure:")
print(dataset)

# Example data
print("\nExample Data:")
df = pd.DataFrame(dataset['train'])
# Shuffle the DataFrame
df_shuffled = df.sample(frac=1, random_state=42).reset_index(drop=True)
df.head(20)



Dataset Structure:
DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 130319
    })
    validation: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 11873
    })
})

Example Data:


Unnamed: 0,id,title,context,question,answers
0,56be85543aeaaa14008c9063,Beyoncé,"Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles ""Crazy in Love"" and ""Baby Boy"".",When did Beyonce start becoming popular?,"{'text': ['in the late 1990s'], 'answer_start': [269]}"
1,56be85543aeaaa14008c9065,Beyoncé,"Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles ""Crazy in Love"" and ""Baby Boy"".",What areas did Beyonce compete in when she was growing up?,"{'text': ['singing and dancing'], 'answer_start': [207]}"
2,56be85543aeaaa14008c9066,Beyoncé,"Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles ""Crazy in Love"" and ""Baby Boy"".",When did Beyonce leave Destiny's Child and become a solo singer?,"{'text': ['2003'], 'answer_start': [526]}"
3,56bf6b0f3aeaaa14008c9601,Beyoncé,"Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles ""Crazy in Love"" and ""Baby Boy"".",In what city and state did Beyonce grow up?,"{'text': ['Houston, Texas'], 'answer_start': [166]}"
4,56bf6b0f3aeaaa14008c9602,Beyoncé,"Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles ""Crazy in Love"" and ""Baby Boy"".",In which decade did Beyonce become famous?,"{'text': ['late 1990s'], 'answer_start': [276]}"
5,56bf6b0f3aeaaa14008c9603,Beyoncé,"Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles ""Crazy in Love"" and ""Baby Boy"".",In what R&B group was she the lead singer?,"{'text': ['Destiny's Child'], 'answer_start': [320]}"
6,56bf6b0f3aeaaa14008c9604,Beyoncé,"Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles ""Crazy in Love"" and ""Baby Boy"".",What album made her a worldwide known artist?,"{'text': ['Dangerously in Love'], 'answer_start': [505]}"
7,56bf6b0f3aeaaa14008c9605,Beyoncé,"Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles ""Crazy in Love"" and ""Baby Boy"".",Who managed the Destiny's Child group?,"{'text': ['Mathew Knowles'], 'answer_start': [360]}"
8,56d43c5f2ccc5a1400d830a9,Beyoncé,"Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles ""Crazy in Love"" and ""Baby Boy"".",When did Beyoncé rise to fame?,"{'text': ['late 1990s'], 'answer_start': [276]}"
9,56d43c5f2ccc5a1400d830aa,Beyoncé,"Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles ""Crazy in Love"" and ""Baby Boy"".",What role did Beyoncé have in Destiny's Child?,"{'text': ['lead singer'], 'answer_start': [290]}"


In [4]:
# Function to check if 'text' is empty
def is_text_empty(answer):
    #print(answer['text'])
    return answer['text'] == []

# Filter rows where 'text' is empty
empty_text_rows = df[df['answers'].apply(is_text_empty)]

empty_text_rows.head()

Unnamed: 0,id,title,context,question,answers
2075,5a8d7bf7df8bba001a0f9ab1,The_Legend_of_Zelda:_Twilight_Princess,"The Legend of Zelda: Twilight Princess (Japanese: ゼルダの伝説 トワイライトプリンセス, Hepburn: Zeruda no Densetsu: Towairaito Purinsesu?) is an action-adventure game developed and published by Nintendo for the GameCube and Wii home video game consoles. It is the thirteenth installment in the The Legend of Zelda series. Originally planned for release on the GameCube in November 2005, Twilight Princess was delayed by Nintendo to allow its developers to refine the game, add more content, and port it to the Wii. The Wii version was released alongside the console in North America in November 2006, and in Japan, Europe, and Australia the following month. The GameCube version was released worldwide in December 2006.[b]",What category of game is Legend of Zelda: Australia Twilight?,"{'text': [], 'answer_start': []}"
2076,5a8d7bf7df8bba001a0f9ab2,The_Legend_of_Zelda:_Twilight_Princess,"The Legend of Zelda: Twilight Princess (Japanese: ゼルダの伝説 トワイライトプリンセス, Hepburn: Zeruda no Densetsu: Towairaito Purinsesu?) is an action-adventure game developed and published by Nintendo for the GameCube and Wii home video game consoles. It is the thirteenth installment in the The Legend of Zelda series. Originally planned for release on the GameCube in November 2005, Twilight Princess was delayed by Nintendo to allow its developers to refine the game, add more content, and port it to the Wii. The Wii version was released alongside the console in North America in November 2006, and in Japan, Europe, and Australia the following month. The GameCube version was released worldwide in December 2006.[b]",What consoles can be used to play Australia Twilight?,"{'text': [], 'answer_start': []}"
2077,5a8d7bf7df8bba001a0f9ab3,The_Legend_of_Zelda:_Twilight_Princess,"The Legend of Zelda: Twilight Princess (Japanese: ゼルダの伝説 トワイライトプリンセス, Hepburn: Zeruda no Densetsu: Towairaito Purinsesu?) is an action-adventure game developed and published by Nintendo for the GameCube and Wii home video game consoles. It is the thirteenth installment in the The Legend of Zelda series. Originally planned for release on the GameCube in November 2005, Twilight Princess was delayed by Nintendo to allow its developers to refine the game, add more content, and port it to the Wii. The Wii version was released alongside the console in North America in November 2006, and in Japan, Europe, and Australia the following month. The GameCube version was released worldwide in December 2006.[b]",When was Australia Twilight launched in North America?,"{'text': [], 'answer_start': []}"
2078,5a8d7bf7df8bba001a0f9ab4,The_Legend_of_Zelda:_Twilight_Princess,"The Legend of Zelda: Twilight Princess (Japanese: ゼルダの伝説 トワイライトプリンセス, Hepburn: Zeruda no Densetsu: Towairaito Purinsesu?) is an action-adventure game developed and published by Nintendo for the GameCube and Wii home video game consoles. It is the thirteenth installment in the The Legend of Zelda series. Originally planned for release on the GameCube in November 2005, Twilight Princess was delayed by Nintendo to allow its developers to refine the game, add more content, and port it to the Wii. The Wii version was released alongside the console in North America in November 2006, and in Japan, Europe, and Australia the following month. The GameCube version was released worldwide in December 2006.[b]",When could GameCube owners purchase Australian Princess?,"{'text': [], 'answer_start': []}"
2079,5a8d7bf7df8bba001a0f9ab5,The_Legend_of_Zelda:_Twilight_Princess,"The Legend of Zelda: Twilight Princess (Japanese: ゼルダの伝説 トワイライトプリンセス, Hepburn: Zeruda no Densetsu: Towairaito Purinsesu?) is an action-adventure game developed and published by Nintendo for the GameCube and Wii home video game consoles. It is the thirteenth installment in the The Legend of Zelda series. Originally planned for release on the GameCube in November 2005, Twilight Princess was delayed by Nintendo to allow its developers to refine the game, add more content, and port it to the Wii. The Wii version was released alongside the console in North America in November 2006, and in Japan, Europe, and Australia the following month. The GameCube version was released worldwide in December 2006.[b]",What year was the Legend of Zelda: Australian Princess originally planned for release?,"{'text': [], 'answer_start': []}"


#### Loading the pre-trained T5-small Model and the Tokenizer

In [None]:
# Load the pre-trained T5 model and tokenizer
#pip install sentecepiece
model_name = "t5-small"  # Can be changed to "t5-base" or "t5-large" for better performance
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)



You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


#### Preprocess and tokenize Dataset

In [6]:
# Define a function to preprocess the dataset for T5
def preprocess_function(example):
    input_text = f"question: {example['question']} context: {example['context']}"
    target_text = example["answers"]["text"][0] if example["answers"]["text"] else ""
    
    return {
        "input_ids": tokenizer(input_text, truncation=True, padding="max_length", max_length=512)["input_ids"],
        "labels": tokenizer(target_text, truncation=True, padding="max_length", max_length=50)["input_ids"],
    }

In [7]:
# Tokenize the dataset
tokenized_datasets = dataset.map(preprocess_function, remove_columns=["id", "title", "context", "question", "answers"])
print("\nExample Tokenized Data:")
df = pd.DataFrame(tokenized_datasets['train'])
print(len(df))
df.head(1)


Map:   0%|          | 0/130319 [00:00<?, ? examples/s]

Map: 100%|██████████| 130319/130319 [05:34<00:00, 389.38 examples/s]
Map: 100%|██████████| 11873/11873 [00:29<00:00, 409.33 examples/s]



Example Tokenized Data:
130319


Unnamed: 0,input_ids,labels
0,"[822, 10, 366, 410, 493, 63, 14549, 456, 2852, 1012, 58, 2625, 10, 493, 63, 106, 75, 154, 3156, 7, 693, 8900, 965, 18, 6936, 449, 41, 87, 115, 23, 2, 354, 2, 29, 7, 15, 2, 87, 36, 15, 18, 476, 4170, 18, 8735, 61, 41, 7473, 1600, 6464, 15465, 61, 19, 46, 797, 7634, 6, 3, 21101, 6, 1368, 8211, 11, 15676, 5, 12896, 11, 3279, 16, 8018, 6, 2514, 6, 255, 3032, 16, 796, 8782, 11, 10410, 2259, 7, 38, 3, 9, 861, 6, 11, 4659, 12, 10393, 16, 8, 1480, 5541, 7, 38, 991, 7634, 13, ...]","[16, 8, 1480, 5541, 7, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]"


#### Evaluate pre-trained T5-small Model

In [8]:
# Define evaluation functions (Exact Match & F1 Score)
def exact_match(prediction, ground_truth):
    return int(prediction.strip().lower() == ground_truth.strip().lower())

def f1_score(prediction, ground_truth):
    pred_tokens = prediction.lower().split()
    gt_tokens = ground_truth.lower().split()
    common = set(pred_tokens) & set(gt_tokens)
    if not common:
        return 0.0
    precision = len(common) / len(pred_tokens)
    recall = len(common) / len(gt_tokens)
    return 2 * (precision * recall) / (precision + recall)


In [9]:
# Function to evaluate model on SQuAD v2
def evaluate_model(model, dataset, num_samples=100):
    predictions, references, em_scores, f1_scores = [], [], [], []
    
    for i in range(num_samples):
        sample = dataset[i]
        input_text = f"question: {sample['question']} context: {sample['context']}"
        input_ids = tokenizer(input_text, return_tensors="pt").input_ids
        
        output_ids = model.generate(input_ids, max_length=50)
        generated_answer = tokenizer.decode(output_ids[0], skip_special_tokens=True)
        
        actual_answers = sample["answers"]["text"]
        reference_answer = actual_answers[0] if actual_answers else ""
        
        em = exact_match(generated_answer, reference_answer)
        f1 = f1_score(generated_answer, reference_answer)
        
        em_scores.append(em)
        f1_scores.append(f1)

        print(f"\nContext: {sample['context']}")
        print(f"\nQuestion: {sample['question']}")
        print(f"Predicted Answer: {generated_answer}")
        print(f"Actual Answer: {reference_answer}")
        print(f"EM: {em}, F1: {f1:.2f}")

    avg_em = np.mean(em_scores) * 100
    avg_f1 = np.mean(f1_scores) * 100

    print("\nFinal Evaluation Results:")
    print(f"Exact Match (EM): {avg_em:.2f}%")
    print(f"F1 Score: {avg_f1:.2f}%")
    
    return {"EM": avg_em, "F1": avg_f1}

In [9]:
# Evaluate pre-trained T5 model on SQuAD v2
print("\nEvaluating Pre-trained T5 Model...")
evaluate_model(model, list(dataset["validation"])[:100])


Evaluating Pre-trained T5 Model...



Context: The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse ("Norman" comes from "Norseman") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants would gradually merge with the Carolingian-based cultures of West Francia. The distinct cultural and ethnic identity of the Normans emerged initially in the first half of the 10th century, and it continued to evolve over the succeeding centuries.

Question: In what country is Normandy located?
Predicted Answer: France
Actual Answer: France
EM: 1, F1: 1.00

Context: The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave the

{'EM': np.float64(28.000000000000004), 'F1': np.float64(34.04446078431372)}

#### Finetuning T5-small Model on SQuAD v2 Train Split

In [None]:
## Define training arguments for fine-tuning
training_args = TrainingArguments(
    output_dir="./results_squadv2_t5",
    evaluation_strategy="steps",
    eval_steps=500,
    learning_rate=3e-4,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    save_total_limit=3,
    save_strategy="steps",
    save_steps=500,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
)

## Initialize Trainer for fine-tuning
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    tokenizer=tokenizer,
)

## Fine-tune the T5 model
print("\nFine-tuning the T5 Model...")
trainer.train()
trainer.save_model("./results_squadv2_ft/best_model")

  trainer = Trainer(



Fine-tuning the T5 Model...


Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.


Step,Training Loss,Validation Loss
500,0.0946,0.045541
1000,0.0497,0.045651
1500,0.0479,0.043212
2000,0.0453,0.042592
2500,0.0449,0.039167
3000,0.0451,0.038881
3500,0.0435,0.037389
4000,0.0429,0.038354
4500,0.0379,0.037036
5000,0.0364,0.03847


### Evaluate fine-tuned T5-small model on SQuAD v2

In [10]:
# ## Evaluate fine-tuned T5 model on SQuAD v2
model = T5ForConditionalGeneration.from_pretrained("T5_model/checkpoint-12000")
print("\nEvaluating Fine-tuned T5 Model...")
evaluate_model(model, list(dataset["validation"])[:100]) 



Evaluating Fine-tuned T5 Model...

Context: The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse ("Norman" comes from "Norseman") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants would gradually merge with the Carolingian-based cultures of West Francia. The distinct cultural and ethnic identity of the Normans emerged initially in the first half of the 10th century, and it continued to evolve over the succeeding centuries.

Question: In what country is Normandy located?
Predicted Answer: France
Actual Answer: France
EM: 1, F1: 1.00

Context: The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in t

{'EM': 57.99999999999999, 'F1': 34.45331871345029}

#### Exact Match score improved to 58% from 28% with finetuning!