To run the fine-tuned models efficiently on a Google Colab notebook with an **A100 GPU**, you can follow these steps.

load and explore the `SciQ` Dataset, which is a question-answering dataset for science-related multiple-choice questions.

In [None]:


from datasets import load_dataset


print("Loading the SciQ dataset...")
dataset = load_dataset("sciq")


print(f"Number of training examples: {len(dataset['train'])}")
print(f"Number of validation examples: {len(dataset['validation'])}")

print("\nSample training example:")
print(dataset['train'][0])

Loading the SciQ dataset...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/7.02k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/3.99M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/339k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/343k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/11679 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Number of training examples: 11679
Number of validation examples: 1000

Sample training example:
{'question': 'What type of organism is commonly used in preparation of foods such as cheese and yogurt?', 'distractor3': 'viruses', 'distractor1': 'protozoa', 'distractor2': 'gymnosperms', 'correct_answer': 'mesophilic organisms', 'support': 'Mesophiles grow best in moderate temperature, typically between 25°C and 40°C (77°F and 104°F). Mesophiles are often found living in or on the bodies of humans or other animals. The optimal growth temperature of many pathogenic mesophiles is 37°C (98°F), the normal human body temperature. Mesophilic organisms have important uses in food preparation, including cheese, yogurt, beer and wine.'}


format the `SciQ` dataset in a way that makes it easier to work with in machine learning tasks, such as training a model for question answering (QA). Specifically, it converts the dataset into a list of dictionaries, where each dictionary contains an input and output key.

In [None]:


def format_data(split):
    """
    Formats the dataset into a list of dictionaries with 'input' and 'output' keys.
    The 'input' combines the question and context.
    """
    formatted = []
    for ex in split:
        # Some examples might not have 'support'; handle such cases
        context = ex.get("support", "")
        formatted.append({
            "input": f"Question: {ex['question']} Context: {context}",
            "output": ex["correct_answer"]
        })
    return formatted


print("Formatting the data...")
train_data = format_data(dataset["train"])
valid_data = format_data(dataset["validation"])


print(f"Number of formatted training examples: {len(train_data)}")
print(f"Number of formatted validation examples: {len(valid_data)}")

# show a formatted training example
print("\nSample formatted training example:")
print(train_data[0])

Formatting the data...
Number of formatted training examples: 11679
Number of formatted validation examples: 1000

Sample formatted training example:
{'input': 'Question: What type of organism is commonly used in preparation of foods such as cheese and yogurt? Context: Mesophiles grow best in moderate temperature, typically between 25°C and 40°C (77°F and 104°F). Mesophiles are often found living in or on the bodies of humans or other animals. The optimal growth temperature of many pathogenic mesophiles is 37°C (98°F), the normal human body temperature. Mesophilic organisms have important uses in food preparation, including cheese, yogurt, beer and wine.', 'output': 'mesophilic organisms'}


converts the formatted training and validation data from the previous step into the Hugging Face `Dataset` format. The Hugging Face `datasets` library provides an easy-to-use interface for working with datasets, which can be directly used for training machine learning models.

In [None]:

from datasets import Dataset

print("Converting formatted data to Hugging Face Dataset format...")
train_dataset = Dataset.from_list(train_data)
valid_dataset = Dataset.from_list(valid_data)


print(f"Converted training dataset has {len(train_dataset)} examples.")
print(f"Converted validation dataset has {len(valid_dataset)} examples.")


print("\nSample from the converted training dataset:")
print(train_dataset[0])

Converting formatted data to Hugging Face Dataset format...
Converted training dataset has 11679 examples.
Converted validation dataset has 1000 examples.

Sample from the converted training dataset:
{'input': 'Question: What type of organism is commonly used in preparation of foods such as cheese and yogurt? Context: Mesophiles grow best in moderate temperature, typically between 25°C and 40°C (77°F and 104°F). Mesophiles are often found living in or on the bodies of humans or other animals. The optimal growth temperature of many pathogenic mesophiles is 37°C (98°F), the normal human body temperature. Mesophilic organisms have important uses in food preparation, including cheese, yogurt, beer and wine.', 'output': 'mesophilic organisms'}


loads the T5 model (`t5-base`) and its tokenizer from Hugging Face's model hub. It automatically detects whether a GPU (CUDA) is available and moves the model to the appropriate device (GPU or CPU). This setup prepares the model for text generation tasks, such as question answering or summarization, using the loaded model and tokenizer.

In [None]:


from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

# Check device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")


model_name = "t5-base"  
print(f"Loading tokenizer and model: {model_name}")
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)


model.to(device)
print("Tokenizer and model loaded and moved to device successfully.")

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

Using device: cuda
Loading tokenizer and model: t5-base


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Tokenizer and model loaded and moved to device successfully.


preprocesses the training and validation datasets by tokenizing the input (question + context) and output (correct answer) using the T5 tokenizer. The input is truncated to a maximum of 512 tokens, and the output is truncated to 64 tokens. The tokenized outputs are assigned as labels for the inputs. Finally, the original `"input"` and `"output"` columns are removed, preparing the datasets for model training.

In [None]:


def preprocess_data(examples):
    """
    Tokenizes the inputs and outputs.
    """
    
    inputs = tokenizer(
        examples["input"],
        max_length=512,
        truncation=True,
        padding="max_length"
    )

    
    targets = tokenizer(
        examples["output"],
        max_length=64,
        truncation=True,
        padding="max_length"
    )

    
    inputs["labels"] = targets["input_ids"]

    return inputs

print("Preprocessing the training dataset...")
train_dataset = train_dataset.map(preprocess_data, batched=True, remove_columns=["input", "output"])
print("Preprocessing the validation dataset...")
valid_dataset = valid_dataset.map(preprocess_data, batched=True, remove_columns=["input", "output"])


print("\nSample preprocessed training example:")
print(train_dataset[0])

Preprocessing the training dataset...


Map:   0%|          | 0/11679 [00:00<?, ? examples/s]

Preprocessing the validation dataset...


Map:   0%|          | 0/1000 [00:00<?, ? examples/s]


Sample preprocessed training example:
{'input_ids': [11860, 10, 363, 686, 13, 9329, 19, 5871, 261, 16, 4537, 13, 4371, 224, 38, 3285, 11, 19168, 58, 1193, 6327, 10, 10162, 21144, 15, 7, 1604, 200, 16, 8107, 2912, 6, 3115, 344, 944, 1956, 254, 11, 1283, 1956, 254, 41, 4013, 1956, 371, 11, 3, 15442, 1956, 371, 137, 10162, 21144, 15, 7, 33, 557, 435, 840, 16, 42, 30, 8, 5678, 13, 6917, 42, 119, 3127, 5, 37, 6624, 1170, 2912, 13, 186, 2071, 20853, 140, 7, 21144, 15, 7, 19, 6862, 1956, 254, 41, 3916, 1956, 371, 201, 8, 1389, 936, 643, 2912, 5, 10162, 21144, 447, 9329, 7, 43, 359, 2284, 16, 542, 4537, 6, 379, 3285, 6, 19168, 6, 6061, 11, 2013, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

sets up the training arguments for the model using the `TrainingArguments` class from Hugging Face. It specifies various training configurations such as the output directory, evaluation and saving strategies, learning rate, batch sizes, number of epochs, weight decay, and logging settings. Additionally, it configures the model to load the best version based on the evaluation loss. These settings ensure efficient training and checkpoint management.

In [None]:
# Step 6: Set Up Training Arguments

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",              # Output directory
    evaluation_strategy="epoch",         # Evaluation is done at the end of each epoch
    save_strategy="epoch",               # Save checkpoint at the end of each epoch
    learning_rate=5e-5,                  # Learning rate
    per_device_train_batch_size=8,       # Batch size for training
    per_device_eval_batch_size=8,        # Batch size for evaluation
    num_train_epochs=3,                  # Total number of training epochs
    weight_decay=0.01,                   # Weight decay
    save_total_limit=2,                  # Limit the total amount of checkpoints
    logging_dir="./logs",                # Directory for storing logs
    logging_steps=10,                    # Log every 10 steps
    load_best_model_at_end=True,         # Load the best model when finished training
    metric_for_best_model="loss",        # Use loss to evaluate the best model
    # fp16=True,                         # Use mixed precision if your GPU supports it
)

print("Training arguments set up successfully.")



Training arguments set up successfully.


initializes the Trainer object from Hugging Face's `transformers` library, which simplifies the process of training a model. It sets the model, training arguments, and datasets (training and validation) for the trainer. The `Trainer` handles the training loop, evaluation, and model saving. The successful initialization of the trainer is confirmed with a print statement.

In [None]:
from transformers import Trainer

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=valid_dataset,
)

print("Trainer initialized successfully.")

Trainer initialized successfully.


begins the fine-tuning process of the model by calling the `train()` method on the `Trainer` object. It starts the training loop using the specified training arguments, datasets, and model. Once the training is completed, a confirmation message is printed. This step fine-tunes the pre-trained model on your specific task using the provided data.

In [None]:

print("Starting training...")
trainer.train()
print("Training completed successfully.")



Starting training...


[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.


Epoch,Training Loss,Validation Loss
1,0.0319,0.032708
2,0.0199,0.031204
3,0.0362,0.031495


There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight'].


Training completed successfully.


saves the fine-tuned model and its tokenizer to a specified directory (`fine_tuned_t5_sciq`). It uses the `save_pretrained()` method to store both the model and tokenizer. This allows you to easily load and use the model later without needing to fine-tune it again. A confirmation message is printed once the saving process is completed successfully.

In [None]:
# save the Fine-Tuned Model
fine_tuned_model_dir = "fine_tuned_t5_sciq"
print(f"Saving the fine-tuned model to {fine_tuned_model_dir}...")
model.save_pretrained(fine_tuned_model_dir)
tokenizer.save_pretrained(fine_tuned_model_dir)
print("Fine-tuned model and tokenizer saved successfully.")

Saving the fine-tuned model to fine_tuned_t5_sciq...
Fine-tuned model and tokenizer saved successfully.


sets up Dense Passage Retrieval (DPR) using the DPR Question Encoder and DPR Context Encoder from Hugging Face, along with FAISS for efficient similarity search. It initializes the DPR models, encodes the training contexts in batches, and stores them in a FAISS index for fast retrieval based on cosine similarity.


*  **DPR Models Initialization**: The DPR models for question and context encoding are loaded from Hugging Face's model hub and moved to the appropriate device (GPU/CPU).
*   **Context Preparation**: The training contexts (from the `support` field in the dataset) are prepared for encoding

*   **Context Encoding**: A function `encode_contexts` is defined to process the contexts in batches, minimizing GPU memory usage. The contexts are then encoded into embeddings using the DPR Context Encoder.
*   **FAISS Setup**: The FAISS index is initialized with an inner product (cosine similarity) method, and the encoded context embeddings are added to the index for fast retrieval.





In [None]:
from transformers import DPRQuestionEncoder, DPRContextEncoder, DPRQuestionEncoderTokenizer, DPRContextEncoderTokenizer
import faiss
import numpy as np
import torch
import gc


print("Initializing DPR models using transformers...")


q_tokenizer = DPRQuestionEncoderTokenizer.from_pretrained("facebook/dpr-question_encoder-single-nq-base")
q_model = DPRQuestionEncoder.from_pretrained("facebook/dpr-question_encoder-single-nq-base").to(device)


c_tokenizer = DPRContextEncoderTokenizer.from_pretrained("facebook/dpr-ctx_encoder-single-nq-base")
c_model = DPRContextEncoder.from_pretrained("facebook/dpr-ctx_encoder-single-nq-base").to(device)

print("DPR models initialized and moved to device.")

print("Preparing contexts for encoding...")
contexts = [ex.get("support", "") for ex in dataset["train"]]
print(f"Number of contexts to embed: {len(contexts)}")


def encode_contexts(encoder, tokenizer, texts, batch_size=64):
    """
    Encodes texts in batches to manage GPU memory.

    Args:
        encoder: The DPR context encoder.
        tokenizer: The corresponding tokenizer.
        texts (list): List of context strings to encode.
        batch_size (int): Number of contexts per batch.

    Returns:
        numpy.ndarray: Array of normalized context embeddings.
    """
    embeddings = []
    num_texts = len(texts)
    for start_idx in range(0, num_texts, batch_size):
        end_idx = min(start_idx + batch_size, num_texts)
        batch_texts = texts[start_idx:end_idx]
        inputs = tokenizer(
            batch_texts,
            return_tensors='pt',
            padding=True,
            truncation=True,
            max_length=512
        ).to(device)
        with torch.no_grad():
            batch_embeddings = encoder(**inputs).pooler_output  
            batch_embeddings = torch.nn.functional.normalize(batch_embeddings, p=2, dim=1)  
            embeddings.append(batch_embeddings.cpu().numpy())
        print(f"Encoded contexts {start_idx} to {end_idx} (Batch size: {len(batch_texts)})")
    return np.vstack(embeddings)


print("Clearing GPU memory...")
torch.cuda.empty_cache()
gc.collect()


print("Encoding all contexts with reduced batch size...")
context_embeddings_np = encode_contexts(c_model, c_tokenizer, contexts, batch_size=64)
print("All contexts encoded successfully.")


embedding_dim = context_embeddings_np.shape[1]
faiss_index = faiss.IndexFlatIP(embedding_dim)
print("Adding embeddings to the FAISS index...")
faiss_index.add(context_embeddings_np)
print(f"FAISS index contains {faiss_index.ntotal} vectors.")

Initializing DPR models using transformers...


tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/493 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Some weights of the model checkpoint at facebook/dpr-question_encoder-single-nq-base were not used when initializing DPRQuestionEncoder: ['question_encoder.bert_model.pooler.dense.bias', 'question_encoder.bert_model.pooler.dense.weight']
- This IS expected if you are initializing DPRQuestionEncoder from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DPRQuestionEncoder from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/492 [00:00<?, ?B/s]

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'DPRQuestionEncoderTokenizer'. 
The class this function is called from is 'DPRContextEncoderTokenizer'.


pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Some weights of the model checkpoint at facebook/dpr-ctx_encoder-single-nq-base were not used when initializing DPRContextEncoder: ['ctx_encoder.bert_model.pooler.dense.bias', 'ctx_encoder.bert_model.pooler.dense.weight']
- This IS expected if you are initializing DPRContextEncoder from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DPRContextEncoder from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


DPR models initialized and moved to device.
Preparing contexts for encoding...
Number of contexts to embed: 11679
Clearing GPU memory...
Encoding all contexts with reduced batch size...
Encoded contexts 0 to 64 (Batch size: 64)
Encoded contexts 64 to 128 (Batch size: 64)
Encoded contexts 128 to 192 (Batch size: 64)
Encoded contexts 192 to 256 (Batch size: 64)
Encoded contexts 256 to 320 (Batch size: 64)
Encoded contexts 320 to 384 (Batch size: 64)
Encoded contexts 384 to 448 (Batch size: 64)
Encoded contexts 448 to 512 (Batch size: 64)
Encoded contexts 512 to 576 (Batch size: 64)
Encoded contexts 576 to 640 (Batch size: 64)
Encoded contexts 640 to 704 (Batch size: 64)
Encoded contexts 704 to 768 (Batch size: 64)
Encoded contexts 768 to 832 (Batch size: 64)
Encoded contexts 832 to 896 (Batch size: 64)
Encoded contexts 896 to 960 (Batch size: 64)
Encoded contexts 960 to 1024 (Batch size: 64)
Encoded contexts 1024 to 1088 (Batch size: 64)
Encoded contexts 1088 to 1152 (Batch size: 64)
Enc

defines a RAG (Retrieval-Augmented Generation) + ReAct pipeline, which combines retrieval-based and generation-based methods to answer questions. The pipeline first attempts to generate an initial answer using a fine-tuned T5 model. If the answer is deemed insufficient (e.g., too short or containing uncertainty phrases), it retrieves relevant context using the DPR question encoder and FAISS index, and then generates a refined answer by incorporating the retrieved context.

Steps in the pipeline:


*   **Initial Answer Generation**: The question is passed through the T5 model to generate an initial answer.
*   **Answer Quality Assessment**: The answer is evaluated based on length and certain keywords (e.g., "I don't know"), and if insufficient, further steps are taken.
*   **Context Retrieva**l: The question is encoded using the DPR question encoder, and the FAISS index is used to retrieve the top k most relevant contexts.
*   **Refined Answer Generation**: The retrieved contexts are combined with the question to generate a more refined answer using the T5 model.

In [None]:
def rag_react_pipeline(question, model, tokenizer, q_model, q_tokenizer, faiss_index, contexts, top_k=3):
    """
    Combines RAG with ReAct to answer questions using retrieval and generation.

    Args:
        question (str): The input question.
        model: The fine-tuned T5 model.
        tokenizer: The T5 tokenizer.
        q_model: The DPR question encoder.
        q_tokenizer: The DPR question encoder tokenizer.
        faiss_index: The FAISS index containing context embeddings.
        contexts (list): List of context strings.
        top_k (int): Number of top contexts to retrieve.

    Returns:
        str: The final answer.
    """
    print(f"\nQuestion: {question}")

   
    input_text = f"Question: {question}"
    input_ids = tokenizer.encode(
        input_text,
        return_tensors="pt",
        truncation=True,
        max_length=512
    ).to(device)

    with torch.no_grad():
        outputs = model.generate(input_ids, max_length=128)

    initial_answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"Initial Answer: {initial_answer}")

    
    insufficient = False
    
    if len(initial_answer.strip()) < 10 or "i don't know" in initial_answer.lower() or "cannot" in initial_answer.lower():
        insufficient = True

    if insufficient:
        print("Initial answer is insufficient. Retrieving relevant context...")


        q_inputs = q_tokenizer(
            question,
            return_tensors='pt',
            truncation=True,
            max_length=512
        ).to(device)
        with torch.no_grad():
            question_embedding = q_model(**q_inputs).pooler_output  # Shape: [1, hidden_size]
            question_embedding = torch.nn.functional.normalize(question_embedding, p=2, dim=1)


        question_embedding_np = question_embedding.cpu().numpy()

 
        D, I = faiss_index.search(question_embedding_np, top_k)
        retrieved_contexts = [contexts[idx] for idx in I[0]]
        retrieved_context = " ".join(retrieved_contexts)
        print(f"Retrieved Context: {retrieved_context}")


        refined_input = f"Question: {question} Context: {retrieved_context}"
        refined_input_ids = tokenizer.encode(
            refined_input,
            return_tensors="pt",
            truncation=True,
            max_length=512
        ).to(device)

        with torch.no_grad():
            refined_outputs = model.generate(refined_input_ids, max_length=128)

        refined_answer = tokenizer.decode(refined_outputs[0], skip_special_tokens=True)
        print(f"Refined Answer: {refined_answer}")
        return refined_answer
    else:
        return initial_answer

tests the RAG + ReAct pipeline with a set of sample questions. It iterates through a list of test questions, applying the pipeline to each one. The pipeline first generates an initial answer, and if needed, it retrieves relevant context using DPR and FAISS to refine the answer. The final answer is printed for each question.


*   Sample Questions: A list of test questions is defined.
*   Pipeline Testing: The `rag_react_pipeline` function is called for each question, passing in the model, tokenizer, DPR encoders, FAISS index, and contexts.
*   Final Answer Output: The final answer (either initial or refined) for each question is printed






In [None]:
test_questions = [
    "What is the speed of light?",
    "Why is the sky blue?",
    "Explain Newton's first law of motion."
]


for q in test_questions:
    answer = rag_react_pipeline(
        question=q,
        model=model,
        tokenizer=tokenizer,
        q_model=q_model,
        q_tokenizer=q_tokenizer,
        faiss_index=faiss_index,
        contexts=contexts,
        top_k=3 
    )
    print(f"Final Answer: {answer}\n")


Question: What is the speed of light?
Initial Answer: f/cm
Initial answer is insufficient. Retrieving relevant context...
Retrieved Context: The speed of light is different in different media. No physical object can travel faster than the speed of light in a vacuum. (Maximum speed is finite). Speed may be constant, but often it varies from moment to moment. Speed at any given instant is called instantaneous speed. It is much more difficult to calculate than average speed.
Refined Answer: different in different media
Final Answer: different in different media


Question: Why is the sky blue?
Initial Answer: the sun
Initial answer is insufficient. Retrieving relevant context...
Retrieved Context: The human eye can distinguish only red, green, and blue light. These three colors are the primary colors of light. All other colors of light can be created by combining the primary colors. Secondary colors of light—cyan, yellow, and magenta—form when two primary colors combine equally. Clouds b

creates a ZIP archive of the fine-tuned model directory (`fine_tuned_t5_sciq`) using Python's shutil library. It specifies the directory to be archived and then creates the zip file. After the zip file is created, a confirmation message is printed.


*   Directory Specification: The model directory to be zipped is defined (`model_dir = "fine_tuned_t5_sciq"`).
*   Creating the Archive: `shutil.make_archive()` creates a ZIP file of the directory.
*   Confirmation: A message is printed indicating that the model directory has been zipped successfully.





In [None]:
import shutil


model_dir = "fine_tuned_t5_sciq"


shutil.make_archive(model_dir, 'zip', model_dir)

print(f"Model directory '{model_dir}' zipped successfully.")

Model directory 'fine_tuned_t5_sciq' zipped successfully.


Responsible for encoding contexts into embeddings using the DPR Question Encoder and preparing the data for efficient similarity search using FAISS.

*   Device Configuration: The code checks if a GPU is available and sets the device to GPU if possible, otherwise falls back to CPU.
*   Loading the DPR Question Encoder: The DPRQuestionEncoder and its tokenizer are loaded from the Hugging Face model hub. This model is used to encode questions into fixed-size embeddings.
*   Encoding Contexts: A function `encode_contexts` is defined to process the input texts (contexts) in batches. It tokenizes the texts and uses the DPR question encoder to generate embeddings. The embeddings are then normalized and stored in a numpy array.
*   Loading Contexts: The contexts (from a JSONL file) are loaded into a list. Each context is read from the file, and the "text" field is extracted for encoding.
*   Encoding the Loaded Contexts: The function `encode_contexts` is called to generate the embeddings for all loaded contexts in batches, which are then stored in a numpy array.

In [None]:
from transformers import DPRQuestionEncoder, DPRQuestionEncoderTokenizer
import faiss
import numpy as np
import torch


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")


q_model = DPRQuestionEncoder.from_pretrained("facebook/dpr-question_encoder-single-nq-base").to(device)
q_tokenizer = DPRQuestionEncoderTokenizer.from_pretrained("facebook/dpr-question_encoder-single-nq-base")
print("DPR Question Encoder and tokenizer loaded successfully.")


def encode_contexts(encoder, tokenizer, texts, batch_size=64):
    """
    Encodes texts into embeddings using the question encoder.
    """
    embeddings = []
    num_texts = len(texts)
    for start_idx in range(0, num_texts, batch_size):
        end_idx = min(start_idx + batch_size, num_texts)
        batch_texts = texts[start_idx:end_idx]
        inputs = tokenizer(batch_texts, return_tensors="pt", padding=True, truncation=True, max_length=512).to(device)
        with torch.no_grad():
            batch_embeddings = encoder(**inputs).pooler_output  # Shape: [batch_size, hidden_size]
            batch_embeddings = torch.nn.functional.normalize(batch_embeddings, p=2, dim=1)  # Normalize
            embeddings.append(batch_embeddings.cpu().numpy())
        print(f"Encoded contexts {start_idx} to {end_idx}")
    return np.vstack(embeddings)


import json
context_path = "/content/sciq_faiss_index/sciq_contexts.jsonl"
with open(context_path, "r") as f:
    contexts = [json.loads(line)["text"] for line in f]
print(f"Loaded {len(contexts)} contexts for encoding.")


print("Encoding contexts with the DPR question encoder...")
context_embeddings_np = encode_contexts(q_model, q_tokenizer, contexts, batch_size=64)
print("Context embeddings generated successfully.")

Using device: cuda


Some weights of the model checkpoint at facebook/dpr-question_encoder-single-nq-base were not used when initializing DPRQuestionEncoder: ['question_encoder.bert_model.pooler.dense.bias', 'question_encoder.bert_model.pooler.dense.weight']
- This IS expected if you are initializing DPRQuestionEncoder from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DPRQuestionEncoder from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


DPR Question Encoder and tokenizer loaded successfully.
Loaded 11679 contexts for encoding.
Encoding contexts with the DPR question encoder...
Encoded contexts 0 to 64
Encoded contexts 64 to 128
Encoded contexts 128 to 192
Encoded contexts 192 to 256
Encoded contexts 256 to 320
Encoded contexts 320 to 384
Encoded contexts 384 to 448
Encoded contexts 448 to 512
Encoded contexts 512 to 576
Encoded contexts 576 to 640
Encoded contexts 640 to 704
Encoded contexts 704 to 768
Encoded contexts 768 to 832
Encoded contexts 832 to 896
Encoded contexts 896 to 960
Encoded contexts 960 to 1024
Encoded contexts 1024 to 1088
Encoded contexts 1088 to 1152
Encoded contexts 1152 to 1216
Encoded contexts 1216 to 1280
Encoded contexts 1280 to 1344
Encoded contexts 1344 to 1408
Encoded contexts 1408 to 1472
Encoded contexts 1472 to 1536
Encoded contexts 1536 to 1600
Encoded contexts 1600 to 1664
Encoded contexts 1664 to 1728
Encoded contexts 1728 to 1792
Encoded contexts 1792 to 1856
Encoded contexts 1856 

reinitializes a FAISS index for efficient similarity search and saves it to disk.
Breakdown of the steps:


1.   Reinitialize FAISS Index:
*    The FAISS index is initialized using `IndexFlatIP`, which performs similarity search using the inner product (equivalent to cosine similarity after normalization) between query vectors and the vectors in the index.
*    The `embedding_dim` is extracted from the shape of the `context_embeddings_np` array, which holds the embeddings of the contexts.

2.   Add Context Embeddings to the Index:
*    The previously generated `context_embeddings_np` are added to the FAISS index with `faiss_index.add()`, allowing the index to store these vectors for future retrieval.
3.   Save the FAISS Index:
*   The FAISS index is saved to disk in binary format using `faiss.write_index()` at the specified `index_save_path`. This allows the index to be loaded and reused later without needing to reprocess the context embeddings.

In [None]:

embedding_dim = context_embeddings_np.shape[1]
faiss_index = faiss.IndexFlatIP(embedding_dim)  # Inner product for cosine similarity
faiss_index.add(context_embeddings_np)  # Add the context embeddings
print(f"FAISS index created with {faiss_index.ntotal} vectors.")


index_save_path = "/content/sciq_faiss_index/faiss_index.bin"
faiss.write_index(faiss_index, index_save_path)
print(f"FAISS index saved to: {index_save_path}")

FAISS index created with 11679 vectors.
FAISS index saved to: /content/sciq_faiss_index/faiss_index.bin


loads the fine-tuned T5 model and its corresponding tokenizer from a specified directory and prepares them for further use.

**Steps:**


1.   Model Path Specification: The model is located in the directory `"/content/fine_tuned_t5_sciq"`, where both the fine-tuned model and the tokenizer are stored.
2.   Load Tokenizer and Model:
*   The T5Tokenizer and T5ForConditionalGeneration classes are used to load the tokenizer and model, respectively, from the specified directory.
*   The model is moved to the appropriate device (GPU or CPU) using .to(device) for faster computation.

3.   Confirmation: After loading the model and tokenizer, a success message is printed to confirm the process.





In [None]:
from transformers import T5ForConditionalGeneration, T5Tokenizer


model_path = "/content/fine_tuned_t5_sciq"
print(f"Loading the fine-tuned model from: {model_path}")

tokenizer = T5Tokenizer.from_pretrained(model_path)
model = T5ForConditionalGeneration.from_pretrained(model_path).to(device)

print("Fine-tuned model and tokenizer loaded successfully.")

Loading the fine-tuned model from: /content/fine_tuned_t5_sciq
Fine-tuned model and tokenizer loaded successfully.


defines a function retrieve_and_generate_answer() that retrieves relevant contexts using FAISS and generates an answer to a question using a fine-tuned T5 model.

Steps:
*   Question Encoding: The question is tokenized using the DPR Question Encoder tokenizer, and its embedding is computed with the DPR Question Encoder model.
*   Context Retrieval: Using FAISS, the encoded question is compared to a stored corpus of contexts to retrieve the top-k most relevant contexts. These contexts are then filtered to exclude any empty ones.
*   Answer Generation: The question and retrieved contexts are combined into an input string, tokenized, and passed through the fine-tuned T5 model to generate an answer. The model uses beam search (num_beams=5) for improved generation quality.
*   Error Handling: The function handles cases where no valid contexts are retrieved or when there is an error during the generation process. In such cases, it returns a generic or error message.

In [None]:
def retrieve_and_generate_answer(question, model, tokenizer, q_model, q_tokenizer, faiss_index, contexts, top_k=3):
    """
    Retrieves relevant contexts using FAISS and generates an answer with the T5 model.
    """
    print(f"\nProcessing question: {question}")


    q_inputs = q_tokenizer(question, return_tensors="pt", truncation=True, max_length=512).to(device)
    with torch.no_grad():
        q_embedding = q_model(**q_inputs).pooler_output  # Shape: [1, embedding_dim]
        q_embedding = torch.nn.functional.normalize(q_embedding, p=2, dim=1)


    q_embedding_np = q_embedding.cpu().numpy()
    distances, indices = faiss_index.search(q_embedding_np, top_k)
    retrieved_contexts = [contexts[idx] for idx in indices[0] if contexts[idx].strip()]  # Ensure no empty contexts


    if not retrieved_contexts:
        print("No valid contexts retrieved. Returning a generic response.")
        return "I'm sorry, I could not find relevant information to answer your question."

    retrieved_context = " ".join(retrieved_contexts)
    print(f"Retrieved Contexts: {retrieved_contexts}")


    input_text = f"Question: {question} Context: {retrieved_context}"


    tokenized_inputs = tokenizer(
        input_text,
        return_tensors="pt",
        truncation=True,
        max_length=512,
        padding="max_length"
    ).to(device)


    try:
        with torch.no_grad():
            outputs = model.generate(
                input_ids=tokenized_inputs["input_ids"],
                attention_mask=tokenized_inputs["attention_mask"],
                max_length=128,
                num_beams=5,
                early_stopping=True
            )
        answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
        print(f"Generated Answer: {answer}")
        return answer
    except Exception as e:
        print(f"Error during generation: {e}")
        return "I'm sorry, I encountered an error while generating the answer."

 iterates through a list of example questions and uses the retrieve_and_generate_answer() function to generate answers for each question. For each question:

*    Context Retrieval: The function retrieves the top 20 most relevant contexts using the FAISS index.
*   Answer Generation: It generates an answer by passing the question and the retrieved contexts to the fine-tuned T5 model.
*    Output: For each question, the generated answer is printed.

This allows you to test the model on multiple example questions and see how well it can retrieve relevant contexts and generate meaningful answers. By setting `top_k=20`, the code ensures that the model has access to the top 20 retrieved contexts, which may improve the quality of the answer.

In [None]:
example_questions = [
    "What is the speed of light?",
    "Why is the sky blue?",
    "Explain Newton's first law of motion."
]

for question in example_questions:
    answer = retrieve_and_generate_answer(
        question=question,
        model=model,
        tokenizer=tokenizer,
        q_model=q_model,
        q_tokenizer=q_tokenizer,
        faiss_index=faiss_index,
        contexts=contexts,
        top_k=20
    )
    print(f"Question: {question}")
    print(f"Answer: {answer}\n")


Processing question: What is the speed of light?
Retrieved Contexts: ['No physical object can travel faster than the speed of light in a vacuum. (Maximum speed is finite).', 'The variable is the speed of light. For the relationship to hold mathematically, if the speed of light is used in m/s, the wavelength must be in meters and the frequency in Hertz.', 'The speed of a wave is a product of its wavelength and frequency. Because the speed of electromagnetic waves through space is constant, the wavelength or frequency of an electromagnetic wave can be calculated if the other value is known.', 'the distance light can travel in one year, 9.5 trillion kilometers.', 'Visible light is one type of electromagnetic radiation , which is a form of energy that exhibits wavelike behavior as it moves through space. Other types of electromagnetic radiation include gamma rays, x-rays, ultraviolet light, infrared light, microwaves, and radio waves. The figure below shows the electromagnetic spectrum , 

demonstrates how to use a pre-trained extractive Question Answering (QA) model, specifically `deepset/roberta-base-squad2`, to answer questions based on a given context. Here's a breakdown of the key parts:

*   Model Loading: The code loads the `deepset/roberta-base-squad2` model and its tokenizer from Hugging Face's model hub. This is a RoBERTa model fine-tuned on the SQuAD2.0 dataset for extractive question answering.

*   Function Definition (`extractive_qa`): This function performs extractive question answering, where the model identifies a span of text from the context that answers the question. The function does the following:
       
    *   Tokenizes the question and context pair.
    *   Passes the tokenized input to the QA model.
    *   Retrieves the start and end logits to find the span of the context that answers the question.
    *   Decodes the span of tokens back to text to produce the answer.
*    Device Handling: The model is moved to the appropriate device (GPU or CPU) based on availability, ensuring efficient computation.

In [None]:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch


qa_model_name = "deepset/roberta-base-squad2"
print(f"Loading QA model: {qa_model_name}")
qa_model = AutoModelForQuestionAnswering.from_pretrained(qa_model_name).to(device)
qa_tokenizer = AutoTokenizer.from_pretrained(qa_model_name)
print("QA model loaded successfully.")

def extractive_qa(question, context, qa_model, qa_tokenizer):
    """
    Performs extractive question answering using the QA model.

    Args:
        question (str): The input question.
        context (str): The context passage.
        qa_model: The extractive QA model.
        qa_tokenizer: The tokenizer for the QA model.

    Returns:
        str: The extracted answer.
    """
    inputs = qa_tokenizer(question, context, return_tensors="pt", truncation=True, max_length=512).to(device)
    with torch.no_grad():
        outputs = qa_model(**inputs)


    start_index = torch.argmax(outputs.start_logits)
    end_index = torch.argmax(outputs.end_logits) + 1


    answer = qa_tokenizer.decode(inputs["input_ids"][0][start_index:end_index])
    return answer

Loading QA model: deepset/roberta-base-squad2


config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/496M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

QA model loaded successfully.


function retrieve_and_answer combines retrieval-augmented generation (RAG) and extractive question answering. It leverages both FAISS (for context retrieval) and an extractive QA model (like RoBERTa) to generate accurate answers. Here's a breakdown of the function:

*   Key Steps:
  *   **Encode the Question**:The question is encoded using the DPR (Dense Passage Retrieval) question encoder, which transforms the question into a dense vector.
  *   **Retrieve Relevant Contexts**:Using FAISS, the function retrieves the top-k most relevant contexts from the corpus based on the question's encoded embedding. FAISS performs efficient similarity search to find the best matching contexts.
  *   **Answer Generation**:The top-k retrieved contexts are concatenated, and the combined context is fed into an extractive QA model (such as RoBERTa) to generate the answer. The extractive model identifies the span in the combined context that best answers the question.

In [None]:
def retrieve_and_answer(question, faiss_index, contexts, q_model, q_tokenizer, qa_model, qa_tokenizer, top_k=3):
    """
    Retrieves relevant contexts using FAISS and answers using the extractive QA model.

    Args:
        question (str): The input question.
        faiss_index: The FAISS index containing context embeddings.
        contexts (list): List of context passages.
        q_model: The DPR question encoder.
        q_tokenizer: The tokenizer for the DPR question encoder.
        qa_model: The extractive QA model.
        qa_tokenizer: The tokenizer for the extractive QA model.
        top_k (int): Number of top contexts to retrieve.

    Returns:
        str: The extracted answer.
    """
    print(f"\nProcessing question: {question}")


    q_inputs = q_tokenizer(question, return_tensors="pt", truncation=True, max_length=512).to(device)
    with torch.no_grad():
        q_embedding = q_model(**q_inputs).pooler_output
        q_embedding = torch.nn.functional.normalize(q_embedding, p=2, dim=1)


    q_embedding_np = q_embedding.cpu().numpy()
    distances, indices = faiss_index.search(q_embedding_np, top_k)
    retrieved_contexts = [contexts[idx] for idx in indices[0] if contexts[idx].strip()]  # Ensure non-empty contexts

    if not retrieved_contexts:
        print("No valid contexts retrieved.")
        return "I'm sorry, I couldn't find relevant information to answer your question."

    print(f"Retrieved Contexts: {retrieved_contexts}")


    combined_context = " ".join(retrieved_contexts)
    try:
        answer = extractive_qa(question, combined_context, qa_model, qa_tokenizer)
        print(f"Generated Answer: {answer}")
        return answer
    except Exception as e:
        print(f"Error during QA: {e}")
        return "I'm sorry, I encountered an error while answering your question."

In [None]:
example_questions = [
    "What is the speed of light?",
    "Why is the sky blue?",
    "Explain Newton's first law of motion."
]

for question in example_questions:
    answer = retrieve_and_answer(
        question=question,
        faiss_index=faiss_index,
        contexts=contexts,
        q_model=q_model,
        q_tokenizer=q_tokenizer,
        qa_model=qa_model,
        qa_tokenizer=qa_tokenizer,
        top_k=3
    )
    print(f"Question: {question}")
    print(f"Answer: {answer}\n")


Processing question: What is the speed of light?
Retrieved Contexts: ['No physical object can travel faster than the speed of light in a vacuum. (Maximum speed is finite).', 'The variable is the speed of light. For the relationship to hold mathematically, if the speed of light is used in m/s, the wavelength must be in meters and the frequency in Hertz.', 'The speed of a wave is a product of its wavelength and frequency. Because the speed of electromagnetic waves through space is constant, the wavelength or frequency of an electromagnetic wave can be calculated if the other value is known.']
Generated Answer:  The variable
Question: What is the speed of light?
Answer:  The variable


Processing question: Why is the sky blue?
Retrieved Contexts: ["The photosphere is the visible surface of the Sun ( Figure below ). It's the part that we see shining. Surprisingly, the photosphere is also one of the coolest layers of the Sun. It is only about 6000 degrees C.", "The star constellation “Orio

In [None]:
from datasets import load_dataset, Dataset


print("Loading the SciQ dataset...")
dataset = load_dataset("sciq")

def format_data(split):
    formatted = []
    for example in split:
        context = example["support"]
        question = example["question"]
        answer = example["correct_answer"]
        formatted.append({
            "input": f"Question: {question} Context: {context}",
            "output": answer
        })
    return formatted

print("Formatting the dataset...")
train_data = format_data(dataset["train"])
valid_data = format_data(dataset["validation"])


train_dataset = Dataset.from_list(train_data)
valid_dataset = Dataset.from_list(valid_data)


print("Sample formatted data:", train_dataset[0])

Loading the SciQ dataset...
Formatting the dataset...
Sample formatted data: {'input': 'Question: What type of organism is commonly used in preparation of foods such as cheese and yogurt? Context: Mesophiles grow best in moderate temperature, typically between 25°C and 40°C (77°F and 104°F). Mesophiles are often found living in or on the bodies of humans or other animals. The optimal growth temperature of many pathogenic mesophiles is 37°C (98°F), the normal human body temperature. Mesophilic organisms have important uses in food preparation, including cheese, yogurt, beer and wine.', 'output': 'mesophilic organisms'}


demonstrates how to preprocess and tokenize a dataset using the T5 tokenizer for a sequence-to-sequence task. Here's a breakdown of the steps:
*   **Loading the Tokenizer**:The T5Tokenizer from Hugging Face is loaded with the pre-trained T5 model (t5-large in this case). You can replace "t5-large" with any other model variant if needed.
*   Preprocessing Function:   The preprocess_data function is defined to tokenize both the inputs and outputs of the dataset:
The input text (example["input"]) is tokenized with a maximum length of 512 tokens. The padding="max_length" option ensures all input sequences are padded to the maximum length.
The output text (example["output"]) is tokenized similarly but with a maximum length of 64 tokens.
The target token IDs are assigned to the "labels" key, which is expected by T5 for training sequence-to-sequence tasks.
*   Applying Preprocessing:The map function is applied to the training and validation datasets, where the preprocess_data function is executed on each example.
The `remove_columns` argument ensures the original "input" and "output" columns are removed after tokenization.
*  Display a Sample:Finally, a sample from the tokenized training dataset is displayed to verify the results.

In [None]:
from transformers import T5Tokenizer


model_name = "t5-large"  # Replace with the chosen model
tokenizer = T5Tokenizer.from_pretrained(model_name)


def preprocess_data(example):
    inputs = tokenizer(
        example["input"],
        max_length=512,
        truncation=True,
        padding="max_length"
    )
    targets = tokenizer(
        example["output"],
        max_length=64,
        truncation=True,
        padding="max_length"
    )
    inputs["labels"] = targets["input_ids"]
    return inputs


print("Tokenizing the datasets...")
train_dataset = train_dataset.map(preprocess_data, batched=True, remove_columns=["input", "output"])
valid_dataset = valid_dataset.map(preprocess_data, batched=True, remove_columns=["input", "output"])

print("Tokenized sample:", train_dataset[0])

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


Tokenizing the datasets...


Map:   0%|          | 0/11679 [00:00<?, ? examples/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Tokenized sample: {'input_ids': [11860, 10, 363, 686, 13, 9329, 19, 5871, 261, 16, 4537, 13, 4371, 224, 38, 3285, 11, 19168, 58, 1193, 6327, 10, 10162, 21144, 15, 7, 1604, 200, 16, 8107, 2912, 6, 3115, 344, 944, 1956, 254, 11, 1283, 1956, 254, 41, 4013, 1956, 371, 11, 3, 15442, 1956, 371, 137, 10162, 21144, 15, 7, 33, 557, 435, 840, 16, 42, 30, 8, 5678, 13, 6917, 42, 119, 3127, 5, 37, 6624, 1170, 2912, 13, 186, 2071, 20853, 140, 7, 21144, 15, 7, 19, 6862, 1956, 254, 41, 3916, 1956, 371, 201, 8, 1389, 936, 643, 2912, 5, 10162, 21144, 447, 9329, 7, 43, 359, 2284, 16, 542, 4537, 6, 379, 3285, 6, 19168, 6, 6061, 11, 2013, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

Defines the training arguments for fine-tuning a T5 model using the `TrainingArguments` class from Hugging Face's `transformers` library. Here’s a breakdown of the arguments used

In [None]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./t5_large_finetuned_sciq",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=3e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    save_total_limit=2,
    logging_dir="./logs",
    logging_steps=10,
    load_best_model_at_end=True,
    fp16=True,  # Use mixed precision for faster training on GPUs
)



Outlines the process of fine-tuning a T5 model on your custom dataset using Hugging Face's Trainer class, then saving the fine-tuned model and tokenizer.

In [None]:
from transformers import T5ForConditionalGeneration, Trainer


model = T5ForConditionalGeneration.from_pretrained(model_name).to("cuda")


trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=valid_dataset,
)

print("Starting fine-tuning...")
trainer.train()


print("Saving the fine-tuned model...")
model.save_pretrained("./t5_large_finetuned_sciq")
tokenizer.save_pretrained("./t5_large_finetuned_sciq")
print("Model saved successfully!")

model.safetensors:   0%|          | 0.00/2.95G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]



Starting fine-tuning...


[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33mlearningpuria[0m ([33mlearningpuria-politecnico-di-torino[0m). Use [1m`wandb login --relogin`[0m to force relogin


Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.


Epoch,Training Loss,Validation Loss
1,0.0,
2,0.0,
3,0.0,


There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight'].


Saving the fine-tuned model...
Model saved successfully!


You've provided is for evaluating the fine-tuned model after training using the Hugging Face Trainer class. The `trainer.evaluate()` method evaluates the model on the validation dataset and returns performance metrics such as loss and any other metrics defined in your `TrainingArguments`.

In [None]:

results = trainer.evaluate()
print("Evaluation Results:", results)

Evaluation Results: {'eval_loss': nan, 'eval_runtime': 12.0489, 'eval_samples_per_second': 82.995, 'eval_steps_per_second': 10.374, 'epoch': 3.0}


Loads a fine-tuned T5 model using Hugging Face's `pipeline` API and uses it for text generation tasks based on a question and context. It appears to be set up to generate answers in a question-answering context where you provide both the question and context together, and the model generates a corresponding answer.

In [None]:

from transformers import pipeline

qa_pipeline = pipeline(
    "text2text-generation",
    model="./t5_large_finetuned_sciq",
    tokenizer="./t5_large_finetuned_sciq",
    device=0  # Use GPU if available
)

questions = [
    "What is the speed of light?",
    "Why is the sky blue?",
    "Explain Newton's first law of motion."
]
contexts = [
    "Light travels at a speed of 299,792,458 meters per second in a vacuum.",
    "The sky appears blue due to Rayleigh scattering of sunlight in the atmosphere.",
    "Newton's first law states that an object will remain at rest or in uniform motion unless acted upon by an external force."
]

for question, context in zip(questions, contexts):
    input_text = f"Question: {question} Context: {context}"
    print(f"Question: {question}")
    print(f"Answer: {qa_pipeline(input_text)[0]['generated_text']}\n")

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers


Question: What is the speed of light?




Answer: 299,792,458 meters per second

Question: Why is the sky blue?
Answer: Rayleigh scattering of sunlight

Question: Explain Newton's first law of motion.
Answer: Newton's first law states that an object will remain at rest or in uniform motion unless



create a zip archive of the `t5_large_finetuned_sciq` model directory and save it as a `.zip` file.

In [None]:
import shutil

model_dir = "/content/t5_large_finetuned_sciq"


shutil.make_archive(model_dir, 'zip', model_dir)

print(f"Model directory '{model_dir}' zipped successfully.")