# Question Answering Model Comparison

This notebook demonstrates the comparative study of encoder-decoder architectures for question answering using the SQuAD dataset.

Three models are implemented:
1. Encoder-Decoder without Attention
2. Encoder-Decoder with Bahdanau Attention
3. Transformer-based Encoder-Decoder

## Setup

First, let's clone the repository and install the required dependencies:

In [None]:
!git clone https://github.com/vedant7001/DL_Project.git
%cd DL_Project
!pip install -r requirements.txt

## Train Models

For demonstration purposes, we'll train smaller versions of each model on a limited dataset.

In [None]:
# Train a small base model (5-10 minutes)
!python train.py --model_type base --embedding_dim 128 --hidden_dim 64 --num_epochs 5 --batch_size 16 --max_samples 200

In [None]:
# Train a small attention model (5-10 minutes)
!python train.py --model_type attention --embedding_dim 128 --hidden_dim 64 --num_epochs 5 --batch_size 16 --max_samples 200

In [None]:
# Train a small transformer model (5-10 minutes)
!python train.py --model_type transformer --embedding_dim 128 --num_heads 4 --num_layers 2 --num_epochs 5 --batch_size 16 --max_samples 200

## Evaluate and Compare Models

Let's compare the trained models.

In [None]:
# Run simple evaluation that doesn't require full validation data
!python evaluate_simple.py

## Visualize Attention

Let's visualize the attention weights in the attention-based models.

In [None]:
# Run example usage script to visualize attention on sample questions
!python example_usage.py

## Try Your Own Questions

Now let's try our own questions with one of the trained models.

In [None]:
import torch
import os
import matplotlib.pyplot as plt
from evaluate import load_model
from data_utils import simple_tokenize

def find_latest_model(model_type):
    runs_dir = 'runs'
    matching_dirs = [d for d in os.listdir(runs_dir) if d.startswith(model_type)]
    if not matching_dirs:
        return None
    
    # Sort by timestamp
    latest_dir = sorted(matching_dirs)[-1]
    model_path = os.path.join(runs_dir, latest_dir, 'best_model.pt')
    
    if os.path.exists(model_path):
        return model_path
    return None

def get_answer(model, model_type, context, question, device):
    # Tokenize the inputs
    context_tokens = simple_tokenize(context)
    question_tokens = simple_tokenize(question)
    
    # Truncate if needed
    max_context_len = 400
    max_question_len = 50
    
    if len(context_tokens) > max_context_len:
        context_tokens = context_tokens[:max_context_len]
    if len(question_tokens) > max_question_len:
        question_tokens = question_tokens[:max_question_len]
    
    # For demonstration, we'll use placeholder values for word indices
    context_tensor = torch.ones(1, len(context_tokens), dtype=torch.long).to(device)  # All UNK tokens
    question_tensor = torch.ones(1, len(question_tokens), dtype=torch.long).to(device)  # All UNK tokens
    
    context_len = torch.tensor([len(context_tokens)]).to(device)
    question_len = torch.tensor([len(question_tokens)]).to(device)
    
    # Get predictions
    with torch.no_grad():
        if model_type == 'base':
            start_idx, end_idx = model.predict(context_tensor, context_len, question_tensor, question_len)
            attention = None
        else:
            start_idx, end_idx, attention = model.predict(context_tensor, context_len, question_tensor, question_len)
    
    # Get the predicted span
    start = start_idx.item()
    end = end_idx.item()
    
    # Ensure start <= end
    if start > end:
        start, end = end, start
    
    # Extract the answer
    answer_tokens = context_tokens[start:end+1]
    answer = ' '.join(answer_tokens)
    
    result = {
        'question': question,
        'context': context,
        'answer': answer,
        'start': start,
        'end': end,
        'context_tokens': context_tokens
    }
    
    if attention is not None:
        result['attention'] = attention[0].cpu().numpy()
    
    return result

def highlight_answer(context, start, end):
    tokens = simple_tokenize(context)
    highlighted = []
    
    for i, token in enumerate(tokens):
        if start <= i <= end:
            highlighted.append(f"[{token}]")
        else:
            highlighted.append(token)
    
    return ' '.join(highlighted)

def visualize_attention(context_tokens, attention_weights, title="Attention Weights"):
    """Visualize attention weights."""
    plt.figure(figsize=(12, 4))
    plt.imshow(attention_weights, cmap='viridis')
    plt.title(title)
    plt.xlabel('Context Position')
    
    # Show tokens on x-axis
    if len(context_tokens) <= 50:  # Only show tokens if not too many
        plt.xticks(range(len(context_tokens)), context_tokens, rotation=90)
    
    plt.colorbar()
    plt.tight_layout()
    plt.show()

# Load the attention model (usually gives better results for visualization)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_path = find_latest_model('attention')
if model_path:
    print(f"Using model: {model_path}")
    model, _ = load_model(model_path)
    model = model.to(device)
    model.eval()
else:
    print("No attention model found. Please run the training cell first.")

In [None]:
# Define context and question
context = """
Deep learning is a subset of machine learning that uses artificial neural networks with multiple layers. 
These deep neural networks can learn representations of data with multiple levels of abstraction. 
The architecture of deep neural networks consists of an input layer, multiple hidden layers, and an output layer. 
Each layer contains nodes or neurons that perform computations on the input data.
Deep learning has achieved remarkable results in various fields, including computer vision, 
natural language processing, speech recognition, and game playing.
"""

question = "What is deep learning?"

# Get and display answer
if model_path:
    result = get_answer(model, 'attention', context, question, device)
    print(f"Question: {result['question']}")
    print(f"\nAnswer: {result['answer']}")
    print("\nContext with highlighted answer:")
    print(highlight_answer(context, result['start'], result['end']))
    
    # Visualize attention if available
    if 'attention' in result:
        visualize_attention(result['context_tokens'], result['attention'], "Attention Weights")
else:
    print("Model not loaded, cannot answer question.")

## Additional Example: Deep Learning and Transformers

In [None]:
# Enhanced context and questions about deep learning
context1 = """
Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. 
Learning can be supervised, semi-supervised or unsupervised. Deep learning architectures such as deep neural networks, 
deep belief networks, recurrent neural networks and convolutional neural networks have been applied to fields including 
computer vision, speech recognition, natural language processing, audio recognition, social network filtering, 
machine translation, bioinformatics, drug design, medical image analysis, material inspection and board game programs, 
where they have produced results comparable to and in some cases surpassing human expert performance.
Transformer architectures, a type of deep learning model, have become particularly important for NLP tasks 
since their introduction in 2017. Transformers use self-attention mechanisms to process input sequences in parallel, 
which has proven especially effective for tasks like machine translation, text summarization, and question answering. 
Models like BERT, GPT, and T5 are all based on the transformer architecture and have set new performance benchmarks 
across numerous language understanding tasks.
"""

questions1 = [
    "What is deep learning part of?",
    "What types of learning can be used in deep learning?",
    "What are transformer architectures used for?"
]

# Only run if we have a model loaded
if model_path and 'model' in locals():
    for question in questions1:
        print(f"\n{'='*80}\nQuestion: {question}\n{'='*80}\n")
        
        result = get_answer(model, 'attention', context1, question, device)
        print(f"Answer: {result['answer']}\n")
        print("Context with highlighted answer:")
        print(highlight_answer(context1, result['start'], result['end']))
        
        # Visualize attention if available
        if 'attention' in result:
            visualize_attention(result['context_tokens'], result['attention'], "Attention Weights")
        
        print('\n' + '-'*80)
else:
    print("No model loaded, cannot run deep learning examples.")

## Additional Example: Natural Language Processing and SQuAD

In [None]:
# Example context about NLP and Question Answering
context2 = """
Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence 
concerned with the interactions between computers and human language, in particular how to program computers 
to process and analyze large amounts of natural language data. The goal is a computer capable of "understanding" 
the contents of documents, including the contextual nuances of the language within them. The technology can then 
accurately extract information and insights contained in the documents as well as categorize and organize the 
documents themselves. Challenges in natural language processing frequently involve speech recognition, natural 
language understanding, and natural language generation. Modern NLP approaches are based on machine learning, 
especially statistical methods and neural networks. As of 2020, deep learning approaches such as transformers 
have achieved state-of-the-art results on many NLP tasks.
Question answering (QA) is an important NLP task that involves automatically answering questions posed in natural language. 
Machine reading comprehension, a subset of QA, focuses on answering questions based on a given context passage. 
The Stanford Question Answering Dataset (SQuAD) has become a benchmark dataset for this task, consisting of questions 
posed by crowdworkers on a set of Wikipedia articles. In SQuAD, the answer to every question is a segment of text 
from the corresponding reading passage. Models are evaluated based on exact match and F1 scores, comparing their 
predicted answers against human-provided reference answers.
"""

questions2 = [
    "What is NLP?",
    "What is SQuAD used for?",
    "How are QA models evaluated?"
]

# Only run if we have a model loaded
if model_path and 'model' in locals():
    for question in questions2:
        print(f"\n{'='*80}\nQuestion: {question}\n{'='*80}\n")
        
        result = get_answer(model, 'attention', context2, question, device)
        print(f"Answer: {result['answer']}\n")
        print("Context with highlighted answer:")
        print(highlight_answer(context2, result['start'], result['end']))
        
        # Visualize attention if available
        if 'attention' in result:
            visualize_attention(result['context_tokens'], result['attention'], "Attention Weights")
        
        print('\n' + '-'*80)
else:
    print("No model loaded, cannot run NLP examples.")

## PyTorch Example: Try It Yourself

In [None]:
# Define your own context and question about PyTorch
my_context = """
PyTorch is an open source machine learning framework based on the Torch library, 
used for applications such as computer vision and natural language processing, 
originally developed by Meta AI and now part of the Linux Foundation umbrella. 
It is free and open-source software released under the Modified BSD license. 
Although the Python interface is more polished and the primary focus of development, 
PyTorch also has a C++ interface. PyTorch provides two high-level features: 
Tensor computing (like NumPy) with strong acceleration via graphics processing units (GPU) 
and Deep neural networks built on a tape-based automatic differentiation system.
PyTorch is distinctive in its implementation of dynamic computational graphs, which allow for 
more flexible model building compared to static graph frameworks. This 'define-by-run' approach 
enables developers to modify neural networks on the fly, making debugging and experimentation easier. 
The framework includes modules for building complex neural network architectures, optimizers for 
training, data loading utilities, and seamless GPU integration. Its ecosystem has expanded 
to include libraries like torchvision for computer vision, torchaudio for audio processing, 
torchtext for NLP, and PyTorch Lightning for organizing research code. With its intuitive design 
and Python-native flow, PyTorch has become especially popular in research communities.
"""

my_question = "Who developed PyTorch?"

# Only run if we have a model loaded
if model_path and 'model' in locals():
    result = get_answer(model, 'attention', my_context, my_question, device)
    
    print(f"Question: {result['question']}\n")
    print(f"Answer: {result['answer']}\n")
    print("Context with highlighted answer:")
    print(highlight_answer(my_context, result['start'], result['end']))
    
    # Visualize attention if available
    if 'attention' in result:
        visualize_attention(result['context_tokens'], result['attention'], "Attention Weights")
else:
    print("No model loaded, cannot run custom example.")

## Your Turn: Try Your Own Context and Question

Modify the cells below to try your own context and question.

In [None]:
# Define your own context and question
custom_context = """
Replace this text with your own context paragraph. It should be several sentences long.
Make sure to include factual information that can be used to answer questions.
The longer and more detailed the context, the better the model can extract specific answers.
"""

custom_question = "Write your question here?"

In [None]:
# Run your custom example
if model_path and 'model' in locals():
    result = get_answer(model, 'attention', custom_context, custom_question, device)
    
    print(f"Question: {result['question']}\n")
    print(f"Answer: {result['answer']}\n")
    print("Context with highlighted answer:")
    print(highlight_answer(custom_context, result['start'], result['end']))
    
    # Visualize attention if available
    if 'attention' in result:
        visualize_attention(result['context_tokens'], result['attention'], "Attention Weights")
else:
    print("No model loaded, cannot run custom example.")

## Conclusion

In this notebook, we've demonstrated:
1. Training three different encoder-decoder models for question answering
2. Evaluating and comparing their performance
3. Visualizing attention weights
4. Using the models for inference on custom questions

The full code is available on GitHub at: https://github.com/vedant7001/DL_Project