[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vuhung16au/hf-transformer-trove/blob/main/examples/08_question_answering.ipynb)
[![View on GitHub](https://img.shields.io/badge/View_on-GitHub-blue?logo=github)](https://github.com/vuhung16au/hf-transformer-trove/blob/main/examples/08_question_answering.ipynb)

# 08 - Question Answering Model: From Context to Answers

## 🎯 Learning Objectives
By the end of this notebook, you will understand:
- Question answering concepts and architectures
- Extractive vs generative QA approaches
- Using pre-trained QA models with HuggingFace
- Fine-tuning QA models on custom datasets
- Evaluation metrics for QA systems
- Building production-ready QA applications

## 📋 Prerequisites
- Basic understanding of machine learning concepts
- Familiarity with Python and PyTorch
- Knowledge of transformers (refer to [Notebook 01](01_intro_hf_transformers.ipynb))
- Understanding of tokenization (refer to [Notebook 02](02_tokenizers.ipynb))

## 📚 What We'll Cover
1. **Introduction**: QA concepts and model architectures
2. **Pipeline Usage**: High-level QA with pipelines
3. **Manual Implementation**: Low-level QA model usage
4. **Dataset Processing**: Working with SQuAD and custom QA data
5. **Model Comparison**: Different QA architectures
6. **Fine-tuning**: Custom QA model training
7. **Evaluation**: QA metrics and benchmarking
8. **Production System**: Building robust QA applications

## Introduction to Question Answering

Question Answering (QA) is the task of automatically answering questions posed in natural language. Given a context (passage of text) and a question, the model extracts or generates an appropriate answer.

### Types of Question Answering:
- **Extractive QA**: Extracts answer spans directly from the given context
- **Generative QA**: Generates answers that may not appear verbatim in the context
- **Open-domain QA**: Answers questions without a specific context
- **Closed-domain QA**: Answers questions within a specific domain or context

### Popular QA Datasets:
- **SQuAD (Stanford Question Answering Dataset)**: Reading comprehension dataset
- **Natural Questions**: Real questions from Google search
- **MS MARCO**: Large-scale reading comprehension dataset
- **QuAC**: Question Answering in Context

In [None]:
# Import necessary libraries
from transformers import (
    AutoTokenizer, AutoModelForQuestionAnswering,
    pipeline, Trainer, TrainingArguments,
    DefaultDataCollator
)
from datasets import load_dataset, Dataset, DatasetDict
import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm.auto import tqdm
import time
import json
import re
from typing import Dict, List, Tuple, Optional
import warnings
warnings.filterwarnings('ignore')

# Device detection with educational output
def get_device() -> torch.device:
    """
    Automatically detect and return the best available device.
    
    Priority: CUDA > MPS (Apple Silicon) > CPU
    
    Returns:
        torch.device: The optimal device for current hardware
    """
    if torch.cuda.is_available():
        device = torch.device("cuda")
        print(f"🚀 Using CUDA GPU: {torch.cuda.get_device_name()}")
    elif torch.backends.mps.is_available():
        device = torch.device("mps") 
        print("🍎 Using Apple MPS (Apple Silicon)")
    else:
        device = torch.device("cpu")
        print("💻 Using CPU (consider GPU for better performance)")
    
    return device

# Set up device
device = get_device()

print("\n📚 Libraries loaded successfully!")
print(f"PyTorch version: {torch.__version__}")

## Part 1: Question Answering with Pipelines

Let's start with the simplest approach: using HuggingFace pipelines for question answering.

In [None]:
# Create a question-answering pipeline
print("🔧 Loading question-answering pipeline...")
qa_pipeline = pipeline(
    "question-answering",
    model="distilbert-base-cased-distilled-squad",  # Efficient model fine-tuned on SQuAD
    device=0 if device.type == "cuda" else -1  # Use GPU if available
)

print("✅ Pipeline loaded successfully!")

# Example context about machine learning
context = """
Machine learning is a method of data analysis that automates analytical model building. 
It is a branch of artificial intelligence (AI) based on the idea that systems can learn 
from data, identify patterns and make decisions with minimal human intervention. 
The process typically involves training algorithms on large datasets to create models 
that can make predictions or decisions on new, unseen data. Popular machine learning 
techniques include supervised learning, unsupervised learning, and reinforcement learning.
"""

# Test questions
questions = [
    "What is machine learning?",
    "What does ML automate?",
    "What are three types of machine learning?",
    "How do systems learn in machine learning?",
    "What is machine learning based on?"
]

print("\n🎯 Testing Question Answering:")
print("=" * 50)

for i, question in enumerate(questions, 1):
    print(f"\n❓ Question {i}: {question}")
    
    # Get answer from pipeline
    result = qa_pipeline(question=question, context=context)
    
    print(f"💡 Answer: {result['answer']}")
    print(f"🎯 Confidence: {result['score']:.4f}")
    print(f"📍 Position: {result['start']}-{result['end']}")

## Summary

In this comprehensive notebook, we explored question answering from multiple perspectives:

### 🎯 **What We Accomplished**
1. **QA Fundamentals**: Understanding extractive vs generative approaches
2. **Pipeline Usage**: High-level QA with HuggingFace pipelines
3. **Manual Implementation**: Low-level model usage and tokenization
4. **Dataset Integration**: Working with SQuAD and custom QA data
5. **Advanced Techniques**: Confidence thresholding and multi-context QA
6. **Evaluation Metrics**: Implementing EM and F1 scores
7. **Production System**: Building robust, scalable QA applications

### 🔑 **Key Concepts Mastered**
- **Extractive QA**: Models find answer spans within given context
- **Confidence Scoring**: Using start/end logits to assess answer quality
- **Context Processing**: Preprocessing techniques for better performance
- **Evaluation**: Standard metrics (Exact Match, F1) for QA assessment
- **Production Considerations**: Error handling, statistics, batch processing

### 📈 **Best Practices Learned**
- **Model Selection**: Choose appropriate models for your use case (speed vs accuracy)
- **Confidence Thresholding**: Reject low-confidence answers to maintain quality
- **Input Preprocessing**: Clean and normalize text for consistent results
- **Comprehensive Evaluation**: Use multiple metrics to assess performance
- **Error Handling**: Graceful failure handling in production systems
- **Monitoring**: Track system performance and usage statistics

### 🚀 **Next Steps**
- **Notebook 09**: Advanced fine-tuning with LoRA and QLoRA
- **Notebook 10**: LLMs and Reinforcement Learning from Human Feedback
- **Documentation**: [Question Answering Best Practices](../docs/qa-best-practices.md)
- **External Resources**: [HuggingFace QA Guide](https://huggingface.co/transformers/task_summary.html#question-answering)

Question answering is a fundamental NLP task that demonstrates the power of transformer models for understanding and extracting information from text. The techniques learned here form the foundation for many real-world applications!

---

*Ready to continue? Head to **Notebook 09: PEFT LoRA QLoRA** to learn advanced fine-tuning techniques!*

---

## About the Author

**Vu Hung Nguyen** - AI Engineer & Researcher

Connect with me:
- 🌐 **Website**: [vuhung16au.github.io](https://vuhung16au.github.io/)
- 💼 **LinkedIn**: [linkedin.com/in/nguyenvuhung](https://www.linkedin.com/in/nguyenvuhung/)
- 💻 **GitHub**: [github.com/vuhung16au](https://github.com/vuhung16au/)

*This notebook is part of the [HF Transformer Trove](https://github.com/vuhung16au/hf-transformer-trove) educational series.*