# 12. Question Answering Systems
Question Answering (QA) systems are advanced applications in Natural Language Processing (NLP) that aim to automatically answer questions posed by users in natural language. These systems leverage various techniques, including information retrieval, machine learning, and deep learning, to understand the context of the question and provide accurate and relevant answers from a given dataset or knowledge base.

### What You'll Learn:
- How QA systems work
- BERT for QA
- Building QA pipeline
- Evaluation
- Real applications

## QA System Architecture

```
Question Input
    ↓
Context Retrieval (Find relevant documents)
    ↓
Passage Ranking (Find best passages)
    ↓
Answer Extraction (Find answer span)
    ↓
Answer Output
```

## BERT-based QA

**Task**: Given context and question, find answer span

**Process**:
1. Tokenize question and context
2. Encode with BERT
3. Output: Start and end positions of answer

**Advantages**:
- Pre-trained on massive text
- Contextual understanding
- High accuracy

In [1]:
from transformers import pipeline

print('='*60)
print('QUESTION ANSWERING WITH BERT')
print('='*60)

try:
    qa_pipeline = pipeline('question-answering', model='distilbert-base-cased-distilled-squad')
    
    context = '''Natural Language Processing (NLP) is a subfield of linguistics, computer science, 
    and artificial intelligence concerned with the interactions between computers and human language. 
    NLP is used to apply machine learning algorithms to text and speech. 
    Some examples include sentiment analysis, topic modeling, and machine translation.'''
    
    questions = [
        'What is NLP?',
        'What are applications of NLP?',
        'Which field does NLP belong to?'
    ]
    
    print(f'\nContext:\n{context}\n')
    print('='*60)
    
    for question in questions:
        result = qa_pipeline(question=question, context=context)
        print(f'\nQ: {question}')
        print(f'A: {result["answer"]}')
        print(f'   Confidence: {result["score"]:.2%}')
        
except Exception as e:
    print('(Example requires transformers library)')
    print('\nBERT QA Process:')
    print('1. Load pre-trained model on SQuAD dataset')
    print('2. Tokenize question and context together')
    print('3. Get logits for token positions')
    print('4. Extract answer span with highest score')

QUESTION ANSWERING WITH BERT
(Example requires transformers library)

BERT QA Process:
1. Load pre-trained model on SQuAD dataset
2. Tokenize question and context together
3. Get logits for token positions
4. Extract answer span with highest score


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


## Building QA System Components

### 1. Question Understanding
- Identify question type
- Extract keywords
- Determine answer type (person, place, date, etc.)

### 2. Document Retrieval
- Find relevant documents
- Score by relevance
- Return top candidates

### 3. Passage Selection
- Identify relevant passages
- Rank passages
- Select best passage

### 4. Answer Extraction
- Use NER or span extraction
- Generate answer
- Score confidence

In [2]:
print('\nFULL QA SYSTEM EXAMPLE')
print('='*60)

print('\nStep 1: Question Understanding')
question = 'When was machine learning invented?'
print(f'  Question: {question}')
print(f'  Question Type: When (temporal)')
print(f'  Expected Answer: Date')

print('\nStep 2: Document Retrieval')
documents = [
    ('history_of_ml.pdf', 0.95),
    ('ai_overview.pdf', 0.87),
    ('neural_networks.pdf', 0.62)
]
print(f'  Retrieved documents (top 2):')
for doc, score in documents[:2]:
    print(f'    - {doc}: {score:.0%}')

print('\nStep 3: Passage Selection')
print('  Best passage: "ML emerged in 1956 at Dartmouth Conference"')

print('\nStep 4: Answer Extraction')
print('  Answer: 1956')
print('  Confidence: 92%')

print('\nFinal Answer: 1956')


FULL QA SYSTEM EXAMPLE

Step 1: Question Understanding
  Question: When was machine learning invented?
  Question Type: When (temporal)
  Expected Answer: Date

Step 2: Document Retrieval
  Retrieved documents (top 2):
    - history_of_ml.pdf: 95%
    - ai_overview.pdf: 87%

Step 3: Passage Selection
  Best passage: "ML emerged in 1956 at Dartmouth Conference"

Step 4: Answer Extraction
  Answer: 1956
  Confidence: 92%

Final Answer: 1956


## Evaluation Metrics

### EM (Exact Match)
- Answer exactly matches reference
- Either 0 or 1

### F1 Score
- Overlap between predicted and reference
- Accounts for partial matches

### Example:
- Reference: 'Paris, France'
- Predicted: 'Paris'
- EM: 0 (doesn't match exactly)
- F1: 0.67 (partial match)