# 10. Transformers & BERT
Transformers and BERT (Bidirectional Encoder Representations from Transformers) have revolutionized the field of Natural Language Processing (NLP) by enabling models to understand context and relationships in text more effectively than previous architectures. Transformers utilize self-attention mechanisms to process input data in parallel, allowing for better handling of long-range dependencies in text. BERT, built on the Transformer architecture, is pre-trained on large corpora and can be fine-tuned for various NLP tasks, achieving state-of-the-art performance.

### What You'll Learn:
- Transformer architecture
- Attention mechanism
- BERT explained
- Fine-tuning BERT
- Applications

## Problem with RNN/LSTM

- Process sequentially (slow)
- Hard to parallelize
- Still has gradient issues

**Solution: Transformers**
- Process entire sequence at once (parallel)
- Attention mechanism for relationships
- State-of-the-art performance

## Attention Mechanism

**Core Idea**: Focus on relevant parts of input

**Query, Key, Value**:
- Query: What am I looking for?
- Key: What does this contain?
- Value: What information is here?

**Result**: Context-aware representations

## BERT (Bidirectional Encoder Representations from Transformers)

**Why BERT is powerful**:
- **Bidirectional**: Reads left-to-right AND right-to-left
- **Pre-trained**: Already knows English
- **Transfer Learning**: Fine-tune for your task
- **Contextual**: Same word has different meaning based on context

In [1]:
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
import torch

print('='*60)
print('BERT FOR SENTIMENT CLASSIFICATION')
print('='*60)

try:
    # Using pre-trained sentiment classifier
    classifier = pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english')
    
    texts = [
        'I absolutely love this product!',
        'This is terrible',
        'It is ok'
    ]
    
    print('\nBERT Predictions:')
    for text in texts:
        result = classifier(text)[0]
        print(f'  "{text}"')
        print(f'    -> {result["label"]}: {result["score"]:.2%}\n')
        
except Exception as e:
    print(f'(Transformers example - requires internet for first download)')
    print(f'Error: {e}')

  from .autonotebook import tqdm as notebook_tqdm


BERT FOR SENTIMENT CLASSIFICATION
(Transformers example - requires internet for first download)
Error: Your currently installed version of Keras is Keras 3, but this is not yet supported in Transformers. Please install the backwards-compatible tf-keras package with `pip install tf-keras`.


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


## Fine-tuning BERT

Adapt pre-trained BERT for your specific task.

In [2]:
print('\nBERT FINE-TUNING PROCESS:')
print('1. Load pre-trained BERT model')
print('2. Add task-specific layers')
print('3. Train on your labeled data')
print('4. Evaluate on test set')
print('\nAdvantages:')
print('- Faster training (transfer learning)')
print('- Need less data')
print('- Better results than training from scratch')
print('- Can adapt to specific domains')


BERT FINE-TUNING PROCESS:
1. Load pre-trained BERT model
2. Add task-specific layers
3. Train on your labeled data
4. Evaluate on test set

Advantages:
- Faster training (transfer learning)
- Need less data
- Better results than training from scratch
- Can adapt to specific domains
