[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vuhung16au/hf-transformer-trove/blob/main/examples/basic1.4/text-classification.ipynb)
[![View on GitHub](https://img.shields.io/badge/View_on-GitHub-blue?logo=github)](https://github.com/vuhung16au/hf-transformer-trove/blob/main/examples/basic1.4/text-classification.ipynb)

# Text Classification with BERT

## 🎯 Learning Objectives
By the end of this notebook, you will understand:
- What text classification is and its applications
- How BERT works for text classification tasks
- How to use pre-trained BERT models for text classification
- The structure of BERT's classification outputs
- Device awareness for optimal performance

## 📋 Prerequisites
- Basic understanding of machine learning concepts
- Familiarity with Python and PyTorch
- Knowledge of NLP fundamentals (refer to [NLP Learning Journey](https://github.com/vuhung16au/nlp-learning-journey))

## 📚 What We'll Cover
1. **Introduction**: Understanding text classification and BERT
2. **Setup**: Installing and importing required libraries
3. **BERT Model**: Loading and configuring BERT for classification
4. **Simple Classification**: Basic text classification examples
5. **Understanding Outputs**: Interpreting BERT's classification results
6. **Summary**: Key takeaways and next steps

## 1. Introduction to Text Classification with BERT

**Text Classification** involves assigning predefined categories to text documents, such as sentiment analysis, topic classification, or spam detection.

### What is BERT?

[**BERT (Bidirectional Encoder Representations from Transformers)**](https://huggingface.co/docs/transformers/model_doc/bert) is an encoder-only model and is the first model to effectively implement deep bidirectionality to learn richer representations of the text by attending to words on both sides.

### Key BERT Features for Classification:
- **Bidirectional Context**: Unlike traditional models, BERT reads text in both directions
- **WordPiece Tokenization**: Handles unknown words by breaking them into subword tokens
- **Special Tokens**: Uses [CLS] token for classification tasks
- **Pre-trained**: Already trained on massive text corpora

### How BERT Performs Classification:
1. **[CLS] Token**: Added at the beginning of every sequence
2. **[SEP] Token**: Separates sentences when needed
3. **Classification Head**: Linear layer that takes [CLS] output for classification
4. **Softmax**: Converts logits to probability scores

## 2. Setup and Imports

In [None]:
# Install required packages (uncomment and run if needed)
# !pip install transformers torch datasets tokenizers matplotlib seaborn

# Import essential libraries
import torch
import numpy as np
import matplotlib.pyplot as plt
import time
import warnings
from typing import List, Dict, Optional

# Hugging Face imports
from transformers import (
    AutoTokenizer, 
    AutoModelForSequenceClassification,
    pipeline
)

warnings.filterwarnings('ignore')

print("✅ Libraries imported successfully!")
print(f"🔥 PyTorch version: {torch.__version__}")
print(f"🤗 Transformers available")

## 3. Device Detection and Setup

Let's detect the best available device for optimal performance:

In [None]:
def get_device() -> torch.device:
    """
    Get the best available device for PyTorch operations.
    
    Priority order: CUDA > MPS (Apple Silicon) > CPU
    
    Returns:
        torch.device: The optimal device for current hardware
    """
    if torch.cuda.is_available():
        device = torch.device("cuda")
        print(f"🚀 Using CUDA GPU: {torch.cuda.get_device_name()}")
    elif torch.backends.mps.is_available():
        device = torch.device("mps")
        print("🍎 Using Apple MPS for Apple Silicon optimization")
    else:
        device = torch.device("cpu")
        print("💻 Using CPU - consider GPU for better performance")
    
    return device

# Get optimal device
device = get_device()
print(f"📱 Selected device: {device}")

## 4. Loading BERT for Text Classification

We'll use a pre-trained BERT model fine-tuned for sentiment analysis as our example:

In [None]:
# Choose a pre-trained BERT model for sentiment analysis
model_name = "bert-base-uncased"
# Alternative: "distilbert-base-uncased-finetuned-sst-2-english" for sentiment analysis

print(f"📥 Loading BERT model: {model_name}")
print("⏱️ This may take a moment for first-time download...")

# Load tokenizer and model with error handling
try:
    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    print(f"✅ Tokenizer loaded successfully")
    
    # For basic demonstration, we'll use a pipeline
    # This automatically loads the appropriate model architecture
    classifier = pipeline(
        "sentiment-analysis",
        model="distilbert-base-uncased-finetuned-sst-2-english",
        device=0 if device.type == "cuda" else -1  # Use GPU if available
    )
    
    print(f"✅ BERT classification pipeline loaded successfully")
    print(f"📊 Model ready for inference on {device}")
    
except Exception as e:
    print(f"❌ Error loading model: {e}")
    print("💡 Try checking your internet connection or model name")

## 5. Simple Text Classification Examples

Let's test our BERT model with various text examples:

In [None]:
# Test texts for classification
test_texts = [
    "I love using BERT for text classification!",
    "This movie was terrible and boring.",
    "The weather today is okay, nothing special.",
    "BERT is a powerful transformer model for NLP.",
    "I hate waiting in long queues."
]

print("🔍 Performing Text Classification with BERT")
print("=" * 60)

results = []

# Process each text with timing
for i, text in enumerate(test_texts, 1):
    print(f"\n📝 Example {i}:")
    print(f"Text: '{text}'")
    
    # Perform classification with timing
    start_time = time.time()
    result = classifier(text)
    inference_time = time.time() - start_time
    
    # Extract results
    label = result[0]['label']
    confidence = result[0]['score']
    
    # Display results
    print(f"🎯 Prediction: {label}")
    print(f"📊 Confidence: {confidence:.4f} ({confidence*100:.2f}%)")
    print(f"⏱️ Inference time: {inference_time:.3f} seconds")
    
    # Store for analysis
    results.append({
        'text': text,
        'label': label,
        'confidence': confidence,
        'inference_time': inference_time
    })

print(f"\n✅ Classified {len(test_texts)} texts successfully!")

## 6. Understanding BERT's Classification Process

Let's examine what happens under the hood when BERT classifies text:

In [None]:
# Let's examine the tokenization process
example_text = "BERT is amazing for text classification!"

print("🔍 BERT Tokenization Process")
print("=" * 40)
print(f"Original text: '{example_text}'")

# Tokenize the text
tokens = tokenizer.tokenize(example_text)
print(f"\n🔤 Tokens: {tokens}")

# Convert to input IDs
input_ids = tokenizer.encode(example_text)
print(f"\n🔢 Input IDs: {input_ids}")

# Show special tokens
print(f"\n🏷️ Special Tokens:")
print(f"   [CLS] token ID: {tokenizer.cls_token_id} ('{tokenizer.cls_token}')")
print(f"   [SEP] token ID: {tokenizer.sep_token_id} ('{tokenizer.sep_token}')")
print(f"   [PAD] token ID: {tokenizer.pad_token_id} ('{tokenizer.pad_token}')")

# Decode back to verify
decoded = tokenizer.decode(input_ids)
print(f"\n🔄 Decoded: '{decoded}'")

print(f"\n💡 Notice how BERT adds [CLS] at the start and [SEP] at the end!")

## 7. Visualization of Results

Let's visualize our classification results:

In [None]:
# Create visualizations of our results
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Plot 1: Confidence scores
texts_short = [r['text'][:30] + '...' if len(r['text']) > 30 else r['text'] for r in results]
confidences = [r['confidence'] for r in results]
colors = ['green' if r['label'] == 'POSITIVE' else 'red' for r in results]

ax1.barh(range(len(texts_short)), confidences, color=colors, alpha=0.7)
ax1.set_yticks(range(len(texts_short)))
ax1.set_yticklabels(texts_short)
ax1.set_xlabel('Confidence Score')
ax1.set_title('BERT Classification Confidence Scores')
ax1.set_xlim(0, 1)

# Add confidence values as text
for i, conf in enumerate(confidences):
    ax1.text(conf + 0.01, i, f'{conf:.3f}', va='center')

# Plot 2: Inference times
inference_times = [r['inference_time'] for r in results]
ax2.bar(range(len(texts_short)), inference_times, color='blue', alpha=0.7)
ax2.set_xticks(range(len(texts_short)))
ax2.set_xticklabels([f'Text {i+1}' for i in range(len(texts_short))], rotation=45)
ax2.set_ylabel('Inference Time (seconds)')
ax2.set_title('BERT Inference Performance')

# Add time values as text
for i, time_val in enumerate(inference_times):
    ax2.text(i, time_val + max(inference_times)*0.01, f'{time_val:.3f}s', ha='center')

plt.tight_layout()
plt.show()

# Summary statistics
avg_confidence = np.mean(confidences)
avg_time = np.mean(inference_times)

print(f"📊 Summary Statistics:")
print(f"   Average Confidence: {avg_confidence:.4f}")
print(f"   Average Inference Time: {avg_time:.4f} seconds")
print(f"   Device: {device}")

## 8. Key Concepts Summary

### 🧠 What We Learned About BERT:

1. **Bidirectional Context**: BERT reads text in both directions, giving it a complete understanding of context
2. **WordPiece Tokenization**: Breaks down unknown words into manageable subword tokens
3. **Special Tokens**: Uses [CLS] token for classification and [SEP] to separate sentences
4. **Pre-trained Power**: Leverages massive pre-training on text corpora

### 🔧 Technical Implementation:
- **Pipeline API**: Simplifies model usage for common tasks
- **Device Awareness**: Automatically utilizes available hardware (GPU/CPU)
- **Confidence Scores**: Provides probability estimates for predictions

### 📈 Performance Insights:
- BERT provides highly confident predictions for clear sentiment
- Inference time varies but is generally fast for individual texts
- GPU acceleration significantly improves performance

---

## 📋 Summary

### 🔑 Key Concepts Mastered
- **Text Classification**: Understanding how to assign categories to text using BERT
- **BERT Architecture**: How bidirectional encoding works for classification tasks
- **Tokenization**: WordPiece tokenization and special tokens ([CLS], [SEP])
- **Pipeline Usage**: Using Hugging Face pipelines for simple classification tasks
- **Device Optimization**: Leveraging GPU/MPS for faster inference

### 📈 Best Practices Learned
- Use device detection for optimal performance across different hardware
- Pre-trained models provide excellent out-of-the-box performance
- Pipeline API simplifies complex model operations
- Always examine confidence scores to assess prediction reliability
- Visualization helps understand model behavior and performance

### 🚀 Next Steps
- **Fine-tuning**: Learn to adapt BERT for custom classification tasks
- **Advanced Models**: Explore RoBERTa, DeBERTa, and other BERT variants
- **Multi-class Classification**: Handle more than two categories
- **Documentation**: [Hugging Face Transformers Docs](https://huggingface.co/docs/transformers/)
- **BERT Paper**: [Original BERT Research Paper](https://arxiv.org/abs/1810.04805)

---

## About the Author

**Vu Hung Nguyen** - AI Engineer & Researcher

Connect with me:
- 🌐 **Website**: [vuhung16au.github.io](https://vuhung16au.github.io/)
- 💼 **LinkedIn**: [linkedin.com/in/nguyenvuhung](https://www.linkedin.com/in/nguyenvuhung/)
- 💻 **GitHub**: [github.com/vuhung16au](https://github.com/vuhung16au/)

*This notebook is part of the [HF Transformer Trove](https://github.com/vuhung16au/hf-transformer-trove) educational series.*