[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vuhung16au/hf-transformer-trove/blob/main/examples/basic1.2/01-sentiment-analysis.ipynb)
[![View on GitHub](https://img.shields.io/badge/View_on-GitHub-blue?logo=github)](https://github.com/vuhung16au/hf-transformer-trove/blob/main/examples/basic1.2/01-sentiment-analysis.ipynb)

# Basic Sentiment Analysis with Hugging Face Transformers

## 🎯 Learning Objectives
By the end of this notebook, you will understand:
- What sentiment analysis is and why it's important
- How to use Hugging Face transformers pipeline for sentiment analysis
- The structure of sentiment analysis outputs
- Basic text preprocessing considerations
- How transformers perform sentiment classification

## 📋 Prerequisites
- Basic understanding of machine learning concepts
- Familiarity with Python programming
- Basic knowledge of Natural Language Processing (NLP) concepts

## 📚 What We'll Cover
1. **Introduction**: What is sentiment analysis?
2. **Setup**: Installing and importing required libraries
3. **Core Implementation**: Using transformers pipeline
4. **Understanding Output**: Interpreting results and confidence scores
5. **Advanced Examples**: Testing different text inputs
6. **Summary**: Key takeaways and next steps

## 1. Introduction to Sentiment Analysis

**Sentiment Analysis** is a fundamental Natural Language Processing task that determines the emotional tone or attitude expressed in text. It's widely used in:

- **Social Media Monitoring**: Understanding public opinion about brands or products
- **Customer Feedback**: Analyzing reviews and support tickets
- **Market Research**: Gauging consumer sentiment
- **Content Moderation**: Identifying negative or harmful content

### How Does It Work?

Modern sentiment analysis uses transformer models that:
1. **Tokenize** the input text into smaller pieces
2. **Encode** these tokens into numerical representations
3. **Process** the sequence using attention mechanisms
4. **Classify** the overall sentiment with a confidence score

### Common Sentiment Categories
- **POSITIVE**: Expresses happiness, satisfaction, or approval
- **NEGATIVE**: Expresses sadness, dissatisfaction, or disapproval
- **NEUTRAL**: Expresses neither positive nor negative sentiment

## 2. Setup and Installation

Let's start by importing the necessary libraries and setting up our environment.

In [None]:
# Install required packages (uncomment if running for the first time)
# !pip install transformers torch datasets

# Import essential libraries
import torch
from transformers import pipeline
import warnings

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Device detection for optimal performance
def get_device() -> torch.device:
    """
    Automatically detect and return the best available device.
    
    Priority: CUDA > MPS (Apple Silicon) > CPU
    
    Returns:
        torch.device: The optimal device for current hardware
    """
    if torch.cuda.is_available():
        device = torch.device("cuda")
        print(f"🚀 Using CUDA GPU: {torch.cuda.get_device_name()}")
    elif torch.backends.mps.is_available():
        device = torch.device("mps") 
        print("🍎 Using Apple MPS (Apple Silicon)")
    else:
        device = torch.device("cpu")
        print("💻 Using CPU (consider GPU for better performance)")
    
    return device

# Detect the best available device
device = get_device()

print("\n📚 Setup completed successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"Device: {device}")

## 3. Core Implementation: Using Transformers Pipeline

The Hugging Face `pipeline` function provides the easiest way to perform sentiment analysis. It handles:
- Model loading and configuration
- Text preprocessing and tokenization
- Model inference and postprocessing
- Result formatting

In [None]:
# Create a sentiment analysis pipeline
# This uses a default model: distilbert-base-uncased-finetuned-sst-2-english
print("🔄 Loading sentiment analysis model...")

classifier = pipeline("sentiment-analysis")

print("✅ Model loaded successfully!")
print(f"📊 Model: {classifier.model.config.name_or_path}")
print(f"🎯 Task: {classifier.task}")

## 4. Testing the Required Example

Let's test our sentiment analysis pipeline with the specific example from the requirements:

In [None]:
# Test with the required example
test_text = "I've been waiting for a HuggingFace course my whole life."

print(f"📝 Input text: '{test_text}'")
print("\n🔍 Performing sentiment analysis...")

# Get the sentiment prediction
result = classifier(test_text)

print(f"\n📊 Result: {result}")

# Extract and display details
sentiment = result[0]
label = sentiment['label']
score = sentiment['score']

print(f"\n✨ Analysis:")
print(f"   Sentiment: {label}")
print(f"   Confidence: {score:.4f} ({score*100:.1f}%)")
print(f"   Interpretation: The model is {score*100:.1f}% confident this text expresses {label.lower()} sentiment.")

## 5. Understanding the Output Structure

Let's break down what the sentiment analysis pipeline returns:

In [None]:
# Analyze the output structure
print("🔍 Understanding the output structure:")
print(f"   Type: {type(result)}")
print(f"   Length: {len(result)}")
print(f"   First item type: {type(result[0])}")
print(f"   Keys: {result[0].keys()}")

print("\n📋 Output Format Explanation:")
print("   • Returns a list containing one dictionary per input text")
print("   • Each dictionary has two keys:")
print("     - 'label': The predicted sentiment class (POSITIVE/NEGATIVE)")
print("     - 'score': The confidence score (0.0 to 1.0)")
print("   • Higher scores indicate higher confidence in the prediction")

## 6. Advanced Examples: Testing Different Text Types

Let's explore how the model handles various types of text to better understand its capabilities:

In [None]:
# Test with various text examples
test_examples = [
    "This movie is absolutely amazing! I loved every minute of it.",
    "The service at this restaurant was terrible and the food was cold.",
    "The weather today is okay, nothing special.",
    "I can't believe how wonderful this product is!",
    "This is the worst experience I've ever had.",
    "The meeting is scheduled for 3 PM tomorrow.",
]

print("🧪 Testing multiple examples:")
print("=" * 60)

for i, text in enumerate(test_examples, 1):
    result = classifier(text)
    sentiment = result[0]
    
    # Use emoji for visual representation
    emoji = "😊" if sentiment['label'] == 'POSITIVE' else "😞"
    
    print(f"\n{i}. Text: '{text}'")
    print(f"   Result: {sentiment['label']} {emoji} (confidence: {sentiment['score']:.3f})")

## 7. Batch Processing

The pipeline can also process multiple texts at once, which is more efficient:

In [None]:
# Batch processing example
batch_texts = [
    "I love this new feature!",
    "This is disappointing.",
    "Not sure how I feel about this."
]

print("🚀 Batch processing example:")
print("Input texts:", batch_texts)

# Process all texts at once
batch_results = classifier(batch_texts)

print("\nResults:")
for text, result in zip(batch_texts, batch_results):
    emoji = "😊" if result['label'] == 'POSITIVE' else "😞"
    print(f"  '{text}' → {result['label']} {emoji} ({result['score']:.3f})")

## 8. Model Information and Technical Details

Let's explore some technical details about the model being used:

In [None]:
# Explore model details
print("🔧 Model Technical Details:")
print(f"   Model Name: {classifier.model.config.name_or_path}")
print(f"   Model Type: {classifier.model.config.model_type}")
print(f"   Number of Labels: {classifier.model.config.num_labels}")
print(f"   Label Mapping: {classifier.model.config.id2label}")

# Model size information
total_params = sum(p.numel() for p in classifier.model.parameters())
trainable_params = sum(p.numel() for p in classifier.model.parameters() if p.requires_grad)

print(f"\n📊 Model Size:")
print(f"   Total Parameters: {total_params:,}")
print(f"   Trainable Parameters: {trainable_params:,}")
print(f"   Model Size: ~{total_params/1_000_000:.1f}M parameters")

## 9. Performance Considerations

Understanding performance characteristics helps in production deployment:

In [None]:
import time

# Performance timing
def time_inference(texts, num_runs=5):
    """Time the inference process for performance analysis."""
    times = []
    
    for _ in range(num_runs):
        start_time = time.time()
        _ = classifier(texts)
        end_time = time.time()
        times.append(end_time - start_time)
    
    return times

# Test with single text
single_text = "This is a test for performance measurement."
single_times = time_inference(single_text)

# Test with batch
batch_text = [single_text] * 10
batch_times = time_inference(batch_text)

print("⏱️ Performance Analysis:")
print(f"   Single text (avg): {sum(single_times)/len(single_times):.3f}s")
print(f"   Batch of 10 (avg): {sum(batch_times)/len(batch_times):.3f}s")
print(f"   Speedup factor: {(sum(single_times)*10)/(sum(batch_times)*len(batch_times)):.1f}x")
print("\n💡 Tip: Batch processing is more efficient for multiple texts!")

---

## 📋 Summary

### 🔑 Key Concepts Mastered
- **Sentiment Analysis**: Understanding how to classify text emotional tone using transformers
- **Hugging Face Pipeline**: Using the high-level pipeline API for quick sentiment analysis
- **Output Interpretation**: Understanding confidence scores and label predictions
- **Batch Processing**: Efficiently processing multiple texts at once
- **Performance Considerations**: Understanding timing and efficiency trade-offs

### 📈 Best Practices Learned
- Use batch processing for multiple texts to improve efficiency
- Always check confidence scores to understand prediction reliability
- Consider the model's training data and potential biases in real-world applications
- Monitor performance characteristics for production deployment

### 🚀 Next Steps
- **Advanced Notebooks**: Explore custom model fine-tuning for specific domains
- **Model Comparison**: Compare different sentiment analysis models
- **Documentation**: Review [Hugging Face Transformers documentation](https://huggingface.co/docs/transformers/) for advanced usage
- **External Resources**: Explore the [Model Hub](https://huggingface.co/models?pipeline_tag=text-classification) for specialized models

---

## About the Author

**Vu Hung Nguyen** - AI Engineer & Researcher

Connect with me:
- 🌐 **Website**: [vuhung16au.github.io](https://vuhung16au.github.io/)
- 💼 **LinkedIn**: [linkedin.com/in/nguyenvuhung](https://www.linkedin.com/in/nguyenvuhung/)
- 💻 **GitHub**: [github.com/vuhung16au](https://github.com/vuhung16au/)

*This notebook is part of the [HF Transformer Trove](https://github.com/vuhung16au/hf-transformer-trove) educational series.*