# Sentiment Analysis in Natural Language Processing

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vuhung16au/nlp-learning-journey/blob/main/examples/sentiment-analysis.ipynb)

## Overview

Sentiment Analysis is the process of determining the emotional tone or attitude expressed in text. It's used to identify whether text expresses positive, negative, or neutral sentiment, and can be extended to detect specific emotions.

## What You'll Learn

- Understanding sentiment analysis concepts
- Rule-based sentiment analysis
- Machine learning approaches
- Deep learning for sentiment analysis
- Handling different types of text
- Evaluation metrics
- Real-world applications

## Prerequisites

Basic understanding of Python, NLP preprocessing, and machine learning concepts.

## Setup and Installation

In [None]:
# Install required libraries
!pip install nltk textblob vaderSentiment transformers scikit-learn pandas matplotlib seaborn wordcloud

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter
import re

# NLP libraries
import nltk
from textblob import TextBlob
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from transformers import pipeline

# Machine learning
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# Download NLTK data
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('vader_lexicon')

# Set up plotting
plt.style.use('default')
sns.set_palette("husl")

# Initialize sentiment analyzers
vader_analyzer = SentimentIntensityAnalyzer()

## Understanding Sentiment Analysis

Sentiment analysis can be performed at different levels:
- **Document-level**: Overall sentiment of entire document
- **Sentence-level**: Sentiment of individual sentences
- **Aspect-level**: Sentiment toward specific aspects/features

Common approaches:
- **Rule-based**: Using lexicons and linguistic rules
- **Machine Learning**: Training models on labeled data
- **Deep Learning**: Neural networks and transformers

In [None]:
# Sample texts for sentiment analysis
sample_texts = {
    "Positive": [
        "I absolutely love this product! It's amazing and works perfectly.",
        "What a fantastic experience! Highly recommend to everyone.",
        "Brilliant service and excellent quality. Very satisfied!"
    ],
    "Negative": [
        "This is the worst product I've ever bought. Complete waste of money.",
        "Terrible service and poor quality. Very disappointed.",
        "I hate this! It doesn't work at all and broke after one day."
    ],
    "Neutral": [
        "The product arrived on time and matches the description.",
        "It's an okay product. Nothing special but does the job.",
        "The item is as expected. Standard quality and features."
    ],
    "Mixed": [
        "Great design but poor functionality. Love the look but it doesn't work well.",
        "Good customer service but the product quality is disappointing.",
        "Fast delivery which is great, but the item arrived damaged unfortunately."
    ]
}

print("Sample Texts for Sentiment Analysis:")
for category, texts in sample_texts.items():
    print(f"\n{category}:")
    for i, text in enumerate(texts, 1):
        print(f"  {i}. {text}")

## TextBlob Sentiment Analysis

TextBlob provides simple sentiment analysis with polarity (-1 to 1) and subjectivity (0 to 1) scores.

In [None]:
def analyze_sentiment_textblob(text):
    """Analyze sentiment using TextBlob"""
    blob = TextBlob(text)
    polarity = blob.sentiment.polarity
    subjectivity = blob.sentiment.subjectivity
    
    # Classify sentiment
    if polarity > 0.1:
        sentiment = "Positive"
    elif polarity < -0.1:
        sentiment = "Negative"
    else:
        sentiment = "Neutral"
    
    return {
        'sentiment': sentiment,
        'polarity': polarity,
        'subjectivity': subjectivity
    }

# Test TextBlob on sample texts
print("TextBlob Sentiment Analysis Results:")
print("=" * 60)
print(f"{'Text':<50} {'Sentiment':<10} {'Polarity':<10} {'Subjectivity':<12}")
print("-" * 85)

for category, texts in sample_texts.items():
    for text in texts[:1]:  # One example per category
        result = analyze_sentiment_textblob(text)
        text_short = text[:47] + "..." if len(text) > 50 else text
        print(f"{text_short:<50} {result['sentiment']:<10} {result['polarity']:<10.2f} {result['subjectivity']:<12.2f}")

## VADER Sentiment Analysis

VADER (Valence Aware Dictionary and sEntiment Reasoner) is particularly good for social media text and handles emoticons, slang, and intensifiers.

In [None]:
def analyze_sentiment_vader(text):
    """Analyze sentiment using VADER"""
    scores = vader_analyzer.polarity_scores(text)
    
    # Determine sentiment based on compound score
    compound = scores['compound']
    if compound >= 0.05:
        sentiment = "Positive"
    elif compound <= -0.05:
        sentiment = "Negative"
    else:
        sentiment = "Neutral"
    
    return {
        'sentiment': sentiment,
        'compound': compound,
        'positive': scores['pos'],
        'negative': scores['neg'],
        'neutral': scores['neu']
    }

# Test VADER on sample texts including social media style
social_media_texts = [
    "OMG this is AMAZING!!! 😍😍😍 #love",
    "This sucks 😞 worst day ever :(",
    "It's okay I guess... not bad but not great either",
    "BEST PRODUCT EVER!!!! 5 stars ⭐⭐⭐⭐⭐",
    "Meh... could be better 🤷‍♀️"
]

print("VADER Sentiment Analysis Results:")
print("=" * 80)
print(f"{'Text':<40} {'Sentiment':<10} {'Compound':<10} {'Pos':<6} {'Neg':<6} {'Neu':<6}")
print("-" * 80)

for text in social_media_texts:
    result = analyze_sentiment_vader(text)
    text_short = text[:37] + "..." if len(text) > 40 else text
    print(f"{text_short:<40} {result['sentiment']:<10} {result['compound']:<10.2f} "
          f"{result['positive']:<6.2f} {result['negative']:<6.2f} {result['neutral']:<6.2f}")

## Transformer-based Sentiment Analysis

Using pre-trained transformer models for state-of-the-art sentiment analysis.

In [None]:
# Initialize transformer-based sentiment analysis pipeline
try:
    sentiment_pipeline = pipeline("sentiment-analysis", 
                                 model="cardiffnlp/twitter-roberta-base-sentiment-latest")
    print("Loaded Twitter-specific RoBERTa model")
except:
    try:
        sentiment_pipeline = pipeline("sentiment-analysis")
        print("Loaded default sentiment analysis model")
    except:
        sentiment_pipeline = None
        print("Could not load transformer model")

def analyze_sentiment_transformer(text):
    """Analyze sentiment using transformer model"""
    if sentiment_pipeline is None:
        return {"sentiment": "N/A", "confidence": 0.0}
    
    try:
        result = sentiment_pipeline(text)[0]
        return {
            'sentiment': result['label'],
            'confidence': result['score']
        }
    except:
        return {"sentiment": "Error", "confidence": 0.0}

# Test transformer model
if sentiment_pipeline:
    print("\nTransformer Sentiment Analysis Results:")
    print("=" * 60)
    print(f"{'Text':<40} {'Sentiment':<15} {'Confidence':<10}")
    print("-" * 65)
    
    test_texts = [
        "I love this product so much!",
        "This is terrible and doesn't work",
        "It's an average product, nothing special",
        "Mixed feelings about this purchase"
    ]
    
    for text in test_texts:
        result = analyze_sentiment_transformer(text)
        text_short = text[:37] + "..." if len(text) > 40 else text
        print(f"{text_short:<40} {result['sentiment']:<15} {result['confidence']:<10.3f}")
else:
    print("Transformer model not available")

## Comparing Sentiment Analysis Methods

In [None]:
def compare_sentiment_methods(texts):
    """Compare different sentiment analysis methods"""
    results = []
    
    for text in texts:
        textblob_result = analyze_sentiment_textblob(text)
        vader_result = analyze_sentiment_vader(text)
        transformer_result = analyze_sentiment_transformer(text)
        
        results.append({
            'text': text,
            'textblob': textblob_result['sentiment'],
            'vader': vader_result['sentiment'],
            'transformer': transformer_result['sentiment']
        })
    
    return results

# Compare methods on diverse texts
comparison_texts = [
    "I absolutely love this! Best purchase ever! 😍",
    "This is horrible and I hate it completely",
    "It's okay, nothing special but does the job",
    "Great quality but too expensive for what it is",
    "Not bad, could be better though",
    "AMAZING!!! 5 stars ⭐⭐⭐⭐⭐ #recommended"
]

comparison_results = compare_sentiment_methods(comparison_texts)

print("Sentiment Analysis Method Comparison:")
print("=" * 80)
print(f"{'Text':<35} {'TextBlob':<12} {'VADER':<12} {'Transformer':<12}")
print("-" * 80)

for result in comparison_results:
    text_short = result['text'][:32] + "..." if len(result['text']) > 35 else result['text']
    print(f"{text_short:<35} {result['textblob']:<12} {result['vader']:<12} {result['transformer']:<12}")

# Calculate agreement between methods
textblob_vader_agreement = sum(1 for r in comparison_results 
                              if r['textblob'] == r['vader']) / len(comparison_results)
textblob_transformer_agreement = sum(1 for r in comparison_results 
                                    if r['textblob'] == r['transformer']) / len(comparison_results)
vader_transformer_agreement = sum(1 for r in comparison_results 
                                 if r['vader'] == r['transformer']) / len(comparison_results)

print(f"\nMethod Agreement:")
print(f"TextBlob vs VADER: {textblob_vader_agreement:.2%}")
print(f"TextBlob vs Transformer: {textblob_transformer_agreement:.2%}")
print(f"VADER vs Transformer: {vader_transformer_agreement:.2%}")

## Machine Learning Approach

Building a custom sentiment classifier using traditional ML algorithms.

In [None]:
# Create a sample dataset for training
def create_sample_dataset():
    """Create a sample dataset for sentiment classification"""
    data = []
    
    # Positive examples
    positive_texts = [
        "I love this product, it's amazing!",
        "Excellent quality and fast delivery",
        "Best purchase I've made this year",
        "Fantastic service and great results",
        "Highly recommend to everyone",
        "Perfect solution to my problem",
        "Outstanding quality and value",
        "Exceeded my expectations completely",
        "Brilliant design and functionality",
        "Absolutely wonderful experience"
    ]
    
    # Negative examples
    negative_texts = [
        "Terrible product, complete waste of money",
        "Poor quality and doesn't work properly",
        "Worst purchase ever, very disappointed",
        "Broke after one day, terrible quality",
        "Don't buy this, it's awful",
        "Completely useless and overpriced",
        "Horrible experience and poor service",
        "Failed to meet any expectations",
        "Waste of time and money",
        "Extremely poor quality control"
    ]
    
    # Neutral examples
    neutral_texts = [
        "It's an okay product, nothing special",
        "Average quality, does what it says",
        "Standard features and decent price",
        "Neither good nor bad, just average",
        "Meets basic requirements adequately",
        "Fair quality for the price point",
        "Acceptable performance overall",
        "Regular product with normal features",
        "Decent but not outstanding",
        "Works as expected, nothing more"
    ]
    
    # Combine data
    for text in positive_texts:
        data.append({'text': text, 'sentiment': 'positive'})
    for text in negative_texts:
        data.append({'text': text, 'sentiment': 'negative'})
    for text in neutral_texts:
        data.append({'text': text, 'sentiment': 'neutral'})
    
    return pd.DataFrame(data)

# Create and prepare dataset
df = create_sample_dataset()
print(f"Dataset shape: {df.shape}")
print(f"\nSentiment distribution:")
print(df['sentiment'].value_counts())

# Text preprocessing
def preprocess_text(text):
    """Simple text preprocessing"""
    # Convert to lowercase
    text = text.lower()
    # Remove special characters
    text = re.sub(r'[^a-zA-Z\s]', '', text)
    return text

df['text_processed'] = df['text'].apply(preprocess_text)

# Split data
X = df['text_processed']
y = df['sentiment']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

# Feature extraction
vectorizer = TfidfVectorizer(max_features=1000, stop_words='english')
X_train_vectorized = vectorizer.fit_transform(X_train)
X_test_vectorized = vectorizer.transform(X_test)

# Train models
models = {
    'Logistic Regression': LogisticRegression(random_state=42),
    'Naive Bayes': MultinomialNB()
}

trained_models = {}
for name, model in models.items():
    model.fit(X_train_vectorized, y_train)
    trained_models[name] = model
    
    # Evaluate
    y_pred = model.predict(X_test_vectorized)
    accuracy = accuracy_score(y_test, y_pred)
    
    print(f"\n{name} Accuracy: {accuracy:.3f}")
    print("Classification Report:")
    print(classification_report(y_test, y_pred))

## Aspect-Based Sentiment Analysis

Analyzing sentiment toward specific aspects or features.

In [None]:
def aspect_based_sentiment_analysis(text, aspects):
    """Perform aspect-based sentiment analysis"""
    results = {}
    
    # Split text into sentences
    sentences = [s.strip() for s in text.split('.') if s.strip()]
    
    for aspect in aspects:
        aspect_sentiments = []
        
        # Find sentences mentioning the aspect
        for sentence in sentences:
            if aspect.lower() in sentence.lower():
                # Analyze sentiment of this sentence
                sentiment = analyze_sentiment_vader(sentence)
                aspect_sentiments.append(sentiment['compound'])
        
        if aspect_sentiments:
            avg_sentiment = np.mean(aspect_sentiments)
            if avg_sentiment >= 0.05:
                sentiment_label = "Positive"
            elif avg_sentiment <= -0.05:
                sentiment_label = "Negative"
            else:
                sentiment_label = "Neutral"
            
            results[aspect] = {
                'sentiment': sentiment_label,
                'score': avg_sentiment,
                'mentions': len(aspect_sentiments)
            }
        else:
            results[aspect] = {
                'sentiment': 'Not Mentioned',
                'score': 0.0,
                'mentions': 0
            }
    
    return results

# Test aspect-based sentiment analysis
review_text = """
The quality of this phone is excellent and I love the camera features. 
However, the battery life is disappointing and doesn't last long. 
The design is beautiful and feels premium in hand. 
The price is a bit high but the performance makes it worth it. 
Customer service was helpful when I had questions.
"""

aspects = ['quality', 'camera', 'battery', 'design', 'price', 'performance', 'service']

aspect_results = aspect_based_sentiment_analysis(review_text, aspects)

print("Aspect-Based Sentiment Analysis Results:")
print("=" * 50)
print(f"{'Aspect':<12} {'Sentiment':<12} {'Score':<8} {'Mentions':<8}")
print("-" * 50)

for aspect, result in aspect_results.items():
    print(f"{aspect:<12} {result['sentiment']:<12} {result['score']:<8.2f} {result['mentions']:<8}")

# Visualize aspect sentiments
aspects_mentioned = [aspect for aspect, result in aspect_results.items() 
                    if result['mentions'] > 0]
scores = [aspect_results[aspect]['score'] for aspect in aspects_mentioned]

plt.figure(figsize=(10, 6))
colors = ['green' if score > 0 else 'red' if score < 0 else 'gray' for score in scores]
plt.bar(aspects_mentioned, scores, color=colors, alpha=0.7)
plt.axhline(y=0, color='black', linestyle='-', alpha=0.3)
plt.title('Aspect-Based Sentiment Analysis')
plt.xlabel('Aspects')
plt.ylabel('Sentiment Score')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## Real-World Applications

In [None]:
# Application 1: Social Media Monitoring
def monitor_brand_sentiment(posts):
    """Monitor brand sentiment from social media posts"""
    results = []
    
    for post in posts:
        vader_result = analyze_sentiment_vader(post)
        results.append({
            'post': post,
            'sentiment': vader_result['sentiment'],
            'score': vader_result['compound']
        })
    
    # Calculate overall sentiment distribution
    sentiments = [r['sentiment'] for r in results]
    sentiment_counts = Counter(sentiments)
    
    return results, sentiment_counts

# Application 2: Customer Feedback Analysis
def analyze_customer_feedback(feedbacks):
    """Analyze customer feedback and identify key issues"""
    sentiment_summary = {
        'total_reviews': len(feedbacks),
        'positive': 0,
        'negative': 0,
        'neutral': 0,
        'avg_score': 0,
        'negative_keywords': []
    }
    
    all_scores = []
    negative_texts = []
    
    for feedback in feedbacks:
        result = analyze_sentiment_vader(feedback)
        score = result['compound']
        all_scores.append(score)
        
        if result['sentiment'] == 'Positive':
            sentiment_summary['positive'] += 1
        elif result['sentiment'] == 'Negative':
            sentiment_summary['negative'] += 1
            negative_texts.append(feedback.lower())
        else:
            sentiment_summary['neutral'] += 1
    
    sentiment_summary['avg_score'] = np.mean(all_scores)
    
    # Extract keywords from negative feedback
    if negative_texts:
        # Simple keyword extraction (in practice, use more sophisticated methods)
        negative_words = []
        common_negative_keywords = ['bad', 'terrible', 'awful', 'poor', 'hate', 
                                   'broken', 'defective', 'slow', 'expensive']
        
        for text in negative_texts:
            for keyword in common_negative_keywords:
                if keyword in text:
                    negative_words.append(keyword)
        
        sentiment_summary['negative_keywords'] = list(set(negative_words))
    
    return sentiment_summary

# Test applications
social_media_posts = [
    "Just tried the new product from @brand - absolutely love it! 😍 #amazing",
    "Disappointed with my recent purchase from @brand. Poor quality 😞",
    "@brand has excellent customer service! Quick response and helpful.",
    "Not impressed with @brand lately. Quality has gone down.",
    "Great experience with @brand! Will definitely buy again 👍",
    "@brand products are overpriced for what you get 💸"
]

customer_feedbacks = [
    "Great product, excellent quality and fast delivery!",
    "The item broke after just one week. Very poor quality.",
    "Average product, nothing special but does the job.",
    "Terrible customer service and defective product.",
    "Love this purchase! Exceeded expectations.",
    "Too expensive for what it offers. Not worth the money.",
    "Perfect solution to my needs. Highly recommend!",
    "Product arrived damaged and return process was slow."
]

# Social media monitoring
print("Social Media Brand Monitoring:")
print("=" * 40)
social_results, social_counts = monitor_brand_sentiment(social_media_posts)

for sentiment, count in social_counts.items():
    percentage = (count / len(social_media_posts)) * 100
    print(f"{sentiment}: {count} posts ({percentage:.1f}%)")

# Customer feedback analysis
print("\nCustomer Feedback Analysis:")
print("=" * 40)
feedback_summary = analyze_customer_feedback(customer_feedbacks)

print(f"Total Reviews: {feedback_summary['total_reviews']}")
print(f"Positive: {feedback_summary['positive']} ({feedback_summary['positive']/feedback_summary['total_reviews']*100:.1f}%)")
print(f"Negative: {feedback_summary['negative']} ({feedback_summary['negative']/feedback_summary['total_reviews']*100:.1f}%)")
print(f"Neutral: {feedback_summary['neutral']} ({feedback_summary['neutral']/feedback_summary['total_reviews']*100:.1f}%)")
print(f"Average Sentiment Score: {feedback_summary['avg_score']:.3f}")
print(f"Negative Keywords Found: {feedback_summary['negative_keywords']}")

# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Social media sentiment distribution
ax1.pie(social_counts.values(), labels=social_counts.keys(), autopct='%1.1f%%')
ax1.set_title('Social Media Sentiment Distribution')

# Customer feedback sentiment distribution
feedback_labels = ['Positive', 'Negative', 'Neutral']
feedback_values = [feedback_summary['positive'], feedback_summary['negative'], feedback_summary['neutral']]
ax2.pie(feedback_values, labels=feedback_labels, autopct='%1.1f%%')
ax2.set_title('Customer Feedback Sentiment Distribution')

plt.tight_layout()
plt.show()

## Exercises

1. **Multi-class Emotion Detection**: Extend sentiment analysis to detect specific emotions (joy, anger, fear, sadness, etc.)
2. **Domain-Specific Sentiment**: Build sentiment classifiers for specific domains (finance, healthcare, etc.)
3. **Sentiment Timeline Analysis**: Track sentiment changes over time
4. **Comparative Sentiment**: Compare sentiment between different products/brands

## Key Takeaways

- **Multiple approaches available**: Rule-based (VADER), statistical (TextBlob), and neural (transformers)
- **Context matters**: Social media text needs different handling than formal reviews
- **VADER excels at social media**: Handles emoticons, caps, and slang well
- **Transformers provide best accuracy**: But require more computational resources
- **Aspect-based analysis**: Provides more granular insights than document-level sentiment
- **Consider domain adaptation**: Generic models may not work well for specialized domains

## Best Practices

1. **Choose the right tool**: VADER for social media, transformers for accuracy, TextBlob for simplicity
2. **Preprocess appropriately**: Handle negations, intensifiers, and domain-specific language
3. **Validate on your data**: Different domains may need different approaches
4. **Consider aspect-level analysis**: More informative than document-level for complex text
5. **Handle class imbalance**: Real-world data often has uneven sentiment distribution

## Applications

- **Brand monitoring**: Track public sentiment about brands/products
- **Customer feedback analysis**: Understand customer satisfaction
- **Market research**: Analyze public opinion on topics
- **Content moderation**: Detect toxic or negative content
- **Recommendation systems**: Factor sentiment into recommendations

## Next Steps

- Learn about emotion detection and fine-grained sentiment analysis
- Explore multimodal sentiment analysis (text + images/audio)
- Study temporal sentiment analysis and trend detection
- Practice with real-world datasets from your domain of interest

## Resources

- [VADER Sentiment Analysis](https://github.com/cjhutto/vaderSentiment)
- [TextBlob Documentation](https://textblob.readthedocs.io/)
- [Hugging Face Sentiment Models](https://huggingface.co/models?pipeline_tag=text-classification)
- [Stanford Sentiment Treebank](https://nlp.stanford.edu/sentiment/)
- [SemEval Sentiment Analysis Tasks](http://alt.qcri.org/semeval2017/task4/)