# Sentiment Analysis

This notebook demonstrates comprehensive sentiment analysis using multiple approaches:

## Objectives:
- Compare different sentiment analysis methods
- Train traditional ML models for sentiment classification
- Use transformer-based models for state-of-the-art results
- Evaluate and visualize model performance
- Analyze sentiment confidence and distributions

In [2]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Set up plotting
plt.style.use('seaborn-v0_8')
sns.set_palette('viridis')

# Import our custom modules
import sys
sys.path.append('../src')

from sentiment_analysis import SentimentAnalyzer
from evaluation import ModelEvaluator
from visualization import NLPVisualizer



## 1. Load and Prepare Data

In [3]:
# Load the processed dataset
df = pd.read_csv('../data/processed/processed_news_data.csv')

print("Dataset shape:", df.shape)
print("\nSentiment distribution:")
print(df['sentiment'].value_counts())

# Display sample texts
print("\nSample texts:")
for i, row in df.head(3).iterrows():
    print(f"\nText {i+1}: {row['text'][:100]}...")
    print(f"Sentiment: {row['sentiment']}")

Dataset shape: (10, 9)

Sentiment distribution:
sentiment
positive    6
negative    2
neutral     2
Name: count, dtype: int64

Sample texts:

Text 1: A major technology company has announced a groundbreaking advancement in artificial intelligence tha...
Sentiment: positive

Text 2: World leaders have reached a historic agreement at the latest climate change summit, committing to a...
Sentiment: positive

Text 3: Financial markets continue to experience volatility amid ongoing economic uncertainty. Investors rem...
Sentiment: negative


In [None]:
# Import standard libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Set up plotting styles
plt.style.use('seaborn-v0_8')
sns.set_palette('viridis')

# Fix imports from custom modules
import sys
import os

# Append the src directory to sys.path (absolute path to avoid issues)
sys.path.append(os.path.abspath('../src'))

# Now import your modules
from data_preprocessing import TextPreprocessor, create_sample_dataset
from visualization import NLPVisualizer


## 2. Initialize Sentiment Analyzer

In [4]:
# Initialize sentiment analyzer and visualizer
analyzer = SentimentAnalyzer()
evaluator = ModelEvaluator()
visualizer = NLPVisualizer()

print("Sentiment analyzer initialized successfully!")

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


Loaded transformer model: cardiffnlp/twitter-roberta-base-sentiment-latest
Sentiment analyzer initialized successfully!


## 3. TextBlob Sentiment Analysis

In [5]:
# Analyze sentiment using TextBlob
print("Analyzing sentiment with TextBlob...")

textblob_results = []
for text in df['text']:
    result = analyzer.textblob_sentiment(text)
    textblob_results.append(result)

# Add results to dataframe
df['textblob_sentiment'] = [r['sentiment'] for r in textblob_results]
df['textblob_polarity'] = [r['polarity'] for r in textblob_results]
df['textblob_confidence'] = [r['confidence'] for r in textblob_results]

print("\nTextBlob Results:")
print(df['textblob_sentiment'].value_counts())

Analyzing sentiment with TextBlob...

TextBlob Results:
textblob_sentiment
neutral     5
positive    5
Name: count, dtype: int64


## 4. Transformer-based Sentiment Analysis

In [6]:
# Analyze sentiment using transformer model
print("Analyzing sentiment with Transformer model...")

transformer_results = []
for text in df['text']:
    result = analyzer.transformer_sentiment(text)
    transformer_results.append(result)

# Add results to dataframe
df['transformer_sentiment'] = [r['sentiment'] for r in transformer_results]
df['transformer_confidence'] = [r['confidence'] for r in transformer_results]

print("\nTransformer Results:")
print(df['transformer_sentiment'].value_counts())

Analyzing sentiment with Transformer model...

Transformer Results:
transformer_sentiment
positive    6
negative    3
neutral     1
Name: count, dtype: int64
