# Sentiment Analysis on Twitter Data

This notebook demonstrates the workflow for performing sentiment analysis on Twitter data. We will cover the following steps:

1. Data Loading
2. Data Preprocessing
3. Vectorization
4. Model Training
5. Evaluation


In [34]:
# Step 1: Data Loading

import pandas as pd

data_path = '../data/raw/tweets.csv'
tweets_df = pd.read_csv(data_path)
tweets_df.head()

Unnamed: 0,tweet_text,sentiment
0,I love this product! It's amazing.,positive
1,This is the worst experience I've ever had.,negative
2,"Just okay, nothing special.",neutral
3,Absolutely fantastic service!,positive
4,Terrible customer support.,negative


In [35]:
# Step 2: Data Preprocessing

import sys
import os
sys.path.append(os.path.abspath('../src'))

from data_preprocessing import clean_text, remove_stopwords

tweets_df['cleaned_text'] = tweets_df['tweet_text'].apply(clean_text)
tweets_df['cleaned_text'] = tweets_df['cleaned_text'].apply(remove_stopwords)
tweets_df.head()

Unnamed: 0,tweet_text,sentiment,cleaned_text
0,I love this product! It's amazing.,positive,love product amaz
1,This is the worst experience I've ever had.,negative,worst experi ive ever
2,"Just okay, nothing special.",neutral,okay noth special
3,Absolutely fantastic service!,positive,absolut fantast servic
4,Terrible customer support.,negative,terribl custom support


In [36]:
# Step 3: Vectorization

from vectorization import tfidf_vectorize

X, vectorizer = tfidf_vectorize(tweets_df['cleaned_text'])
y = tweets_df['sentiment']  # Use the correct column name from your CSV

In [37]:
# Step 4: Model Training

from model_training import train_logistic_regression

model, accuracy = train_logistic_regression(X, y)
print(f'Validation Accuracy: {accuracy:.2f}')

Validation Accuracy: 0.50


In [38]:
# Step 5: Evaluation

from sklearn.metrics import classification_report

y_pred = model.predict(X)
print(classification_report(y, y_pred))

              precision    recall  f1-score   support

    negative       0.80      1.00      0.89         4
     neutral       1.00      1.00      1.00         2
    positive       1.00      0.75      0.86         4

    accuracy                           0.90        10
   macro avg       0.93      0.92      0.92        10
weighted avg       0.92      0.90      0.90        10

