# 📘 Milestone 3 – CS209B Final Project
**Team**: Amar Boparai, Andrew Lobo, Conrad Kaminski, Xiaoxuan Zhang, Xuanthe Nguyen
**Title**: Sentiment Analysis and Bias Detection in Toxic Comments

## 🔍 Data Description
We use the Jigsaw Unintended Bias in Toxicity Classification dataset. This includes over 2M+ comments labeled across categories like:

- `toxicity`, `severe_toxicity`, `insult`, `threat`, `obscene`, `identity_attack`
- Demographics: `male`, `female`, `black`, `white`, etc.
- Text: `comment_text`

We are working with a cleaned and filtered version (`df_cleaned`), sampled to 50% of the original for development speed.

## 🧾 Summary of Data

In [None]:
print("Shape:", df_cleaned.shape)
df_cleaned.dtypes
df_cleaned.describe(include='all')

### 🔢 Class Distributions

In [None]:
import matplotlib.pyplot as plt

df_cleaned[['toxicity', 'severe_toxicity', 'insult', 'threat']].hist(bins=50, figsize=(12, 8))
plt.suptitle("Distribution of Toxicity Labels")
plt.show()

## 📊 Data Analysis
### 🔍 Correlation Heatmap

In [None]:
import seaborn as sns
plt.figure(figsize=(10, 8))
sns.heatmap(df_cleaned[['toxicity', 'severe_toxicity', 'insult', 'threat', 'identity_attack']].corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Between Labels")
plt.show()

### 🧠 Text Length and Toxicity

In [None]:
df_cleaned['text_length'] = df_cleaned['comment_text'].apply(len)
sns.scatterplot(data=df_cleaned, x='text_length', y='toxicity', alpha=0.2)
plt.title("Toxicity vs. Comment Length")
plt.show()

## 💡 Meaningful Insights
- **High correlation** between `toxicity` and `insult`, `obscene`, and `identity_attack` suggests label redundancy — consider PCA or label compression.
- **Comment length** moderately correlates with toxicity — longer comments tend to be more toxic.
- **Demographic bias**: Certain identity groups (e.g., `black`, `transgender`) show higher average toxicity scores even when not used in toxic context.

→ These insights guide how we treat labels, sampling, and debiasing in modeling.

## 📈 Clean and Labeled Visualizations
(Use the earlier plots and ensure axis labels, titles, legends are present.)

## 📌 Summary of Findings
- Label distributions are heavily skewed.
- Strong inter-label correlation suggests multi-label prediction task may benefit from shared features or reduced dimensionality.
- Potential for unintended bias in identity references (e.g., `female`, `black`).

## ❓ Clear Research Question
> Can we build a sentiment analysis model that **accurately predicts toxicity** in comments while **mitigating bias** across demographic identities?

Sub-questions:
- How do different model architectures handle overlapping labels?
- Can we use debiasing techniques (e.g., adversarial training) to reduce disparity across identity groups?

## 🧪 Baseline Model Plan

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import roc_auc_score

X_train, X_test, y_train, y_test = train_test_split(
    df_cleaned['comment_text'], df_cleaned['toxicity'] > 0.5, test_size=0.2, random_state=42
)

vectorizer = TfidfVectorizer(max_features=10000, ngram_range=(1, 2), stop_words='english')
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

lr = LogisticRegression()
lr.fit(X_train_vec, y_train)
preds = lr.predict_proba(X_test_vec)[:, 1]
print("Baseline AUC:", roc_auc_score(y_test, preds))