# Technical / Data Engineering Sentiment Analytics Dashboard

This notebook is aimed at **data engineers / ML engineers**.

It uses the same fact table with sentiment to analyse:
- Sentiment distribution
- Relationship between sentiment, review_score, freight_value, and delivery performance
- Model error by sentiment bucket (optional, if you have prediction columns)


In [ ]:
import pandas as pd
import plotly.express as px
from pathlib import Path

# 👇 Adjust this path if needed
DATA_PATH = Path('../dashboard/fact_orders_with_sentiment.csv')

df = pd.read_csv(DATA_PATH, parse_dates=['order_purchase_timestamp'])
df['order_date'] = df['order_purchase_timestamp'].dt.date
df['year_month'] = df['order_purchase_timestamp'].dt.to_period('M').astype(str)

# Derive sentiment from score if needed
if 'sentiment_label' not in df.columns and 'review_score' in df.columns:
    def score_to_sentiment(s):
        if s >= 4:
            return 'positive'
        elif s == 3:
            return 'neutral'
        else:
            return 'negative'
    df['sentiment_label'] = df['review_score'].apply(score_to_sentiment)

df.head()

In [ ]:
# === Sentiment distribution ===
sentiment_counts = df['sentiment_label'].value_counts().reset_index()
sentiment_counts.columns = ['sentiment_label', 'count']
fig_sent_dist = px.bar(sentiment_counts, x='sentiment_label', y='count',
                        title='Sentiment Distribution',
                        labels={'sentiment_label': 'Sentiment', 'count': 'Number of Reviews'})
fig_sent_dist.show()

# === Review score vs sentiment ===
if 'review_score' in df.columns:
    fig_box = px.box(df, x='sentiment_label', y='review_score',
                     title='Review Score by Sentiment')
    fig_box.show()
else:
    print('review_score not found; skipping boxplot.')

In [ ]:
# === Freight cost by sentiment ===
fig_freight = px.box(df, x='sentiment_label', y='freight_value',
                     title='Freight Cost by Sentiment',
                     labels={'sentiment_label': 'Sentiment', 'freight_value': 'Freight Value'})
fig_freight.show()

# === Revenue by sentiment ===
sent_rev = df.groupby('sentiment_label')['payment_value'].agg(['mean', 'sum', 'count']).reset_index()
fig_sent_rev = px.bar(sent_rev, x='sentiment_label', y='mean',
                      title='Average Revenue per Order by Sentiment',
                      labels={'sentiment_label': 'Sentiment', 'mean': 'Avg Revenue'})
fig_sent_rev.show()
sent_rev

In [ ]:
# === Optional: Model error by sentiment (if you have predictions) ===
if {'actual_freight', 'predicted_freight'}.issubset(df.columns):
    df['residual'] = df['actual_freight'] - df['predicted_freight']
    fig_resid = px.box(df, x='sentiment_label', y='residual',
                       title='Model Residuals by Sentiment',
                       labels={'sentiment_label': 'Sentiment', 'residual': 'Error (Actual - Predicted)'})
    fig_resid.show()
else:
    print('No actual_freight / predicted_freight columns – skipping model error analysis.')

In [ ]:
# === Correlation heatmap for numeric features ===
numeric_cols = df.select_dtypes(include='number').columns
corr = df[numeric_cols].corr()
fig_corr = px.imshow(corr, text_auto=True, aspect='auto',
                     title='Correlation Heatmap (Numeric Features)')
fig_corr.show()

## Technical Interpretation Notes

- Use sentiment vs freight to see whether **logistics issues** drive negative reviews.
- Use revenue vs sentiment to understand whether **unhappy customers** are also **high value**.
- Optional model residual plots help you inspect whether the model is **biased for certain sentiment groups**.
