# Employee Sentiment & Engagement Analysis

This notebook performs the full workflow: data loading, sentiment labeling (VADER), EDA, monthly scoring, ranking, flight-risk detection, and a simple linear regression model.

If `test.csv` is not present, please place it in the repository root or run `runner.py`. If your file uses different column names, update the `COLUMN_MAP` mapping in the runner script or below.

In [None]:
# This cell loads the dataset and auto-detects columns
import pandas as pd
from pathlib import Path
ROOT = Path('.')
CSV = ROOT / 'test.csv'
if not CSV.exists():
    print('test.csv not found in the current folder. Place it here and rerun.')
else:
    df = pd.read_csv(CSV)
    print('Loaded', len(df), 'rows')
    display(df.head())

## Sentiment Labeling (VADER)
We use NLTK's VADER lexicon to label messages as Positive/Neutral/Negative with thresholds: compound >= 0.05 => Positive, <= -0.05 => Negative, otherwise Neutral. This is fast and reproducible without API calls.

In [None]:
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
# replace these names if your CSV uses different columns
EMP_COL = 'employee'  # adjust if needed
MSG_COL = 'message'   # adjust if needed
DATE_COL = 'date'     # adjust if needed

def label_vader(text):
    s = sid.polarity_scores(str(text))
    c = s['compound']
    if c >= 0.05:
        return 'Positive'
    elif c <= -0.05:
        return 'Negative'
    else:
        return 'Neutral'

df['sentiment'] = df[MSG_COL].apply(label_vader)
display(df[['sentiment']].value_counts())

## EDA: sentiment distribution, time trends, and message lengths
Visualizations will be saved to `visualizations/` when run via `runner.py`.