# 📊 Employee Engagement Classification using AutoGluon

This notebook demonstrates how to use [AutoGluon](https://auto.gluon.ai) to automatically build and evaluate machine learning models for classifying employee engagement levels. The goal is to predict whether an employee is likely to show **High** or **Medium** engagement based on their salary, performance, and leave behavior.

---

### 📁 Dataset Overview

The dataset contains anonymized employee records with performance metrics, absenteeism, salary, and organizational information. The target variable is a derived field called `EngagementLevel`, which categorizes employees based on defined business logic.


### 📚 Data Dictionary

| Feature             | Type        | Description                  |
|---------------------|-------------|------------------------------|
| Gender              | Categorical | Employee's gender            |
| YearsWorked         | Numeric     | Number of years worked       |
| Department          | Categorical | Department name              |
| Country             | Categorical | Country of work              |
| MonthlySalary       | Numeric     | Monthly salary               |
| AnnualSalary        | Numeric     | Annual salary                |
| JobRate             | Numeric     | Job performance rating (1–5) |
| SickLeaves          | Numeric     | Number of sick leave days    |
| UnpaidLeaves        | Numeric     | Number of unpaid leave days  |
| Location_ID         | Numeric     | Encoded office location      |
| Department_ID       | Numeric     | Encoded department           |
| **EngagementLevel** | **Target**  | Categorical (High, Medium)   |

In [1]:
#Load libraries
import pandas as pd
from autogluon.tabular import TabularPredictor

#Load dataset
df = pd.read_csv(r"C:\Users\19024\DataScience\Employees_clean.csv")

In [2]:
#Derive target variable
def get_engagement_level(row):
    if (row['JobRate'] <= 2) and (row['OvertimeHours'] < 10) and (row['SickLeaves'] >= 5) and (row['UnpaidLeaves'] >= 5):
        return 'Low'
    elif (row['JobRate'] >= 4) and (row['OvertimeHours'] > 50) and (row['SickLeaves'] <= 1) and (row['UnpaidLeaves'] <= 1):
        return 'High'
    else:
        return 'Medium'

df['EngagementLevel'] = df.apply(get_engagement_level, axis=1)
df['EngagementLevel'] = df['EngagementLevel'].replace({'Low': 'Medium'})  # Combine Low with Medium

#Drop non predictive columns
df_model = df.drop(['Performance_ID', 'Employee_ID', 'FirstName', 'LastName', 'StartDate'], axis=1)

In [3]:
#Train AutoGluon classification model
predictor = TabularPredictor(label='EngagementLevel').fit(df_model)

#Leaderboard
predictor.leaderboard()

No path specified. Models will be saved in: "AutogluonModels\ag-20250416_031523"
Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.2
Python Version:     3.11.9
Operating System:   Windows
Platform Machine:   AMD64
Platform Version:   10.0.26100
CPU Count:          8
Memory Avail:       3.03 GB / 11.78 GB (25.7%)
Disk Space Avail:   41.58 GB / 237.36 GB (17.5%)
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets=

Unnamed: 0,model,score_val,eval_metric,pred_time_val,fit_time,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,LightGBM,1.0,accuracy,0.0,0.352504,0.0,0.352504,1,True,4
1,LightGBMLarge,1.0,accuracy,0.004017,0.579958,0.004017,0.579958,1,True,12
2,LightGBMXT,1.0,accuracy,0.007724,1.043844,0.007724,1.043844,1,True,3
3,XGBoost,1.0,accuracy,0.007818,0.430008,0.007818,0.430008,1,True,11
4,WeightedEnsemble_L2,1.0,accuracy,0.007818,0.550418,0.0,0.12041,2,True,13
5,CatBoost,1.0,accuracy,0.0158,11.61622,0.0158,11.61622,1,True,7
6,NeuralNetFastAI,1.0,accuracy,0.024428,2.694687,0.024428,2.694687,1,True,10
7,RandomForestEntr,1.0,accuracy,0.06529,1.464547,0.06529,1.464547,1,True,6
8,RandomForestGini,0.992754,accuracy,0.212584,1.298331,0.212584,1.298331,1,True,5
9,ExtraTreesGini,0.978261,accuracy,0.056539,0.832435,0.056539,0.832435,1,True,8


### ✅ Conclusion Summary

AutoGluon provided a fully automated training process to evaluate multiple models and select the best performing one. This approach is well suited for rapid prototyping, especially when model interpretability is less critical than performance.

### 📊 Model Insights & Evaluation Summary

AutoGluon tested multiple classification models to predict employee `EngagementLevel` (High vs Medium). The leaderboard shows that several models achieved **perfect validation accuracy (1.00)**, including:

- `LightGBMXT`
- `CatBoost`
- `LightGBM`
- `WeightedEnsemble_L2`

This indicates the model can strongly distinguish between engagement levels based on features like `OvertimeHours`, `JobRate`, and `Leave Days`.



