# Celonis ML Workbench: SLA Breach Prediction
**Objective:** Predict the likelihood of an HR Case breaching its SLA using historical ServiceNow and Oracle HCM data.

**Context:** Mund Consulting ID02400 (EVU/EIU HR Transformation)

---

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
import os

# Mock credentials for Celonis EMS
# In a real Celonis ML Workbench, you would use:
# from pycelonis import get_celonis
# c = get_celonis()
# df = c.datamodel.get_data_models().find("HR Process Mining").get_data_frame("HRSD_ORACLE_HARMONIZED_LOG")

## 1. Load Data
For this demo, we use the local harmonized CSV file.

In [None]:
data_path = '../../data/harmonized_eventlog.csv'
if os.path.exists(data_path):
    df = pd.read_csv(data_path)
    print(f"Loaded {len(df)} records.")
else:
    print("Harmonized log not found. Using dummy data for structure demo.")
    df = pd.DataFrame(columns=['Case ID', 'Priority', 'Department', 'Entity_Type', 'Job_Role', 'SLA_Status'])

## 2. Feature Engineering
We transform our process data into features suitable for ML.

In [None]:
# Take only one row per Case ID (the final state or early state for prediction)
df_cases = df.drop_duplicates(subset='Case ID', keep='last').copy()

# Create Target Variable: 1 if SLA Breached, 0 otherwise
df_cases['target'] = (df_cases['SLA_Status'] == 'SLA Breached').astype(int)

# Encode Categorical Features
features = ['Priority', 'Department', 'Entity_Type', 'Job_Role', 'Location']
X = pd.get_dummies(df_cases[features])
y = df_cases['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## 3. Model Training
Using a Random Forest Classifier to identify patterns in SLA breaches.

In [None]:
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

y_pred = rf.predict(X_test)
print(classification_report(y_test, y_pred))

## 4. Feature Importance
Identifying which factors contribute most to SLA breaches.

In [None]:
importance = pd.Series(rf.feature_importances_, index=X.columns).sort_values(ascending=False).head(10)
plt.figure(figsize=(10,6))
importance.plot(kind='barh')
plt.title("Top Factors Contributing to SLA Breaches")
plt.show()