# 📌 Customer Churn Prediction (SME Corporate)
This notebook trains a **Decision Tree** model to predict customer churn based on SME customer data.
- **Churn (1)** → Customer left
- **No Churn (0)** → Customer stayed

**Steps Covered:**
1. Load & Explore Data 📊
2. Preprocessing 🛠️
3. Train Decision Tree 🌳
4. Evaluate Performance 📈
5. Save Model for FastAPI Integration 🚀

In [12]:
# ✅ Import Required Libraries
import pandas as pd
import numpy as np
import pickle
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from imblearn.over_sampling import SMOTE 
# Load Dataset
df = pd.read_csv('sme_customer_churn.csv')

# Display first few rows
df.head()

Unnamed: 0,Customer_ID,Company_Size,Contract_Length,Monthly_Bill,Payment_History,Support_Tickets,Product_Usage,Churn
0,1,Small,17,142.63,Good,3,0.59,0
1,2,Medium,9,100.36,Good,1,0.35,0
2,3,Small,33,171.39,Average,3,0.91,1
3,4,Small,20,75.69,Good,0,0.29,0
4,5,Small,13,289.03,Average,2,0.9,0


## 🛠️ Data Preprocessing
- Convert categorical variables to numerical values
- Split data into training & testing sets

In [22]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

# Encode categorical variables
encoder = LabelEncoder()
df['Company_Size'] = encoder.fit_transform(df['Company_Size'])
df['Payment_History'] = encoder.fit_transform(df['Payment_History'])

# ✅ 3. Define Features (X) & Target (y)
X = df.drop(columns=['Customer_ID', 'Churn'])  # Drop irrelevant columns
y = df['Churn']

# ✅ 4. Split Data into Training & Testing Sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Original Class Distribution:\n", y_train.value_counts())

# ✅ 5. Apply SMOTE to Balance the Classes
smote = SMOTE(sampling_strategy='auto', random_state=42)
X_train_resampled, y_train_resampled = smote.fit_resample(X_train, y_train)

print("After SMOTE Class Distribution:\n", y_train_resampled.value_counts())

Original Class Distribution:
 Churn
0    561
1    239
Name: count, dtype: int64
After SMOTE Class Distribution:
 Churn
0    561
1    561
Name: count, dtype: int64


## 🌳 Train Decision Tree Model

In [25]:
# ✅ 6. Train Decision Tree Model
model = DecisionTreeClassifier(max_depth=5, random_state=42)
model.fit(X_train_resampled, y_train_resampled)

# ✅ 7. Make Predictions
y_pred = model.predict(X_test)

# ✅ 8. Evaluate Model Performance
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

print(f"\n✅ Accuracy after SMOTE: {accuracy:.2f}")
print("\n🔹 Classification Report:\n", report)
print("\n🔹 Confusion Matrix:\n", conf_matrix)

# ✅ 9. Save Model Using Pickle
with open("decision_tree_model.pkl", "wb") as file:
    pickle.dump(model, file)

print("\n✅ Model saved as 'decision_tree_model.pkl'")




✅ Accuracy after SMOTE: 0.54

🔹 Classification Report:
               precision    recall  f1-score   support

           0       0.78      0.53      0.63       150
           1       0.28      0.54      0.37        50

    accuracy                           0.54       200
   macro avg       0.53      0.54      0.50       200
weighted avg       0.65      0.54      0.57       200


🔹 Confusion Matrix:
 [[80 70]
 [23 27]]

✅ Model saved as 'decision_tree_model.pkl'


## 📈 Model Evaluation

In [16]:

# ✅ Evaluate Model
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")
print("\nClassification Report:\n", report)
print("\nConfusion Matrix:\n", conf_matrix)

Accuracy: 0.58

Classification Report:
               precision    recall  f1-score   support

           0       0.68      0.37      0.48       149
           1       0.54      0.81      0.65       136

    accuracy                           0.58       285
   macro avg       0.61      0.59      0.56       285
weighted avg       0.61      0.58      0.56       285


Confusion Matrix:
 [[ 55  94]
 [ 26 110]]


In [10]:
# Assuming X_train was a pandas DataFrame
print("Feature Names:", list(X_train.columns))

Feature Names: ['Company_Size', 'Contract_Length', 'Monthly_Bill', 'Payment_History', 'Support_Tickets', 'Product_Usage']


## 💾 Save Model for FastAPI

In [9]:
import pickle
# ✅ Save Model Using pickle
with open("decison_tree_model", "wb") as file:
    pickle.dump(model, file)

print("Model saved using pickle!")

Model saved using pickle!
