# Introduction

Beta Bank is facing a concerning trend: a slow, steady loss of customers. We've realized that it's far more cost-efficient to retain existing clients than to find new ones. Therefore, our goal is to anticipate if a customer is likely to leave the bank in the near future. Using historical data on client behaviour and contract termination, we need to build a predictive model with the highest possible F1 score, aiming for at least 0.59. Furthermore, we will be measuring the AUC-ROC metric and comparing it with the F1 score to create a truly optimized model.

In [1]:
import pandas as pd

from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score, mean_squared_error, confusion_matrix, classification_report, roc_auc_score, f1_score
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.utils import resample

## Examining Data & Preparing Data

In [2]:
df = pd.read_csv('Churn.csv')

In [4]:
df.head(20)

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2.0,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1.0,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8.0,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1.0,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2.0,125510.82,1,1,1,79084.1,0
5,6,15574012,Chu,645,Spain,Male,44,8.0,113755.78,2,1,0,149756.71,1
6,7,15592531,Bartlett,822,France,Male,50,7.0,0.0,2,1,1,10062.8,0
7,8,15656148,Obinna,376,Germany,Female,29,4.0,115046.74,4,1,0,119346.88,1
8,9,15792365,He,501,France,Male,44,4.0,142051.07,2,0,1,74940.5,0
9,10,15592389,H?,684,France,Male,27,2.0,134603.88,1,1,1,71725.73,0


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           9091 non-null   float64
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(3), int64(8), object(3)
memory usage: 1.1+ MB


**Here, I took a quick first look at the data. Noticing that a few of the columns had floats as numerical values where I wanted ints**

In [24]:
df.isnull().sum()

RowNumber            0
CustomerId           0
Surname              0
CreditScore          0
Geography            0
Gender               0
Age                  0
Tenure             909
Balance              0
NumOfProducts        0
HasCrCard            0
IsActiveMember       0
EstimatedSalary      0
Exited               0
dtype: int64

In [25]:
df.fillna(0,inplace=True)

**Here I checked for an rectified any null values in the data**

In [26]:
df['Tenure'] = df['Tenure'].astype(int)
df['Balance'] = df['Balance'].astype(int)
df['EstimatedSalary'] = df['EstimatedSalary'].astype(int)

In [27]:
df['Surname'] = df['Surname'].astype(str)
df['Geography'] = df['Geography'].astype(str)
df['Gender'] = df['Gender'].astype(str)

**Finally changing the floats to ints now that I know the data is cleared of any nulls values**

In [8]:
df.head(20)

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0,1,1,1,101348,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807,1,0,1,112542,0
2,3,15619304,Onio,502,France,Female,42,8,159660,3,1,0,113931,1
3,4,15701354,Boni,699,France,Female,39,1,0,2,0,0,93826,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510,1,1,1,79084,0
5,6,15574012,Chu,645,Spain,Male,44,8,113755,2,1,0,149756,1
6,7,15592531,Bartlett,822,France,Male,50,7,0,2,1,1,10062,0
7,8,15656148,Obinna,376,Germany,Female,29,4,115046,4,1,0,119346,1
8,9,15792365,He,501,France,Male,44,4,142051,2,0,1,74940,0
9,10,15592389,H?,684,France,Male,27,2,134603,1,1,1,71725,0


Thankfully, this data didn't have too many things that needed correction.

--- 

## Examining Class Balances & Preliminary Training

In [28]:
df['Exited'].value_counts()

0    7963
1    2037
Name: Exited, dtype: int64

In [29]:
# Split the data into features and target
features = df.drop(['RowNumber', 'CustomerId', 'Surname', 'Exited'], axis=1)
target = df['Exited']

# One-hot encode categorical features
categorical_features = ['Geography', 'Gender']  # Add other categorical features if any
encoder = OneHotEncoder(drop='first', sparse=False)
encoded_features = pd.DataFrame(encoder.fit_transform(features[categorical_features]), columns=encoder.get_feature_names(categorical_features))

# Drop original categorical features and concatenate encoded features
features = features.drop(categorical_features, axis=1)
features = pd.concat([features.reset_index(drop=True), encoded_features.reset_index(drop=True)], axis=1)

# Split the data into training and testing sets
features_train, features_test, target_train, target_test = train_test_split(features, target, test_size=0.2, random_state=42)

# Train a RandomForestClassifier
model = RandomForestClassifier(random_state=12345, n_estimators=50)
model.fit(features_train, target_train)

# Make predictions
prediction_test = model.predict(features_test)

# Evaluate the model
print(confusion_matrix(target_test, prediction_test))
print(classification_report(target_test, prediction_test))

[[1538   69]
 [ 203  190]]
              precision    recall  f1-score   support

           0       0.88      0.96      0.92      1607
           1       0.73      0.48      0.58       393

    accuracy                           0.86      2000
   macro avg       0.81      0.72      0.75      2000
weighted avg       0.85      0.86      0.85      2000



* **Class Distribution**: The dataset is imbalanced with 7963 customers who did not exit (0) and 2037 customers who exited (1).  

* **Model Performance**: The RandomForestClassifier achieved an overall accuracy of 86%.  

* **Confusion Matrix**:  
   True Negatives (correctly predicted non-exits): 1538  
   False Positives (incorrectly predicted exits): 69  
   False Negatives (incorrectly predicted non-exits): 203  
   True Positives (correctly predicted exits): 190  
   
* **Classification Report**:  
   For class 0 (non-exits): Precision = 0.88, Recall = 0.96, F1-score = 0.92  
   For class 1 (exits): Precision = 0.73, Recall = 0.48, F1-score = 0.58  
   Macro average F1-score = 0.75  
   Weighted average F1-score = 0.85  

--- 

## Improving Model Quality

In [31]:
#Address class imbalance using undersampling
majority_class = features_train[target_train == 0]
minority_class = features_train[target_train == 1]

# Downsample majority class
majority_class_downsampled = resample(majority_class, 
                                      replace=False,    # sample without replacement
                                      n_samples=len(minority_class), # to match minority class
                                      random_state=42)  # reproducible results

# Combine minority class with downsampled majority class
features_train_balanced = pd.concat([majority_class_downsampled, minority_class])
target_train_balanced = target_train.loc[features_train_balanced.index]

# Define the models
models = {
    'RandomForest': RandomForestClassifier(random_state=12345),
    'GradientBoosting': GradientBoostingClassifier(random_state=12345)
}

# Define the parameter grid for GridSearchCV
param_grid = {
    'RandomForest': {
        'n_estimators': [50, 100, 200],
        'max_depth': [None, 10, 20],
        'class_weight': ['balanced', 'balanced_subsample']
    },
    'GradientBoosting': {
        'n_estimators': [50, 100, 200],
        'learning_rate': [0.01, 0.1, 0.2],
        'max_depth': [3, 5, 7]
    }
}

# Perform GridSearchCV for each model
best_models = {}
for model_name, model in models.items():
    grid_search = GridSearchCV(model, param_grid[model_name], cv=5, scoring='f1')
    grid_search.fit(features_train_balanced, target_train_balanced)
    best_models[model_name] = grid_search.best_estimator_

# Evaluate the best models on the test set
for model_name, model in best_models.items():
    predictions = model.predict(features_test)
    print(f"Model: {model_name}")
    print(confusion_matrix(target_test, predictions))
    print(classification_report(target_test, predictions))

Model: RandomForest
[[1310  297]
 [  94  299]]
              precision    recall  f1-score   support

           0       0.93      0.82      0.87      1607
           1       0.50      0.76      0.60       393

    accuracy                           0.80      2000
   macro avg       0.72      0.79      0.74      2000
weighted avg       0.85      0.80      0.82      2000

Model: GradientBoosting
[[1293  314]
 [  82  311]]
              precision    recall  f1-score   support

           0       0.94      0.80      0.87      1607
           1       0.50      0.79      0.61       393

    accuracy                           0.80      2000
   macro avg       0.72      0.80      0.74      2000
weighted avg       0.85      0.80      0.82      2000



**RandomForest Model:**  

**Precision:** 0.50 for class 1 (minority class), indicating that 50% of the predicted positive cases are true positives.  
**Recall:** 0.76 for class 1, meaning that 76% of the actual positive cases are correctly identified.  
**F1-Score:** 0.60 for class 1, which is a balance between precision and recall.  
**Accuracy:** 0.80, indicating that 80% of the total predictions are correct.  
**Macro Avg F1-Score:** 0.74, which is the average F1-score for both classes.  
**Weighted Avg F1-Score:** 0.82, which takes into account the support (number of true instances) for each class.  

**GradientBoosting Model:**

**Precision:** 0.50 for class 1, similar to the RandomForest model.  
**Recall:** 0.79 for class 1, slightly higher than the RandomForest model.  
**F1-Score:** 0.61 for class 1, slightly better than the RandomForest model.  
**Accuracy:** 0.80, same as the RandomForest model.  
Macro Avg F1-Score: 0.74, same as the RandomForest model.
Weighted Avg F1-Score: 0.82, same as the RandomForest model.

Both models achieved an accuracy of 80%, with the GradientBoosting model showing a slightly better recall and F1-score for the minority class (class 1). This suggests that the GradientBoosting model is slightly better at identifying the minority class, which is crucial for addressing class imbalance.

## Final Test

In [38]:
# Evaluate the best models on the test set
for model_name, model in best_models.items():
    predictions = model.predict(features_test)
    probabilities = model.predict_proba(features_test)[:, 1]
    
    print(f"Model: {model_name}")
    print("Confusion Matrix:")
    print(confusion_matrix(target_test, predictions))
    print("Classification Report:")
    print(classification_report(target_test, predictions))
    print("AUC-ROC Score:")
    auc_roc = roc_auc_score(target_test, probabilities)
    print(auc_roc)
    print("F1 Score:")
    f1 = f1_score(target_test, predictions)
    print(f1)
    print("\n")

Model: RandomForest
Confusion Matrix:
[[1310  297]
 [  94  299]]
Classification Report:
              precision    recall  f1-score   support

           0       0.93      0.82      0.87      1607
           1       0.50      0.76      0.60       393

    accuracy                           0.80      2000
   macro avg       0.72      0.79      0.74      2000
weighted avg       0.85      0.80      0.82      2000

AUC-ROC Score:
0.8636848013857947
F1 Score:
0.6046511627906976


Model: GradientBoosting
Confusion Matrix:
[[1293  314]
 [  82  311]]
Classification Report:
              precision    recall  f1-score   support

           0       0.94      0.80      0.87      1607
           1       0.50      0.79      0.61       393

    accuracy                           0.80      2000
   macro avg       0.72      0.80      0.74      2000
weighted avg       0.85      0.80      0.82      2000

AUC-ROC Score:
0.8674588433871532
F1 Score:
0.6110019646365422




---

**Conclusion**


After rigorous performance evaluation, both RandomForest and GradientBoosting models have demonstrated commendable accuracy, holding steady at 80%. However, when we consider the performance measures crucial in handling class imbalance, the GradientBoosting model displays a slight edge with higher recall and F1-score for the minority class (class 1).  

In terms of the confusion matrix, both models have shown a strong ability to predict the majority class (class 0), but the GradientBoosting model outperforms the RandomForest model in identifying the minority class.
The AUC-ROC score, an important evaluation metric for assessing the quality of the models, is slightly higher in the GradientBoosting model, indicating superior model performance.  

Both models have their strengths, but when balancing all key metrics, the GradientBoosting model shows an overall stronger performance, making it the slightly more effective choice in this context.