# Project Description

Beta Bank customers are leaving: little by little, chipping away every month. The bankers figured out it’s cheaper to save the existing customers rather than to attract new ones. We need to predict whether a customer will leave the bank soon. You have the data on clients’ past behavior and termination of contracts with the bank. We will build a model with maximum possible F1 score and measure the AUC-ROC metric and compare it to F1. We are targetting for model score of F1 of 0.59.

## Import and Review Data

### Importing Libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.metrics import recall_score, precision_score, f1_score, roc_auc_score, r2_score, mean_absolute_error, confusion_matrix, accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
from sklearn.utils import shuffle
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.datasets import make_hastie_10_2
from sklearn.tree import DecisionTreeClassifier

### Import and Review Data

In [2]:
df = pd.read_csv('/datasets/Churn.csv')
display(df.head(5))
print()
print(df.shape)
print()
display(df.describe())
print()
print(df.duplicated().sum())
print()
print(df.info())

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2.0,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1.0,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8.0,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1.0,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2.0,125510.82,1,1,1,79084.1,0



(10000, 14)



Unnamed: 0,RowNumber,CustomerId,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
count,10000.0,10000.0,10000.0,10000.0,9091.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,5000.5,15690940.0,650.5288,38.9218,4.99769,76485.889288,1.5302,0.7055,0.5151,100090.239881,0.2037
std,2886.89568,71936.19,96.653299,10.487806,2.894723,62397.405202,0.581654,0.45584,0.499797,57510.492818,0.402769
min,1.0,15565700.0,350.0,18.0,0.0,0.0,1.0,0.0,0.0,11.58,0.0
25%,2500.75,15628530.0,584.0,32.0,2.0,0.0,1.0,0.0,0.0,51002.11,0.0
50%,5000.5,15690740.0,652.0,37.0,5.0,97198.54,1.0,1.0,1.0,100193.915,0.0
75%,7500.25,15753230.0,718.0,44.0,7.0,127644.24,2.0,1.0,1.0,149388.2475,0.0
max,10000.0,15815690.0,850.0,92.0,10.0,250898.09,4.0,1.0,1.0,199992.48,1.0



0

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           9091 non-null   float64
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(3), int64(8), object(3)
memory usage: 1.1+ MB
None


<div class="alert alert-info"> <b> Data Review </b>:
    <li> There are 14 columns and 10000 rows in this data.</li>
    <li> There are 11 numeric columns and 3 categorical columns.</li>
    <li> There are no duplicates in the dataset.</li>
    <li> Column 'Tenure' has missing values. </li>

## Preprocessing

Let's change categorical information; drop those that are non-behavioral information and transform those that are behavioral information using encoding

### Replace missing value
There are missing values in Tenure. Let's replace this with median value.

In [3]:
df['Tenure'].fillna(5, inplace=True)

### Drop non-behavioral information
RowNumber, CustomerId, Surname are not behavioral data. Let's drop these

In [4]:
NonBehave = ['CustomerId', 'Surname', 'RowNumber']
df.drop(columns=NonBehave, inplace=True)

### Transform categorical behavioral data 
Gender and Gerography are information that may impact customer behavior. Let's transform these using OHE and avoid dummy trap

In [5]:
le = LabelEncoder()
df_ohe = pd.get_dummies(df, columns=['Gender'], drop_first=True)
df_ohe['Geography'] = le.fit_transform(df_ohe['Geography'])
display(df_ohe.head(3))

Unnamed: 0,CreditScore,Geography,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited,Gender_Male
0,619,0,42,2.0,0.0,1,1,1,101348.88,1,0
1,608,2,41,1.0,83807.86,1,0,1,112542.58,0,0
2,502,0,42,8.0,159660.8,3,1,0,113931.57,1,0


## Balance of Classes
Let's examine train models and examine their balance of classes.

In [6]:
# Let's split the data set in to train, validation, and test sets.

features = df_ohe.drop('Exited', axis=1)
target = df_ohe['Exited']

features_train, features_test, target_train, target_test = train_test_split(
    features, target, test_size=0.2, random_state=1234)
features_train, features_valid, target_train, target_valid = train_test_split(
    features_train, target_train, test_size=0.25, random_state=1234)

print(features_train.shape)
print(features_valid.shape)
print(features_test.shape)

(6000, 10)
(2000, 10)
(2000, 10)


In [7]:
# Let's train a Decision Tree Classifier
best_score = 0
best_f1 = 0
for depth in range(10, 101, 10):
    model = DecisionTreeClassifier(random_state=1234, max_depth=depth)
    model.fit(features_train, target_train)
    predictions = model.predict(features_valid)
    f1 = f1_score(target_valid, predictions)
    if f1 > best_f1:
        best_f1 = f1
        best_max_depth = depth

print('Best max depth:', depth)

DCmodel = DecisionTreeClassifier(random_state=1234, max_depth=100)
DCmodel.fit(features_train, target_train)
DCpredic_valid = DCmodel.predict(features_valid)
DCprob_valid = DCmodel.predict_proba(features_valid)

print('F1 Score:', f1_score(target_valid, DCpredic_valid))

Best max depth: 100
F1 Score: 0.4381188118811882


In [9]:
# Let's a Random Forest Classifier model

RFmodel = RandomForestClassifier(random_state=1234, n_estimators=100)
RFmodel.fit(features_train, target_train)
RFmodel.fit(features_train, target_train)
RFpredic_valid = RFmodel.predict(features_valid)
RFprob_valid = RFmodel.predict_proba(features_valid)
RFprob_one_valid = RFprob_valid[:, 1]

### Precision, Recall, and F1 Score

In [10]:
print('Decision Tree Classifier Precision:',
      precision_score(target_valid, DCpredic_valid))
print('Decision Tree Classifier Recall:',
      recall_score(target_valid, DCpredic_valid))
print('Decision Tree Classifier F1 Score:',
      f1_score(target_valid, DCpredic_valid))

Decision Tree Classifier Precision: 0.4338235294117647
Decision Tree Classifier Recall: 0.4425
Decision Tree Classifier F1 Score: 0.4381188118811882


In [11]:
print('Random Forest Classifier Precision:',
      precision_score(target_valid, RFpredic_valid))
print('Random Forest Classifier Recall:',
      recall_score(target_valid, RFpredic_valid))
print('Random Forest Classifier F1 Score:',
      f1_score(target_valid, RFpredic_valid))

Random Forest Classifier Precision: 0.7325581395348837
Random Forest Classifier Recall: 0.4725
Random Forest Classifier F1 Score: 0.574468085106383


### AUC-ROC Score

In [12]:
DCprobabilities_valid = DCmodel.predict_proba(features_valid)
DCprobabilities_one_valid = DCprobabilities_valid[:, 1]
DCauc_roc = roc_auc_score(target_valid, DCprobabilities_one_valid)

RFprobabilities_valid = RFmodel.predict_proba(features_valid)
RFprobabilities_one_valid = RFprobabilities_valid[:, 1]
RFauc_roc = roc_auc_score(target_valid, RFprobabilities_one_valid)
print("Decision Tree Classifier Precision AUC_ROC:", DCauc_roc)
print("Random Forest Classifier Precision AUC_ROC:", RFauc_roc)

Decision Tree Classifier Precision AUC_ROC: 0.6490625
Random Forest Classifier Precision AUC_ROC: 0.8467171875


In [13]:
# Let's examine porportion of the exit data
exited = df_ohe[df_ohe['Exited'] == 1].shape[0]
stayed = df_ohe[df_ohe['Exited'] == 0].shape[0]

print('Customers Exited:', exited)
print('Customers Stayed:', stayed)
print('Customer Exited/Stayed:', exited/stayed)

Customers Exited: 2037
Customers Stayed: 7963
Customer Exited/Stayed: 0.25580811252040686


<div class="alert alert-info"> <b> Review </b>:
    <li> Between Decision Tree Classifier and Gradient Booster Classifier, Random Forest Classifier has better F1 and AUC-ROC score. Let's choose Random Forest Classifier and further examine.</li>
    <li> Current model presents with Recall score of 0.442, meaning model is correctly identifies 44.2% of exits.</li>
    <li> Current model presents with Precision score of 0.767, indicating that when the model predicts a customer will exit, it's correct about 76.7% of the time.</li>
    <li> F1 score of 0.561 indicate fairly harmony between recall and precision, but from raw data we can see the class imbalance.</li>
    <li> AUC-ROC of 0.704 signifies the model's ability to distinguish between classes. An AUC-ROC of 0.5 suggests no discrimitation, while higher value indicate better discrimination. </li>
    <li> Discrepancy between recall vs precision and there is distribution of exit vs stayed customers suggests there is class imbalance. This suggests that the model isn't capturing all the positive instances effectively. </li>

## Model Improvements

### Class Weight Adjustment

In [14]:
RFCWmodel = RandomForestClassifier(
    random_state=1234, n_estimators=100, class_weight='balanced')
RFCWmodel.fit(features_train, target_train)
RFpredic_valid = RFCWmodel.predict(features_valid)
RFprob_valid = RFCWmodel.predict_proba(features_valid)

print('Precision:', precision_score(target_valid, RFpredic_valid))
print('Recall:', recall_score(target_valid, RFpredic_valid))
print('F1 Score:', f1_score(target_valid, RFpredic_valid))

Precision: 0.7222222222222222
Recall: 0.4225
F1 Score: 0.5331230283911671


<div class="alert alert-info"> <b> Review </b>:
    <li> We can observe through class weight adjustment (class_weight='balanced') F1 Score has decreased! Let's try other methods.</li>

### Upsampling
There is quite a difference between sheer number of customers who exited vs who stayed. Let's use upsampling to make observations of exits less rare in data. 

In [15]:
def upsample(features, target, repeat):
    features_zeros = features[target == 0]
    features_ones = features[target == 1]
    target_zeros = target[target == 0]
    target_ones = target[target == 1]

    features_upsampled = pd.concat([features_zeros] + [features_ones] * repeat)
    target_upsampled = pd.concat([target_zeros] + [target_ones] * repeat)

    features_upsampled, target_upsampled = shuffle(
        features_upsampled, target_upsampled, random_state=1234
    )

    return features_upsampled, target_upsampled


features_upsampled, target_upsampled = upsample(
    features_train, target_train, 11
)

In [16]:
# Train model based on the upsampling

RFUPmodel = RandomForestClassifier(random_state=1234, n_estimators=10)
RFUPmodel.fit(features_upsampled, target_upsampled)
RFUPpredic_valid = RFUPmodel.predict(features_valid)
RFUPprob_valid = RFUPmodel.predict_proba(features_valid)

print('Precision:', precision_score(target_valid, RFUPpredic_valid))
print('Recall:', recall_score(target_valid, RFUPpredic_valid))
print('F1 Score:', f1_score(target_valid, RFUPpredic_valid))

Precision: 0.628125
Recall: 0.5025
F1 Score: 0.5583333333333332


In [17]:
# Downsampling may need further tuning on hyperparameter. Let's start with finding best n_estimator
best_n_estimators = None
best_f1_score = 0

for n in range(1, 101, 10):  # Try different values for n_estimators
    RFUPmodel = RandomForestClassifier(random_state=1234, n_estimators=n)
    RFUPmodel.fit(features_upsampled, target_upsampled)
    RFUPpredic_valid = RFUPmodel.predict(features_valid)
    f1 = f1_score(target_valid, RFUPpredic_valid)

    if f1 > best_f1_score:
        best_f1_score = f1
        best_n_estimators = n

print(
    f'Best n_estimators: {best_n_estimators}, Best F1 Score: {best_f1_score}')

Best n_estimators: 71, Best F1 Score: 0.5956284153005463


In [18]:
# Let's retrain the model with best n_estimator = 71

def upsample(features, target, repeat):
    features_zeros = features[target == 0]
    features_ones = features[target == 1]
    target_zeros = target[target == 0]
    target_ones = target[target == 1]

    features_upsampled = pd.concat([features_zeros] + [features_ones] * repeat)
    target_upsampled = pd.concat([target_zeros] + [target_ones] * repeat)

    features_upsampled, target_upsampled = shuffle(
        features_upsampled, target_upsampled, random_state=1234
    )

    return features_upsampled, target_upsampled


features_upsampled, target_upsampled = upsample(
    features_train, target_train, 11

)

RFUPmodel = RandomForestClassifier(random_state=1234, n_estimators=71)
RFUPmodel.fit(features_upsampled, target_upsampled)
RFUPpredic_valid = RFUPmodel.predict(features_valid)
RFUPprob_valid = RFUPmodel.predict_proba(features_valid)

print('Precision:', precision_score(target_valid, RFUPpredic_valid))
print('Recall:', recall_score(target_valid, RFUPpredic_valid))
print('F1 Score:', f1_score(target_valid, RFUPpredic_valid))

Precision: 0.6566265060240963
Recall: 0.545
F1 Score: 0.5956284153005463


<div class="alert alert-info"> <b> Review </b>:
    <li> We can observe through upsampling and n_estimator adjustment that F1 score has improved.</li>

### Downsampling
Let's use downsampling to make observations of stays less frequent in the data.

In [19]:
def downsample(features, target, fraction):
    features_zeros = features[target == 0]
    features_ones = features[target == 1]
    target_zeros = target[target == 0]
    target_ones = target[target == 1]

    features_downsampled = pd.concat([
        features_zeros.sample(frac=fraction, random_state=1234),
        features_ones
    ])

    target_downsampled = pd.concat([
        target_zeros.sample(frac=fraction, random_state=1234),
        target_ones
    ])

    features_downsampled, target_downsampled = shuffle(
        features_downsampled, target_downsampled, random_state=1234
    )

    return features_downsampled, target_downsampled


features_downsampled, target_downsampled = downsample(
    features_train, target_train, 0.6
)

In [20]:
# Let's train the model

RFDNmodel = RandomForestClassifier(random_state=1234, n_estimators=10)
RFDNmodel.fit(features_downsampled, target_downsampled)
RFDNpredic_valid = RFDNmodel.predict(features_valid)
RFDNprob_valid = RFDNmodel.predict_proba(features_valid)

print('Precision:', precision_score(target_valid, RFDNpredic_valid))
print('Recall:', recall_score(target_valid, RFDNpredic_valid))
print('F1 Score:', f1_score(target_valid, RFDNpredic_valid))

Precision: 0.6371681415929203
Recall: 0.54
F1 Score: 0.584573748308525


In [21]:
# We are close to the threshold, but not quite. Let's adjust the n_estimator
best_n_estimators = None
best_f1_score = 0

for n in range(1, 101, 11):
    RFDNmodel = RandomForestClassifier(random_state=1234, n_estimators=n)
    RFDNmodel.fit(features_downsampled, target_downsampled)
    RFDNpredic_valid = RFDNmodel.predict(features_valid)
    f1 = f1_score(target_valid, RFDNpredic_valid)

    if f1 > best_f1_score:
        best_f1_score = f1
        best_n_estimators = n

print(f'Best n_estimators: {best_n_estimators}')

Best n_estimators: 78


In [22]:
# Let's retrain the model with best n_estimator = 78

def downsample(features, target, fraction):
    features_zeros = features[target == 0]
    features_ones = features[target == 1]
    target_zeros = target[target == 0]
    target_ones = target[target == 1]

    features_downsampled = pd.concat([
        features_zeros.sample(frac=fraction, random_state=1234),
        features_ones
    ])

    target_downsampled = pd.concat([
        target_zeros.sample(frac=fraction, random_state=1234),
        target_ones
    ])

    features_downsampled, target_downsampled = shuffle(
        features_downsampled, target_downsampled, random_state=1234
    )

    return features_downsampled, target_downsampled


features_downsampled, target_downsampled = downsample(
    features_train, target_train, 0.6
)


RFDNmodel = RandomForestClassifier(random_state=1234, n_estimators=78)
RFDNmodel.fit(features_downsampled, target_downsampled)
RFDNpredic_valid = RFDNmodel.predict(features_valid)
RFDNprob_valid = RFDNmodel.predict_proba(features_valid)

print('Precision:', precision_score(target_valid, RFDNpredic_valid))
print('Recall:', recall_score(target_valid, RFDNpredic_valid))
print('F1 Score:', f1_score(target_valid, RFDNpredic_valid))

Precision: 0.6361031518624641
Recall: 0.555
F1 Score: 0.5927903871829105


<div class="alert alert-info"> <b> Review </b>:
    <li> Downsampling also has reached threshold of 0.59</li>

## Final Testing 

In [23]:
# Final model with upsampling

RFUPmodel = RandomForestClassifier(random_state=1234, n_estimators=78)
RFUPmodel.fit(features_upsampled, target_upsampled)
RFUPpredic_test = RFUPmodel.predict(features_test)
RFUPprob_test = RFUPmodel.predict_proba(features_test)
RFUPprob_one_test = RFUPprob_test[:, 1]
RFUPauc_roc = roc_auc_score(target_test, RFUPprob_one_test)

print('Precision:', precision_score(target_test, RFUPpredic_test))
print('Recall:', recall_score(target_test, RFUPpredic_test))
print('F1 Score:', f1_score(target_test, RFUPpredic_test))
print('AUC_ROC: ', RFUPauc_roc)

Precision: 0.6655629139072847
Recall: 0.4878640776699029
F1 Score: 0.5630252100840336
AUC_ROC:  0.8355872930473699


In [24]:
# Final model with downsampling

RFDNmodel = RandomForestClassifier(random_state=1234, n_estimators=78)
RFDNmodel.fit(features_downsampled, target_downsampled)
RFDNpredic_test = RFDNmodel.predict(features_test)
RFDNprob_test = RFDNmodel.predict_proba(features_test)
RFDNprob_one_test = RFDNprob_test[:, 1]
RFDNauc_roc = roc_auc_score(target_test, RFDNprob_one_test)

print('Precision:', precision_score(target_test, RFDNpredic_test))
print('Recall:', recall_score(target_test, RFDNpredic_test))
print('F1 Score:', f1_score(target_test, RFDNpredic_test))
print('AUC_ROC: ', RFDNauc_roc)

Precision: 0.6616766467065869
Recall: 0.5364077669902912
F1 Score: 0.5924932975871313
AUC_ROC:  0.8371149825144898


<div class="alert alert-info"> <b> Final Review </b>:
    <li> We have replaced missing Tenure data with median since median would be less senitive to outliers than the mean. </li>
    <li> We have dropped non-behavioral categorical columns from model training. </li>
    <li> We have edited so Geography information is categorical to numeric. </li>
    <li> We have compared 2 models: Decision Tree Classifier and Random Forest Classifier.<ul>
        <li> We have adjusted both model's hyperparameter </li>
        <li>By comparing the two model's AUC_ROC and F1 score, we have decided to pursue further model training with Random Froest Classifier.</li></ul></li>
    <li> Model Improvements<ul>
        <li> We have tried class weight adjustment, but only worsend our F1 score.</li>
        <li> We have also tried upsampling and downsampling. Both methods improved the model beyond its target F1 score of 0.59. </li></ul></li>
    <li> Final model result:<ul>
        <li> Although both the upsampling and downsampling reached threshold of 0.59 with valid set, when tested with test set, only downsampling reached the threshold of 0.59.</li>
        <li> Precision 66.2%: Model's prediction of an exit is accurate around 66.2% of the time.</li>
        <li> Recall 53.6%: Out of all the exits, the model captures about 53.6% of them. </li> 
        <li> F1 0.592: Precision and Recall being not far off from each other and with F1 Score of 0.592, we can say that there is balance between the model's ability to accurately predict exit cases and capture the most of the actual exit cases.</li>
        <li> AUC_ROC 0.837: AUC_ROC increased from 0.704 indicating improvement in model's ability. </li>