# Credit Card Customers_Hyperparameter Tuning

In the following notebooks, we will go through the implementation of each of the steps in the Machine Learning Pipeline. 

The steps will: 

   - 1. Cleaning, EDA, and Visualization
   - 2. Feature Engineering and Feature Scaling
   - 3. Oversampling 
   - __4. Hyperparameter Tuning for Gradient Boosting Model__
   - 5. Building Model Pipeline
   
==========================================================================================

### Introduction

In the notebook (Building_Model_Pipeline.ipynb), we found the GradientBoosingClassifer shows the best performance for the dataset. 

In this notebook, I will try to find the best hyperparameter by using Optuna. 



### Import Packages

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

from collections import Counter

pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 100)

from pathlib import Path
import os
os.getcwd()

'/Users/yejiseoung/Dropbox/My Mac (Yejis-MacBook-Pro.local)/Documents/Projects/CreditCard'

In [2]:
# set up path for data
path = Path('/Users/yejiseoung/Dropbox/My Mac (Yejis-MacBook-Pro.local)/Documents/Projects/CreditCard/Data/')

In [3]:
# Data pre-processing 
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split


# Modelling 
from sklearn.metrics import (
    roc_auc_score,
    precision_score, 
    accuracy_score, 
    recall_score,
)

from sklearn.ensemble import GradientBoostingClassifier


# for feature engineering
from feature_engine import encoding as ce

# Evaluation & CV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import KFold, cross_val_score


# pipeline
from sklearn.pipeline import Pipeline

# for oversampling
from imblearn.over_sampling import ADASYN, SMOTE

# for cross-validation
from imblearn.pipeline import make_pipeline

# for hyperparameter tuning
import optuna

###  Load Data

In [4]:
df = pd.read_csv(path/'BankChurners.csv')

In [5]:
# drop unuseful columns 
df.drop(['CLIENTNUM',
        'Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1',
     'Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2'],
        axis=1, inplace=True)

df.shape

(10127, 20)

In [6]:
# create lists for categorical and numerical variables
cat_vars = [var for var in df.columns if df[var].dtype=='O' and var != 'Attrition_Flag']
num_vars = [var for var in df.columns if df[var].dtype!='O']

print('The number of categorical variables: {}'.format(len(cat_vars)))
print('The number of numerical vairables: {}'.format(len(num_vars)))

The number of categorical variables: 5
The number of numerical vairables: 14


In [7]:
df.head(2)

Unnamed: 0,Attrition_Flag,Customer_Age,Gender,Dependent_count,Education_Level,Marital_Status,Income_Category,Card_Category,Months_on_book,Total_Relationship_Count,Months_Inactive_12_mon,Contacts_Count_12_mon,Credit_Limit,Total_Revolving_Bal,Avg_Open_To_Buy,Total_Amt_Chng_Q4_Q1,Total_Trans_Amt,Total_Trans_Ct,Total_Ct_Chng_Q4_Q1,Avg_Utilization_Ratio
0,Existing Customer,45,M,3,High School,Married,$60K - $80K,Blue,39,5,1,3,12691.0,777,11914.0,1.335,1144,42,1.625,0.061
1,Existing Customer,49,F,5,Graduate,Single,Less than $40K,Blue,44,6,1,2,8256.0,864,7392.0,1.541,1291,33,3.714,0.105


## Separate into train and test set

In [8]:
df['Attrition_Flag'].value_counts()

Existing Customer    8500
Attrited Customer    1627
Name: Attrition_Flag, dtype: int64

In [9]:
# change the string values of target to integer value
churn_map = {'Existing Customer': 0,
            'Attrited Customer': 1}
df['Attrition_Flag'] = df['Attrition_Flag'].map(churn_map)

In [10]:
X_train, X_test, y_train, y_test = train_test_split(
    df.drop(['Attrition_Flag'], axis=1),
    df['Attrition_Flag'],
    test_size=0.2, 
    random_state=0)

X_train.shape, X_test.shape

((8101, 19), (2026, 19))

## Data Preparation

I will perform categorical encoding, feature scaling and oversampling here. 

In [11]:
# categorical encoding
cat_encoding = ce.OrdinalEncoder(
    encoding_method='arbitrary', variables=cat_vars)

X_train = cat_encoding.fit_transform(X_train)
X_test = cat_encoding.transform(X_test)

# feature scaling
scaler = StandardScaler()

X_train = pd.DataFrame(
    scaler.fit_transform(X_train),
    columns=X_train.columns)

X_test = pd.DataFrame(
    scaler.transform(X_test),
    columns=X_train.columns)


# oversampling
oversampling = ADASYN(
    sampling_strategy='auto', # samples only the minority class
        random_state=0,
        n_neighbors=5,
        n_jobs=1)

X_resampled, y_resampled = oversampling.fit_resample(X_train, y_train)

In [12]:
# new resampled X_train and y_train
X_resampled.shape, y_resampled.shape

((13689, 19), (13689,))

## Hyperparater tuning with Optuna

In [19]:
def objective(trial):
    n_estimators = trial.suggest_int("n_estimators", 100, 1000)
    criterion = trial.suggest_categorical("criterion", ['mse', 'friedman_mse'])
    max_depth = trial.suggest_int("max_depth", 1, 11)
    min_samples_split = trial.suggest_float("min_samples_split", 0.01, 1)
    max_features = trial.suggest_categorical("features", ['sqrt', 'log2'])
    learning_rate = trial.suggest_float("learning_rate", 0.0001, 0.1)
    
    model = GradientBoostingClassifier(
        n_estimators=n_estimators,
        criterion=criterion,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        max_features=max_features,
        learning_rate=learning_rate
    )
    
    score = cross_val_score(model, X_resampled, y_resampled, cv=3, scoring='roc_auc')
    accuracy = score.mean()
    return accuracy

In [21]:
study = optuna.create_study(
    direction='maximize',
    sampler=optuna.samplers.RandomSampler()
)

study.optimize(objective, n_trials=100)

[32m[I 2022-03-02 09:34:22,769][0m A new study created in memory with name: no-name-29eab686-5dd8-4ea2-975b-eeaa2ec25afa[0m
[32m[I 2022-03-02 09:34:29,604][0m Trial 0 finished with value: 0.9918070204898458 and parameters: {'n_estimators': 617, 'criterion': 'mse', 'max_depth': 5, 'min_samples_split': 0.7099803605986117, 'features': 'log2', 'learning_rate': 0.049744854977002914}. Best is trial 0 with value: 0.9918070204898458.[0m
[32m[I 2022-03-02 09:34:30,958][0m Trial 1 finished with value: 0.9580176485061993 and parameters: {'n_estimators': 180, 'criterion': 'friedman_mse', 'max_depth': 5, 'min_samples_split': 0.9488702774633049, 'features': 'log2', 'learning_rate': 0.05064986015555556}. Best is trial 0 with value: 0.9918070204898458.[0m
[32m[I 2022-03-02 09:34:37,771][0m Trial 2 finished with value: 0.9910296644150366 and parameters: {'n_estimators': 950, 'criterion': 'friedman_mse', 'max_depth': 1, 'min_samples_split': 0.8762109752127044, 'features': 'sqrt', 'learning_ra

[32m[I 2022-03-02 09:38:24,584][0m Trial 26 finished with value: 0.9978477650148266 and parameters: {'n_estimators': 310, 'criterion': 'friedman_mse', 'max_depth': 10, 'min_samples_split': 0.19399784478179977, 'features': 'sqrt', 'learning_rate': 0.08171679676322756}. Best is trial 15 with value: 0.998131348199804.[0m
[32m[I 2022-03-02 09:38:34,198][0m Trial 27 finished with value: 0.9937855063375206 and parameters: {'n_estimators': 927, 'criterion': 'mse', 'max_depth': 3, 'min_samples_split': 0.7035853885676934, 'features': 'log2', 'learning_rate': 0.05075301213282573}. Best is trial 15 with value: 0.998131348199804.[0m
[32m[I 2022-03-02 09:38:44,298][0m Trial 28 finished with value: 0.9924194013423194 and parameters: {'n_estimators': 857, 'criterion': 'friedman_mse', 'max_depth': 10, 'min_samples_split': 0.6469010815833495, 'features': 'sqrt', 'learning_rate': 0.03137339801022523}. Best is trial 15 with value: 0.998131348199804.[0m
[32m[I 2022-03-02 09:38:51,924][0m Trial 

[32m[I 2022-03-02 09:42:17,463][0m Trial 52 finished with value: 0.874166622156659 and parameters: {'n_estimators': 101, 'criterion': 'mse', 'max_depth': 1, 'min_samples_split': 0.5037439442238232, 'features': 'log2', 'learning_rate': 0.0034683665266914687}. Best is trial 15 with value: 0.998131348199804.[0m
[32m[I 2022-03-02 09:42:36,843][0m Trial 53 finished with value: 0.9965451908505868 and parameters: {'n_estimators': 864, 'criterion': 'friedman_mse', 'max_depth': 5, 'min_samples_split': 0.21078949598096727, 'features': 'sqrt', 'learning_rate': 0.018817868415075645}. Best is trial 15 with value: 0.998131348199804.[0m
[32m[I 2022-03-02 09:42:40,248][0m Trial 54 finished with value: 0.9898584428055369 and parameters: {'n_estimators': 282, 'criterion': 'mse', 'max_depth': 11, 'min_samples_split': 0.6369023237600919, 'features': 'sqrt', 'learning_rate': 0.06753314411309945}. Best is trial 15 with value: 0.998131348199804.[0m
[32m[I 2022-03-02 09:42:41,348][0m Trial 55 finis

[32m[I 2022-03-02 09:48:26,734][0m Trial 78 finished with value: 0.9937428535820566 and parameters: {'n_estimators': 782, 'criterion': 'friedman_mse', 'max_depth': 2, 'min_samples_split': 0.22194599622800337, 'features': 'sqrt', 'learning_rate': 0.03484940233508202}. Best is trial 67 with value: 0.9983726628883755.[0m
[32m[I 2022-03-02 09:48:31,707][0m Trial 79 finished with value: 0.9874128900725777 and parameters: {'n_estimators': 700, 'criterion': 'friedman_mse', 'max_depth': 1, 'min_samples_split': 0.2780274659341932, 'features': 'log2', 'learning_rate': 0.07136945245216535}. Best is trial 67 with value: 0.9983726628883755.[0m
[32m[I 2022-03-02 09:48:35,926][0m Trial 80 finished with value: 0.987178620133407 and parameters: {'n_estimators': 523, 'criterion': 'mse', 'max_depth': 10, 'min_samples_split': 0.9326278601221986, 'features': 'log2', 'learning_rate': 0.0728879253274146}. Best is trial 67 with value: 0.9983726628883755.[0m
[32m[I 2022-03-02 09:48:38,681][0m Trial 

In [22]:
study.best_params

{'n_estimators': 658,
 'criterion': 'friedman_mse',
 'max_depth': 5,
 'min_samples_split': 0.017209668061883378,
 'features': 'sqrt',
 'learning_rate': 0.050497537295197764}

In [24]:
study.best_value

0.9983726628883755

In [29]:
study.trials_dataframe().sort_values(by='value', ascending=False).head(10)

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_criterion,params_features,params_learning_rate,params_max_depth,params_min_samples_split,params_n_estimators,state
67,67,0.998373,2022-03-02 09:44:08.940111,2022-03-02 09:44:26.918925,0 days 00:00:17.978814,friedman_mse,sqrt,0.050498,5,0.01721,658,COMPLETE
99,99,0.998345,2022-03-02 09:51:47.833389,2022-03-02 09:52:20.422881,0 days 00:00:32.589492,friedman_mse,sqrt,0.035233,8,0.101857,974,COMPLETE
84,84,0.998217,2022-03-02 09:48:45.561024,2022-03-02 09:49:15.310015,0 days 00:00:29.748991,friedman_mse,log2,0.044219,11,0.166233,899,COMPLETE
15,15,0.998131,2022-03-02 09:36:23.270543,2022-03-02 09:36:50.483685,0 days 00:00:27.213142,mse,sqrt,0.06289,8,0.180719,936,COMPLETE
89,89,0.998088,2022-03-02 09:49:53.635857,2022-03-02 09:50:12.836045,0 days 00:00:19.200188,mse,log2,0.073303,6,0.174288,744,COMPLETE
11,11,0.998061,2022-03-02 09:35:38.113061,2022-03-02 09:35:53.776718,0 days 00:00:15.663657,mse,sqrt,0.085086,11,0.253061,570,COMPLETE
25,25,0.99805,2022-03-02 09:37:59.976088,2022-03-02 09:38:15.355735,0 days 00:00:15.379647,mse,log2,0.047085,10,0.167838,503,COMPLETE
33,33,0.998048,2022-03-02 09:39:10.578324,2022-03-02 09:39:30.479177,0 days 00:00:19.900853,friedman_mse,sqrt,0.085175,7,0.23858,766,COMPLETE
63,63,0.997975,2022-03-02 09:43:28.278729,2022-03-02 09:43:48.314905,0 days 00:00:20.036176,friedman_mse,sqrt,0.022399,10,0.06254,510,COMPLETE
90,90,0.997938,2022-03-02 09:50:12.836737,2022-03-02 09:50:39.107006,0 days 00:00:26.270269,friedman_mse,log2,0.065874,10,0.27115,976,COMPLETE


## Conclusion

We found the best parameters for Gradient Boosting Model by using Optuna. I will use the parameters when building the final model. 