# Kaggle Playground - Season 4 Episode 7
## Binary Classification of Insurance Cross Selling

Competion link - https://www.kaggle.com/competitions/playground-series-s4e7

### Steps
- Import the necessary libraries, packages and modules
- Unzip the zipped files
- Read the datsets as data framers

### Understand the problem

The objective of this competition is to predict which customers respond positively to an automobile insurance offer.

In [1]:
# Import the necessary libraries, packages and modules

import warnings
warnings.filterwarnings('ignore')

import dtale    # Use of a web progrm to analysis the data deeply
import keras_tuner as kt
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import pickle
import seaborn as sns
import statsmodels.api as sm
import tensorflow as tf
import zipfile

from keras import Sequential, Model
from keras.callbacks import EarlyStopping
from keras.layers import Dense, Dropout, BatchNormalization, Input, concatenate
from keras.metrics import AUC
from imblearn.over_sampling import RandomOverSampler
#from pandas_profiling import ProfileReport
from sklearn.linear_model import LogisticRegression, RidgeClassifier
from sklearn.ensemble import AdaBoostClassifier, BaggingClassifier, GradientBoostingClassifier, RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, roc_auc_score
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from tensorflow import keras
from xgboost import XGBClassifier

sns.set()
%matplotlib inline

In [2]:
train_df = pd.read_csv('train.csv')
test_df = pd.read_csv('test.csv')

train_df.head()

Unnamed: 0,id,Gender,Age,Driving_License,Region_Code,Previously_Insured,Vehicle_Age,Vehicle_Damage,Annual_Premium,Policy_Sales_Channel,Vintage,Response
0,0,Male,21,1,35.0,0,1-2 Year,Yes,65101.0,124.0,187,0
1,1,Male,43,1,28.0,0,> 2 Years,Yes,58911.0,26.0,288,1
2,2,Female,25,1,14.0,1,< 1 Year,No,38043.0,152.0,254,0
3,3,Female,35,1,1.0,0,1-2 Year,Yes,2630.0,156.0,76,0
4,4,Female,36,1,15.0,1,1-2 Year,No,31951.0,152.0,294,0


In [3]:
test_df.head()

Unnamed: 0,id,Gender,Age,Driving_License,Region_Code,Previously_Insured,Vehicle_Age,Vehicle_Damage,Annual_Premium,Policy_Sales_Channel,Vintage
0,11504798,Female,20,1,47.0,0,< 1 Year,No,2630.0,160.0,228
1,11504799,Male,47,1,28.0,0,1-2 Year,Yes,37483.0,124.0,123
2,11504800,Male,47,1,43.0,0,1-2 Year,Yes,2630.0,26.0,271
3,11504801,Female,22,1,47.0,1,< 1 Year,No,24502.0,152.0,115
4,11504802,Male,51,1,19.0,0,1-2 Year,No,34115.0,124.0,148


### Checking for incorrect datatypes

- There are no incorrect datatypes 
- The type of columns in both train and test are some
- Below are the observations
     0.   id                   - int64      - insignificant
     1.   Gender               - object     - categorical - change to numeric
     2.   Age                  - int64      - categorical - numeric
     3.   Driving_License      - int64      - categorical - numeric
     4.   Region_Code          - float64    - categorical - numeric
     5.   Previously_Insured   - int64      - categorical - numeric
     6.   Vehicle_Age          - object     - categorical - change to numeric
     7.   Vehicle_Damage       - object     - categorical - change to numeric
     8.   Annual_Premium       - float64    - numeric
     9.   Policy_Sales_Channel - float64    - not sure
     10.  Vintage              - int64      - not sure

In [4]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11504798 entries, 0 to 11504797
Data columns (total 12 columns):
 #   Column                Dtype  
---  ------                -----  
 0   id                    int64  
 1   Gender                object 
 2   Age                   int64  
 3   Driving_License       int64  
 4   Region_Code           float64
 5   Previously_Insured    int64  
 6   Vehicle_Age           object 
 7   Vehicle_Damage        object 
 8   Annual_Premium        float64
 9   Policy_Sales_Channel  float64
 10  Vintage               int64  
 11  Response              int64  
dtypes: float64(3), int64(6), object(3)
memory usage: 1.0+ GB


In [5]:
test_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7669866 entries, 0 to 7669865
Data columns (total 11 columns):
 #   Column                Dtype  
---  ------                -----  
 0   id                    int64  
 1   Gender                object 
 2   Age                   int64  
 3   Driving_License       int64  
 4   Region_Code           float64
 5   Previously_Insured    int64  
 6   Vehicle_Age           object 
 7   Vehicle_Damage        object 
 8   Annual_Premium        float64
 9   Policy_Sales_Channel  float64
 10  Vintage               int64  
dtypes: float64(3), int64(5), object(3)
memory usage: 643.7+ MB


In [6]:
column_names = train_df.columns.tolist()

for i in column_names:
    print(i, train_df[i].nunique(), 'unique values')

id 11504798 unique values
Gender 2 unique values
Age 66 unique values
Driving_License 2 unique values
Region_Code 54 unique values
Previously_Insured 2 unique values
Vehicle_Age 3 unique values
Vehicle_Damage 2 unique values
Annual_Premium 51728 unique values
Policy_Sales_Channel 152 unique values
Vintage 290 unique values
Response 2 unique values


### Encoding categorical variables

- Columns needing encoding
    - Gender - Label encoder
    - Vehicle_Age - Mapped encoder
    - Vehicle_Damage - Label encoder
- All columns are now numeric, we can proceed with building the models

In [7]:
# Proceeding with encoding
# Label encoder on gender column

train_df['Gender'] = train_df['Gender'].astype('category')
train_df['Gender'] = train_df['Gender'].cat.codes

test_df['Gender'] = test_df['Gender'].astype('category')
test_df['Gender'] = test_df['Gender'].cat.codes

train_df.head()

Unnamed: 0,id,Gender,Age,Driving_License,Region_Code,Previously_Insured,Vehicle_Age,Vehicle_Damage,Annual_Premium,Policy_Sales_Channel,Vintage,Response
0,0,1,21,1,35.0,0,1-2 Year,Yes,65101.0,124.0,187,0
1,1,1,43,1,28.0,0,> 2 Years,Yes,58911.0,26.0,288,1
2,2,0,25,1,14.0,1,< 1 Year,No,38043.0,152.0,254,0
3,3,0,35,1,1.0,0,1-2 Year,Yes,2630.0,156.0,76,0
4,4,0,36,1,15.0,1,1-2 Year,No,31951.0,152.0,294,0


In [8]:
unique_veh_age = train_df['Vehicle_Age'].unique
print(unique_veh_age)

<bound method Series.unique of 0            1-2 Year
1           > 2 Years
2            < 1 Year
3            1-2 Year
4            1-2 Year
              ...    
11504793     1-2 Year
11504794     < 1 Year
11504795     < 1 Year
11504796     1-2 Year
11504797     < 1 Year
Name: Vehicle_Age, Length: 11504798, dtype: object>


In [9]:
# Define the mapping for encoding

veh_age_mapping = {
    '< 1 Year': 0,
    '1-2 Year': 1,
    '> 2 Years': 2
}

# Encode the 'Vehicle_Age' column

train_df['Vehicle_Age'] = train_df['Vehicle_Age'].map(veh_age_mapping)
test_df['Vehicle_Age'] = test_df['Vehicle_Age'].map(veh_age_mapping)

train_df.head()

Unnamed: 0,id,Gender,Age,Driving_License,Region_Code,Previously_Insured,Vehicle_Age,Vehicle_Damage,Annual_Premium,Policy_Sales_Channel,Vintage,Response
0,0,1,21,1,35.0,0,1,Yes,65101.0,124.0,187,0
1,1,1,43,1,28.0,0,2,Yes,58911.0,26.0,288,1
2,2,0,25,1,14.0,1,0,No,38043.0,152.0,254,0
3,3,0,35,1,1.0,0,1,Yes,2630.0,156.0,76,0
4,4,0,36,1,15.0,1,1,No,31951.0,152.0,294,0


In [10]:
# Encoding 'Vehicle_Damage' column - using label encoding

train_df['Vehicle_Damage'] = train_df['Vehicle_Damage'].astype('category')
train_df['Vehicle_Damage'] = train_df['Vehicle_Damage'].cat.codes

test_df['Vehicle_Damage'] = test_df['Vehicle_Damage'].astype('category')
test_df['Vehicle_Damage'] = test_df['Vehicle_Damage'].cat.codes

train_df.head()

Unnamed: 0,id,Gender,Age,Driving_License,Region_Code,Previously_Insured,Vehicle_Age,Vehicle_Damage,Annual_Premium,Policy_Sales_Channel,Vintage,Response
0,0,1,21,1,35.0,0,1,1,65101.0,124.0,187,0
1,1,1,43,1,28.0,0,2,1,58911.0,26.0,288,1
2,2,0,25,1,14.0,1,0,0,38043.0,152.0,254,0
3,3,0,35,1,1.0,0,1,1,2630.0,156.0,76,0
4,4,0,36,1,15.0,1,1,0,31951.0,152.0,294,0


In [11]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11504798 entries, 0 to 11504797
Data columns (total 12 columns):
 #   Column                Dtype  
---  ------                -----  
 0   id                    int64  
 1   Gender                int8   
 2   Age                   int64  
 3   Driving_License       int64  
 4   Region_Code           float64
 5   Previously_Insured    int64  
 6   Vehicle_Age           int64  
 7   Vehicle_Damage        int8   
 8   Annual_Premium        float64
 9   Policy_Sales_Channel  float64
 10  Vintage               int64  
 11  Response              int64  
dtypes: float64(3), int64(7), int8(2)
memory usage: 899.7 MB


In [12]:
test_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7669866 entries, 0 to 7669865
Data columns (total 11 columns):
 #   Column                Dtype  
---  ------                -----  
 0   id                    int64  
 1   Gender                int8   
 2   Age                   int64  
 3   Driving_License       int64  
 4   Region_Code           float64
 5   Previously_Insured    int64  
 6   Vehicle_Age           int64  
 7   Vehicle_Damage        int8   
 8   Annual_Premium        float64
 9   Policy_Sales_Channel  float64
 10  Vintage               int64  
dtypes: float64(3), int64(6), int8(2)
memory usage: 541.3 MB


### Droping the insignificant columns

- Since id is insignificant we can drop that column from both test and train.


In [13]:
train_df = train_df.drop(['id'], axis = 1)
test_df = test_df.drop(['id'], axis = 1)

train_df.head(2)

Unnamed: 0,Gender,Age,Driving_License,Region_Code,Previously_Insured,Vehicle_Age,Vehicle_Damage,Annual_Premium,Policy_Sales_Channel,Vintage,Response
0,1,21,1,35.0,0,1,1,65101.0,124.0,187,0
1,1,43,1,28.0,0,2,1,58911.0,26.0,288,1


In [14]:
test_df.head(2)

Unnamed: 0,Gender,Age,Driving_License,Region_Code,Previously_Insured,Vehicle_Age,Vehicle_Damage,Annual_Premium,Policy_Sales_Channel,Vintage
0,0,20,1,47.0,0,0,0,2630.0,160.0,228
1,1,47,1,28.0,0,1,1,37483.0,124.0,123


### Train test split of the train df

In [15]:
# Since we have only one data set, spliting it into train and test (validation)

raw_train_df, validation_df = train_test_split(train_df, train_size = 0.75, random_state = 1, stratify = train_df['Response'])
raw_train_df.head(2)

Unnamed: 0,Gender,Age,Driving_License,Region_Code,Previously_Insured,Vehicle_Age,Vehicle_Damage,Annual_Premium,Policy_Sales_Channel,Vintage,Response
6400262,0,26,1,28.0,0,0,0,54497.0,26.0,234,0
8095698,0,25,1,30.0,1,0,0,38748.0,152.0,131,0


In [16]:
validation_df.head(2)

Unnamed: 0,Gender,Age,Driving_License,Region_Code,Previously_Insured,Vehicle_Age,Vehicle_Damage,Annual_Premium,Policy_Sales_Channel,Vintage,Response
6517611,1,44,1,28.0,0,1,1,2630.0,157.0,91,0
1591313,0,23,1,14.0,1,0,0,35345.0,152.0,272,0


In [17]:
raw_train_df.shape

(8628598, 11)

In [18]:
validation_df.shape

(2876200, 11)

In [19]:
raw_train_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 8628598 entries, 6400262 to 8402201
Data columns (total 11 columns):
 #   Column                Dtype  
---  ------                -----  
 0   Gender                int8   
 1   Age                   int64  
 2   Driving_License       int64  
 3   Region_Code           float64
 4   Previously_Insured    int64  
 5   Vehicle_Age           int64  
 6   Vehicle_Damage        int8   
 7   Annual_Premium        float64
 8   Policy_Sales_Channel  float64
 9   Vintage               int64  
 10  Response              int64  
dtypes: float64(3), int64(6), int8(2)
memory usage: 674.8 MB


In [20]:
validation_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2876200 entries, 6517611 to 326523
Data columns (total 11 columns):
 #   Column                Dtype  
---  ------                -----  
 0   Gender                int8   
 1   Age                   int64  
 2   Driving_License       int64  
 3   Region_Code           float64
 4   Previously_Insured    int64  
 5   Vehicle_Age           int64  
 6   Vehicle_Damage        int8   
 7   Annual_Premium        float64
 8   Policy_Sales_Channel  float64
 9   Vintage               int64  
 10  Response              int64  
dtypes: float64(3), int64(6), int8(2)
memory usage: 224.9 MB


In [21]:
raw_train_df.describe()

Unnamed: 0,Gender,Age,Driving_License,Region_Code,Previously_Insured,Vehicle_Age,Vehicle_Damage,Annual_Premium,Policy_Sales_Channel,Vintage,Response
count,8628598.0,8628598.0,8628598.0,8628598.0,8628598.0,8628598.0,8628598.0,8628598.0,8628598.0,8628598.0,8628598.0
mean,0.5412746,38.389,0.9980113,26.41771,0.4630153,0.6032037,0.5027108,30461.89,112.4161,163.8887,0.1229973
std,0.4982935,14.99678,0.04455088,12.99227,0.4986303,0.5678678,0.4999927,16444.75,54.03797,79.97808,0.3284341
min,0.0,20.0,0.0,0.0,0.0,0.0,0.0,2630.0,1.0,10.0,0.0
25%,0.0,24.0,1.0,15.0,0.0,0.0,0.0,25279.0,29.0,99.0,0.0
50%,1.0,36.0,1.0,28.0,0.0,1.0,1.0,31826.0,151.0,166.0,0.0
75%,1.0,49.0,1.0,35.0,1.0,1.0,1.0,39454.0,152.0,232.0,0.0
max,1.0,85.0,1.0,52.0,1.0,2.0,1.0,540165.0,163.0,299.0,1.0


In [22]:
validation_df.describe()

Unnamed: 0,Gender,Age,Driving_License,Region_Code,Previously_Insured,Vehicle_Age,Vehicle_Damage,Annual_Premium,Policy_Sales_Channel,Vintage,Response
count,2876200.0,2876200.0,2876200.0,2876200.0,2876200.0,2876200.0,2876200.0,2876200.0,2876200.0,2876200.0,2876200.0
mean,0.5415802,38.36725,0.998054,26.42163,0.4629403,0.6028183,0.5025867,30459.83,112.4533,163.9249,0.1229974
std,0.4982682,14.98347,0.04407022,12.98954,0.4986248,0.5678204,0.4999934,16484.7,54.02893,79.98389,0.3284342
min,0.0,20.0,0.0,0.0,0.0,0.0,0.0,2630.0,1.0,10.0,0.0
25%,0.0,24.0,1.0,15.0,0.0,0.0,0.0,25272.0,29.0,99.0,0.0
50%,1.0,36.0,1.0,28.0,0.0,1.0,1.0,31817.0,151.0,166.0,0.0
75%,1.0,49.0,1.0,35.0,1.0,1.0,1.0,39443.0,152.0,232.0,0.0
max,1.0,85.0,1.0,52.0,1.0,2.0,1.0,540165.0,163.0,299.0,1.0


### Splitting dependent and independent variable

In [23]:
# Splitting dependent and independent variable

raw_x_train = raw_train_df.drop(['Response'], axis = 1)
raw_y_train = raw_train_df['Response']

raw_x_val = validation_df.drop(['Response'], axis = 1)
raw_y_val = validation_df['Response']

raw_x_train.head()

Unnamed: 0,Gender,Age,Driving_License,Region_Code,Previously_Insured,Vehicle_Age,Vehicle_Damage,Annual_Premium,Policy_Sales_Channel,Vintage
6400262,0,26,1,28.0,0,0,0,54497.0,26.0,234
8095698,0,25,1,30.0,1,0,0,38748.0,152.0,131
5898936,1,58,1,8.0,1,1,0,2630.0,26.0,142
3958879,0,54,1,28.0,0,1,1,46156.0,26.0,24
2335270,1,45,1,10.0,0,1,1,2630.0,124.0,257


### Standardisation of raw data 

In [24]:
# Using satandardisation technique

ssc = StandardScaler()
scaled_x_train = pd.DataFrame(ssc.fit_transform(raw_x_train))
scaled_y_train = raw_y_train
scaled_x_val = pd.DataFrame(ssc.fit_transform(raw_x_val))
scaled_y_val = raw_y_val

scaled_x_train.head(2)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,-1.086257,-0.826111,0.04464,0.121787,-0.928574,-1.062226,-1.005436,1.461568,-1.599175,0.876632
1,-1.086257,-0.892792,0.04464,0.275725,1.07692,-1.062226,-1.005436,0.503876,0.732519,-0.411221


In [25]:
scaled_x_val.head(2)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,0.920026,0.375931,0.044156,0.121511,-0.928434,0.699485,0.99484,-1.688222,0.824496,-0.911745
1,-1.086925,-1.025614,0.044156,-0.95628,1.077082,-1.061636,-1.005187,0.296346,0.731953,1.351211


In [26]:
raw_x_train.shape

(8628598, 10)

In [27]:
raw_inputs = raw_x_train.shape[1]
raw_inputs

10

In [28]:
scaled_inputs = scaled_x_train.shape[1]
early_stopping = EarlyStopping(monitor = 'val_loss', patience = 10, min_delta = 0.0001, verbose = 1)

In [29]:
# Designing the Model
scaled_model = Sequential()

scaled_model.add(Dense(input_dim = scaled_inputs, activation = 'relu', units = 128))
scaled_model.add(BatchNormalization())
scaled_model.add(Dense(activation = 'relu', units = 128))
scaled_model.add(BatchNormalization())
scaled_model.add(Dense(activation = 'relu', units = 64))
scaled_model.add(BatchNormalization())
scaled_model.add(Dense(activation = 'relu', units = 64))
scaled_model.add(BatchNormalization())
scaled_model.add(Dense(activation = 'relu', units = 32))
scaled_model.add(BatchNormalization())
scaled_model.add(Dense(activation = 'relu', units = 32))
scaled_model.add(BatchNormalization())
scaled_model.add(Dense(activation = 'sigmoid', units = 1))

scaled_model.summary()

In [30]:
# Compiling the model

scaled_model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = [AUC (name = 'auroc')])

# Training the model

history_scaled = scaled_model.fit(scaled_x_train, scaled_y_train, 
                                  validation_data = (scaled_x_val, scaled_y_val), 
                                  epochs = 100, 
                                  callbacks = [early_stopping])

Epoch 1/100
[1m269644/269644[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m668s[0m 2ms/step - auroc: 0.8480 - loss: 0.2705 - val_auroc: 0.8580 - val_loss: 0.2644
Epoch 2/100
[1m269644/269644[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m702s[0m 3ms/step - auroc: 0.8562 - loss: 0.2653 - val_auroc: 0.8595 - val_loss: 0.2683
Epoch 3/100
[1m269644/269644[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m705s[0m 3ms/step - auroc: 0.8577 - loss: 0.2646 - val_auroc: 0.8602 - val_loss: 0.2636
Epoch 4/100
[1m269644/269644[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m726s[0m 3ms/step - auroc: 0.8591 - loss: 0.2638 - val_auroc: 0.8608 - val_loss: 0.2714
Epoch 5/100
[1m269644/269644[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m767s[0m 3ms/step - auroc: 0.8596 - loss: 0.2635 - val_auroc: 0.8613 - val_loss: 0.2663
Epoch 6/100
[1m269644/269644[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m714s[0m 3ms/step - auroc: 0.8598 - loss: 0.2634 - val_auroc: 0.8610 - val_loss: 0.3041
Epoch 7/10

In [31]:
# Designing the Model
scaled_model2 = Sequential()

scaled_model2.add(Dense(input_dim = scaled_inputs, activation = 'relu', units = 128))
scaled_model2.add(BatchNormalization())
scaled_model2.add(Dense(activation = 'relu', units = 128))
scaled_model2.add(BatchNormalization())
scaled_model2.add(Dense(activation = 'relu', units = 128))
scaled_model2.add(BatchNormalization())
scaled_model2.add(Dense(activation = 'relu', units = 64))
scaled_model2.add(BatchNormalization())
scaled_model2.add(Dense(activation = 'relu', units = 64))
scaled_model2.add(BatchNormalization())
scaled_model2.add(Dense(activation = 'relu', units = 64))
scaled_model2.add(BatchNormalization())
scaled_model2.add(Dense(activation = 'relu', units = 32))
scaled_model2.add(BatchNormalization())
scaled_model2.add(Dense(activation = 'relu', units = 32))
scaled_model2.add(BatchNormalization())
scaled_model2.add(Dense(activation = 'relu', units = 32))
scaled_model2.add(BatchNormalization())
scaled_model2.add(Dense(activation = 'sigmoid', units = 1))

scaled_model2.summary()

In [32]:
# Compiling the model

scaled_model2.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = [AUC (name = 'auroc')])

# Training the model

history_scaled2 = scaled_model2.fit(scaled_x_train, scaled_y_train, 
                                    validation_data = (scaled_x_val, scaled_y_val), 
                                    epochs = 100, 
                                    callbacks = [early_stopping])

Epoch 1/100
[1m269644/269644[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m973s[0m 4ms/step - auroc: 0.8469 - loss: 0.2713 - val_auroc: 0.8580 - val_loss: 0.2645
Epoch 2/100
[1m269644/269644[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m981s[0m 4ms/step - auroc: 0.8559 - loss: 0.2657 - val_auroc: 0.8591 - val_loss: 0.2640
Epoch 3/100
[1m269644/269644[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m960s[0m 4ms/step - auroc: 0.8577 - loss: 0.2645 - val_auroc: 0.8604 - val_loss: 0.2647
Epoch 4/100
[1m269644/269644[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m970s[0m 4ms/step - auroc: 0.8584 - loss: 0.2643 - val_auroc: 0.8601 - val_loss: 0.2641
Epoch 5/100
[1m269644/269644[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m950s[0m 4ms/step - auroc: 0.8590 - loss: 0.2641 - val_auroc: 0.8595 - val_loss: 0.2761
Epoch 6/100
[1m269644/269644[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m973s[0m 4ms/step - auroc: 0.8588 - loss: 0.2640 - val_auroc: 0.8604 - val_loss: 0.2639
Epoch 7/10

In [33]:
# Designing the Model
scaled_model3 = Sequential()

scaled_model3.add(Dense(input_dim = scaled_inputs, activation = 'relu', units = 128))
scaled_model3.add(BatchNormalization())
scaled_model3.add(Dense(activation = 'relu', units = 128))
scaled_model3.add(BatchNormalization())
scaled_model3.add(Dense(activation = 'relu', units = 64))
scaled_model3.add(BatchNormalization())
scaled_model3.add(Dense(activation = 'relu', units = 32))
scaled_model3.add(BatchNormalization())
scaled_model3.add(Dense(activation = 'sigmoid', units = 1))

scaled_model3.summary()

In [34]:
# Compiling the model

scaled_model3.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = [AUC (name = 'auroc')])

# Training the model

history_scaled3 = scaled_model3.fit(scaled_x_train, scaled_y_train, 
                                    validation_data = (scaled_x_val, scaled_y_val), 
                                    epochs = 100, 
                                    callbacks = [early_stopping])

Epoch 1/100
[1m269644/269644[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m602s[0m 2ms/step - auroc: 0.8496 - loss: 0.2701 - val_auroc: 0.8590 - val_loss: 0.2640
Epoch 2/100
[1m269644/269644[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m592s[0m 2ms/step - auroc: 0.8572 - loss: 0.2649 - val_auroc: 0.8597 - val_loss: 0.2652
Epoch 3/100
[1m269644/269644[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m610s[0m 2ms/step - auroc: 0.8586 - loss: 0.2642 - val_auroc: 0.8606 - val_loss: 0.2632
Epoch 4/100
[1m269644/269644[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m612s[0m 2ms/step - auroc: 0.8591 - loss: 0.2637 - val_auroc: 0.8613 - val_loss: 0.2633
Epoch 5/100
[1m269644/269644[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m603s[0m 2ms/step - auroc: 0.8600 - loss: 0.2631 - val_auroc: 0.8615 - val_loss: 0.2630
Epoch 6/100
[1m269644/269644[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m628s[0m 2ms/step - auroc: 0.8601 - loss: 0.2634 - val_auroc: 0.8622 - val_loss: 0.2625
Epoch 7/10