# Building neural network in Keras

In the lecture we have discussed artificial neural networks (ANNs) and today we will build a few different ones that reflect the concepts from that using Keras. 

Keras is a Python interface that focuses on neural networks and especially shines for deep learning implementations. It's a very popular choice due to its ease of use and customization options. You'll remember from the lecture how many choices can be made regarding the structure and the tuning of a neural network. Keras will help us with that. Check out their extensive documentation here: https://keras.io/guides/ There's tutorials for both simple implementations like the ones we're doing in class here, but also more advanced options such as parallel processing and recurrent ANNs which are used for example in Natural Language Processing (NLP) for those of you interested in diving deeper after the course.

This script will construct a simple neural network for our churn data. Remember that churn reflects whether a customer will stop buying a product from a company.

In [1]:
# need to install tensorflow and keras first, may take a few minutes
!pip install tensorflow
!pip install keras

Collecting tensorflow
  Downloading tensorflow-2.11.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (588.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m588.3/588.3 MB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting google-pasta>=0.1.1
  Downloading google_pasta-0.2.0-py3-none-any.whl (57 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.5/57.5 KB[0m [31m471.8 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting astunparse>=1.6.0
  Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting gast<=0.4.0,>=0.2.1
  Downloading gast-0.4.0-py3-none-any.whl (9.8 kB)
Collecting tensorboard<2.12,>=2.11
  Downloading tensorboard-2.11.0-py3-none-any.whl (6.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.0/6.0 MB[0m [31m46.4 MB/s[0m eta [36m0:00:00[0m:00:01[0m
Collecting grpcio<2.0,>=1.24.3
  Downloading grpcio-1.51.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (

## Dataset

We use the churn dataset:

In [5]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

df = pd.read_csv('churn_ibm.csv')
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


You already know the pre-processing steps from before:

In [6]:
y = df['Churn']
X = df.drop(['Churn','customerID'],axis=1)

for column in X.columns:
    if X[column].dtype == object:
        X = pd.concat([X,pd.get_dummies(X[column], prefix=column, drop_first=True)],axis=1).drop([column],axis=1)

# Also for neural networks, it's best to scale the input variables. IN contrast to decision trees with which, 
# we've analysed churn data in the past, ANNs are more sensitive to scaling issues.

X = StandardScaler().fit_transform(X)
y = pd.get_dummies(y, prefix='churn', drop_first=True)


## Build a 2-layer neural network

We'll now build a neural network with two hidden layers using Keras. Keras builds on top of TensorFlow which is a huge ML focused Python library. As mentioned before, Keras focuses on neural networks specifically. We start the process as we always do by splitting our data into test and train.

In [7]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, roc_auc_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)

We now import the Keras modules that we need. We will make use of their 'Sequential' functionality which allows us to add multiple hidden layers together very easily.

https://keras.io/guides/sequential_model/

Keras' sequential model wants us to specify the number of features (=input dimension) as well as the output dimension (=number of classes - 1). We'll store them as variables for easier readability in the model.

In [12]:
from keras.models import Sequential
from keras.layers.core import Dense, Activation

input_dim = X_train.shape[1]
output_dim = y_train.shape[1]

print('Input dimension: ', input_dim)
print('Output dimension: ', output_dim)

Input dimension:  30
Output dimension:  1


Now, we create the model. We use a sequential model, meaning we can sequentially add layers to it as follows:

In [13]:
# Create a model instance (Sequential, as we add each layer in order of appearance)
model = Sequential()

# Add the input layer and connect to 50 hidden neurons
model.add(Dense(50,input_dim=input_dim))
# 'Dense' refers to a fully connected network where each node is connected to each node of the next layer
# Note that we tell the network how big it should be by both specifying the size of the input vector
# and the number of nodes we'd like to be in the first hidden layer

# Connect the neurons to the next 50 neurons; this is your second hidden layer
model.add(Dense(50))

# Connect the previous layer to the output layer; note we specify the output dimensions here
model.add(Dense(output_dim))

# Add a final layer for classification based on the sigmoid function (since this is binary)
# We could add this after every layer (more on this later)
model.add(Activation('sigmoid'))

# Create the model with an optimizer, loss function, and evaluation metric.
model.compile(optimizer='sgd',loss='binary_crossentropy',metrics=['accuracy'])

# We'll make some changes to these later, but for now we use Stochastic Gradient Descent (SGD) as an optimizer,
# cross entropy as a cost function (remember that we are classifying, so cross entropy is appropriate here;
# in a regression problem you would want to use for example MeanSquaredError) 
# and we want the network to be evaluated by its accuracy (other choices would be for example recall or 
# precision)


Now, we can summarise the model and obtain predictions for the test set:

In [10]:
# fit the above specified model with the training data
model.summary()
model.fit(X_train,y_train)

# Test the model on our test data and obtain the results as predicted probabilities
prediction_prob = model.predict(X_test)
print(prediction_prob)

# Also obtain the results as a class (here 0/1)
classes=(prediction_prob > 0.5).astype("int32")
print(classes)


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 50)                1550      
                                                                 
 dense_1 (Dense)             (None, 50)                2550      
                                                                 
 dense_2 (Dense)             (None, 1)                 51        
                                                                 
 activation (Activation)     (None, 1)                 0         
                                                                 
Total params: 4,151
Trainable params: 4,151
Non-trainable params: 0
_________________________________________________________________
[[0.07582566]
 [0.271753  ]
 [0.3009887 ]
 ...
 [0.32117188]
 [0.32301703]
 [0.3397907 ]]
[[0]
 [0]
 [0]
 ...
 [0]
 [0]
 [0]]


The first part of the output shows the structure of your network: You'll see that we have two dense hidden layers with 50 nodes each, as well as one output layer of size 1 and an activation function that we add as a final layer which transforms our output into probabilities.

The total number of parameters that the model has to estimate is 4,151.

We use our outcomes to calculate the final evaluation metrics:

In [11]:
print('Accuracy:', accuracy_score(y_test,classes))
print('AUC:',roc_auc_score(y_test,prediction_prob))

Accuracy: 0.7834123222748816
AUC: 0.8199310221717716


That's already a pretty good result but we will learn how to improve it later on.