# Week 6 Keras + Tensorflow for Multi-layered Perceptron (MLP)

In [3]:
# initialise the random number generator by setting the seed value
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
np.random.seed(7)

dataset = np.loadtxt('pima-indians-diabetes.data.csv',delimiter=',')
X = dataset[:,0:8]
Y = dataset[:,8]

## Define Model with `Sequential()`

Models in Keras are defined as a sequence of layers. We create a `𝑆𝑒𝑞𝑢𝑒𝑛𝑡𝑖𝑎𝑙` model and add layers one at a time until we are happy with our network topology. The first thing to get right is to ensure the input layer has the right number of inputs. This can be specified when creating the first layer with the input dim argument and setting it to 8 for the 8 input variables.


**Fully connected layers** are defined using the `Dense` class. We can specify the number of neurons in the layer as the first argument and specify the activation function using the `𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛` argument. We will use the rectifier (𝑟𝑒𝑙𝑢) activation function on the first two layers and the sigmoid activation function in the output layer. It used to be the case that 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 and 𝑡𝑎𝑛h activation functions were preferred for all layers. These days, better performance is seen using the 𝑟𝑒𝑙𝑢 activation function. We use a 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 activation function on the output layer to ensure our network output is between 0 and 1 and easy to map to either a probability of class 1 or snap to a hard classification of either class with a default threshold of 0.5. We can piece it all together by adding each layer. The first hidden layer has 12 neurons and expects 8 input variables (e.g. 𝑖𝑛𝑝𝑢𝑡 𝑑𝑖𝑚 = 8). The second hidden layer has 8 neurons and finally the output layer has 1 neuron to predict the class (onset of diabetes or not).

In [4]:
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation="relu"))
model.add(Dense(8,activation="relu"))
model.add(Dense(1,activation="sigmoid"))

In [5]:
%matplotlib inline

## Compile Model with `compile()`

We must specify the **loss function** to use to evaluate a set of weights, the **optimizer** used to search through different weights for the network and any optional metrics we would like to collect and report during training. In this case we will use 𝑙𝑜𝑔𝑎𝑟𝑖𝑡h𝑚𝑖𝑐 𝑙𝑜𝑠𝑠, which for a binary classification problem is defined in Keras as 𝑏𝑖𝑛𝑎𝑟𝑦_𝑐𝑟𝑜𝑠𝑠𝑒𝑛𝑡𝑟𝑜𝑝𝑦. We will also use the efficient gradient descent algorithm 𝑎𝑑𝑎𝑚 for no other reason that it is an efficient default

* loss function = `binary_crossentropy`
* optimizer = `adam`

In [6]:
# compile model
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])

## Fit Model with `fit()`

The training process will run for a fixed number of iterations through the dataset called epochs, that we must specify using the epochs argument. We can also set the number of instances that are evaluated before a weight update in the network is performed called the batch size and set using the batch size argument. For this problem we will run for a small number of epochs (50) and use a relatively small batch size of 10. 

In [7]:
# fit the model
model.fit(X, Y, epochs = 50, batch_size = 10)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0xb317372e8>

## Evaluate Model with `evaluate()`

You can evaluate your model on your training dataset using the 𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑖𝑜𝑛() function on your model and pass it the same input and output used to train the model. This will generate a prediction for each input and output pair and collect scores, including the average loss and any metrics you have configured, such as accuracy.

In [8]:
# evluate the model using the training dataset
scores = model.evaluate(X,Y)
print("\n%s: %.2f%%" % (model.metrics_names[1],scores[1]*100))


acc: 76.04%


## Data Splitting

Keras provides two ways to split data:
1. Automatic verification with `fit()`
2. Manual verification with `train_test_split()`
3. K-fold verification with `StratifiedKFold()`

**Automatic verfication dataset** 

Keras can separate a portion of your training data into a validation dataset and evaluate the performance of your model on that validation dataset each epoch. You can do this by setting the validation split argument on the 𝑓𝑖𝑡() function to a percentage of the size of your training dataset. For example, a reasonable value might be 0.2 or 0.33 for 20% or 33% of your training data held back for validation. 

In [9]:
# automatic verification dataset
from keras.models import Sequential
from keras.layers import Dense
import numpy as np

np.random.seed(7)
dataset = np.loadtxt('pima-indians-diabetes.data.csv',delimiter=',')
X = dataset[:,0:8]
Y = dataset[:,8]

model = Sequential()
model.add(Dense(12,input_dim=8,activation="relu"))
model.add(Dense(8,activation="relu"))
model.add(Dense(1,activation="sigmoid"))

model.compile(loss='binary_crossentropy',optimizer="adam",metrics = ["accuracy"])

model.fit(X,Y,validation_split=0.33, epochs=50, batch_size=10)

Train on 514 samples, validate on 254 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0xb31be7550>

**Manual Verification Dataset** 

Keras also allows you to manually specify the dataset to use for validation during training. In this example we use the handy train test 𝑠𝑝𝑙𝑖𝑡() function from the Python scikit-learn machine learning library to separate our data into a training and test dataset. 

In [10]:
# automatic verification dataset
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
import numpy as np

np.random.seed(7)
dataset = np.loadtxt('pima-indians-diabetes.data.csv',delimiter=',')
X = dataset[:,0:8]
Y = dataset[:,8]

X_train,X_test,y_train,y_test = train_test_split(X,Y,test_size=0.33, random_state=7)

model = Sequential()
model.add(Dense(12,input_dim=8,activation="relu"))
model.add(Dense(8,activation="relu"))
model.add(Dense(1,activation="sigmoid"))

model.compile(loss='binary_crossentropy',optimizer="adam",metrics = ["accuracy"])

model.fit(X_train,y_train,validation_data=(X_test,y_test), epochs=50, batch_size=10)

Train on 514 samples, validate on 254 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x1a332d3d68>

**Manual k-Fold Cross-Validation**

* **StratifiedKFold** means that the algorithm attemps to balance the number of instances of each class in each fold. 
* Verbose output for each epoch is turned off by using `verbose = 0`

In [15]:
# StratifiedKFold verification 
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import StratifiedKFold
import numpy as np

np.random.seed(7)
dataset = np.loadtxt('pima-indians-diabetes.data.csv',delimiter=',')
X = dataset[:,0:8]
Y = dataset[:,8]

# create a 10-fold cross validation 
kfold = StratifiedKFold(n_splits=10, shuffle=True,random_state=7)
cvscores = []

for train, test in kfold.split(X,Y):
    model = Sequential()
    model.add(Dense(12,input_dim=8,activation="relu"))
    model.add(Dense(8,activation="relu"))
    model.add(Dense(1,activation="sigmoid"))
    model.compile(loss='binary_crossentropy',optimizer="adam",metrics = ["accuracy"])
    model.fit(X[train],Y[train], epochs=50, batch_size=10,verbose=0)
    scores = model.evaluate(X[test],Y[test],verbose=0)
    print("%s: %.2f%%" % (model.metrics_names[1],scores[1]*100))
    cvscores.append(scores[1]*100)

print("%.2f%% (+/- %.2f%%)" % (np.mean(cvscores), np.std(cvscores)) )

acc: 64.94%
acc: 67.53%
acc: 62.34%
acc: 77.92%
acc: 76.62%
acc: 72.73%
acc: 63.64%
acc: 59.74%
acc: 71.05%
acc: 61.84%
67.83% (+/- 6.09%)


## Use Keras with Scikit-Learn: `KerasClassifier` and `KerasRegressor`

The `𝐾𝑒𝑟𝑎𝑠𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟` and `𝐾𝑒𝑟𝑎𝑠𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑜𝑟` classes in Keras take an argument `𝑏𝑢𝑖𝑙𝑑_𝑓𝑛` which is the name of the function to call to create your model. You must define a function called whatever you like that defines your model, compiles it and returns it. In the example below we define a function `𝑐𝑟𝑒𝑎𝑡𝑒_𝑚𝑜𝑑𝑒𝑙()` that create a simple multilayer neural network for the problem.

We pass this function name to the `𝐾𝑒𝑟𝑎𝑠𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟` class by the `𝑏𝑢𝑖𝑙𝑑_𝑓𝑛` argument. We also pass in additional arguments of `𝑒𝑝𝑜𝑐h𝑠` = 150 and `𝑏𝑎𝑡𝑐h 𝑠𝑖𝑧𝑒` = 10. These are automatically bundled up and passed on to the `𝑓𝑖𝑡()` function which is called internally by the `𝐾𝑒𝑟𝑎𝑠𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟` class. In this example we use the scikit-learn `𝑆𝑡𝑟𝑎𝑡𝑖𝑓𝑖𝑒𝑑𝐾𝐹𝑜𝑙𝑑` to perform 10-fold stratified cross- validation. This is a resampling technique that can provide a robust estimate of the performance of a machine learning model on unseen data. We use the scikit-learn function `𝑐𝑟𝑜𝑠𝑠_𝑣𝑎𝑙_𝑠𝑐𝑜𝑟𝑒()` to evaluate our model using the cross-validation scheme and print the results.


In [16]:
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import StratifiedKFold, cross_val_score
import numpy as np

# create a function to build a model, which is required by Kerasclassifier
def create_model():
    model = Sequential()
    model.add(Dense(12,input_dim=8,activation="relu"))
    model.add(Dense(8,activation="relu"))
    model.add(Dense(1,activation="sigmoid"))
    model.compile(loss='binary_crossentropy',optimizer="adam",metrics = ["accuracy"])
    return model

seed = 7
np.random.seed(seed)

X = dataset[:,0:8]
Y = dataset[:,8]

# create model
model = KerasClassifier(build_fn=create_model, epochs = 150, batch_size=10,verbose=0)
# 10-fold cross validation
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state= 7)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.6939337005088856


## Grid Search Deep Learning Model Parameters: `GridSearchCV`

In this example we use a grid search to evaluate different configurations for our neural network model and report on the combination that provides the best estimated performance. The `create_𝑚𝑜𝑑𝑒𝑙()` function is defined to take two arguments `𝑜𝑝𝑡𝑖𝑚𝑖𝑧𝑒𝑟` and `𝑖𝑛𝑖𝑡`, both of which must have default values. This will allow us to evaluate the effect of using different optimisation algorithms and weight initialisation schemes for our network. After creating our model, we define arrays of values for the parameter we wish to search, specifically:

* **Optimizers** for searching different weight values.
* **Initializers** for preparing the network weights using different schemes.
* **Number of epochs** for training the model for different number of exposures to the training dataset.
* **Batches** for varying the number of samples before weight updates.

The options are specified into a dictionary and passed to the configuration of the `𝐺𝑟𝑖𝑑𝑆𝑒𝑎𝑟𝑐h𝐶𝑉` scikit-learn class. This class will evaluate a version of our neural network model for each com- bination of parameters (2 × 3 × 3 × 3) for the combinations of optimizers, initializations, epochs and batches). Each combination is then evaluated using the default of 3-fold stratified cross-validation.

In [None]:
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier 
from sklearn.model_selection import GridSearchCV 
import numpy

# create a function to build a model, required for KerasClassifier
def create_model(optimizer='rmsprop', init='glorot_uniform'):
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=8, kernel_initializer=init, activation='relu')) 
    model.add(Dense(8, kernel_initializer=init, activation='relu'))
    model.add(Dense(1, kernel_initializer=init, activation='sigmoid'))
    # compile model
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy']) 
    return model

# fix random seed for reproducibility
seed = 7 
numpy.random.seed(seed)

# load data & split into input (X) and output (Y) variables
dataset = numpy.loadtxt("pima-indians-diabetes.data.csv", delimiter=",")
X = dataset[:,0:8] 
Y = dataset[:,8]

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)

# grid search epochs, batch size and optimizer
optimizers = ['rmsprop', 'adam']
inits = ['glorot_uniform', 'normal', 'uniform'] 
epochs = [50, 100, 150]
batches = [5, 10, 20]
param_grid = dict(optimizer=optimizers, epochs=epochs, batch_size=batches, init=inits)
grid = GridSearchCV(estimator=model, param_grid=param_grid)
grid_result = grid.fit(X, Y)

# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_)) 
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params): 
    print("%f (%f) with: %r" % (mean, stdev, param))

KeyboardInterrupt: 