# Experiments with Keras functional API on MNIST

This notebook will guide you through the use of the `keras` functional API. You are going to use the `mnist` dataset from LeCun et al. 1998

We assume you are using TF 2. If you need to install some packages, use `pip install ...`, e.g. `pip install sklearn` for SciKit Learn.

## Loading the packages

In [None]:
# First, import TF and get its version.
import tensorflow as tf
tf_version = tf.__version__

# Check if version >=2.0.0 is used
if not tf_version.startswith('2.'):
    print('WARNING: TensorFlow >= 2.0.0 will be used in this course.\nYour version is {}'.format(tf_version) + '.\033[0m')
else:
    print('OK: TensorFlow >= 2.0.0' + '.\033[0m')

In [None]:
import numpy as np
from matplotlib import pyplot as pl

from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras import utils
from sklearn import metrics as me

## Loading the raw data
First load the `mnist` dataset and normalize it to be in the range [0, 1]

In [None]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255.0
X_test /= 255.0
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

The target values of the network are supposed to be 1-hot targets. Now the `y_train` is an array with scalar values as in `[5 0 4 1 ...]` and it should be a 1-hot array `Y_train` as in : 

`[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]...]`
 
Note the change of capital letter in the `Y_train` to denote, per convention, an array with multiple dimensions.

In [None]:
n_classes = 10
Y_train = utils.to_categorical(y_train, n_classes)
Y_test = utils.to_categorical(y_test, n_classes)
print(Y_train[:10])

## MLP
Here is an example of Multi-Layer Perceptron first defined with the Sequential model then with the functional model. We will also need to tell Keras what is the size of our inputs, in our case a linearized vector of size D=784.

In [None]:
from tensorflow.keras.utils import plot_model
H = 300               # number of neurons
D = X_train.shape[1]  # dimension of input - 784 for MNIST

#Keras sequential model
model1 = Sequential()
model1.add(Dense(H, input_shape=(D,), activation='relu'))
model1.add(Dense(n_classes, activation='softmax'))
model1.summary()

#Keras functional API
visible = Input(shape=(D,)) # func api, input is declared
hidden1 = Dense(H, activation='relu')(visible)
output = Dense(n_classes, activation='softmax')(hidden1)
model2 = Model(inputs=visible, outputs=output)
model2.summary()
plot_model(model2, to_file='multilayer_perceptron_graph.png')

In [None]:
B = 128
E = 10
model2.compile(loss='categorical_crossentropy', optimizer='rmsprop', 
              metrics=['accuracy'])
log = model2.fit(X_train, Y_train, batch_size=B, epochs=E,
                    verbose=1, validation_data=(X_test, Y_test))

In [None]:
pl.plot(log.history['loss'], label='Training')
pl.plot(log.history['val_loss'], label='Testing')
pl.legend()
pl.grid()

In [None]:
loss_test, metric_test = model2.evaluate(X_test, Y_test)
print('Test loss:', loss_test)
print('Test accuracy:', metric_test)

## Convolutional neural network - CNN

In [None]:
# re-shape the data
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1).astype('float32')
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1).astype('float32')
print(X_train.shape)
print(X_test.shape)

In [None]:
# CNN - Keras functional API
from tensorflow.keras.utils import plot_model
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D

visible = Input(shape=(28,28,1))
conv1 = Conv2D(32, kernel_size=3, activation='relu')(visible)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(32, kernel_size=3, activation='relu')(pool1)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
flat = Flatten()(pool2)
hidden1 = Dense(100, activation='relu')(flat)
output = Dense(10, activation='softmax')(hidden1)
model3 = Model(inputs=visible, outputs=output)
# summarize layers
print(model3.summary())
# plot graph
plot_model(model3, to_file='convolutional_neural_network.png')

In [None]:
B = 128
E = 10
model3.compile(loss='categorical_crossentropy', optimizer='rmsprop', 
              metrics=['accuracy'])
log = model3.fit(X_train, Y_train, batch_size=B, epochs=E,
                 verbose=1, validation_data=(X_test, Y_test))

In [None]:
pl.plot(log.history['loss'], label='Training')
pl.plot(log.history['val_loss'], label='Testing')
pl.legend()
pl.grid()

In [None]:
loss_test, metric_test = model3.evaluate(X_test, Y_test)
print('Test loss:', loss_test)
print('Test accuracy:', metric_test)

## Convolutional neural network - CNN with multiple path and shared input layer

In [None]:
# Shared Input Layer
from tensorflow.keras.utils import plot_model
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Dropout
from tensorflow.keras.layers import Flatten, Conv2D
from tensorflow.keras.layers import MaxPooling2D, concatenate

# input layer
visible = Input(shape=(28,28,1))
# first feature extractor
conv1 = Conv2D(32, kernel_size=3, activation='relu')(visible)
drop1 = Dropout(0.2)(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(drop1)
flat1 = Flatten()(pool1)
# second feature extractor
conv2 = Conv2D(32, kernel_size=6, activation='relu')(visible)
drop2 = Dropout(0.2)(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(drop2)
flat2 = Flatten()(pool2)
# merge feature extractors
merge = concatenate([flat1, flat2])
# interpretation layer
hidden1 = Dense(100, activation='relu')(merge)
# prediction output
output = Dense(10, activation='softmax')(hidden1)
model4 = Model(inputs=visible, outputs=output)
# summarize layers
print(model4.summary())
# plot graph
plot_model(model4, to_file='shared_input_layer.png')

In [None]:
B = 128
E = 10
model4.compile(loss='categorical_crossentropy', optimizer='rmsprop', 
              metrics=['accuracy'])
log = model4.fit(X_train, Y_train, batch_size=B, epochs=E,
                 verbose=1, validation_data=(X_test, Y_test))

In [None]:
loss_test, metric_test = model4.evaluate(X_test, Y_test)
print('Test loss:', loss_test)
print('Test accuracy:', metric_test)

In [None]:
pl.plot(log.history['accuracy'], label='Training')
pl.plot(log.history['val_accuracy'], label='Testing')
pl.legend()
pl.grid()

## Conv neural network - CNN with multiple path, multiple features

In [None]:
# Shared Input Layer

# input layer
visible = Input(shape=(28,28,1))
# first feature extractor
conv1 = Conv2D(32, kernel_size=3, activation='relu')(visible)
drop1 = Dropout(0.2)(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(drop1)
flat1 = Flatten()(pool1)
# second feature extractor
conv2 = Conv2D(32, kernel_size=3, activation='relu')(pool1)
drop2 = Dropout(0.2)(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(drop2)
flat2 = Flatten()(pool2)
# third feature extractor
conv3 = Conv2D(32, kernel_size=3, activation='relu')(pool2)
drop3 = Dropout(0.2)(conv3)
pool3 = MaxPooling2D(pool_size=(2, 2))(drop3)
flat3 = Flatten()(pool3)
# merge feature extractors
merge = concatenate([flat1, flat2, flat3])
# interpretation layer
hidden1 = Dense(100, activation='relu')(merge)
# prediction output
output = Dense(10, activation='softmax')(hidden1)
model5 = Model(inputs=visible, outputs=output)
# summarize layers
print(model5.summary())
# plot graph
plot_model(model5, to_file='shared_input_layer_multi_feat.png')

In [None]:
from tensorflow.keras.callbacks import ModelCheckpoint

B = 128
E = 10
checkpoint = ModelCheckpoint('model-{epoch:03d}.h5', verbose=1, 
                             monitor='val_accuracy',save_best_only=True, 
                             mode='auto')
model5.compile(loss='categorical_crossentropy', optimizer='rmsprop', 
              metrics=['accuracy'])
log = model5.fit(X_train, Y_train, batch_size=B, epochs=E,
                 verbose=1, validation_data=(X_test, Y_test), 
                 callbacks=[checkpoint])

In [None]:
model5.load_weights(filepath = 'model-010.h5')
loss_test, metric_test = model5.evaluate(X_test, Y_test)
print('Test loss:', loss_test)
print('Test accuracy:', metric_test)

In [None]:
pl.plot(log.history['accuracy'], label='Training')
pl.plot(log.history['val_accuracy'], label='Testing')
pl.legend()
pl.grid()