*This is just a ***lecture*** notebook - you do not have to hand this in!*

# Lecture 08 - 21.06.2022
Comparing Multi-Layer-Perceptrons and Convolutional Neural Networks on  the famous MNIST dataset

In [None]:
import json
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import Flatten, Dense, Conv2D, MaxPooling2D
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import MinMaxScaler

%matplotlib inline
sns.set_style('whitegrid')
sns.set(font_scale=1.32)
from matplotlib_inline.backend_inline import set_matplotlib_formats
set_matplotlib_formats('png')

tf.random.set_seed(89)

## MNIST Data
For this quick demo we use the famous MNIST dataset that contains handwritten digits from 0 to 9. It has 60.000 training samples and 10.000 test samples. The images themselves are really lightweight, containing only one channel with 28x28 pixels. The small size is also the reason this dataset is perfectly suited for using it during lecture.

In [None]:
(X_train, y_train), (X_test,y_test) = mnist.load_data()
# reshape dataset to have a single channel
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1))
# normalize to range [0; 1]
X_train = X_train.astype("float32") / 255.0
X_test = X_test.astype("float32") / 255.0
# one-hot encode the labels
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

In [None]:
fig, axs = plt.subplots(2, 5, figsize=(20, 8))
axs = axs.ravel()
for i in range(10):
    axs[i].imshow(X_train[i])
    axs[i].grid(False)
plt.tight_layout
plt.show()

## CNN
At first we build a simple CNN with 2 convolutional layers with a kernel size of 5 and 3, respectively. In between we use a max pooling layer with a kernel size of 2.

Inspecting the model summary you can see that the model has only around 42k parameters to train.

In [None]:
# Creating the model
cnn_model = tf.keras.models.Sequential([
    Conv2D(32, (5, 5), activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D((2, 2)),
    Conv2D(32, (3, 3), activation='relu'),
    Flatten(),
    Dense(10, activation='softmax')
],
    name="CNN"
)

cnn_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

cnn_model.summary()

Evaluating the training and the performance on the test data shows, that the CNN does a pretty decent job with a test accuracy of 99.02% - even though the model is rather small.

In [None]:
# We already performed the training for
"""
# Training the model
cnn_callbacks = [
    EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=5, restore_best_weights=True)
]
                  
cnn_history = cnn_model.fit(X_train, y_train, batch_size=32, epochs=10, validation_split=0.2, callbacks=cnn_callbacks)
"""

# You can simply load the trained model here
cnn_model = tf.keras.models.load_model("cnn_model")

# And can also have a look at how the training was
cnn_history = json.load(open("cnn_model_history.json", 'r'))
cnn_history = pd.DataFrame(cnn_history)
print("Training history:")
display(cnn_history)

# Evaluating the model
print("Evaluation")
cnn_test_performance = cnn_model.evaluate(X_test, y_test, verbose=1)
print(f"    Test loss:     {cnn_test_performance[0]}")
print(f"    Test accuracy: {cnn_test_performance[1]}")

We also plotted the training history for you:

In [None]:
 def plot_model_history(history, ax=None, metric='loss', ep_start=1, ep_stop=None, monitor='val_loss', mode='min', plttitle=None):
    if ax is None:
        fig,ax = plt.subplots()
    if ep_stop is None:
        ep_stop = len(history)
    if plttitle is None:
        plttitle = metric[0].swapcase() + metric[1:] + ' During Training'
    epochs=np.arange(ep_start,ep_stop+1, dtype='int')
    sns.lineplot(x=epochs, y=history[metric][ep_start-1:ep_stop], ax=ax)
    sns.lineplot(x=epochs, y=history['val_' + metric][ep_start-1:ep_stop], ax=ax)
    ax.set(title=plttitle)
    ax.set(ylabel=metric[0].swapcase() + metric[1:])
    ax.set(xlabel='Epoch')
    ax.legend(['train', 'val'], loc='upper right')

fig, ax = plt.subplots(1, 2, figsize=(14,7))
plot_model_history(cnn_history, ax=ax[0])
plot_model_history(cnn_history, metric='accuracy',ax=ax[1])
plt.tight_layout()
plt.show()

## MLP
In the next step we will build a MLP model. We already provide you with a model that has almost the same amount of paramters as our CNN has. Nevertheless, it has only an accuracy on the test data of ~97%.

**Task:**

Adjust the MLP. Can you come up with a model that beats the CNN? Is it even possible? How many parameters do you need for it?

You can adjust the model as well as the training procedure in the next two cells.

In [None]:
# Creating the model
mlp_model = tf.keras.models.Sequential([
    Flatten(input_shape=(28, 28, 1)),
    Dense(52, activation='relu'),
    Dense(10, activation='softmax')
])

mlp_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

mlp_model.summary()

In [None]:
# Training the model
mlp_callbacks = [
    EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=5, restore_best_weights=True)
]

mlp_history = mlp_model.fit(X_train, y_train, batch_size=64, epochs=10, validation_split=0.2, callbacks=mlp_callbacks)
mlp_history = pd.DataFrame(mlp_history.history)

# Evaluating the model
print("\nEvaluation")
mlp_test_performance = mlp_model.evaluate(X_test, y_test, verbose=1)
print(f"    Test loss:     {mlp_test_performance[0]}")
print(f"    Test accuracy: {mlp_test_performance[1]}")

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(14,7))
plot_model_history(mlp_history, ax=ax[0])
plot_model_history(mlp_history, metric='accuracy',ax=ax[1])
plt.tight_layout()
plt.show()