# Multi-Layer Perceptron in `keras`

In this series of lab sessions, you will use a Python library called `keras`.
You should visit [`keras` webpage](https://keras.io/) to get access to more information about this library, including a comprehensive documentation.

## The `Sequential` model in `keras`

This library offers two ways to define neural network models.
We will start with the `Sequential` class of `keras` models.
Below is an example of how to define a `Sequential` model:

In [1]:
import os

os.environ["KERAS_BACKEND"] = "torch"

In [2]:
import keras
from keras.models import Sequential
from keras.layers import Dense, InputLayer

**1. Define layers, and add them one by one to the model**


In [3]:
input_layer = InputLayer(input_shape=(24,))
hidden_layer1 = Dense(units=12, activation="relu")
hidden_layer2 = Dense(units=12, activation="sigmoid")
#[...]
output_layer = Dense(units=3, activation="linear")

model = Sequential([
    input_layer,
    hidden_layer1,
    hidden_layer2,
    # ...
    output_layer
])



**2. Pick an optimization algorithm (optimizer) and a loss function to be optimized**

Usual loss functions are:
* `"mse"` for regression,
* `"categorical_crossentropy"` for multiclass classification (when the `y` array fed to `fit` is of shape $(n, n_\text{classes})$)
* `"binary_crossentropy"` for binary classification (when the model is fed with `y` array of shape $(n, 1)$)

One can also specify additional metrics to be printed during training (correct classification rate here).

In [4]:
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

**3. Fit the model**

NB: do not try to execute the following line of code: variables `X_train` and `y_train` do not exist yet!

In [None]:
#model.fit(X_train, y_train, verbose=2, epochs=10, batch_size=200)

## Data pre-processing

Have a look at the `prepare_mnist` and `prepare_boston` functions defined below.

**Question #1.** What do these functions do? What are the shapes of returned arrays? Does the returned data correpond to classification or regression problems?

In [7]:
from sklearn.preprocessing import MinMaxScaler
from keras.datasets import mnist, boston_housing
from keras.utils import to_categorical

def prepare_mnist():
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train = x_train.reshape((x_train.shape[0], -1))
    x_test = x_test.reshape((x_test.shape[0], -1))
    scaler = MinMaxScaler()
    scaler.fit(x_train)
    x_train = scaler.transform(x_train)
    x_test = scaler.transform(x_test)
    y_train = to_categorical(y_train)
    y_test = to_categorical(y_test)
    return x_train, x_test, y_train, y_test


def prepare_boston():
    (x_train, y_train), (x_test, y_test) = boston_housing.load_data()
    scaler_x = MinMaxScaler()
    scaler_x.fit(x_train)
    x_train = scaler_x.transform(x_train)
    x_test = scaler_x.transform(x_test)
    scaler_y = MinMaxScaler()
    scaler_y.fit(y_train.reshape((-1, 1)))
    y_train = scaler_y.transform(y_train.reshape((-1, 1)))
    y_test = scaler_y.transform(y_test.reshape((-1, 1)))
    return x_train, x_test, y_train, y_test

x_train, x_test, y_train, y_test = prepare_mnist()
print(x_train.shape, y_train.shape)

(60000, 784) (60000, 10)


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/boston_housing.npz
[1m57026/57026[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
(404, 13) (404, 1)


## Building your first models

In the following, when fitting models, restrict the training to 10 epochs (which is not realistic, but training for more epochs takes time...)

**Question #2.** Following the guidelines provided above, implement a linear regression model for the `boston` dataset that would optimize on a least squares objective using Stochastic Gradient Descent and fit your model to the corresponding training data.

In [16]:
x_train, x_test, y_train, y_test = prepare_boston()
print(x_train.shape, y_train.shape)
#model.add(Dense(units=1, activation="linear",input_shape=(x_train.shape[1],))) #linear regression is the most simple perception hence no need of any hidden layers
#or
model=Sequential([
    InputLayer(input_shape=(x_train.shape[1],)),
    Dense(units=1, activation="linear")
])
model.compile(optimizer="sgd", loss="mse", metrics=["accuracy"])
model.fit(x_train, y_train, verbose=2, epochs=10, batch_size=100)
loss, accuracy = model.evaluate(x_test, y_test)
print('Test Loss:', loss)
print('Test Accuracy:', accuracy)

(404, 13) (404, 1)
Epoch 1/10
5/5 - 0s - 5ms/step - accuracy: 0.0050 - loss: 0.6068
Epoch 2/10
5/5 - 0s - 5ms/step - accuracy: 0.0050 - loss: 0.4072
Epoch 3/10
5/5 - 0s - 5ms/step - accuracy: 0.0025 - loss: 0.2961
Epoch 4/10
5/5 - 0s - 6ms/step - accuracy: 0.0050 - loss: 0.2503
Epoch 5/10
5/5 - 0s - 6ms/step - accuracy: 0.0074 - loss: 0.2216
Epoch 6/10
5/5 - 0s - 5ms/step - accuracy: 0.0074 - loss: 0.1994
Epoch 7/10




5/5 - 0s - 5ms/step - accuracy: 0.0099 - loss: 0.1856
Epoch 8/10
5/5 - 0s - 5ms/step - accuracy: 0.0099 - loss: 0.1747
Epoch 9/10
5/5 - 0s - 5ms/step - accuracy: 0.0099 - loss: 0.1654
Epoch 10/10
5/5 - 0s - 5ms/step - accuracy: 0.0099 - loss: 0.1575
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.0039 - loss: 0.1393    
Test Loss: 0.13295860588550568
Test Accuracy: 0.009803921915590763


**Question #3.** Similarly, define a logistic regression model for the `mnist` dataset and print its training accuracy during training.

In [12]:
x_train, x_test, y_train, y_test = prepare_mnist()
model = Sequential() # make sure you define model using Sequential() before adding layers
model.add(Dense(units=10, activation="softmax",input_shape=(x_train.shape[1],)))
model.compile(optimizer="sgd", loss="categorical_crossentropy", metrics=["accuracy"])
model.fit(x_train,y_train, verbose=2, epochs=10, batch_size=200)
loss, accuracy = model.evaluate(x_test, y_test)
print('Test Loss:', loss)
print('Test Accuracy:', accuracy)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/10
300/300 - 2s - 6ms/step - accuracy: 0.6484 - loss: 1.4552
Epoch 2/10
300/300 - 2s - 6ms/step - accuracy: 0.8248 - loss: 0.8522
Epoch 3/10
300/300 - 2s - 6ms/step - accuracy: 0.8471 - loss: 0.6845
Epoch 4/10
300/300 - 2s - 6ms/step - accuracy: 0.8582 - loss: 0.6032
Epoch 5/10
300/300 - 2s - 6ms/step - accuracy: 0.8656 - loss: 0.5538
Epoch 6/10
300/300 - 2s - 6ms/step - accuracy: 0.8706 - loss: 0.5201
Epoch 7/10
300/300 - 3s - 10ms/step - accuracy: 0.8747 - loss: 0.4953
Epoch 8/10
300/300 - 2s - 6ms/step - accuracy: 0.8781 - loss: 0.4760
Epoch 9/10
300/300 - 2s - 6ms/step - accuracy: 0.8806 - loss: 0.4606
Epoch 10/10
300/300 - 2s - 6ms/step - accuracy: 0.8832 - loss: 0.4479
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8738 - loss: 0.4784
Test Loss: 0.4206957519054413
Test Accuracy: 0.8919000029563904


**Question #4.** Compare performance (in terms of training accuracy, we will come back to better ways to compare models afterwards) of this logistic regression model with that of a neural network with respectively 1, 2, and 3 hidden layers of 128 neurons each.
You will use the `"relu"` activation function for hidden layers.

In [24]:

model=Sequential([
    InputLayer(input_shape=(x_train.shape[1],)),
    Dense(units=128, activation="relu"),
    Dense(units=128, activation="relu"),
    Dense(units=128, activation="relu"),
    Dense(units=10, activation="softmax")
])
model.compile(optimizer="sgd", loss="categorical_crossentropy", metrics=["accuracy"])
model.fit(x_train,y_train, verbose=2, epochs=10, batch_size=200)
loss, accuracy = model.evaluate(x_test, y_test)
print('Test Loss:', loss)
print('Test Accuracy:', accuracy)

Epoch 1/10
300/300 - 3s - 12ms/step - accuracy: 0.4554 - loss: 1.9143
Epoch 2/10
300/300 - 3s - 11ms/step - accuracy: 0.8171 - loss: 0.7584
Epoch 3/10
300/300 - 3s - 10ms/step - accuracy: 0.8710 - loss: 0.4691
Epoch 4/10
300/300 - 4s - 14ms/step - accuracy: 0.8906 - loss: 0.3881
Epoch 5/10
300/300 - 3s - 12ms/step - accuracy: 0.9013 - loss: 0.3464
Epoch 6/10
300/300 - 3s - 11ms/step - accuracy: 0.9082 - loss: 0.3188
Epoch 7/10
300/300 - 3s - 11ms/step - accuracy: 0.9137 - loss: 0.2983
Epoch 8/10
300/300 - 4s - 14ms/step - accuracy: 0.9181 - loss: 0.2816
Epoch 9/10
300/300 - 3s - 11ms/step - accuracy: 0.9228 - loss: 0.2674
Epoch 10/10
300/300 - 3s - 10ms/step - accuracy: 0.9269 - loss: 0.2550
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.9214 - loss: 0.2753
Test Loss: 0.2420545220375061
Test Accuracy: 0.9314000010490417


**Question #5.** `keras` models offer a `count_params()` method to get the number of parameters to be learned in the model. Use this facility to get the number of parameters of your 3-hidden-layer model and build a new one-hidden-layer model with an equivalent number of parameters. Compare performance of these two models with similar number of parameters.

## A better way to compare models

Comparing models based on training accuracy (resp. loss) is a "great" way to overfit your model to the training data.
A better way to compare models is to use hold out data (aka validation set).

To do so, `keras` allows to pass, at `fit` time, a fraction of the training data to be used as validation set. Have a look [there](https://keras.io/api/models/model_training_apis/#fit-method) for more details about how validation samples are selected.

**Question #6.** Repeat model comparisons above (relying on validation scores) using 30% of training data as validation set.

## Optimizers and learning rate

**Question #7.** Change the optimizer used for your model. Use an optimizer with momentum and adaptive learning rate.

**Question #8.** Using [the docs](https://keras.io/api/optimizers/), vary the learning rate of your optimizer from a very low value to a much larger one so as to show evidence of:
* instability when the learning rate is too large;
* slow convergence when the learning rate is too low.

## Callbacks

Callbacks are tools that, in `keras`, allow one to intervene during the training process of a model.
Callbacks can be used to take actions (_ie._ save intermediate model, stop optimization if overfitting occurs, _etc._).

A first callback one can play with is the one returned by any call to `fit` on a `keras` model.
This callback is an object with an `.history` attribute in the form of a Python dictionnary whose keys are the metrics recorded during training. Each of these keys links to an array containing the consecutive values of the considered quantity (one value per epoch).

**Question #9.** Plot correct classification rates on both training and validation sets.

Setting up other callbacks must be explicit. This is done by passing a list of callbacks to the `fit` method.

When training a model is long, one can wish to record intermediate models (in case of a crash during training, or just for cases when intermediate models were performing better than the final one).
The [`ModelCheckpoint`](https://keras.io/api/callbacks/model_checkpoint/) callback is designed for that purpose.

**Question #10.** Set up recording of intermediate models every epoch. Save the models into a dedicated file `model.keras` on your project. Only record models if validation loss is lower than for all previous models.

Use the code below to check that a model has been saved:

In [None]:
%ls -alh "model.keras"

In [None]:
model.evaluate(x_test, y_test)

In [None]:
model.load_weights("model.keras")
model.evaluate(x_test, y_test)

## Regularization

**Question #11.** Set up an [`EarlyStopping`](https://keras.io/api/callbacks/early_stopping/) strategy such that training the model will stop in case the validation loss does not decrease for 5 consecutive epochs.