# Hyper-Parameter Optimisation

In this notebook we will build a simple pipeline to optimize the hyper-parameters of our architecture as well as of out training setup. For this purpose, we will use the hyper-parameter optimisation framework [Optuna](https://optuna.org/).

We will demonstrate how Optuna can help us to select the adequate hyper-parameters to aid our training process. We will use a subset of the classic MNIST dataset to demonstrate how it works.

In [None]:
import optuna
import numpy as np

from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import Conv2D, Dense, Flatten
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import RMSprop

from matplotlib import pyplot as plt
plt.rcParams['figure.figsize'] = [15, 5]

### Dataset Preparation

Load the dataset, normalize it and extract a subset of it for both training and validation. We extract the subsets only for the sake of demonstration, so the training loops don't take too much time.

In [None]:
N_TRAIN_EXAMPLES = 600
N_VALID_EXAMPLES = 100
BATCHSIZE = 128
CLASSES = 10
EPOCHS = 10

# Load MNIST datase
(x_train, y_train), (x_valid, y_valid) = mnist.load_data()
num_samples, rows, cols = x_train.shape

# Create training subset
x_train = x_train[:N_TRAIN_EXAMPLES, ..., np.newaxis]/255
y_train = y_train[:N_TRAIN_EXAMPLES]

# Create validation subset
x_valid = x_valid[:N_VALID_EXAMPLES, ..., np.newaxis]/255
y_valid = y_valid[:N_VALID_EXAMPLES]

INPUT_SHAPE = (rows, cols, 1)

In [None]:
for cnt, idx in enumerate(np.random.randint(0, len(x_train), 24)):
    plt.subplot(3, 8, cnt+1), plt.imshow(x_train[idx, ...], cmap='gray')
    plt.axis(False), plt.title('Label: ' + str(y_train[idx]))

### Optuna Pipeline

In order to run a hyper-parameter search, we need to define an objective function. This function is a higher level abstraction of the actual loss function we use for tune the model parameters.

In [None]:
def objective(trial):
    """Objective function that controls the quality of the different runs.
    
    Args:
        trial (optuna.trial.Trial) Wrapper that controls the launching and the
            hyper-parameters of the different runs.
            
    Returns:
        (float) The quality metric of the current run.
        
    """    
    # Build model
    model = Sequential()
    model.add(
        Conv2D(
            filters=trial.suggest_categorical("filters", [32, 64]),
            kernel_size=trial.suggest_categorical("kernel_size", [3, 5]),
            strides=trial.suggest_categorical("strides", [1, 2]),
            activation=trial.suggest_categorical("activation", ["relu", "linear"]),
            input_shape=INPUT_SHAPE,
        )
    )
    model.add(Flatten())
    model.add(Dense(CLASSES, activation="softmax"))

    # Compile model with sampled learning rate.
    learning_rate = trial.suggest_float("learning_rate", 1e-5, 1e-1, log=True)
    model.compile(
        loss="sparse_categorical_crossentropy",
        optimizer=RMSprop(learning_rate=learning_rate),
        metrics=["accuracy"],
    )

    # Run training
    model.fit(x_train, y_train, shuffle=True, batch_size=BATCHSIZE, epochs=EPOCHS, verbose=False)

    # Evaluate model quality (performance on validation set)
    return model.evaluate(x_valid, y_valid, verbose=0)[1]

### Run Hyper-Parameter Optimisation

In order to run the hyper-parameter optimisation we need to create a `study` object. It contains the main directives (and even the "hyper-hyper-parameters") of the pipeline.

In [None]:
# Create Optuna study
study = optuna.create_study(direction="maximize")

# Launch hyper-parameter search
study.optimize(objective, n_trials=100)

print('Number of finished trials:', len(study.trials))

# Show detailed info about finished trials
study.trials_dataframe()

Show the optimisation metric for the different runs.

In [None]:
metrics = [trial.value for trial in study.trials]
plt.plot(metrics, '.-'), plt.grid(True)
plt.xlabel('trial'), plt.ylabel('metric')

Let's inspect the hyper-parameters for the best run.

In [None]:
# Show best trial
trial = study.best_trial
print('Best trial:', trial.number, '\n')
print('  Metric:', trial.value)
print('  Params:')
for key, value in trial.params.items():
    print('\t', key.ljust(13), ':', value)

In [None]:
# Show overall stats
mu = np.mean([trial.value for trial in study.trials])
std = np.std([trial.value for trial in study.trials])

print('Avg metric:', np.mean(mu))
print('Std metric:', np.mean(std))