In [1]:
from IPython.display import Image

## **Note 5**: Hyperparameter optimization
<br><br>
Hyperparameters are parameters that are set before the learning process started. They control the algorithm responsible to optimize the objective function and directly influence how well the model performs. Tuning those hyperparameters (which defer depending on the model) manually is a very long and painful process. Thankfully there are other ways to improve their values which don't require more work for engineers. 
<br><br>

#### 1. Grid search
Considered the traditional way to optimize hyperparameters. Also called a brute-force search or an exhaustive search, it consists in choosing a set of values for each hyperparameter to optimize and try all combinations before to select the one which performs best using K-fold cross-validation. This technic suffers from the curse of dimensionality and wastes a lot of time analyzing bad hyperparameters because of its uninformed nature. 
<br>image source: https://goo.gl/8JJECg
<br><br>
![title](ims/gridsearch.jpg)

<br><br>
#### 2. Random search
Randomly sets values over a search space using a specified probability distribution. This technic has proven to be better than grid search especially for low intrinsic dimensionality problems where few hyperparameters influence the algorithm. It does not guarantee to find any good parameters by its random and uninformed nature.
<br>image source:: https://goo.gl/8JJECg
<br><br>
![title](ims/randomsearch.jpg)
<br><br>
#### 3. Bayesian search
Combines values exploration and exploitation. A probabilistic model based on Bayesian optimization is used to try new values and focus on the most promising one. Bayesian optimization uses a Gaussian Process function to predict posterior function based on prior function and an acquisition function to determine the next sampling point. It beats previous methods as it is capable of selecting hyperparameter values based on previous analysis.
<br>image source: https://goo.gl/ZYEdt8
<br><br>
![title](ims/bayesian.png)
<br><br>
Other methods also exist such as Gradient-based optimization, evolutionary optimization, and population-based optimization. I did not investigate them yet.The architecture of the model can also be optimized using those technics. In the next part of the notebook, I experimented with two libraries called **Hyperas and Talos**.

<br><br>
## Hyperas 
https://github.com/maxpumperla/hyperas


<br>
I tested the library on a toy dataset I created in a previous notebook. I only added a function to generate and save it to .npy Binary files so I can load the exact same dataset through the different model evaluations (**data.py**). I also reused the model but modify it to be used with hyperas (**hyperas.py**). After 25 trials, the best model output a loss value of 1.107572268210788e-07 and a perfect accuracy on the test set. Note that hyperas should run using the command line and free of comments.

<br>
The best model used the following hyperparamter values:
<br><br>
{'Activation': 0, 'Activation_1': 1, 'Dense': 2, 'Dense_1': 2, 'Dropout': 0.031459480186175934, 'batch_size': 0, 'lr': 0.0054328707426503095}

In [None]:
def data():
    '''Load the dataset
    '''
    path = 'D:/DATA/Works/AI/100_Posts/hyperparameter-tunning/data'
    x_train = np.load(os.path.join(path,'x_train.npy'))
    y_train = np.load(os.path.join(path,'y_train.npy'))
    x_test = np.load(os.path.join(path,'x_test.npy'))
    y_test = np.load(os.path.join(path,'y_test.npy'))

    return x_train, y_train, x_test, y_test



def create_model(x_train, y_train, x_test, y_test):
    '''Create the model with hyperas functions
    '''
    Inputs = (64, 64, 3)
    model = Sequential()

    model.add(KL.Conv2D(32, kernel_size=(3, 3),input_shape=Inputs))
    model.add(KL.Activation('relu'))
    model.add(KL.BatchNormalization())
    model.add(KL.MaxPooling2D((2, 2)))
    model.add(KL.Dropout({{uniform(0, 1)}}))

    model.add(KL.Flatten())
    model.add(KL.Dense({{choice([16 ,32, 64, 128])}}))
    model.add(KL.Activation({{choice(['relu', 'sigmoid'])}}))


    if {{choice(['three', 'four'])}} == 'four':
        model.add(KL.Dense({{choice([16 ,32, 64, 128])}}))
        model.add(KL.Activation('relu'))


    model.add(KL.Dense(1))
    model.add(KL.Activation('sigmoid'))


    adam=Adam(lr={{uniform(0.00005,0.01)}})
    model.compile(optimizer=adam,
                      loss='binary_crossentropy',
                      metrics=[metrics.binary_accuracy])


    H = model.fit(
        x_train,
        y_train,
        epochs=6,
        batch_size={{choice([64, 128])}})

    score = model.evaluate(x_test, y_test, verbose=0)
    accuracy = score[1]
    return {'loss': -accuracy, 'status': STATUS_OK, 'model': model}



best_run, best_model = optim.minimize(model=create_model,
                                              data=data,
                                              algo=tpe.suggest,
                                              max_evals=25,
                                              trials=Trials())


    X_train, Y_train, X_test, Y_test = data()
    print("Evalutation of best performing model:")
    print(best_model.evaluate(X_test, Y_test))
    print("Best performing model chosen hyper-parameters:")
    print(best_run)

<br><br>
## Talos
https://github.com/autonomio/talos

<br>
Talos is another tool that I tried this week. Instead of setting possible values in the model architecture like hyperas, it passes them as a dictionary of parameters which is more readable. I got pretty similar results with the previous library but found the overall experience more delightful with many powerful helper functions to analyze results. (**talos_optim.py**)

In [None]:
def data():
    '''Load the dataset
    '''
    path = 'D:/DATA/Works/AI/100_Posts/hyperparameter-tunning/data'
    x_train = np.load(os.path.join(path,'x_train.npy'))
    y_train = np.load(os.path.join(path,'y_train.npy'))

    return x_train, y_train


def create_model(x_train, y_train, x_val, y_val, params):
    '''Create the model with talos functions
    '''
    model = Sequential()

    model.add(KL.Conv2D(32, kernel_size=(3, 3),input_shape=(64, 64, 3)))
    model.add(KL.Activation('relu'))
    model.add(KL.BatchNormalization())
    model.add(KL.MaxPooling2D((2, 2)))
    model.add(KL.Dropout(params['dropout']))

    model.add(KL.Flatten())
    model.add(KL.Dense(params['first_neuron']))
    model.add(KL.Activation(params['activation']))

    model.add(KL.Dense(1))
    model.add(KL.Activation('sigmoid'))

    model.compile(optimizer=params['optimizer'](),
                      loss='binary_crossentropy',
                      metrics=[metrics.binary_accuracy])


    H = model.fit(
        x_train,
        y_train,
        validation_data=[x_val, y_val],
        epochs=params['epochs'],
        batch_size=params['batch_size'])

    return H, model



x, y = data()

p = {'dropout': [0,0.25,0.5,0.75],
    'first_neuron':[16 ,32, 64, 128, 256],
    'activation':[relu, elu],
    'optimizer': [Nadam, Adam],
    'epochs': [6],
    'batch_size': [32, 64]}


t = ta.Scan(x=x,
            y=y,
            model=create_model,
            params=p,
            dataset_name='shapes',
            grid_downsample=1,
            experiment_no='1')


result = to.main()
result.high('val_binary_accuracy')
result.best_params('val_binary_accuracy')

<br><br>

## Finally
To know how much hyperparameter optimization is worth, it is necessary to compare its result with a non-optimized model. To do so, I trained the exact same model with generic values on the same number of epochs.

In [5]:
#For the notebook's clarity, the code is moved the script called no_optim
import no_optim as no

no.train_model()

Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6
loss and binary accuracy on test samples [0.06147673666477203, 0.9600000023841858]


<br><br>
## Note:
1. Overall the hyperparameter optimization performs better than the non-optimized model on the test set after 6 epochs (accuracy is not a perfect metrics, TODO: should monitor with the ROC AUC or Precision- Recall metrics). However, the task to solve here is rather easy so a bigger amount of epochs both would be similar for optimized or not models. However I tried it on a real project with a more complicated task and it really impacted the learning phase, especially the time used by the network to reach its local minima.
<br><br>

2. I experimented with two libraries called hyperas (hyperopt's wrapper) and talos. I aim to study not only deep learning but also a wider set of machine learning models in the next few weeks. Therefore, I will focus on hyperopt for its wider compatibility. https://goo.gl/pWW7JJ 
<br><br>

3. I did not uncover yet all features and possibilities of the two libraries but aim to investigate it during the current and future projects' development.

<br><br>

## Resources:
*** Wikipedia** - Hyperparameter optimization https://goo.gl/tUSCFh
<br><br>
*** Prabhu** - Understanding Hyperparameters and its Optimisation techniques https://goo.gl/RjVGGt
<br><br>
*** Will Koehrsen** - Automated Machine Learning Hyperparameter Tuning in Python https://goo.gl/3P184e
<br><br>
*** Will Koehrsen** - A Conceptual Explanation of Bayesian Hyperparameter Optimization for Machine Learning https://goo.gl/DcAuQp
<br><br>
**Siraj Raval** Hyperparameter Optimization - The Math of Intelligence #7 https://www.youtube.com/watch?v=ttE0F7fghfk
<br><br>
*** Hvass Laboratories** - TensorFlow Tutorial #19 Hyper-Parameter Optimization https://goo.gl/bxcdZc
<br><br>
*** James Bergstra, Remi Bardenet, Yoshua Bengio, Balazs Kegl** Algorithms for Hyper-Parameter Optimization https://goo.gl/dTCyBG
<br><br>
*** James Bergstra, Yoshua Bengio** Random Search for Hyper-Parameter Optimization https://goo.gl/eKcLjL
<br><br>
*** J. Bergstra, D. Yamins, D. D. Cox** Making a Science of Model Search: Hyperparameter Optimization
in Hundreds of Dimensions for Vision Architectures https://goo.gl/AUKwa4
<br><br>
*** Jasper Snoek, Hugo Larochelle, Ryan P. Adams** Practical Bayesian Optimization of Machine Learning Algorithms https://goo.gl/jLYPCj
<br><br>
*** Ian Dewancker, Michael McCourt, Scott Clark** Bayesian Optimization Primer https://goo.gl/YZNAVv
<br><br>
*** Jan N. van Rijn and Frank Hutter** An Empirical Study of Hyperparameter Importance Across Datasets https://goo.gl/9tJRn4
<br><br>
*** Jason Brownlee** How to Grid Search Hyperparameters for Deep Learning Models in Python With Keras https://goo.gl/rnDtrW
<br><br>
*** Kishan Maladkar** Why Is Random Search Better Than Grid Search For Machine Learning https://goo.gl/TMKpq3
<br><br>
*** Miko** Hyperparameter Optimization with Keras https://goo.gl/373LbF https://github.com/autonomio/talos
<br><br>
**Tensorflow 2.0** What's new in TensorBoard (TF Dev Summit '19) https://goo.gl/RLBCXT