# Hyperopt notebook
Hyperopt is a Python library for serial and parallel optimization over awkward search spaces, which may include real-valued, discrete, and conditional dimensions.
In this notebook, the basics are covered and advanced features will be explored. Everything here will be done with the idea of applying hyperopt to hyper-parameter tuning of Keras models (usin TF2.1) (although generalization is persued for other cases as well).

## 1.- Imports

In [None]:
# cosmetic imports
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.ticker import LinearLocator, FormatStrFormatter
# real imports
import numpy as np
import joblib
from collections import OrderedDict
from hyperopt import fmin, tpe, hp, Trials, space_eval, plotting, STATUS_OK, STATUS_FAIL

## 2.- Objective functions
Test functions for optimization:
- [Himmelblau function][https://en.wikipedia.org/wiki/Himmelblau%27s_function]: Himmelblau, D.(1972). Applied Nonlinear Programming.
- [Eggholder function][https://www.researchgate.net/publication/337947149_Hybridization_of_interval_methods_and_evolutionary_algorithms_for_solving_difficult_optimization_problems]: Vanaret, Charlie. (2015). Hybridization of interval methods and evolutionary algorithms for solving difficult optimization problems. 

In applied mathematics, test functions, known as artificial landscapes, are useful to evaluate characteristics of optimization algorithms. These functions will be used to test hyperopt basics.

In [None]:
def himmelblau_function(x, y):
    return (x**2 + y -11)**2 + (x + y**2 - 7)**2

def eggholder_function(x, y):
    return -(y + 47)*np.sin(np.sqrt(np.abs((x/2) + (y + 47)))) - x*np.sin(np.sqrt(np.abs(x - y + 47)))

# Plotting function
def plot_3d_surface(f,  limits, angle=90):
    fig = plt.figure()
    ax = fig.gca(projection='3d')
    # Make data
    lower_, upper_, step_ = limits
    X = np.arange(lower_, upper_, step_)
    Y = np.arange(lower_, upper_, step_)
    X, Y = np.meshgrid(X, Y)
    Z = f(X, Y)
    # Plot the surface.
    surf = ax.plot_surface(X, Y, Z, cmap=cm.coolwarm,
                       linewidth=0, antialiased=False,
                          rstride=1, cstride=1)
    # Customize the z axis.
    ax.set_zlim(np.min(Z), np.max(Z))
    ax.zaxis.set_major_locator(LinearLocator(4))
    ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f'))
    # Add a color bar which maps values to colors.
    fig.colorbar(surf, shrink=0.5, aspect=5)
    ax.view_init(30, angle)
    plt.show()
plot_3d_surface(himmelblau_function, [-5, 5, 0.25], 80)
himmelblau_function(3,2)
#plot_3d_surface(eggholder_function, [-512, 512, 1], 45)
#eggholder_function(512, 404.2319)

## 3.- Hyperopt example
### _From their github page_
- Objective function
- Search space
- fmin

In [None]:
# define an objective function
def objective(args):
    case, val = args
    if case == 'case 1':
        return val
    else:
        return val ** 2

# define a search space
space = hp.choice(
    'a',
    [
        ('case 1', 1 + hp.lognormal('c1', 0, 1)),
        ('case 2', hp.uniform('name', -10, 10))
    ])

# minimize the objective over the space
best = fmin(objective, space, algo=tpe.suggest, max_evals=2000)
print(best)
# -> {'a': 1, 'c2': 0.01420615366247227}
print(space_eval(space, best))
# -> ('case 2', 0.01420615366247227}

## 4.- Search space
A search space is a dictionary or OrderedDict as in this case with the hyper-params ranges.
### 2.1.- Uniform range or prob distribution (float)
    - hp.randint(label, upper)
    - hp.uniform(label, low, high)
    - hp.loguniform(label, low, high)
    - hp.normal(label, mu, sigma)
    - hp.lognormal(label, mu, sigma)
### 2.2.- Quantized parameters (int)
    - hp.quniform(label, low, high, q)
    - hp.qloguniform(label, low, high, q)
    - hp.qnormal(label, mu, sigma, q)
    - hp.qlognormal(label, mu, sigma, q)
### 2.3.- Categorical parameters (choices)
    - hp.choice(label, ["list", "of", "potential", "choices"])
    - hp.choice(label, [hp.uniform(sub_label_1, low, high), hp.normal(sub_label_2, mu, sigma), None, 0, 1, "anything"])

In [None]:
SEARCH_SPACE = OrderedDict([('learning_rate',
                             hp.loguniform('learning_rate', np.log(0.01), np.log(0.5))),
                            ('epochs',
                             hp.choice('epochs', range(1, 51, 1))),
                            ('batch_size',
                             hp.choice('batch_size', [32, 64, 128, 256, 512])),
                            ('l1_reg',
                             hp.choice('l1_reg', np.arange(1e-5, 2e-4, 1e-6)))
                           ])

## 5. Objective function
Hyperopt minimizes the function, so change the sign if you need to.

In [None]:
def objective(params):
    all_params = {**params}
    return -1.0 * train_evaluate(all_params)

## 6.- Data and model declaration:

- Downloading the MNIST data for our model
- Create a function for building the model with the selected hyper parameters (logistic regression model)

In [None]:
from tflite2xcore.model_generation import utils
import tensorflow as tf
utils.set_all_seeds(42)
# Data
data = utils.prepare_MNIST(False, simard=False, padding=0)
for k, v in data.items():
    print(f"Prepped data[{k}] with shape: {v.shape}")

# Model, called from the objective function
def train_evaluate(params):
    core_model = tf.keras.Sequential(
        name='logistic_regression',
        layers=[
            tf.keras.layers.Flatten(input_shape=(28, 28, 1), name='input'),
            tf.keras.layers.Dense(10,
                                  activation='softmax',
                                  kernel_regularizer=tf.keras.regularizers.l1(params['l1_reg']))
        ]
    )
    core_model.compile(
        loss='sparse_categorical_crossentropy',
        optimizer=tf.keras.optimizers.RMSprop(learning_rate=params['learning_rate']),
        metrics=['accuracy'])
    core_model.fit(
        data['x_train'], data['y_train'],
        validation_data=(data['x_test'], data['y_test']),
        batch_size=params['batch_size'],
        verbose=0,
        epochs=params['epochs']
    )
    _, accuracy = core_model.evaluate(data['x_test'], data['y_test'])
    return accuracy

## 7.- Run trials
- Declare constants MAX_EVAL and instantiate the Trial object
- Run fmin to find the best candidate

In [None]:
trials = Trials()
MAX_EVALS = 1000
HPO_PARAMS = {'max_evals': MAX_EVALS,
              'trials': trials
             }
best = fmin(
    fn=objective,
    space=SEARCH_SPACE,
    algo=tpe.rand.suggest,
    **HPO_PARAMS
)

## 8.- Results

### 8.1.- Minimum

In [None]:
print(f"Found minimum after {HPO_PARAMS['max_evals']} trials:")
print(space_eval(SEARCH_SPACE, best))

### 8.2.- Best parameters

In [None]:
best_params = space_eval(SEARCH_SPACE, best)

### 8.3.- Accuracy range and median (50 trainings)

In [None]:
l = [-objective(best_params) for e in range(50)]
maxv = np.max(l)
minv = np.min(l)
print(f"Acc range: ({maxv}, {minv})\nAcc median: {np.median(l)}")

## 9. Save/load trials pickle

In [None]:
joblib.dump(trials, 'hyperopt_trials.pkl') # imagine max_evals = 100
trials = joblib.load('./hyperopt_trials.pkl')
#_ = fmin(objective, SPACE, trials=trials, algo=tpe.rand.suggest, max_evals=200) -> we can add more trials

## 10. Visualization:
- History
- Histogram

In [None]:
plotting.main_plot_history(trials)
plotting.main_plot_histogram(trials)