# Hyperparameter Optimization

## Cross Validation

Utilize a validation set to tune hyperparameters.  

![](cross-validation.png)

# TODO: Add Tensorflow Example

# Random Search vs Grid Search

![](search.png)

In [None]:
 Gradient Noise Improves Learning for Very Deep Networks”Arvind Neelakantanet al., 2016


### Additional Notes

### How to make sure points in randomized search are farther apart?

**Low-discrepancy sequences**
1. Sobol Sequence
2. Hammersley Set
3. Halton Sequence
4. Poisson Disk Sampling

**Note the above techniques won't work well in higher dimension**

## What to monitor?

### Loss Curve

![](loss.png)

### Per layer activation
 - Magnitude, center (mean or median), breadth (sdevor quartiles)
 - Spatial/feature-rank variations

### Gradients
 - Magnitude, center (mean or median), breadth (sdevor quartiles)
 - Spatial/feature-rank variations
 
### Learning Trajectories
 - Plot parameter values in a low-dimensional space

## Overfitting

![](accuracy.png)

**Why do we want the ratio of weight update/weight magnitude to be around 1e-3?**

### Survey of  Hyperparameter Tuning Methods

In [None]:
from neupy import algorithms, layers

def train_network(n_hidden, x_train, x_test, y_train, y_test):
    network = algorithms.Momentum(
        [
            layers.Input(64),
            layers.Relu(n_hidden),
            layers.Softmax(10),
        ],

        # Randomly shuffle dataset before each
        # training epoch.
        shuffle_data=True,

        # Do not show training progress in output
        verbose=False,

        step=0.001,
        batch_size=128,
        error='categorical_crossentropy',
    )
    network.train(x_train, y_train, epochs=100)

    # Calculates categorical cross-entropy error between
    # predicted value for x_test and y_test value
    return network.prediction_error(x_test, y_test)

In [None]:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from neupy import environment

environment.reproducible()

dataset = datasets.load_digits()
n_samples = dataset.target.size
n_classes = 10

# One-hot encoder
target = np.zeros((n_samples, n_classes))
target[np.arange(n_samples), dataset.target] = 1

x_train, x_test, y_train, y_test = train_test_split(
    dataset.data, target, train_size=0.7
)


In [None]:
import numpy as np
from sklearn.gaussian_process import GaussianProcess

def vector_2d(array):
    return np.array(array).reshape((-1, 1))

def gaussian_process(x_train, y_train, x_test):
    x_train = vector_2d(x_train)
    y_train = vector_2d(y_train)
    x_test = vector_2d(x_test)

    # Train gaussian process
    gp = GaussianProcess(corr='squared_exponential',
                         theta0=1e-1, thetaL=1e-3, thetaU=1)
    gp.fit(x_train, y_train)

    # Get mean and standard deviation for each possible
    # number of hidden units
    y_mean, y_var = gp.predict(x_test, eval_MSE=True)
    y_std = np.sqrt(vector_2d(y_var))

    return y_mean, y_std