In [4]:
import numpy as np
import pandas as pd
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.optimizers import Adam
from keras.callbacks import LearningRateScheduler

from cifar_model import get_cifar10_cnn

## CIFARNET Convolutional Neural Network Experiments

Here, I tune and train CNN models, to recreate the empirical results of section 5 of the paper.

As specified in the paper, I fix the parameter $\beta_1$ at .99, and tune the learning rate $\alpha$ and the hyperparameter $\beta_2$ using a gridsearch, as done in the paper. 

The authors further specified that the number of hidden units is 100, and that they use the Relu activation function. I'll do the same.

## 0. Load CIFAR Dataset

I've already created train and test splits for the MNIST dataset. They are conviniently stored as compressed numpy arrays.

In [2]:
X_train = np.load("../../data/CIFAR/X_train.npy")
X_test = np.load("../../data/CIFAR/X_test.npy")
y_train = np.load("../../data/CIFAR/y_train.npy")
y_test = np.load("../../data/CIFAR/y_test.npy")

## 1. A framework for exhaustive gridsearch

The hyperpameters that I'll need to tune by gridsearch are: 

- $\beta_2$
- $\alpha$.

To do so in a neat fashion, and make use of all my cores (CPU training :( ) , I'll use the `GridSearchCV` class from `sklearn`, with the `KerasClassifier` wrapper.

The interface of this wrapper requres that I define a function that can be called with a set of hyperparameter options and create a `Sequential` model that can be compiled and trained.

The function that does this is in the file `cifar_model.py`

Note the hyperparameters that I do not tune, as they are fixed by the authors:

- $\beta_1 = .9$
- Discount rate: $\alpha_t$ = $\frac{\alpha}{\sqrt{t}}$
- Batch size = 128

In [8]:
# Example of how this works:
get_cifar10_cnn().summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_5 (Conv2D)            (None, 32, 32, 64)        6976      
_________________________________________________________________
activation_9 (Activation)    (None, 32, 32, 64)        0         
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 16, 16, 64)        0         
_________________________________________________________________
batch_normalization_3 (Batch (None, 16, 16, 64)        256       
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 16, 16, 64)        147520    
_________________________________________________________________
activation_10 (Activation)   (None, 16, 16, 64)        0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 16384)             0         
__________