# Part 2: Grid Search for Hyper Parameter Tuning in Neural Networks

To ensure the performane of neural networks, we cannot directly use them _out-of-the-box_.

In the first part of the tutorial, we discussed why do we need to tune the hyper parameters in neural networks. However, tuning these hyper parameters in a _ad hoc_ fashion is both inefficient and error-prone. Thus, we need to find a more systematic way for tuning the hyper parameters.

### Overview
In this post, I want to show you both how you can use the `scikit-learn`'s grid search capability and give you a suite of examples that you can copy-and-paste into your own project as a starting point.

Below is a list of the topics we are going to cover:

- How to use Keras models in scikit-learn.
- How to use grid search in scikit-learn.
- How to tune batch size and training epochs.
- How to tune optimization algorithms.
- How to tune learning rate and momentum.
- How to tune network weight initialization.
- How to tune activation functions.
- How to tune dropout regularization.
- How to tune the number of neurons in the hidden layer.

## How to Use Keras Models in scikit-learn
Keras models can be used in scikit-learn by wrapping them with the KerasClassifier or KerasRegressor class.

To use these wrappers you must define a function that creates and returns your Keras sequential model, then pass this function to the `build_fn` argument when constructing the `KerasClassifier` class.

For example:
``` python
def create_model():
	...
	return model

model = KerasClassifier(build_fn=create_model)
```

The constructor for the `KerasClassifier` class can take default arguments that are passed on to the calls to `model.fit()`, such as the number of epochs and the [batch size](https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/).

For example:

``` python
def create_model():
	...
	return model

model = KerasClassifier(build_fn=create_model, epochs=10)
```

The constructor for the `KerasClassifier` class can also take new arguments that can be passed to your custom `create_model()` function. These new arguments must also be defined in the signature of your `create_model()` function with default parameters.

For example:
``` python
def create_model(dropout_rate=0.0):
	...
	return model

model = KerasClassifier(build_fn=create_model, dropout_rate=0.2)
```

You can learn more about the [scikit-learn wrapper in Keras API documentation](https://faroit.com/keras-docs/2.0.0/scikit-learn-api/#wrappers-for-the-scikit-learn-api).

## (Review on) How to Use Grid Search in scikit-learn
Grid search is a model hyperparameter optimization technique.

In scikit-learn this technique is provided in the `GridSearchCV` class.

When constructing this class you must provide a __dictionary of hyperparameters__ to evaluate in the param_grid argument. This is a map of the model parameter name and an array of values to try.

By default, `accuracy` is the score that is optimized, but other scores can be specified in the `score` argument of the `GridSearchCV` constructor.

By default, the grid search will only use one thread. By setting the `n_jobs` argument in the `GridSearchCV` constructor to -1, the process will use all cores on your machine. Depending on your Keras backend, this may interfere with the main neural network training process.

The `GridSearchCV` process will then construct and evaluate one model for each combination of parameters. Cross validation is used to evaluate each individual model and the default of _3-fold cross validation_ is used, although this can be overridden by specifying the cv argument to the `GridSearchCV` constructor.

Below is an example of defining a simple grid search:

``` python
param_grid = dict(epochs=[10,20,30])
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)
```
Once completed, you can access the outcome of the grid search in the result object returned from `grid.fit()`. The `best_score_` attribute provides access to the best score observed during the optimization procedure and the `best_params_` describes the combination of parameters that achieved the best results.

You can learn more about the [GridSearchCV class in the scikit-learn API documentation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html).

### Problem Description
Now that we know how to use Keras models with scikit-learn and how to use grid search in scikit-learn, let’s look at a bunch of examples.

All examples will be demonstrated on a small standard machine learning dataset called the Pima Indians onset of diabetes classification dataset. This is a small dataset with all numerical attributes that is easy to work with. The dataset is available [here](https://raw.githubusercontent.com/DrJieTao/diabetesprediction/master/diabetes.csv).

As we proceed through the examples in this post, we will aggregate the best parameters. This is not the best way to grid search because parameters can interact, but it is good for demonstration purposes.



### Import Packages

Import required packages for this tutorial below.

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras.constraints import max_norm

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

### Define the `create_model()` Function

As discussed above, in order to link `keras` with `scikit-learn`, we need to create a function that will initialize and train the model for us. Let's do that now.

In [None]:
# Function to create model, required for KerasClassifier
def create_model(optimizer = 'sgd', learn_rate=0.001, momentum=0.0, init_mode='uniform', activation = 'sigmoid'):
	# create model
	model = Sequential() # a very simple MLP model
	model.add(Dense(12, input_dim=8, activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # since this is a classification problem, accuracy is fine
	return model

### Load the dataset

We will load th data as a `pandas` dataframe and take the values out as a `numpy` array. In the realm of deep learning, we prefer `numpy` over `pandas` for efficiency reasons.

In [None]:
data_url = 'https://raw.githubusercontent.com/DrJieTao/diabetesprediction/master/diabetes.csv'

# load dataset
pima = pd.read_csv(data_url,  header=0)
pima.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [None]:
pima_data = pima.values
pima_data[:5]

array([[6.000e+00, 1.480e+02, 7.200e+01, 3.500e+01, 0.000e+00, 3.360e+01,
        6.270e-01, 5.000e+01, 1.000e+00],
       [1.000e+00, 8.500e+01, 6.600e+01, 2.900e+01, 0.000e+00, 2.660e+01,
        3.510e-01, 3.100e+01, 0.000e+00],
       [8.000e+00, 1.830e+02, 6.400e+01, 0.000e+00, 0.000e+00, 2.330e+01,
        6.720e-01, 3.200e+01, 1.000e+00],
       [1.000e+00, 8.900e+01, 6.600e+01, 2.300e+01, 9.400e+01, 2.810e+01,
        1.670e-01, 2.100e+01, 0.000e+00],
       [0.000e+00, 1.370e+02, 4.000e+01, 3.500e+01, 1.680e+02, 4.310e+01,
        2.288e+00, 3.300e+01, 1.000e+00]])

In [None]:
# split into input (X) and output (Y) variables
X = pima_data[:,:-1]
y = pima_data[:,-1]

We can then create a model without any specified hyper parameters.

In [None]:
# create model
model = KerasClassifier(build_fn=create_model, verbose=0)

  model = KerasClassifier(build_fn=create_model, verbose=0)


Then we can define the parameter set for the `GridSearchCV`.

In [None]:
# define the grid search parameters
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
param_grid = dict(batch_size=batch_size, epochs=epochs)

We can now initialize the `GridSearchCV` class as `grid` with above parameter set, all the cores in the system, and 3-fold CV.

#### Note on Parallelizing Grid Search
All examples are configured to use parallelism (n_jobs=-1).

If you get an error like the one below:
``` python
INFO (tf.gof.compilelock): Waiting for existing lock by process '55614' (I am process '55613')
INFO (tf.gof.compilelock): To manually release the lock, delete ...
```
Kill the process and change the code to not perform the grid search in parallel, set `n_jobs=-1`.

In [None]:
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)

Now we can print out the results to see which combination of _batch size_ and _number of epoches_ gives us the best performance.

In [None]:
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.713542 using {'batch_size': 10, 'epochs': 100}
0.625000 (0.031412) with: {'batch_size': 10, 'epochs': 10}
0.651042 (0.032106) with: {'batch_size': 10, 'epochs': 50}
0.713542 (0.001841) with: {'batch_size': 10, 'epochs': 100}
0.571615 (0.023939) with: {'batch_size': 20, 'epochs': 10}
0.690104 (0.009207) with: {'batch_size': 20, 'epochs': 50}
0.679688 (0.020915) with: {'batch_size': 20, 'epochs': 100}
0.557292 (0.039365) with: {'batch_size': 40, 'epochs': 10}
0.657552 (0.041010) with: {'batch_size': 40, 'epochs': 50}
0.630208 (0.003683) with: {'batch_size': 40, 'epochs': 100}
0.580729 (0.043420) with: {'batch_size': 60, 'epochs': 10}
0.647135 (0.028587) with: {'batch_size': 60, 'epochs': 50}
0.651042 (0.004872) with: {'batch_size': 60, 'epochs': 100}
0.471354 (0.119437) with: {'batch_size': 80, 'epochs': 10}
0.622396 (0.052634) with: {'batch_size': 80, 'epochs': 50}
0.669271 (0.027866) with: {'batch_size': 80, 'epochs': 100}
0.518229 (0.041626) with: {'batch_size': 100, 'epochs':

We can see that the batch size of 20 and 100 epochs achieved the best result of about 72% accuracy.

## How to Tune the Training Optimization Algorithm
Keras offers a suite of different state-of-the-art optimization algorithms.

In this example, we tune the optimization algorithm used to train the network, each with default parameters.

This is an odd example, because often you will choose one approach a priori and instead focus on tuning its parameters on your problem (e.g. see the next example).

In the first part of the tutorial, we used the popular `SGD` optimizer. Here we will evaluate the [suite of optimization algorithms supported by the Keras API](http://keras.io/optimizers/).



In [None]:
# create model with the tuned hyper parameters
model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=20, verbose=0)

  model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=20, verbose=0)


In [None]:
# define the grid search parameters
optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']
param_grid = dict(optimizer=optimizer)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)

Now we can print out the results to see which _optimizer_ gives us the best performance.

In [None]:
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.700521 using {'optimizer': 'Adagrad'}
0.687500 (0.009568) with: {'optimizer': 'SGD'}
0.690104 (0.019488) with: {'optimizer': 'RMSprop'}
0.700521 (0.012890) with: {'optimizer': 'Adagrad'}
0.688802 (0.022402) with: {'optimizer': 'Adadelta'}
0.692708 (0.014731) with: {'optimizer': 'Adam'}
0.688802 (0.003683) with: {'optimizer': 'Adamax'}
0.678385 (0.003683) with: {'optimizer': 'Nadam'}


So we can see the `Adam` optimizer gives us the best results.

## How to Tune Learning Rate and Momentum
It is common to pre-select an optimization algorithm to train your network and tune its parameters.

By far the most common optimization algorithm is plain old Stochastic Gradient Descent (`SGD`, as we discussed in the previous part) because it is so well understood. In this example, we will look at optimizing the SGD learning rate and momentum parameters.

Learning rate controls how much to update the weight at the end of each batch and the momentum controls how much to let the previous update influence the current weight update.

We will try a suite of small standard learning rates and a momentum values from 0.2 to 0.8 in steps of 0.2, as well as 0.9 (because it can be a popular value in practice).

Generally, it is a good idea to also include the number of epochs in an optimization like this as there is a dependency between the amount of learning per batch (learning rate), the number of updates per epoch (batch size) and the number of epochs.

We will use the fine-tuned hyper parameters from above.

In [None]:
# create model with the tuned hyper parameters
model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=20, verbose=0)

  model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=20, verbose=0)


In [None]:
# define the grid search parameters and initialize the `grid` object
learn_rate = [0.001, 0.01, 0.1, 0.2, 0.3]
momentum = [0.0, 0.2, 0.4, 0.6, 0.8, 0.9]
param_grid = dict(learn_rate=learn_rate, momentum=momentum)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)

Now we can print out the results to see which combination of _learning rate_ and _momentum_ gives us the best performance.

In [None]:
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.721354 using {'learn_rate': 0.001, 'momentum': 0.8}
0.695312 (0.025516) with: {'learn_rate': 0.001, 'momentum': 0.0}
0.694010 (0.027126) with: {'learn_rate': 0.001, 'momentum': 0.2}
0.699219 (0.027251) with: {'learn_rate': 0.001, 'momentum': 0.4}
0.675781 (0.022326) with: {'learn_rate': 0.001, 'momentum': 0.6}
0.721354 (0.015073) with: {'learn_rate': 0.001, 'momentum': 0.8}
0.667969 (0.033603) with: {'learn_rate': 0.001, 'momentum': 0.9}
0.686198 (0.018136) with: {'learn_rate': 0.01, 'momentum': 0.0}
0.680990 (0.006639) with: {'learn_rate': 0.01, 'momentum': 0.2}
0.694010 (0.015733) with: {'learn_rate': 0.01, 'momentum': 0.4}
0.632812 (0.046983) with: {'learn_rate': 0.01, 'momentum': 0.6}
0.700521 (0.020752) with: {'learn_rate': 0.01, 'momentum': 0.8}
0.674479 (0.018414) with: {'learn_rate': 0.01, 'momentum': 0.9}
0.691406 (0.017758) with: {'learn_rate': 0.1, 'momentum': 0.0}
0.682292 (0.030314) with: {'learn_rate': 0.1, 'momentum': 0.2}
0.670573 (0.024360) with: {'learn_rate':

So we can see that the combinatio of `learning_rate = 0.001` and the `momentum = 0.4` returns the best result.

### How to Tune Network Weight Initialization
Neural network weight initialization used to be simple: use small random values.

Now there is a suite of different techniques to choose from. Keras provides a [laundry list](http://keras.io/initializations/).

In this example, we will look at tuning the selection of network weight initialization by evaluating all of the available techniques.

We will use the same weight initialization method on each layer. Ideally, it may be better to use different weight initialization schemes according to the activation function used on each layer. In this part, we use rectifier for the hidden layer. We use sigmoid for the output layer because the predictions are binary.

In [None]:
# create model
model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=20, verbose=0)

In [None]:
# define the grid search parameters
init_mode = ['uniform', 'lecun_uniform', 'normal', 'zero',
             'glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform']
param_grid = dict(init_mode=init_mode)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)

In [None]:
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.695312 using {'init_mode': 'uniform'}
0.695312 (0.011500) with: {'init_mode': 'uniform'}
0.680990 (0.021236) with: {'init_mode': 'lecun_uniform'}
0.682292 (0.016367) with: {'init_mode': 'normal'}
0.695312 (0.016877) with: {'init_mode': 'zero'}
0.670573 (0.019225) with: {'init_mode': 'glorot_normal'}
0.692708 (0.022402) with: {'init_mode': 'glorot_uniform'}
0.662760 (0.013279) with: {'init_mode': 'he_normal'}
0.673177 (0.008027) with: {'init_mode': 'he_uniform'}


We can see that the best results were achieved with a __uniform__ weight initialization scheme achieving a performance of about 72%.

### How to Tune the Neuron Activation Function
The activation function controls the __non-linearity__ of individual neurons and when to fire.

Generally, the rectifier activation function is the most popular, but it used to be the sigmoid and the tanh functions and these functions may still be more suitable for different problems.

In this part, we will evaluate the suite of [different activation functions](http://keras.io/activations/) available in Keras. We will only use these functions in the hidden layer, as we require a __sigmoid__ activation function in the output for the binary classification problem.

Generally, it is a good idea to prepare data to the range of the different transfer functions, which we will not do in this case.

In [None]:
model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=10, verbose=0)

In [None]:
activation = ['softmax', 'softplus', 'softsign', 'relu',
              'tanh', 'sigmoid', 'hard_sigmoid', 'linear']
param_grid = dict(activation=activation)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)

ERROR! Session/line number was not unique in database. History logging moved to new session 59


Now we can print out the results to see which _activation function_ gives us the best performance.

In [None]:
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.705729 using {'activation': 'relu'}
0.692708 (0.038976) with: {'activation': 'softmax'}
0.688802 (0.038582) with: {'activation': 'softplus'}
0.700521 (0.027126) with: {'activation': 'softsign'}
0.705729 (0.028940) with: {'activation': 'relu'}
0.701823 (0.027866) with: {'activation': 'tanh'}
0.675781 (0.027805) with: {'activation': 'sigmoid'}
0.692708 (0.004872) with: {'activation': 'hard_sigmoid'}
0.697917 (0.016053) with: {'activation': 'linear'}


We can observe that the `relu` activation function returns the best result.

### How to Tune Dropout Regularization
In this part, we will look at tuning the [dropout rate for regularization](https://machinelearningmastery.com/dropout-for-regularizing-deep-neural-networks/) in an effort to limit overfitting and improve the model’s ability to generalize.

To get good results, dropout is best combined with a weight constraint such as the max norm constraint.

For more on using dropout in deep learning models with Keras see [this post](http://machinelearningmastery.com/dropout-regularization-deep-learning-models-keras/).

This involves fitting both the dropout percentage and the weight constraint. We will try dropout percentages between `0.0` and `0.9` (`1.0` does not make sense) and maxnorm weight constraint values between `0` and `5`.

In [None]:
# redefine the `create_model` function here
def create_model(dropout_rate=0.0, weight_constraint=0):
	# create model
	model = Sequential()
	model.add(Dense(12, input_dim=8, kernel_initializer='uniform',
                 activation='linear', kernel_constraint=max_norm(weight_constraint)))
	model.add(Dropout(dropout_rate))
	model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model

In [None]:
# create model
model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=10, verbose=0)

In [None]:
# define the grid search parameters
weight_constraint = [1, 2, 3, 4, 5]
dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
param_grid = dict(dropout_rate=dropout_rate, weight_constraint=weight_constraint)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)



Now we can print out the results to see which _dropout regularization_ gives us the best performance.

In [None]:
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.729167 using {'dropout_rate': 0.4, 'weight_constraint': 1}
0.697917 (0.018136) with: {'dropout_rate': 0.0, 'weight_constraint': 1}
0.697917 (0.020752) with: {'dropout_rate': 0.0, 'weight_constraint': 2}
0.705729 (0.013279) with: {'dropout_rate': 0.0, 'weight_constraint': 3}
0.708333 (0.023939) with: {'dropout_rate': 0.0, 'weight_constraint': 4}
0.718750 (0.011500) with: {'dropout_rate': 0.0, 'weight_constraint': 5}
0.705729 (0.008027) with: {'dropout_rate': 0.1, 'weight_constraint': 1}
0.721354 (0.023510) with: {'dropout_rate': 0.1, 'weight_constraint': 2}
0.723958 (0.027126) with: {'dropout_rate': 0.1, 'weight_constraint': 3}
0.721354 (0.020505) with: {'dropout_rate': 0.1, 'weight_constraint': 4}
0.703125 (0.011049) with: {'dropout_rate': 0.1, 'weight_constraint': 5}
0.707031 (0.009568) with: {'dropout_rate': 0.2, 'weight_constraint': 1}
0.700521 (0.010253) with: {'dropout_rate': 0.2, 'weight_constraint': 2}
0.708333 (0.011201) with: {'dropout_rate': 0.2, 'weight_constraint': 

We can see that the dropout rate of `20%` and the maxnorm weight constraint of `1` resulted in the best accuracy of ~73%.

### How to Tune the Number of Neurons in the Hidden Layer
The number of neurons in a layer is an important parameter to tune. Generally the number of neurons in a layer controls the representational capacity of the network, at least at that point in the topology.

Also, generally, a large enough single layer network can approximate any other neural network, at least in theory. But if the input data is small, and the number of neurons is too large, the model may function unexpectedly.

In this part, we will look at tuning the number of neurons in a single hidden layer. We will try values from 1 to 30 in steps of 5.

A larger network requires more training and at least the batch size and number of epochs should ideally be optimized with the number of neurons.

In [None]:
# Function to create model, required for KerasClassifier
def create_model(neurons=1):
	# create model
	model = Sequential()
	model.add(Dense(neurons, input_dim=8, kernel_initializer='uniform', activation='linear', kernel_constraint=max_norm(4)))
	model.add(Dropout(0.2))
	model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model

In [None]:
# create model
model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=10, verbose=0)

In [None]:
# define the grid search parameters
neurons = [1, 5, 10, 15, 20, 25, 30]
param_grid = dict(neurons=neurons)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)



Now we can print out the results to see what _number of neurons_ gives us the best performance.

In [None]:
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.722656 using {'neurons': 25}
0.708333 (0.010253) with: {'neurons': 1}
0.708333 (0.038582) with: {'neurons': 5}
0.701823 (0.012890) with: {'neurons': 10}
0.714844 (0.019918) with: {'neurons': 15}
0.708333 (0.009744) with: {'neurons': 20}
0.722656 (0.019918) with: {'neurons': 25}
0.687500 (0.009568) with: {'neurons': 30}


We can see that the best results were achieved with a network with `25` neurons in the hidden layer with an accuracy of ~72%.

## Summary
In this post, you discovered how you can tune the hyperparameters of your deep learning networks in Python using Keras and scikit-learn.

Specifically, you learned:

- How to wrap Keras models for use in scikit-learn and how to use grid search.
- How to grid search a suite of different standard neural network parameters for Keras models.
- How to design your own hyperparameter optimization experiments.

This concludes both parts of the model optimization of neural networks - feel free to apply these concepts and techniques on your own project.