### How to Tune Network Activation function

Use scikit-learn to grid search the activation function

Generally, the rectifier activation function is the most popular,
but it used to be the sigmoid and the tanh functions and these functions may still be more suitable for different problems.

In this example, we will evaluate the suite of different activation functions available in Keras.  
We will only use these functions in the hidden layer,
as we require a sigmoid activation function in the output for the binary classification problem.

Generally, it is a good idea to prepare data to the range of the different transfer functions, which we will not do in this case.

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.wrappers.scikit_learn import KerasClassifier
from keras.constraints import max_norm

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

Using TensorFlow backend.


### Load dataset

In [2]:
df = pd.read_csv("../datasets/pima-indians-diabetes.csv", delimiter=",")

### Investigate data

In [3]:
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [4]:
df.describe()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,0.471876,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,0.331329,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,0.078,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,0.24375,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,0.3725,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,0.62625,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,2.42,81.0,1.0


### Split data into input (X) and output (Y) variables

In [5]:
X = df.drop('Outcome', axis=1).as_matrix()
Y = df['Outcome'].as_matrix()

### Function to define model (required for KerasClassifier)

In [6]:
def create_model(activation='relu'):
    model = Sequential()
    model.add(Dense(12, input_dim=8, kernel_initializer='uniform'))
    model.add(Activation(activation))
    
    model.add(Dense(1, kernel_initializer='uniform'))
    model.add(Activation('sigmoid'))
    
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

### Create model

In [7]:
model = KerasClassifier(build_fn=create_model, nb_epoch=100, batch_size=10, verbose=0)

### Define the grid search parameters

In [8]:
activation = ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']
param_grid = dict(activation=activation)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)

### Run GridSearch

In [9]:
grid_result = grid.fit(X, Y)

### Summarize results

In [10]:
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.657552 using {'activation': 'relu'}
0.651042 (0.024774) with: {'activation': 'softmax'}
0.654948 (0.028940) with: {'activation': 'softplus'}
0.651042 (0.024774) with: {'activation': 'softsign'}
0.657552 (0.023939) with: {'activation': 'relu'}
0.651042 (0.024774) with: {'activation': 'tanh'}
0.651042 (0.024774) with: {'activation': 'sigmoid'}
0.651042 (0.024774) with: {'activation': 'hard_sigmoid'}
0.645833 (0.023939) with: {'activation': 'linear'}
