## MNIST Digit Recognizer Kaggle Competition
###### https://www.kaggle.com/c/digit-recognizer

**Competition Description**:
MNIST ("Modified National Institute of Standards and Technology") is the de facto “hello world” dataset of computer vision. Since its release in 1999, this classic dataset of handwritten images has served as the basis for benchmarking classification algorithms. As new machine learning techniques emerge, MNIST remains a reliable resource for researchers and learners alike.

In this competition, your goal is to correctly identify digits from a dataset of tens of thousands of handwritten images. We’ve curated a set of tutorial-style kernels which cover everything from regression to neural networks. We encourage you to experiment with different algorithms to learn first-hand what works well and how techniques compare.

## The Approach

The approach outlined in this notebook is to first perform a grideserach of the hyperparameter space including  droput rate, batch size, and the optimizer used. Then, the best performing model is expanded upon in a longer, more robust analysis. 

### Import Required Packages

In [3]:
import pandas as pd
import numpy as np

%matplotlib inline

np.random.seed(2)

from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import itertools

from keras.utils.np_utils import to_categorical # convert to one-hot-encoding
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D
from keras.optimizers import RMSprop
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ReduceLROnPlateau

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [4]:
from keras.callbacks import EarlyStopping
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier

In [5]:
%config IPCompleter.greedy=True

### Data  Preprocessing

In [6]:
#Load in the data
train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")

In [7]:
#Identify column with the label
Y_train = train["label"]

# Drop 'label' column
X_train = train.drop(labels = ["label"],axis = 1) 

# remove train as it is no longer needed
del train 

#Count frequency of each value
Y_train.value_counts()

1    4684
7    4401
3    4351
9    4188
2    4177
6    4137
0    4132
4    4072
8    4063
5    3795
Name: label, dtype: int64

In [8]:
# Normalize the data
X_train = X_train / 255.0
test = test / 255.0

In [9]:
# Reshape image in 3 dimensions (height = 28px, width = 28px , canal = 1)
X_train = X_train.values.reshape(-1,28,28,1)
test = test.values.reshape(-1,28,28,1)

In [10]:
# Encode labels into categorical variables
Y_train = to_categorical(Y_train, num_classes = 10)

In [11]:
# Set the random seed
random_seed = 1404

In [12]:
# Split the train and the validation set for the fitting
X_train, X_val, Y_train, Y_val = train_test_split(X_train, Y_train, test_size = 0.1, random_state=random_seed)

### Large Grid Search

We wanted to do a large search of the hyperparameter space, selecting 6 different dropout rates, 3 batch sizes,
and 7 optimizers. These will be run over 5 epochs and 3 (default) cross-validation folds - 6\*3\*7\*3 total of different searches, each with 5 epochs. This resulting in 378 different iterations of 5 epochs.

In [15]:
# Set the parameters for the grid search
dropout_rate = [0.05, 0.1, 0.15, 0.2, 0.25, 0.3]
batch_size = [100, 200, 250]
epochs = [5]
optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']
#learn_rate = [0.001, 0.005, 0.01, 0.1, 0.2, 0.3]
#momentum = [0.0, 0.2, 0.4, 0.6, 0.8, 0.9]

In [24]:
X = X_train
Y = Y_train

#Create the model we will use on the grid search
def create_model(dropout_rate=0.25, weight_constraint=0, optimizer = 'RMSProp'):
    # create model
    model = Sequential()

    model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same', 
                 activation ='relu', input_shape = (28,28,1)))
    model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same', 
                 activation ='relu'))
    model.add(MaxPool2D(pool_size=(2,2)))
    model.add(Dropout(dropout_rate))


    model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', 
                 activation ='relu'))
    model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', 
                 activation ='relu'))
    model.add(MaxPool2D(pool_size=(2,2), strides=(2,2)))
    model.add(Dropout(dropout_rate))
    
    #FROM TWO SECTIONS DOWN
    model.add(Flatten())
    model.add(Dense(256, activation = "relu"))
    model.add(Dropout(dropout_rate))
    model.add(Dense(10, activation = "softmax"))
    model.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])
    return model

In [None]:
# create model
model = KerasClassifier(build_fn=create_model, verbose=2)
# define the grid search parameters
param_grid = dict(batch_size=batch_size, epochs=epochs, dropout_rate=dropout_rate, optimizer=optimizer)
grid = GridSearchCV(estimator=model, param_grid=param_grid)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Epoch 1/5
 - 177s - loss: 1.8108 - acc: 0.4717
Epoch 2/5
 - 175s - loss: 0.4207 - acc: 0.8684
Epoch 3/5
 - 164s - loss: 0.2448 - acc: 0.9252
Epoch 4/5
 - 167s - loss: 0.1787 - acc: 0.9438
Epoch 5/5
 - 165s - loss: 0.1442 - acc: 0.9566
Epoch 1/5
 - 165s - loss: 1.5872 - acc: 0.4892
Epoch 2/5
 - 165s - loss: 0.3805 - acc: 0.8831
Epoch 3/5
 - 165s - loss: 0.2416 - acc: 0.9267
Epoch 4/5
 - 165s - loss: 0.1746 - acc: 0.9467
Epoch 5/5
 - 166s - loss: 0.1385 - acc: 0.9575
Epoch 1/5
 - 165s - loss: 1.7556 - acc: 0.4328
Epoch 2/5
 - 165s - loss: 0.3683 - acc: 0.8862
Epoch 3/5
 - 164s - loss: 0.2175 - acc: 0.9358
Epoch 4/5
 - 165s - loss: 0.1565 - acc: 0.9527
Epoch 5/5
 - 167s - loss: 0.1307 - acc: 0.9606
Epoch 1/5
 - 166s - loss: 0.2569 - acc: 0.9166
Epoch 2/5
 - 167s - loss: 0.0596 - acc: 0.9817
Epoch 3/5
 - 165s - loss: 0.0404 - acc: 0.9873
Epoch 4/5
 - 165s - loss: 0.0301 - acc: 0.9906
Epoch 5/5
 - 165s - loss: 0.0221 - acc: 0.9935
Epoch 1/5
 - 166s - loss: 0.2543 - acc: 0.9189
Epoch 2/5
 - 

Epoch 1/5
 - 173s - loss: 0.2545 - acc: 0.9193
Epoch 2/5
 - 171s - loss: 0.0642 - acc: 0.9795
Epoch 3/5
 - 171s - loss: 0.0439 - acc: 0.9865
Epoch 4/5
 - 171s - loss: 0.0341 - acc: 0.9894
Epoch 5/5
 - 171s - loss: 0.0262 - acc: 0.9910
Epoch 1/5
 - 181s - loss: 0.2424 - acc: 0.9250
Epoch 2/5
 - 173s - loss: 0.0726 - acc: 0.9780
Epoch 3/5
 - 170s - loss: 0.0492 - acc: 0.9845
Epoch 4/5
 - 171s - loss: 0.0378 - acc: 0.9876
Epoch 5/5
 - 170s - loss: 0.0278 - acc: 0.9909
Epoch 1/5
 - 172s - loss: 0.2636 - acc: 0.9162
Epoch 2/5
 - 171s - loss: 0.0697 - acc: 0.9789
Epoch 3/5
 - 170s - loss: 0.0480 - acc: 0.9849
Epoch 4/5
 - 170s - loss: 0.0367 - acc: 0.9884
Epoch 5/5
 - 172s - loss: 0.0281 - acc: 0.9910
Epoch 1/5
 - 172s - loss: 0.2684 - acc: 0.9143
Epoch 2/5
 - 171s - loss: 0.0768 - acc: 0.9765
Epoch 3/5
 - 171s - loss: 0.0514 - acc: 0.9841
Epoch 4/5
 - 170s - loss: 0.0392 - acc: 0.9885
Epoch 5/5
 - 172s - loss: 0.0300 - acc: 0.9904
Epoch 1/5
 - 175s - loss: 0.2227 - acc: 0.9292
Epoch 2/5
 - 

Epoch 1/5
 - 181s - loss: 0.4533 - acc: 0.8559
Epoch 2/5
 - 177s - loss: 0.1023 - acc: 0.9677
Epoch 3/5
 - 176s - loss: 0.0743 - acc: 0.9769
Epoch 4/5
 - 178s - loss: 0.0618 - acc: 0.9812
Epoch 5/5
 - 177s - loss: 0.0530 - acc: 0.9828
Epoch 1/5
 - 181s - loss: 0.3938 - acc: 0.8724
Epoch 2/5
 - 178s - loss: 0.0888 - acc: 0.9720
Epoch 3/5
 - 177s - loss: 0.0633 - acc: 0.9801
Epoch 4/5
 - 177s - loss: 0.0500 - acc: 0.9842
Epoch 5/5
 - 177s - loss: 0.0421 - acc: 0.9866
Epoch 1/5
 - 184s - loss: 0.3301 - acc: 0.8910
Epoch 2/5
 - 179s - loss: 0.0796 - acc: 0.9750
Epoch 3/5
 - 180s - loss: 0.0570 - acc: 0.9833
Epoch 4/5
 - 181s - loss: 0.0431 - acc: 0.9865
Epoch 5/5
 - 180s - loss: 0.0341 - acc: 0.9894
Epoch 1/5
 - 185s - loss: 0.3624 - acc: 0.8850
Epoch 2/5
 - 181s - loss: 0.0766 - acc: 0.9756
Epoch 3/5
 - 181s - loss: 0.0551 - acc: 0.9824
Epoch 4/5
 - 196s - loss: 0.0405 - acc: 0.9868
Epoch 5/5
 - 183s - loss: 0.0303 - acc: 0.9912
Epoch 1/5
 - 185s - loss: 0.3651 - acc: 0.8860
Epoch 2/5
 - 

Epoch 1/5
 - 195s - loss: 2.0524 - acc: 0.2673
Epoch 2/5
 - 195s - loss: 0.7365 - acc: 0.7610
Epoch 3/5
 - 191s - loss: 0.4224 - acc: 0.8669
Epoch 4/5
 - 189s - loss: 0.3125 - acc: 0.9014
Epoch 5/5
 - 189s - loss: 0.2478 - acc: 0.9219
Epoch 1/5
 - 197s - loss: 1.9716 - acc: 0.3019
Epoch 2/5
 - 190s - loss: 0.6863 - acc: 0.7794
Epoch 3/5
 - 190s - loss: 0.4071 - acc: 0.8724
Epoch 4/5
 - 189s - loss: 0.3002 - acc: 0.9063
Epoch 5/5
 - 190s - loss: 0.2456 - acc: 0.9238
Epoch 1/5
 - 196s - loss: 2.0745 - acc: 0.2696
Epoch 2/5
 - 190s - loss: 0.7529 - acc: 0.7498
Epoch 3/5
 - 189s - loss: 0.4377 - acc: 0.8629
Epoch 4/5
 - 189s - loss: 0.3204 - acc: 0.9015
Epoch 5/5
 - 190s - loss: 0.2522 - acc: 0.9208
Epoch 1/5
 - 199s - loss: 0.2793 - acc: 0.9104
Epoch 2/5
 - 191s - loss: 0.0825 - acc: 0.9748
Epoch 3/5
 - 191s - loss: 0.0628 - acc: 0.9819
Epoch 4/5
 - 191s - loss: 0.0526 - acc: 0.9835
Epoch 5/5
 - 191s - loss: 0.0452 - acc: 0.9866
Epoch 1/5
 - 200s - loss: 0.2994 - acc: 0.9017
Epoch 2/5
 - 

Epoch 1/5
 - 206s - loss: 0.3466 - acc: 0.8891
Epoch 2/5
 - 198s - loss: 0.0747 - acc: 0.9767
Epoch 3/5
 - 197s - loss: 0.0497 - acc: 0.9845
Epoch 4/5
 - 196s - loss: 0.0354 - acc: 0.9884
Epoch 5/5
 - 197s - loss: 0.0284 - acc: 0.9904
Epoch 1/5
 - 208s - loss: 0.3548 - acc: 0.8900
Epoch 2/5
 - 198s - loss: 0.0850 - acc: 0.9749
Epoch 3/5
 - 197s - loss: 0.0583 - acc: 0.9820
Epoch 4/5
 - 200s - loss: 0.0450 - acc: 0.9855
Epoch 5/5
 - 199s - loss: 0.0356 - acc: 0.9880
Epoch 1/5
 - 208s - loss: 0.3563 - acc: 0.8865
Epoch 2/5
 - 198s - loss: 0.0792 - acc: 0.9751
Epoch 3/5
 - 199s - loss: 0.0556 - acc: 0.9822
Epoch 4/5
 - 198s - loss: 0.0410 - acc: 0.9866
Epoch 5/5
 - 198s - loss: 0.0338 - acc: 0.9892
Epoch 1/5
 - 207s - loss: 0.3576 - acc: 0.8848
Epoch 2/5
 - 197s - loss: 0.0859 - acc: 0.9734
Epoch 3/5
 - 198s - loss: 0.0585 - acc: 0.9820
Epoch 4/5
 - 198s - loss: 0.0434 - acc: 0.9865
Epoch 5/5
 - 200s - loss: 0.0369 - acc: 0.9888
Epoch 1/5
 - 209s - loss: 0.3123 - acc: 0.9035
Epoch 2/5
 - 

Epoch 1/5
 - 218s - loss: 2.3843 - acc: 0.1123
Epoch 2/5
 - 207s - loss: 2.3017 - acc: 0.1097
Epoch 3/5
 - 206s - loss: 2.3016 - acc: 0.1097
Epoch 4/5
 - 208s - loss: 2.3017 - acc: 0.1096
Epoch 5/5
 - 205s - loss: 2.3017 - acc: 0.1096
Epoch 1/5
 - 216s - loss: 0.5643 - acc: 0.8285
Epoch 2/5
 - 206s - loss: 0.1152 - acc: 0.9643
Epoch 3/5
 - 205s - loss: 0.0834 - acc: 0.9742
Epoch 4/5
 - 204s - loss: 0.0669 - acc: 0.9787
Epoch 5/5
 - 205s - loss: 0.0550 - acc: 0.9822
Epoch 1/5
 - 226s - loss: 0.5137 - acc: 0.8384
Epoch 2/5
 - 213s - loss: 0.1038 - acc: 0.9675
Epoch 3/5
 - 212s - loss: 0.0683 - acc: 0.9781
Epoch 4/5
 - 212s - loss: 0.0520 - acc: 0.9836
Epoch 5/5
 - 213s - loss: 0.0375 - acc: 0.9879
Epoch 1/5
 - 223s - loss: 0.5025 - acc: 0.8442
Epoch 2/5
 - 209s - loss: 0.0975 - acc: 0.9696
Epoch 3/5
 - 210s - loss: 0.0656 - acc: 0.9789
Epoch 4/5
 - 210s - loss: 0.0448 - acc: 0.9858
Epoch 5/5
 - 210s - loss: 0.0380 - acc: 0.9874
Epoch 1/5
 - 224s - loss: 0.5259 - acc: 0.8345
Epoch 2/5
 - 

Epoch 1/5
 - 235s - loss: 2.2731 - acc: 0.1478
Epoch 2/5
 - 219s - loss: 1.6254 - acc: 0.4644
Epoch 3/5
 - 220s - loss: 0.7207 - acc: 0.7651
Epoch 4/5
 - 219s - loss: 0.4916 - acc: 0.8451
Epoch 5/5
 - 218s - loss: 0.3844 - acc: 0.8817
Epoch 1/5
 - 241s - loss: 2.2595 - acc: 0.1805
Epoch 2/5
 - 222s - loss: 1.4096 - acc: 0.5399
Epoch 3/5
 - 220s - loss: 0.7204 - acc: 0.7665
Epoch 4/5
 - 220s - loss: 0.5301 - acc: 0.8335
Epoch 5/5
 - 221s - loss: 0.4166 - acc: 0.8698
Epoch 1/5
 - 231s - loss: 2.2816 - acc: 0.1569
Epoch 2/5
 - 214s - loss: 1.8398 - acc: 0.4084
Epoch 3/5
 - 213s - loss: 0.7699 - acc: 0.7504
Epoch 4/5
 - 215s - loss: 0.4875 - acc: 0.8470
Epoch 5/5
 - 213s - loss: 0.3757 - acc: 0.8838
Epoch 1/5
 - 230s - loss: 0.4103 - acc: 0.8712
Epoch 2/5
 - 215s - loss: 0.0954 - acc: 0.9698
Epoch 3/5
 - 216s - loss: 0.0650 - acc: 0.9796
Epoch 4/5
 - 215s - loss: 0.0510 - acc: 0.9837
Epoch 5/5
 - 215s - loss: 0.0424 - acc: 0.9867
Epoch 1/5
 - 235s - loss: 0.4507 - acc: 0.8627
Epoch 2/5
 - 

Note: After almost 72 hours, we aborted the large Grid Search due to the length of time it is taking. 
However, based off of what we know with how the grid search functions, and the way it iterates through parameters, we can narrow down which parameters are most likely to perform well. 

We will perform a smaller grid search below:

### Smaller Grid Search

Based on the parameters outlined in the larger grid search as well as knowledge gained, we can reduce the size to a more manageable level

In [40]:
#Set parameters for smaller grid search
dropout_rate = [0.05,  0.2, 0.3]
batch_size = [100, 200]
epochs = [3]
optimizer = ['SGD', 'RMSprop', 'Adamax']

In [41]:
X = X_train
Y = Y_train

#Create the model we will use on the grid search
def create_model(dropout_rate=0.25, weight_constraint=0, optimizer = 'RMSProp'):
    # create model
    model = Sequential()

    model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same', 
                 activation ='relu', input_shape = (28,28,1)))
    model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same', 
                 activation ='relu'))
    model.add(MaxPool2D(pool_size=(2,2)))
    model.add(Dropout(dropout_rate))


    model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', 
                 activation ='relu'))
    model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', 
                 activation ='relu'))
    model.add(MaxPool2D(pool_size=(2,2), strides=(2,2)))
    model.add(Dropout(dropout_rate))
    
    #FROM TWO SECTIONS DOWN
    model.add(Flatten())
    model.add(Dense(256, activation = "relu"))
    model.add(Dropout(dropout_rate))
    model.add(Dense(10, activation = "softmax"))
    model.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])
    return model

In [42]:
# create model
model = KerasClassifier(build_fn=create_model, verbose=2)
# define the grid search parameters
param_grid = dict(batch_size=batch_size, epochs=epochs, dropout_rate=dropout_rate, optimizer=optimizer)
grid = GridSearchCV(estimator=model, param_grid=param_grid)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Epoch 1/3
 - 135s - loss: 1.8080 - acc: 0.4715
Epoch 2/3
 - 134s - loss: 0.4216 - acc: 0.8676
Epoch 3/3
 - 134s - loss: 0.2452 - acc: 0.9249
Epoch 1/3
 - 135s - loss: 1.6760 - acc: 0.5087
Epoch 2/3
 - 134s - loss: 0.3623 - acc: 0.8881
Epoch 3/3
 - 134s - loss: 0.2238 - acc: 0.9319
Epoch 1/3
 - 134s - loss: 1.7496 - acc: 0.4437
Epoch 2/3
 - 134s - loss: 0.4682 - acc: 0.8515
Epoch 3/3
 - 134s - loss: 0.2812 - acc: 0.9121
Epoch 1/3
 - 135s - loss: 0.2609 - acc: 0.9173
Epoch 2/3
 - 134s - loss: 0.0623 - acc: 0.9813
Epoch 3/3
 - 134s - loss: 0.0426 - acc: 0.9866
Epoch 1/3
 - 135s - loss: 0.2626 - acc: 0.9173
Epoch 2/3
 - 134s - loss: 0.0583 - acc: 0.9816
Epoch 3/3
 - 134s - loss: 0.0374 - acc: 0.9887
Epoch 1/3
 - 134s - loss: 0.2356 - acc: 0.9250
Epoch 2/3
 - 134s - loss: 0.0633 - acc: 0.9810
Epoch 3/3
 - 134s - loss: 0.0400 - acc: 0.9875
Epoch 1/3
 - 135s - loss: 0.2512 - acc: 0.9188
Epoch 2/3
 - 135s - loss: 0.0715 - acc: 0.9775
Epoch 3/3
 - 135s - loss: 0.0478 - acc: 0.9844
Epoch 1/3
 - 

### Final Model

Based on the outputs above, we select the best performing hyperparameters to use in a larger, longer-run model with a larger dataset (shown below)

In [13]:
#Final Hyperparameters
dropout_rate = 0.2
batch_size = 100
epochs = 30
optimizer = 'RMSprop'

In [14]:
# Add in a learning rate annealer
learning_rate_reduction = ReduceLROnPlateau(monitor='val_acc', 
                                            patience=3, 
                                            verbose=1, 
                                            factor=0.5, 
                                            min_lr=0.00001)

In [15]:
#Set up early stopping
earlystop = EarlyStopping(monitor='val_acc', patience=5, verbose=2)

In [16]:
# With data augmentation to prevent overfitting

datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=10,  # randomly rotate images in the range (degrees, 0 to 180)
        zoom_range = 0.1, # Randomly zoom image 
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=False,  # randomly flip images
        vertical_flip=False)  # randomly flip images


datagen.fit(X_train)

In [17]:
    model = Sequential()

    model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same', 
                 activation ='relu', input_shape = (28,28,1)))
    model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same', 
                 activation ='relu'))
    model.add(MaxPool2D(pool_size=(2,2)))
    model.add(Dropout(dropout_rate))


    model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', 
                 activation ='relu'))
    model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', 
                 activation ='relu'))
    model.add(MaxPool2D(pool_size=(2,2), strides=(2,2)))
    model.add(Dropout(dropout_rate))
    
    #FROM TWO SECTIONS DOWN
    model.add(Flatten())
    model.add(Dense(256, activation = "relu"))
    model.add(Dropout(dropout_rate))
    model.add(Dense(10, activation = "softmax"))
    model.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])

In [18]:
# Fit the model
history = model.fit_generator(datagen.flow(X_train,Y_train, batch_size=batch_size),
                              epochs = epochs, validation_data = (X_val,Y_val),
                              verbose = 2, steps_per_epoch=X_train.shape[0] // batch_size
                              , callbacks=[learning_rate_reduction, earlystop])

Epoch 1/30
 - 277s - loss: 0.3559 - acc: 0.8853 - val_loss: 0.0510 - val_acc: 0.9840
Epoch 2/30
 - 264s - loss: 0.1028 - acc: 0.9682 - val_loss: 0.0405 - val_acc: 0.9879
Epoch 3/30
 - 265s - loss: 0.0768 - acc: 0.9777 - val_loss: 0.0287 - val_acc: 0.9912
Epoch 4/30
 - 265s - loss: 0.0622 - acc: 0.9810 - val_loss: 0.0284 - val_acc: 0.9919
Epoch 5/30
 - 264s - loss: 0.0550 - acc: 0.9830 - val_loss: 0.0293 - val_acc: 0.9919
Epoch 6/30
 - 264s - loss: 0.0508 - acc: 0.9847 - val_loss: 0.0305 - val_acc: 0.9912
Epoch 7/30
 - 268s - loss: 0.0479 - acc: 0.9860 - val_loss: 0.0228 - val_acc: 0.9945
Epoch 8/30
 - 269s - loss: 0.0432 - acc: 0.9864 - val_loss: 0.0251 - val_acc: 0.9933
Epoch 9/30
 - 265s - loss: 0.0427 - acc: 0.9874 - val_loss: 0.0286 - val_acc: 0.9907
Epoch 10/30
 - 265s - loss: 0.0390 - acc: 0.9881 - val_loss: 0.0226 - val_acc: 0.9938

Epoch 00010: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 11/30
 - 264s - loss: 0.0308 - acc: 0.9910 - val_loss: 0.0197 

### Predict final results

In [19]:
# predict results
results = model.predict(test)

# select the indix with the maximum probability
results = np.argmax(results,axis = 1)

results = pd.Series(results,name="Label")

In [20]:
#Create Submission File
submission = pd.concat([pd.Series(range(1,28001),name = "ImageId"),results],axis = 1)

submission.to_csv("cnn_mnist_datagen_6_28.csv",index=False)