In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.optimizers import Adam
from keras.callbacks import LearningRateScheduler

from keras.utils import plot_model

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


## Feedforward Neural Network- Comparing *ADAM* and *AMSGrad* on MNIST

Here, I tune and train FFNN models, to recreate the empirical results of section 5 of the paper.

As specified in the paper, I fix the parameter $\beta_1$ at .99, and tune the learning rate $\alpha$ and the hyperparameter $\beta_2$ using a gridsearch, as done in the paper. 

The authors further specified that the number of hidden units is 100, and that they use the Relu activation function. I'll do the same.

## 0. Load MNIST Dataset

I've already created train and test splits for the MNIST dataset. They are conviniently stored as compressed numpy arrays.

In [1]:
def load_np_file(path, mode = "rb"):
    with open(path, mode) as handle:
        return(np.load(path))

In [4]:
X_train = load_np_file("../data/MNIST/X_train.npy") 
X_test = load_np_file("../data/MNIST/X_test.npy") 
y_train = load_np_file("../data/MNIST/y_train.npy") 
y_test = load_np_file("../data/MNIST/y_test.npy") 

In [5]:
# sanity check - did all the shapes get preserved? 
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

(60000, 784) (60000,)
(10000, 784) (10000,)


## 1. A framework for exhaustive gridsearch

The hyperpameters that I'll need to tune by gridsearch are: 

- $\beta_2$
- $\alpha$.

To do so in a neat fashion, and make use of all my cores (CPU training :( ) , I'll use the `GridSearchCV` class from `sklearn`, with the `KerasClassifier` wrapper.

The interface of this wrapper requres that I define a function that can be called with a set of hyperparameter options and create a `Sequential` model that can be compiled and trained. This is what I do here. 

Note the hyperparameters that I do not tune, as they are fixed by the authors:

- $\beta_1 = .9$
- Discount rate: $\alpha_t$ = $\frac{\alpha}{\sqrt{t}}$
- Batch size = 128

In [4]:
# A function that, when passed with hyperparameter options, returns a compiled model
# Note that if `amsgrad = True`, the method in the paper is used.
def create_model(lr=0.01, beta_2 = .99, amsgrad = False):
    # create model
    model = Sequential()
    model.add(Dense(100, input_dim=784, activation = "relu"))
    model.add(Dense(10, activation='softmax'))
    
    # Compile model
    optimizer = Adam(lr=lr, beta_2 = beta_2, amsgrad = amsgrad, decay = .14)
    model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

In [9]:
# plot a model
plot_model(create_model(), "images/ffnn_model.png", show_layer_names=False, show_shapes=True)

![](images/ffnn_model.png)

## 2. Gridsearch: Adam optimizer

Here I specify the ranges I'll want to look over for $\alpha$ and $\beta_2$

In [7]:
beta2_range = np.append(np.arange(.990, .999, .0025), .999)
print( beta2_range)z

[0.99   0.9925 0.995  0.9975 0.999 ]


In [8]:
alpha_range = [.001*5**i for i in range(5)]
print (alpha_range)

[0.001, 0.005, 0.025, 0.125, 0.625]


In [9]:
param_grid = dict(lr=alpha_range, beta_2=beta2_range)

In [10]:
adam_model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=128, verbose=1)

In [11]:
adam_model

<keras.wrappers.scikit_learn.KerasClassifier at 0x10e4cd860>

Finally, run the gridsearch. I'll do 3-fold cross validation to choose the hyperparameters.

In [12]:
adam_grid = GridSearchCV(estimator=adam_model, param_grid=param_grid, n_jobs=-1, verbose = 3)

In [None]:
adam_grid_result = adam_grid.fit(X_train, y_train)

Fitting 3 folds for each of 25 candidates, totalling 75 fits
[CV] beta_2=0.99, lr=0.001 ...........................................
[CV] beta_2=0.99, lr=0.001 ...........................................
[CV] beta_2=0.99, lr=0.001 ...........................................
[CV] beta_2=0.99, lr=0.005 ...........................................








Epoch 1/100
Epoch 1/100




Epoch 1/100




 2048/40000 [>.............................] - ETA: 10s - loss: 12.8370 - acc: 0.1919 Epoch 1/100
Epoch 2/100
Epoch 2/100
Epoch 2/100
 6400/40000 [===>..........................] - ETA: 2s - loss: 10.9434 - acc: 0.3081Epoch 2/100
Epoch 3/100
Epoch 3/100
Epoch 3/100
Epoch 4/100
Epoch 4/100
 7424/40000 [====>.........................] - ETA: 2s - loss: 10.3662 - acc: 0.3432Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 6/100
Epoch 6/100
Epoch 6/100
Epoch 7/100
Epoch 7/100
Epoch 8/100
Epoch 8/100

  % delta_t_median)


 7168/40000 [====>.........................] - ETA: 2s - loss: 9.9356 - acc: 0.3696Epoch 8/100
Epoch 8/100
Epoch 9/100
Epoch 9/100
Epoch 9/100
Epoch 9/100
Epoch 10/100
Epoch 10/100
Epoch 11/100
Epoch 11/100
Epoch 12/100
Epoch 12/100
Epoch 12/100
Epoch 13/100
Epoch 13/100
Epoch 13/100
Epoch 14/100
Epoch 14/100
 5632/40000 [===>..........................] - ETA: 2s - loss: 9.3522 - acc: 0.4162Epoch 15/100
Epoch 15/100
Epoch 15/100
Epoch 16/100
Epoch 16/100
Epoch 16/100
Epoch 17/100
Epoch 17/100
Epoch 17/100
Epoch 17/100
Epoch 18/100
Epoch 18/100
 3328/40000 [=>............................] - ETA: 2s - loss: 9.2035 - acc: 0.4261Epoch 19/100
Epoch 19/100
Epoch 19/100
Epoch 20/100
Epoch 20/100
Epoch 21/100
Epoch 21/100
Epoch 21/100
Epoch 22/100
Epoch 22/100
Epoch 22/100
Epoch 22/100
Epoch 23/100
Epoch 23/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 25/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 28/100
Epoch 28/100
 2176/40000 [>.............................] - ETA: 2s 

Epoch 39/100
Epoch 40/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 40/100
Epoch 41/100
Epoch 41/100
Epoch 42/100
Epoch 42/100
 1664/40000 [>.............................] - ETA: 1s - loss: 9.2035 - acc: 0.4129Epoch 43/100
Epoch 42/100
Epoch 43/100
 7296/40000 [====>.........................] - ETA: 1s - loss: 9.2301 - acc: 0.4115Epoch 43/100
 8832/40000 [=====>........................] - ETA: 1s - loss: 9.2137 - acc: 0.4128Epoch 44/100
Epoch 43/100
Epoch 44/100
 8576/40000 [=====>........................] - ETA: 1s - loss: 9.2256 - acc: 0.4125Epoch 45/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 45/100
Epoch 46/100
Epoch 46/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 47/100
Epoch 48/100
Epoch 48/100
 4096/40000 [==>...........................] - ETA: 1s - loss: 9.1174 - acc: 0.4221Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 49/100
 6400/40000 [===>..........................] - ETA: 1s - loss: 9.0873 - acc: 0.4223Epoch 49/100
Epoch 50/100
 1920/40000 [>....................

  % delta_t_median)


Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 61/100
Epoch 62/100
Epoch 62/100
Epoch 62/100
Epoch 65/100
Epoch 66/100
Epoch 65/100
Epoch 65/100
Epoch 66/100
 8320/40000 [=====>........................] - ETA: 1s - loss: 9.0475 - acc: 0.4221Epoch 67/100
Epoch 66/100
Epoch 67/100
 7680/40000 [====>.........................] - ETA: 2s - loss: 8.9884 - acc: 0.4266Epoch 70/100
Epoch 69/100
Epoch 69/100
Epoch 70/100
 7168/40000 [====>.........................] - ETA: 1s - loss: 9.0532 - acc: 0.4233Epoch 71/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 71/100
  640/40000 [..............................] - ETA: 5s - loss: 8.7873 - acc: 0.4437Epoch 71/100
Epoch 72/100
Epoch 74/100
Epoch 75/100
Epoch 75/100
 4352/40000 [==>...........................] - ETA: 1s - loss: 8.9585 - acc: 0.4288Epoch 76/100
Epoch 75/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 76/100
Epoch 76/100
Epoch 78/100
  128/40000 [..............................] - ETA: 1s - loss: 8.5326 - acc: 0.4531Epoch 78/100
Epoch 80

 4480/40000 [==>...........................] - ETA: 5s - loss: 11.3003 - acc: 0.2886Epoch 2/100
Epoch 2/100
Epoch 2/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 8/100
 7552/40000 [====>.........................] - ETA: 1s - loss: 11.5307 - acc: 0.2844Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 9/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 10/100
Epoch 12/100
Epoch 13/100
Epoch 12/100
Epoch 14/100
Epoch 15/100
Epoch 14/100
Epoch 16/100
 6144/40000 [===>..........................] - ETA: 2s - loss: 11.6149 - acc: 0.2793Epoch 16/100
Epoch 18/100
Epoch 18/100
Epoch 19/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 19/100
Epoch 22/100
Epoch 21/100
Epoch 22/100
Epoch 22/100
Epoch 23/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 26/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
  128/40000 [..............................] - ETA: 1s -

Epoch 42/100
Epoch 43/100
Epoch 43/100
Epoch 45/100
Epoch 44/100
 1280/40000 [..............................] - ETA: 1s - loss: 11.8745 - acc: 0.2633Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 45/100
  640/40000 [..............................] - ETA: 3s - loss: 11.2827 - acc: 0.3000Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 48/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 50/100
 1664/40000 [>.............................] - ETA: 1s - loss: 11.4783 - acc: 0.2879Epoch 50/100
Epoch 52/100
 1920/40000 [>.............................] - ETA: 3s - loss: 11.5093 - acc: 0.2859Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 54/100
Epoch 55/100
Epoch 55/100
Epoch 58/100
 3968/40000 [=>............................] - ETA: 1s - loss: 11.2640 - acc: 0.3012Epoch 57/100
Epoch 59/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 59/100
Epoch 59/100
Epoch 62/100
Epoch 62/100
Epoch 64/100
Epoch 63/100
Epoch 63/100
Epoch 64/100
Epoch 67

Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 86/100
Epoch 86/100
Epoch 87/100
Epoch 87/100
Epoch 87/100
Epoch 88/100
Epoch 90/100
Epoch 90/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 93/100
Epoch 95/100
Epoch 94/100
Epoch 94/100
Epoch 95/100
Epoch 95/100
Epoch 95/100
 1536/40000 [>.............................] - ETA: 1s - loss: 11.6059 - acc: 0.2799Epoch 96/100
Epoch 96/100
 1536/40000 [>.............................] - ETA: 1s - loss: 11.4275 - acc: 0.2910Epoch 97/100
Epoch 98/100
Epoch 98/100
Epoch 99/100
Epoch 100/100
[CV] .............. beta_2=0.99, lr=0.005, score=0.6981, total= 3.5min


[CV] .............. beta_2=0.99, lr=0.025, score=0.2908, total= 3.4min
[CV] .............. beta_2=0.99, lr=0.025, score=0.2854, total= 3.5min
[CV] beta_2=0.99, lr=0.125 ...........................................
Epoch 1/100
Epoch 1/100
Epoch 2/100
Epoch 2/100
Epoch 3/100
Epoch 2/100
Epoch 2/100
Epoch 3/100
Epoch 3/100
Epoch 5/100
Epoch 4/100
Epoch 4/100
Epoch 5/100
 8832/40000 [=====>........................] - ETA: 1s - loss: 11.6645 - acc: 0.2756Epoch 5/100
Epoch 6/100
Epoch 6/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 8/100
Epoch 9/100
Epoch 9/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 11/100
 7936/40000 [====>.........................] - ETA: 1s - loss: 11.8834 - acc: 0.2626Epoch 12/100
Epoch 13/100
Epoch 12/100
 7552/40000 [====>.........................] - ETA: 1s - loss: 11.7616 - acc: 0.2703Epoch 13/100
Epoch 14/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 15/100
Epoch 15/100
Epoch 17/100
Ep

Epoch 34/100
Epoch 36/100
Epoch 35/100
Epoch 35/100
Epoch 38/100
Epoch 37/100
Epoch 39/100
Epoch 38/100
Epoch 39/100
Epoch 38/100
Epoch 40/100
Epoch 39/100
Epoch 39/100
Epoch 40/100
Epoch 40/100
  128/40000 [..............................] - ETA: 2s - loss: 13.4737 - acc: 0.1641Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 42/100
 5888/40000 [===>..........................] - ETA: 1s - loss: 12.9056 - acc: 0.1992Epoch 44/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 44/100
Epoch 46/100
Epoch 46/100
 8704/40000 [=====>........................] - ETA: 1s - loss: 11.5340 - acc: 0.2842Epoch 47/100
 5248/40000 [==>...........................] - ETA: 1s - loss: 12.8278 - acc: 0.2039Epoch 48/100
Epoch 47/100
Epoch 47/100
 6528/40000 [===>..........................] - ETA: 1s - loss: 12.7281 - acc: 0.2103Epoch 49/100
Epoch 48/100
Epoch 49/100
 8576/40000 [=====>........................] - ETA: 1s - loss: 11.5041 - acc: 0.2863Epoch 48/100
Epoch 50/100
Epoch 50/100
Epoch 51/100
Epoch 51/100
Ep

  % delta_t_median)


Epoch 64/100
Epoch 63/100
 2304/40000 [>.............................] - ETA: 1s - loss: 13.0680 - acc: 0.1892Epoch 65/100
Epoch 65/100
 7168/40000 [====>.........................] - ETA: 2s - loss: 11.5894 - acc: 0.2810Epoch 64/100
Epoch 66/100
Epoch 65/100
Epoch 66/100
Epoch 65/100
Epoch 67/100
Epoch 73/100
Epoch 74/100
 8832/40000 [=====>........................] - ETA: 2s - loss: 12.8222 - acc: 0.2045Epoch 72/100
Epoch 73/100
  768/40000 [..............................] - ETA: 3s - loss: 11.4590 - acc: 0.2891Epoch 74/100
 8192/40000 [=====>........................] - ETA: 1s - loss: 11.5239 - acc: 0.2850Epoch 75/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 74/100
Epoch 76/100
 3968/40000 [=>............................] - ETA: 2s - loss: 11.5564 - acc: 0.2830Epoch 77/100
 1152/40000 [..............................] - ETA: 2s - loss: 9.9039 - acc: 0.3819Epoch 75/100
Epoch 76/100
Epoch 77/100
 2944/40000 [=>............................] - ETA: 2s - loss: 11.5137 - ac

Epoch 2/100
Epoch 4/100
Epoch 3/100
 5504/40000 [===>..........................] - ETA: 2s - loss: 12.8016 - acc: 0.2057Epoch 5/100
Epoch 3/100
Epoch 4/100
Epoch 6/100
Epoch 5/100
Epoch 7/100
 7552/40000 [====>.........................] - ETA: 1s - loss: 14.5451 - acc: 0.0976Epoch 5/100
Epoch 7/100
Epoch 6/100
Epoch 6/100
  128/40000 [..............................] - ETA: 2s - loss: 11.9627 - acc: 0.2578

  % delta_t_median)


Epoch 8/100
Epoch 9/100
 9216/40000 [=====>........................] - ETA: 1s - loss: 14.5336 - acc: 0.0983Epoch 7/100
Epoch 10/100
 8576/40000 [=====>........................] - ETA: 2s - loss: 12.5658 - acc: 0.2203Epoch 9/100
  128/40000 [..............................] - ETA: 2s - loss: 11.0373 - acc: 0.3047Epoch 11/100
  128/40000 [..............................] - ETA: 2s - loss: 11.5849 - acc: 0.2812Epoch 10/100
Epoch 12/100
  128/40000 [..............................] - ETA: 2s - loss: 11.8367 - acc: 0.2656Epoch 10/100
Epoch 12/100
Epoch 11/100
Epoch 13/100
Epoch 11/100
Epoch 14/100
  128/40000 [..............................] - ETA: 1s - loss: 11.0812 - acc: 0.3125Epoch 13/100
 9088/40000 [=====>........................] - ETA: 1s - loss: 12.4322 - acc: 0.2287Epoch 15/100
Epoch 13/100
Epoch 14/100
Epoch 16/100
Epoch 16/100
Epoch 17/100
Epoch 19/100
Epoch 18/100
Epoch 20/100
Epoch 18/100
Epoch 20/100
Epoch 19/100
Epoch 21/100
Epoch 19/100
Epoch 21/100
  128/40000 [.............

  % delta_t_median)


Epoch 22/100
  128/40000 [..............................] - ETA: 1s - loss: 12.8924 - acc: 0.1953Epoch 20/100
Epoch 22/100
Epoch 24/100
Epoch 22/100
Epoch 24/100
 7168/40000 [====>.........................] - ETA: 1s - loss: 14.4991 - acc: 0.1004Epoch 25/100
Epoch 26/100
Epoch 24/100
Epoch 27/100
Epoch 29/100
Epoch 27/100
Epoch 29/100
Epoch 29/100
Epoch 31/100
 1408/40000 [>.............................] - ETA: 1s - loss: 12.1573 - acc: 0.2457Epoch 30/100
 8448/40000 [=====>........................] - ETA: 1s - loss: 12.3442 - acc: 0.2341Epoch 29/100
Epoch 31/100
Epoch 33/100
Epoch 32/100
Epoch 31/100
Epoch 34/100
Epoch 33/100
 8576/40000 [=====>........................] - ETA: 1s - loss: 12.4024 - acc: 0.2305Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 33/100
Epoch 35/100
Epoch 36/100
 1408/40000 [>.............................] - ETA: 1s - loss: 12.1000 - acc: 0.2493Epoch 37/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 36/100
Epoch 37/100
Epoch 39/100
Ep

Epoch 62/100
 4864/40000 [==>...........................] - ETA: 1s - loss: 14.4678 - acc: 0.1024Epoch 63/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
 6144/40000 [===>..........................] - ETA: 1s - loss: 12.3315 - acc: 0.2349Epoch 62/100
Epoch 65/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
 5248/40000 [==>...........................] - ETA: 1s - loss: 12.3150 - acc: 0.2359Epoch 66/100
Epoch 68/100
Epoch 69/100
Epoch 71/100
Epoch 70/100
Epoch 72/100
Epoch 70/100
Epoch 72/100
  128/40000 [..............................] - ETA: 1s - loss: 11.4590 - acc: 0.2891Epoch 73/100
Epoch 71/100
 4736/40000 [==>...........................] - ETA: 1s - loss: 12.3234 - acc: 0.2354Epoch 72/100
Epoch 73/100
 4224/40000 [==>...........................] - ETA: 1s - loss: 14.5040 - acc: 0.1001Epoch 75/100
Epoch 73/100
Epoch 75/100
Epoch 76/100
Epoch 76/100
Epoch 77/100
 3200/40000 [=>............................] - ETA: 2s - loss: 12.3807 - acc: 0.

Epoch 99/100
Epoch 100/100
[CV] ............. beta_2=0.99, lr=0.625, score=0.09855, total= 3.6min
[CV]  beta_2=0.9924999999999999, lr=0.001, score=0.3482, total= 3.6min
Epoch 2/100
 4224/40000 [==>...........................] - ETA: 1s - loss: 12.4436 - acc: 0.2214Epoch 1/100
Epoch 2/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 3/100
Epoch 4/100
 4864/40000 [==>...........................] - ETA: 2s - loss: 10.2898 - acc: 0.3481Epoch 3/100
 6016/40000 [===>..........................] - ETA: 1s - loss: 10.2969 - acc: 0.3469Epoch 5/100
Epoch 4/100
Epoch 6/100
Epoch 4/100
Epoch 6/100
Epoch 5/100
  128/40000 [..............................] - ETA: 2s - loss: 11.5840 - acc: 0.2812Epoch 7/100
Epoch 7/100
Epoch 6/100
Epoch 8/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 7/100
Epoch 8/100
 8064/40000 [=====>........................] - ETA: 1s - loss: 12.9162 - acc: 0.1985Epoch 10/100
 8832/40000 [=====>........................] - ETA: 1s - loss: 12.9172 - acc: 0.1985Epoch 8/100
Epoch 9/100
Ep

 2304/40000 [>.............................] - ETA: 1s - loss: 10.4388 - acc: 0.3494Epoch 39/100
Epoch 40/100
Epoch 39/100
Epoch 39/100
Epoch 41/100
Epoch 40/100
Epoch 40/100
Epoch 42/100
Epoch 41/100
Epoch 44/100
Epoch 43/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 45/100
Epoch 49/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
 8448/40000 [=====>........................] - ETA: 2s - loss: 9.2623 - acc: 0.4046Epoch 49/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 54/100
 2176/40000 [>.............................] - ETA: 2s - loss: 13.1182 - acc: 0.1861Epoch 54/100
Epoch 55/100
Epoch 58/100
Epoch 57/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 59/100
Epoch 60/100
 4480/40000 [==>...........................] - ETA: 1s - loss: 9.1479 - acc: 0.4107Epoch 59/100
Epoch 59/100
Epoch 62/100
 3968/40000 [=>............................] - ETA: 1s - loss: 9.116

  % delta_t_median)


Epoch 71/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 75/100
 3456/40000 [=>............................] - ETA: 2s - loss: 13.0260 - acc: 0.1918Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 76/100
Epoch 76/100
Epoch 77/100
Epoch 77/100
 9088/40000 [=====>........................] - ETA: 1s - loss: 8.9543 - acc: 0.4237Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 80/100
Epoch 80/100
Epoch 82/100
Epoch 81/100
Epoch 85/100
Epoch 84/100
Epoch 84/100
Epoch 86/100
Epoch 85/100
Epoch 85/100
Epoch 86/100
Epoch 86/100
Epoch 89/100
Epoch 88/100
Epoch 90/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
 6272/40000 [===>..........................] - ETA: 1s - loss: 8.6373 - acc: 0.4436Epoch 90/100
  128/40000 [..............................] - ETA: 3s - loss: 10.9553 - acc: 0.3203Epoch 90/100
Epoch 94/100
Epoch 94/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
 1536/40000 [>.............................] - ETA: 1s - loss: 8.5968 - acc: 0.4473Epoch 95/100
Epoch 97/10

Epoch 9/100
Epoch 10/100
Epoch 9/100
Epoch 10/100
Epoch 12/100
 1408/40000 [>.............................] - ETA: 1s - loss: 11.5048 - acc: 0.2862Epoch 13/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 14/100
Epoch 15/100
Epoch 14/100
Epoch 14/100
Epoch 16/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 19/100
Epoch 19/100
Epoch 21/100
Epoch 21/100
Epoch 21/100
 3456/40000 [=>............................] - ETA: 2s - loss: 11.5431 - acc: 0.2839Epoch 22/100
Epoch 22/100
Epoch 23/100
  128/40000 [..............................] - ETA: 1s - loss: 12.2145 - acc: 0.2422

  % delta_t_median)


 3456/40000 [=>............................] - ETA: 3s - loss: 11.5432 - acc: 0.2836Epoch 23/100
Epoch 25/100
Epoch 25/100
Epoch 26/100
Epoch 26/100
Epoch 26/100
Epoch 27/100
Epoch 27/100
Epoch 28/100
Epoch 27/100
Epoch 30/100
Epoch 30/100
Epoch 30/100
Epoch 31/100
Epoch 31/100
Epoch 31/100
Epoch 32/100
Epoch 32/100
 3584/40000 [=>............................] - ETA: 2s - loss: 10.3391 - acc: 0.3585Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 35/100
Epoch 35/100
Epoch 35/100
Epoch 36/100
Epoch 39/100
Epoch 39/100
Epoch 40/100
Epoch 40/100
Epoch 40/100
 5504/40000 [===>..........................] - ETA: 1s - loss: 10.0798 - acc: 0.3746Epoch 40/100
Epoch 41/100
Epoch 41/100
Epoch 41/100
Epoch 41/100
 5248/40000 [==>...........................] - ETA: 1s - loss: 11.7384 - acc: 0.2717Epoch 42/100
 7296/40000 [====>.........................] - ETA: 2s - loss: 13.0451 - acc: 0.1907Epoch 43/100
Epoch 43/100
Epoch 44/100
Epoch 44/100
Epoch 44/100
Epoch 44/100
Epoch 45/100
Epoch 45

Epoch 71/100
Epoch 71/100
Epoch 71/100
Epoch 72/100
 3584/40000 [=>............................] - ETA: 1s - loss: 12.8576 - acc: 0.2023Epoch 72/100
Epoch 72/100
 3968/40000 [=>............................] - ETA: 1s - loss: 11.4305 - acc: 0.2908Epoch 73/100
Epoch 73/100
Epoch 73/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 75/100
Epoch 75/100
Epoch 75/100
 9216/40000 [=====>........................] - ETA: 1s - loss: 13.0225 - acc: 0.1921Epoch 76/100
Epoch 76/100
Epoch 76/100
 4352/40000 [==>...........................] - ETA: 1s - loss: 10.1146 - acc: 0.3725Epoch 76/100
 1664/40000 [>.............................] - ETA: 1s - loss: 11.6216 - acc: 0.2788Epoch 77/100
Epoch 77/100
Epoch 77/100
 5120/40000 [==>...........................] - ETA: 1s - loss: 13.0550 - acc: 0.1900Epoch 78/100
Epoch 79/100
Epoch 79/100
Epoch 79/100
Epoch 80/100
Epoch 80/100
Epoch 80/100
 6784/40000 [====>.........................] - ETA: 1s - loss: 13.0366 - acc: 0.1912Epoch 81/100
Epoch 81/100
Epoch 81/

[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed: 21.1min


Epoch 1/100
Epoch 1/100
Epoch 1/100
 4096/40000 [==>...........................] - ETA: 6s - loss: 13.7916 - acc: 0.1436 Epoch 1/100
Epoch 2/100
Epoch 2/100
Epoch 3/100
Epoch 3/100
Epoch 4/100
Epoch 4/100
Epoch 4/100
Epoch 5/100
Epoch 5/100
Epoch 5/100
Epoch 5/100
Epoch 6/100
Epoch 6/100
Epoch 7/100
Epoch 7/100
 9088/40000 [=====>........................] - ETA: 1s - loss: 11.0587 - acc: 0.3138Epoch 8/100
Epoch 8/100
Epoch 8/100
Epoch 9/100
Epoch 9/100
Epoch 9/100
 5376/40000 [===>..........................] - ETA: 1s - loss: 11.1352 - acc: 0.3092Epoch 10/100
Epoch 10/100
Epoch 10/100
Epoch 10/100
Epoch 11/100
Epoch 11/100
Epoch 11/100
Epoch 11/100
Epoch 12/100
 4224/40000 [==>...........................] - ETA: 1s - loss: 10.4486 - acc: 0.3516Epoch 12/100
Epoch 12/100
 4224/40000 [==>...........................] - ETA: 1s - loss: 11.1270 - acc: 0.3097Epoch 13/100
Epoch 13/100
Epoch 13/100
Epoch 14/100
Epoch 14/100
 1408/40000 [>.............................] - ETA: 1s - loss: 10.6312 

Epoch 34/100
  128/40000 [..............................] - ETA: 2s - loss: 9.5701 - acc: 0.4062Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
  128/40000 [..............................] - ETA: 1s - loss: 11.9627 - acc: 0.2578Epoch 38/100
Epoch 38/100
Epoch 39/100
Epoch 39/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 41/100
Epoch 41/100
Epoch 41/100
Epoch 42/100
Epoch 42/100
Epoch 43/100
 1536/40000 [>.............................] - ETA: 1s - loss: 10.2709 - acc: 0.3626Epoch 43/100
Epoch 43/100
Epoch 44/100
Epoch 44/100
Epoch 45/100
Epoch 45/100
Epoch 46/100
Epoch 46/100
Epoch 47/100
Epoch 47/100
Epoch 47/100
Epoch 47/100
Epoch 49/100
Epoch 49/100
Epoch 49/100
Epoch 49/100
Epoch 50/100
Epoch 50/100
 4864/40000 [==>...........................] - ETA: 1s - loss: 10.2530 - acc: 0.3637Epoch 50/100
Epoch 50/100
Epoch 51/100
Epoch 51/100
Epoch 51/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
 7040/40000 [====>.........................] - ETA: 1s - loss: 10.3417 - acc: 0.

Epoch 69/100
Epoch 70/100
Epoch 70/100
Epoch 70/100
Epoch 70/100
Epoch 71/100
Epoch 71/100
Epoch 71/100
Epoch 71/100
Epoch 72/100
Epoch 72/100
Epoch 73/100
Epoch 73/100
Epoch 73/100
Epoch 73/100
Epoch 74/100
Epoch 74/100
 1408/40000 [>.............................] - ETA: 1s - loss: 10.7251 - acc: 0.3345Epoch 74/100
Epoch 75/100
Epoch 75/100
Epoch 77/100
Epoch 77/100
  128/40000 [..............................] - ETA: 1s - loss: 9.0664 - acc: 0.4375Epoch 77/100
Epoch 78/100
Epoch 78/100
Epoch 79/100
Epoch 79/100
Epoch 79/100
Epoch 79/100
Epoch 81/100
 2432/40000 [>.............................] - ETA: 1s - loss: 10.5322 - acc: 0.3462Epoch 81/100
Epoch 81/100
Epoch 82/100
Epoch 82/100
Epoch 82/100
Epoch 83/100
Epoch 83/100
Epoch 83/100
Epoch 84/100
Epoch 84/100
Epoch 85/100
Epoch 85/100
 9088/40000 [=====>........................] - ETA: 2s - loss: 11.6115 - acc: 0.2796Epoch 86/100

  % delta_t_median)


Epoch 86/100
Epoch 87/100
 6656/40000 [===>..........................] - ETA: 3s - loss: 11.6842 - acc: 0.2751Epoch 87/100
Epoch 87/100
 7808/40000 [====>.........................] - ETA: 2s - loss: 10.3215 - acc: 0.3596Epoch 87/100
Epoch 90/100
Epoch 90/100
Epoch 91/100
Epoch 91/100
Epoch 91/100
Epoch 92/100
Epoch 92/100
 5248/40000 [==>...........................] - ETA: 2s - loss: 10.4792 - acc: 0.3498Epoch 92/100
Epoch 94/100
Epoch 94/100
Epoch 94/100
Epoch 95/100
 6528/40000 [===>..........................] - ETA: 1s - loss: 11.8096 - acc: 0.2673Epoch 95/100
Epoch 95/100
Epoch 96/100
Epoch 96/100
Epoch 96/100
Epoch 99/100
Epoch 99/100
Epoch 99/100
Epoch 100/100
Epoch 100/100
  128/40000 [..............................] - ETA: 1s - loss: 10.4516 - acc: 0.3516Epoch 100/100
[CV]  beta_2=0.9924999999999999, lr=0.125, score=0.2728, total= 3.2min
 7040/40000 [====>.........................] - ETA: 0s[CV] beta_2=0.9924999999999999, lr=0.625 .............................
 9088/40000 [====

Epoch 6/100
Epoch 6/100
Epoch 6/100
Epoch 7/100
Epoch 7/100
Epoch 9/100
Epoch 9/100
Epoch 10/100
Epoch 10/100
Epoch 10/100
Epoch 11/100
Epoch 11/100
Epoch 13/100
Epoch 13/100
Epoch 13/100
Epoch 14/100
Epoch 14/100
Epoch 14/100
Epoch 15/100
Epoch 15/100
Epoch 15/100
Epoch 17/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
 8192/40000 [=====>........................] - ETA: 3s - loss: 14.6779 - acc: 0.0894

  % delta_t_median)


Epoch 21/100
Epoch 22/100
Epoch 22/100
Epoch 23/100
Epoch 23/100
Epoch 23/100
 8448/40000 [=====>........................] - ETA: 1s - loss: 9.0807 - acc: 0.4221Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 27/100
Epoch 27/100
Epoch 27/100
Epoch 28/100
Epoch 30/100
Epoch 30/100
Epoch 30/100
Epoch 31/100
Epoch 31/100
Epoch 31/100
Epoch 32/100
Epoch 32/100
Epoch 32/100
Epoch 32/100
Epoch 34/100
Epoch 34/100
Epoch 35/100
Epoch 35/100
Epoch 35/100
Epoch 36/100
Epoch 36/100
Epoch 38/100
Epoch 39/100
Epoch 39/100
Epoch 39/100
Epoch 40/100
Epoch 40/100
Epoch 43/100
 8192/40000 [=====>........................] - ETA: 1s - loss: 14.6464 - acc: 0.0913Epoch 43/100
Epoch 44/100
Epoch 44/100
Epoch 44/100
Epoch 44/100
Epoch 45/100
Epoch 45/100
Epoch 45/100
Epoch 47/100
Epoch 47/100
Epoch 48/100
Epoch 48/100
Epoch 48/100
Epoch 48/100
Epoch 49/100
Epoch 49/100
 5248/40000 [==>...........................] - ETA: 1s - loss: 8.4658 - acc: 0.4609Epoch 49/100
Epoch 50/100
 9216/40000 [=====>................

Epoch 66/100
Epoch 66/100
Epoch 69/100
Epoch 70/100
Epoch 70/100
Epoch 70/100
Epoch 71/100
Epoch 71/100

  % delta_t_median)


Epoch 73/100
Epoch 74/100
Epoch 74/100
 4736/40000 [==>...........................] - ETA: 2s - loss: 8.3606 - acc: 0.4656Epoch 74/100
Epoch 74/100
Epoch 75/100
Epoch 75/100
 4352/40000 [==>...........................] - ETA: 1s - loss: 14.6033 - acc: 0.0940Epoch 75/100
Epoch 77/100
Epoch 77/100
Epoch 78/100
Epoch 78/100
 8064/40000 [=====>........................] - ETA: 3s - loss: 8.3315 - acc: 0.4694Epoch 78/100
Epoch 79/100
Epoch 79/100
 3712/40000 [=>............................] - ETA: 3s - loss: 8.3827 - acc: 0.4642Epoch 79/100
Epoch 79/100
Epoch 80/100
Epoch 82/100
  384/40000 [..............................] - ETA: 6s - loss: 8.7358 - acc: 0.4453Epoch 82/100
Epoch 82/100
Epoch 83/100
Epoch 83/100
  128/40000 [..............................] - ETA: 2s - loss: 9.2977 - acc: 0.4141Epoch 83/100
Epoch 83/100
Epoch 84/100
Epoch 84/100
Epoch 84/100
Epoch 84/100
Epoch 86/100
 3328/40000 [=>............................] - ETA: 1s - loss: 8.4878 - acc: 0.4591Epoch 86/100
Epoch 86/100
Ep

  % delta_t_median)


Epoch 95/100
Epoch 95/100
Epoch 95/100
 2048/40000 [>.............................] - ETA: 2s - loss: 14.6228 - acc: 0.0928Epoch 95/100
Epoch 96/100
Epoch 96/100
Epoch 96/100
Epoch 97/100
Epoch 99/100
Epoch 99/100
Epoch 99/100
Epoch 100/100
Epoch 100/100
Epoch 100/100
Epoch 100/100
Epoch 1/100
Epoch 1/100
Epoch 1/100
Epoch 2/100
Epoch 2/100
Epoch 2/100
Epoch 3/100

  % delta_t_median)


Epoch 3/100
Epoch 3/100
Epoch 3/100
Epoch 4/100
Epoch 4/100
Epoch 4/100
Epoch 5/100
Epoch 5/100
Epoch 5/100
Epoch 5/100
Epoch 6/100
Epoch 6/100
Epoch 7/100
Epoch 7/100
Epoch 7/100
Epoch 9/100
Epoch 9/100
Epoch 9/100
Epoch 9/100
Epoch 10/100
Epoch 10/100
Epoch 10/100
 4352/40000 [==>...........................] - ETA: 2s - loss: 9.2362 - acc: 0.4233Epoch 13/100
Epoch 14/100
Epoch 14/100
Epoch 15/100
Epoch 15/100
 1408/40000 [>.............................] - ETA: 1s - loss: 9.3058 - acc: 0.4169Epoch 15/100
 4864/40000 [==>...........................] - ETA: 1s - loss: 7.4546 - acc: 0.5319Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 18/100
Epoch 18/100
Epoch 19/100
Epoch 19/100
 2688/40000 [=>............................] - ETA: 1s - loss: 9.5643 - acc: 0.4025Epoch 19/100
Epoch 20/100
Epoch 22/100
Epoch 23/100
Epoch 23/100
 4992/40000 [==>...........................] - ETA: 1s - loss: 9.5211 - acc: 0.4044Epoch 23/100
Epoch 24/100
Epoch 24/100
Epoch 24/100
Epoch 26/100
Epoch 

  % delta_t_median)


Epoch 45/100
Epoch 46/100
Epoch 46/100
  128/40000 [..............................] - ETA: 2s - loss: 6.2155 - acc: 0.6016Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 48/100
Epoch 48/100
Epoch 49/100
Epoch 49/100
Epoch 49/100
Epoch 50/100
Epoch 50/100
Epoch 50/100
Epoch 51/100
Epoch 51/100
Epoch 53/100
Epoch 53/100
 5760/40000 [===>..........................] - ETA: 1s - loss: 7.3250 - acc: 0.5411Epoch 54/100
Epoch 54/100
Epoch 54/100
Epoch 54/100
Epoch 55/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 57/100
Epoch 58/100
Epoch 58/100
 9088/40000 [=====>........................] - ETA: 1s - loss: 8.9803 - acc: 0.4386Epoch 58/100
Epoch 58/100
Epoch 59/100
Epoch 59/100
Epoch 59/100
Epoch 60/100
Epoch 60/100
Epoch 62/100
 6912/40000 [====>.........................] - ETA: 1s - loss: 8.9734 - acc: 0.4371Epoch 62/100
Epoch 62/100
Epoch 63/100
Epoch 63/100
Epoch 63/100
 9088/40000 [=====>........................] - ETA: 1s - loss: 8.8045 - acc: 0.4483Epoch 64/100
 1408/40000 [>............

Epoch 67/100
Epoch 67/100
Epoch 68/100
Epoch 68/100
Epoch 69/100
  256/40000 [..............................] - ETA: 11s - loss: 11.1822 - acc: 0.3008Epoch 68/100
Epoch 70/100
Epoch 71/100
Epoch 71/100
Epoch 71/100
  128/40000 [..............................] - ETA: 1s - loss: 10.0738 - acc: 0.3750Epoch 71/100
 1024/40000 [..............................] - ETA: 2s - loss: 6.7401 - acc: 0.5742Epoch 72/100
Epoch 72/100
 2816/40000 [=>............................] - ETA: 1s - loss: 10.7508 - acc: 0.3292Epoch 72/100
Epoch 73/100
Epoch 73/100
 1408/40000 [>.............................] - ETA: 1s - loss: 11.1840 - acc: 0.3033Epoch 73/100
Epoch 75/100
Epoch 76/100
Epoch 76/100
Epoch 76/100
  896/40000 [..............................] - ETA: 2s - loss: 10.8570 - acc: 0.3248Epoch 76/100
Epoch 77/100
Epoch 77/100
Epoch 77/100
 8192/40000 [=====>........................] - ETA: 1s - loss: 8.8296 - acc: 0.4462Epoch 80/100
Epoch 79/100
Epoch 79/100
Epoch 81/100
Epoch 80/100
Epoch 82/100
Epoch 81/1

Epoch 7/100
Epoch 7/100
Epoch 7/100
Epoch 7/100
 3712/40000 [=>............................] - ETA: 1s - loss: 13.5145 - acc: 0.1614Epoch 8/100
Epoch 8/100
Epoch 9/100
Epoch 9/100
Epoch 9/100
Epoch 10/100
 3712/40000 [=>............................] - ETA: 1s - loss: 14.3812 - acc: 0.1078Epoch 10/100
Epoch 11/100
Epoch 11/100
Epoch 11/100
 4352/40000 [==>...........................] - ETA: 1s - loss: 14.4256 - acc: 0.1050Epoch 12/100
Epoch 11/100
Epoch 13/100
Epoch 13/100
Epoch 13/100
 3712/40000 [=>............................] - ETA: 1s - loss: 14.6374 - acc: 0.0919Epoch 14/100
Epoch 16/100
Epoch 15/100
Epoch 16/100
 2176/40000 [>.............................] - ETA: 2s - loss: 14.5626 - acc: 0.0965Epoch 16/100
Epoch 17/100
Epoch 17/100
Epoch 17/100
Epoch 18/100
Epoch 18/100
Epoch 19/100
Epoch 19/100
 2432/40000 [>.............................] - ETA: 1s - loss: 14.6269 - acc: 0.0925Epoch 19/100
 4736/40000 [==>...........................] - ETA: 1s - loss: 14.5730 - acc: 0.0959Epoch

Epoch 46/100
Epoch 45/100
 6144/40000 [===>..........................] - ETA: 1s - loss: 14.5939 - acc: 0.0946Epoch 45/100
Epoch 46/100
Epoch 46/100
Epoch 49/100
Epoch 50/100
Epoch 49/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 50/100
Epoch 50/100
Epoch 51/100
 3200/40000 [=>............................] - ETA: 3s - loss: 11.9173 - acc: 0.2606Epoch 52/100
Epoch 51/100
Epoch 53/100
 3456/40000 [=>............................] - ETA: 1s - loss: 11.5756 - acc: 0.2818Epoch 54/100
 5120/40000 [==>...........................] - ETA: 1s - loss: 11.6947 - acc: 0.2744Epoch 53/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 54/100
Epoch 54/100
Epoch 56/100
Epoch 55/100
Epoch 55/100
 2432/40000 [>.............................] - ETA: 1s - loss: 14.5938 - acc: 0.0946Epoch 57/100
 8320/40000 [=====>........................] - ETA: 1s - loss: 14.5431 - acc: 0.0977Epoch 57/100
Epoch 58/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
  128/40000 [...............

Epoch 90/100
Epoch 89/100
Epoch 89/100
Epoch 91/100
Epoch 91/100
Epoch 91/100
Epoch 92/100
Epoch 92/100
 5632/40000 [===>..........................] - ETA: 1s - loss: 11.9627 - acc: 0.2578Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 93/100
Epoch 95/100
Epoch 95/100
Epoch 97/100
Epoch 96/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
 4224/40000 [==>...........................] - ETA: 2s - loss: 14.5956 - acc: 0.0945Epoch 97/100
 6912/40000 [====>.........................] - ETA: 1s - loss: 14.4788 - acc: 0.1017Epoch 97/100
Epoch 100/100
 8704/40000 [=====>........................] - ETA: 1s - loss: 14.4570 - acc: 0.1031Epoch 100/100
[CV]  beta_2=0.9949999999999999, lr=0.025, score=0.09735, total= 3.8min
[CV]  beta_2=0.9949999999999999, lr=0.025, score=0.0973, total= 3.8min
[CV]  beta_2=0.9949999999999999, lr=0.125, score=0.16695, total= 3.8min
[CV] beta_2=0.9949999999999999, lr=0.625 .............................
Epoch 1/100
Epoch 1/100
Epoch 1/100
 2048/40000 [>...........................

Epoch 26/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 29/100
Epoch 29/100
Epoch 30/100
 2816/40000 [=>............................] - ETA: 3s - loss: 8.7230 - acc: 0.4588Epoch 30/100
 5632/40000 [===>..........................] - ETA: 2s - loss: 8.8139 - acc: 0.4531Epoch 30/100
Epoch 32/100
Epoch 32/100
Epoch 34/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 37/100
Epoch 37/100
Epoch 38/100
Epoch 38/100
 5632/40000 [===>..........................] - ETA: 2s - loss: 11.1184 - acc: 0.3102Epoch 38/100
Epoch 39/100
Epoch 42/100
Epoch 41/100
Epoch 41/100
Epoch 42/100
 6784/40000 [====>.........................] - ETA: 2s - loss: 8.7243 - acc: 0.4587Epoch 42/100
Epoch 43/100
Epoch 43/100
Epoch 45/100
Epoch 45/100
Epoch 45/100
Epoch 46/100
Epoch 46/100
Epoch 46/100
Epoch 48/100
Epoch 47/100
 4864/40000 [==>...........................] - ETA: 1s - loss: 12.9303 - acc: 0.1978Epoch 47/10

Epoch 50/100
Epoch 49/100
  128/40000 [..............................] - ETA: 2s - loss: 13.0960 - acc: 0.1875Epoch 49/100
Epoch 49/100
Epoch 50/100
 3200/40000 [=>............................] - ETA: 2s - loss: 8.8297 - acc: 0.4522Epoch 50/100
Epoch 52/100
  384/40000 [..............................] - ETA: 6s - loss: 8.0171 - acc: 0.5026Epoch 51/100
Epoch 51/100
Epoch 53/100
Epoch 53/100
Epoch 55/100
Epoch 54/100
Epoch 54/100
  128/40000 [..............................] - ETA: 2s - loss: 12.8441 - acc: 0.2031Epoch 56/100
Epoch 55/100
Epoch 55/100
Epoch 58/100
Epoch 58/100
Epoch 58/100
Epoch 59/100
 3584/40000 [=>............................] - ETA: 1s - loss: 13.0150 - acc: 0.1925Epoch 60/100
 6528/40000 [===>..........................] - ETA: 1s - loss: 8.7899 - acc: 0.4547Epoch 59/100
Epoch 60/100
Epoch 60/100
 9088/40000 [=====>........................] - ETA: 1s - loss: 8.7596 - acc: 0.4565Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
 8320/40000 [=====>....................

Epoch 92/100
Epoch 94/100
Epoch 93/100
Epoch 95/100
Epoch 96/100
 8448/40000 [=====>........................] - ETA: 1s - loss: 12.9605 - acc: 0.1959