# Assignment 4

**Submission deadline: last lab session before or on Wednesday, 22.11.17**

**Points: 11 + 4 bonus points**


## Downloading this notebook

This assignment is an Jupyter notebook. Download it by cloning https://github.com/janchorowski/nn_assignments. Follow the instructions in its README for instructions.

Please do not hesitate to use GitHub’s pull requests to send us corrections!

# Starter code: network for Irises in Pytorch


In the following cells a feedforward neural network has been implemented with the aid of PyTorch and its autograd mechanism. Please study the code - many network implementations follow a similar pattern.

The provided network trains to nearly 100% accuracy on Iris using Batch Gradient Descent.

In [10]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


In [11]:
import numpy as np
import torch
import torch.nn.functional as F
from torch.autograd import Variable
import os


class Layer(object):
    
    def train_mode(self):
        """Put layer into training mode.
        """
        pass
    
    def eval_mode(self):
        """Put layer into evalation mode.
        """
        pass
    
    @property
    def parameters(self):
        return []
    

class AffineLayer(Layer):
    def __init__(self, num_in, num_out):
        self.W = Variable(torch.FloatTensor(num_in, num_out),
                          requires_grad=True)
        self.W.name = 'W'
        self.b = Variable(torch.FloatTensor(1, num_out),
                          requires_grad=True)
        self.b.name = 'b'
    
    @property
    def parameters(self):
        return [self.W, self.b]
    
    def forward(self, x):
        return x.mm(self.W) + self.b

    
class TanhLayer(Layer):
    def forward(self, x):
        return F.tanh(x)

    
class  ReLULayer(Layer):
    def forward(self, x):
        return F.relu(x)


class SoftMaxLayer(Layer):
    def forward(self, x):
        return F.softmax(x)

In [12]:
class FeedforwardNet(object):
    def __init__(self, layers):
        self.layers = layers

    @property
    def parameters(self):
        params = []
        for layer in self.layers:
            params += layer.parameters
        return params

    @parameters.setter
    def parameters(self, values):
        for ownP, newP in zip(self.parameters, values):
            ownP.data = newP.data
    
    def train_mode(self):
        for layer in self.layers:
            layer.train_mode()
    
    def eval_mode(self):
        for layer in self.layers:
            layer.eval_mode()    
    
    def forward(self, x):
        for layer in self.layers:
            x = layer.forward(x)
        return x
    
    def loss(self, outputs, targets):
        return torch.mean(-torch.log(torch.gather(
            outputs, 1, targets.unsqueeze(1))))

In [13]:
from sklearn import datasets
import torchvision


iris = datasets.load_iris()
IrisX = iris.data.astype(np.float32)
IrisX = (IrisX - IrisX.mean(axis=0, keepdims=True)) / IrisX.std(axis=0, keepdims=True)
IrisY = iris.target

def GD(model, x, y, alpha=1e-4, max_iters=1000000, tolerance=1e-6):
    """Simple batch gradient descent"""
    try:
        old_loss = np.inf
        x = Variable(torch.from_numpy(x), requires_grad=False)
#         print(x)
        y = Variable(torch.from_numpy(y.astype(np.int64)), requires_grad=False)
        model.train_mode()
        for i in xrange(max_iters):
            outputs = model.forward(x)
            loss = model.loss(outputs, y)

            loss.backward()
            for p in model.parameters:
                p.data -= p.grad.data * alpha
                # Zero gradients for the next iteration
                p.grad.data.zero_()

            loss = loss.data[0]
            if old_loss < loss:
                print "Iter: %d, loss increased!" % (i,)
            if (old_loss - loss) < tolerance:
                print "Tolerance level reached. Exiting."
                break
            if i % 1000 == 0:
                _, predictions = outputs.data.max(dim=1)
                err_rate = 100.0 * (predictions != y.data).sum() / outputs.size(0)
                print "Iteration {0: >6} | loss {1: >5.2f} | err rate  {2: >5.2f}%" \
                      .format(i, loss, err_rate)
            old_loss = loss
    except KeyboardInterrupt:
        pass

In [14]:
model = FeedforwardNet(
    [AffineLayer(4, 10),
     TanhLayer(),
     AffineLayer(10, 3),
     SoftMaxLayer(),
    ])

# Initialize parameters
for p in model.parameters:
    if p.name == 'W':
        # p.data.normal_(0, 0.05)
        p.data.uniform_(-0.1, 0.1)
    elif p.name == 'b':
        p.data.zero_()
    else:
        raise ValueError('Unknown parameter name "%s"' % p.name)

# Train
GD(model, IrisX, IrisY, alpha=1e-1, tolerance=1e-7)

Iteration      0 | loss  1.09 | err rate  48.00%
Iteration   1000 | loss  0.05 | err rate   2.00%
Iteration   2000 | loss  0.04 | err rate   2.00%
Iteration   3000 | loss  0.04 | err rate   1.33%
Iteration   4000 | loss  0.04 | err rate   1.33%
Iteration   5000 | loss  0.04 | err rate   1.33%
Iteration   6000 | loss  0.04 | err rate   1.33%
Iteration   7000 | loss  0.04 | err rate   1.33%
Iteration   8000 | loss  0.04 | err rate   1.33%
Iteration   9000 | loss  0.04 | err rate   1.33%
Iteration  10000 | loss  0.04 | err rate   1.33%
Iteration  11000 | loss  0.04 | err rate   1.33%
Iteration  12000 | loss  0.04 | err rate   1.33%
Iteration  13000 | loss  0.04 | err rate   1.33%
Iteration  14000 | loss  0.04 | err rate   1.33%
Iteration  15000 | loss  0.04 | err rate   1.33%
Iteration  16000 | loss  0.04 | err rate   1.33%
Iteration  17000 | loss  0.04 | err rate   1.33%
Iteration  18000 | loss  0.04 | err rate   1.33%
Iteration  19000 | loss  0.04 | err rate   1.33%
Iteration  20000 | l

# Starter code for MNIST and SGD scaffolding

In [15]:
import torch
import torchvision


batch_size = 128
data_path = os.environ.get('PYTORCH_DATA_PATH', '../data')

transform = torchvision.transforms.Compose(
    [torchvision.transforms.ToTensor(),
     torchvision.transforms.Normalize((0.1307,), (0.3081,)),
    ])

_test = torchvision.datasets.MNIST(
    data_path, train=False, download=True, transform=transform)

# Load training data, split into train and valid sets
_train = torchvision.datasets.MNIST(
    data_path, train=True, download=True, transform=transform)
_train.train_data = _train.train_data[:50000]
_train.train_labels = _train.train_labels[:50000]

_valid = torchvision.datasets.MNIST(
    data_path, train=True, download=True, transform=transform)
_valid.train_data = _valid.train_data[50000:]
_valid.train_labels = _valid.train_labels[50000:]

mnist_loaders = {
    'train': torch.utils.data.DataLoader(
        _train, batch_size=batch_size, shuffle=True),
    'valid': torch.utils.data.DataLoader(
        _valid, batch_size=batch_size, shuffle=False),
    'test': torch.utils.data.DataLoader(
        _test, batch_size=batch_size, shuffle=False)}
# print(_train.train_data[0])

In [16]:
def compute_error_rate(model, data_loader, cuda=False):
    model.eval_mode()
    num_errs = 0.0
    num_examples = 0
    for x, y in data_loader:
        if cuda:
            x = x.cuda()
            y = y.cuda()
        x = Variable(x.view(x.size(0), -1), volatile=True)
        y = Variable(y, volatile=True)
        outputs = model.forward(x)
        _, predictions = outputs.data.max(dim=1)
        num_errs += (predictions != y.data).sum()
        num_examples += x.size(0)
    return 100.0 * num_errs / num_examples

def SGD(model, data_loaders, alpha=1e-4, num_epochs=1, patience_expansion=1.5,
        log_every=100, cuda=False):
    if cuda:
        for p in model.parameters:
            p.data = p.data.cuda()
    #
    # TODO: Initialize momentum variables
    # Hint: You need one velocity matrix for each parameter
    velocities = [np.zeros(pars.data.shape) for pars in model.parameters]
#     print(model.parameters)
#     print(velocities)
    iter_ = 0
    epoch = 0
    best_params = None
    best_val_err = np.inf
    history = {'train_losses': [], 'train_errs': [], 'val_errs': []}
    print('Training the model!')
    print('Interrupt at any time to evaluate the best validation model so far.')
    
#     alphaB = 1.
#     alphaC = 1e4
    alpha0 = 1e-2
    alphaC = 0.995
    try:
        while epoch < num_epochs:
            model.train_mode()
            epoch += 1
            for x, y in data_loaders['train']:
                if cuda:
                    x = x.cuda()
                    y = y.cuda()
                iter_ += 1
                x = Variable(x.view(x.size(0), -1), requires_grad=False)
                y = Variable(y, requires_grad=False)
                
                out = model.forward(x)
                loss = model.loss(out, y)
                loss.backward()
                _, predictions = out.data.max(dim=1)
                err_rate = 100.0 * (predictions != y.data).sum() / out.size(0)

                history['train_losses'].append(loss.data[0])
                history['train_errs'].append(err_rate)

                for p, v in zip(model.parameters, velocities):
                    if p.name == 'W':
                        pass
                    #
                    # TODO: Implement weight decay addition to gradients
                    p.grad.data += p.data * 1e-4 #beta
                    
                    #
                    # TODO: Update learning rate
                    # Hint: Use the iteration counter i
#                     alpha = alpha0 * alphaC ** iter_ #based on first equation from slides
                    alpha = 70.0 / (10000.0 + iter_)
                    
                    #
                    # TODO: Set the momentum constant 
                    epsilon = 0.98
                    
                    #
                    # TODO: Implement velocity update in momentum
#                     print(v)
#                     print(alpha * p.grad.data)
                    v = (epsilon * torch.Tensor(v)) + (alpha * p.grad.data)
                    #
                    
                    #
                    # TODO: Set a more sensible learning rule here,
                    #       using your learning rate schedule and momentum
                    # 
#                     p.data -= alpha * p.grad.data
                    p.data -= v
                    # Zero gradients for the next iteration
                    p.grad.data.zero_()

                if iter_ % log_every == 0:
                    print "Minibatch {0: >6}  | loss {1: >5.2f} | err rate {2: >5.2f}%" \
                          .format(iter_, loss.data[0], err_rate)
            
            val_err_rate = compute_error_rate(model, data_loaders['valid'], cuda)
            history['val_errs'].append((iter_, val_err_rate))
            
            if val_err_rate < best_val_err:
                # Adjust num of epochs
                num_epochs = int(np.maximum(num_epochs, epoch * patience_expansion + 1))
                best_epoch = epoch
                best_val_err = val_err_rate
                best_params = [p.clone().cpu() for p in model.parameters]
            m = "After epoch {0: >2} | valid err rate: {1: >5.2f}% | doing {2: >3} epochs" \
                .format(epoch, val_err_rate, num_epochs)
            print '{0}\n{1}\n{0}'.format('-' * len(m), m)

    except KeyboardInterrupt:
        pass
    if best_params is not None:
        print "\nLoading best params on validation set (epoch %d)\n" %(best_epoch)
        model.parameters = best_params
    plot_history(history)

def plot_history(history):
    figsize(16, 4)
    subplot(1,2,1)
    train_loss = np.array(history['train_losses'])
    semilogy(np.arange(train_loss.shape[0]), train_loss, label='batch train loss')
    legend()
        
    subplot(1,2,2)
    train_errs = np.array(history['train_errs'])
    plot(np.arange(train_errs.shape[0]), train_errs, label='batch train error rate')
    val_errs = np.array(history['val_errs'])
    plot(val_errs[:,0], val_errs[:,1], label='validation error rate', color='r')
    ylim(0,20)
    legend()

# Problem 1: Stochastic Gradient Descent [3p]
Implement the following additions to the SGD code provided above:
  1. **[1p]** momentum
  2. **[1p]** learning rate schedule
  3. **[1p]** weight decay, in which we additionally minimize for each weight matrix (but typically not the bias) the sum of its elements squared. One way to implement it is to use function `model.parameters` and select all parameters whose names are "`W`" and not "`b`".

# Problem 2: Tuning the Network for MNIST [4p]

Tune the following network to reach **validation error rate below 1.9%**.
This should result in a **test error rate below 2%**. To
tune the network you will need to:
1. Choose the number of layers (more than 1, less than 5);
2. Choose the number of neurons in each layer (more than 100,
    less than 5000);
3. Pick proper weight initialization;
4. Pick proper learning rate schedule (need to decay over time,
    good range to check on MNIST is about 1e-2 ... 1e-1 at the beginning and
    half of that after 10000 batches);
5. Pick a momentum constant (probably a constant one will be OK).


In [None]:
#
# TODO: Pick a network architecture here.
#       The one below is just softmax regression.
#

model = FeedforwardNet(
    [
     AffineLayer(784, 1500),
     ReLULayer(),
     AffineLayer(1500, 10),
     SoftMaxLayer()
    ])

# Initialize parameters
for p in model.parameters:
    if p.name == 'W':
#         p.data.normal_(0, 0.2)
        p.data.uniform_(-0.1, 0.1)
    elif p.name == 'b':
        p.data.zero_()
    else:
        raise ValueError('Unknown parameter name "%s"' % p.name)

# On lab computers you can set cuda=True !
SGD(model, mnist_loaders, alpha=1e-1, cuda=False)

print "Test error rate: %.2f%%" % compute_error_rate(model, mnist_loaders['test'])

Training the model!
Interrupt at any time to evaluate the best validation model so far.
Minibatch    100  | loss  0.67 | err rate 16.41%
Minibatch    200  | loss  0.38 | err rate 12.50%
Minibatch    300  | loss  0.27 | err rate  6.25%
----------------------------------------------------------
After epoch  1 | valid err rate:  8.81% | doing   2 epochs
----------------------------------------------------------
Minibatch    400  | loss  0.34 | err rate 10.16%
Minibatch    500  | loss  0.28 | err rate 10.16%
Minibatch    600  | loss  0.33 | err rate  7.03%
Minibatch    700  | loss  0.24 | err rate  8.59%
----------------------------------------------------------
After epoch  2 | valid err rate:  7.33% | doing   4 epochs
----------------------------------------------------------
Minibatch    800  | loss  0.41 | err rate 10.94%
Minibatch    900  | loss  0.28 | err rate  9.38%
Minibatch   1000  | loss  0.21 | err rate  6.25%
Minibatch   1100  | loss  0.18 | err rate  3.12%
-------------------

Minibatch   8700  | loss  0.16 | err rate  3.91%
Minibatch   8800  | loss  0.09 | err rate  2.34%
Minibatch   8900  | loss  0.09 | err rate  1.56%
----------------------------------------------------------
After epoch 23 | valid err rate:  3.49% | doing  34 epochs
----------------------------------------------------------
Minibatch   9000  | loss  0.09 | err rate  2.34%
Minibatch   9100  | loss  0.09 | err rate  2.34%
Minibatch   9200  | loss  0.18 | err rate  3.91%
Minibatch   9300  | loss  0.04 | err rate  0.78%
----------------------------------------------------------
After epoch 24 | valid err rate:  3.46% | doing  37 epochs
----------------------------------------------------------
Minibatch   9400  | loss  0.12 | err rate  3.91%
Minibatch   9500  | loss  0.09 | err rate  2.34%
Minibatch   9600  | loss  0.13 | err rate  3.91%
Minibatch   9700  | loss  0.12 | err rate  3.12%
----------------------------------------------------------
After epoch 25 | valid err rate:  3.41% | doing 

Minibatch  17500  | loss  0.05 | err rate  0.78%
----------------------------------------------------------
After epoch 45 | valid err rate:  3.07% | doing  68 epochs
----------------------------------------------------------
Minibatch  17600  | loss  0.04 | err rate  0.00%
Minibatch  17700  | loss  0.05 | err rate  0.78%
Minibatch  17800  | loss  0.05 | err rate  0.78%
Minibatch  17900  | loss  0.05 | err rate  1.56%
----------------------------------------------------------
After epoch 46 | valid err rate:  3.08% | doing  68 epochs
----------------------------------------------------------
Minibatch  18000  | loss  0.04 | err rate  0.78%
Minibatch  18100  | loss  0.06 | err rate  1.56%
Minibatch  18200  | loss  0.06 | err rate  1.56%
Minibatch  18300  | loss  0.11 | err rate  1.56%
----------------------------------------------------------
After epoch 47 | valid err rate:  3.10% | doing  68 epochs
----------------------------------------------------------
Minibatch  18400  | loss  0.

Minibatch  26200  | loss  0.02 | err rate  0.00%
Minibatch  26300  | loss  0.04 | err rate  0.00%
Minibatch  26400  | loss  0.07 | err rate  1.56%
Minibatch  26500  | loss  0.12 | err rate  2.34%
----------------------------------------------------------
After epoch 68 | valid err rate:  3.01% | doing  92 epochs
----------------------------------------------------------
Minibatch  26600  | loss  0.06 | err rate  0.78%
Minibatch  26700  | loss  0.10 | err rate  2.34%
Minibatch  26800  | loss  0.03 | err rate  0.00%
Minibatch  26900  | loss  0.05 | err rate  0.78%
----------------------------------------------------------
After epoch 69 | valid err rate:  2.99% | doing  92 epochs
----------------------------------------------------------
Minibatch  27000  | loss  0.07 | err rate  2.34%
Minibatch  27100  | loss  0.04 | err rate  0.00%
Minibatch  27200  | loss  0.03 | err rate  0.00%
Minibatch  27300  | loss  0.06 | err rate  1.56%
----------------------------------------------------------

Minibatch  35000  | loss  0.05 | err rate  1.56%
Minibatch  35100  | loss  0.04 | err rate  0.78%
----------------------------------------------------------
After epoch 90 | valid err rate:  2.88% | doing 136 epochs
----------------------------------------------------------
Minibatch  35200  | loss  0.05 | err rate  1.56%
Minibatch  35300  | loss  0.03 | err rate  0.78%
Minibatch  35400  | loss  0.03 | err rate  0.78%
Minibatch  35500  | loss  0.02 | err rate  0.78%
----------------------------------------------------------
After epoch 91 | valid err rate:  2.92% | doing 136 epochs
----------------------------------------------------------
Minibatch  35600  | loss  0.14 | err rate  2.34%
Minibatch  35700  | loss  0.04 | err rate  1.56%
Minibatch  35800  | loss  0.03 | err rate  0.00%
Minibatch  35900  | loss  0.06 | err rate  1.56%
----------------------------------------------------------
After epoch 92 | valid err rate:  2.91% | doing 136 epochs
--------------------------------------

Minibatch  43700  | loss  0.04 | err rate  0.78%
-----------------------------------------------------------
After epoch 112 | valid err rate:  2.87% | doing 161 epochs
-----------------------------------------------------------
Minibatch  43800  | loss  0.04 | err rate  0.78%
Minibatch  43900  | loss  0.05 | err rate  0.78%
Minibatch  44000  | loss  0.05 | err rate  0.78%
Minibatch  44100  | loss  0.03 | err rate  1.56%
-----------------------------------------------------------
After epoch 113 | valid err rate:  2.87% | doing 161 epochs
-----------------------------------------------------------
Minibatch  44200  | loss  0.02 | err rate  0.00%
Minibatch  44300  | loss  0.03 | err rate  0.00%
Minibatch  44400  | loss  0.04 | err rate  0.00%
Minibatch  44500  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 114 | valid err rate:  2.88% | doing 161 epochs
-----------------------------------------------------------
Minibatch  44600  |

-----------------------------------------------------------
After epoch 134 | valid err rate:  2.84% | doing 200 epochs
-----------------------------------------------------------
Minibatch  52400  | loss  0.02 | err rate  0.00%
Minibatch  52500  | loss  0.02 | err rate  0.00%
Minibatch  52600  | loss  0.04 | err rate  0.00%
Minibatch  52700  | loss  0.06 | err rate  2.34%
-----------------------------------------------------------
After epoch 135 | valid err rate:  2.80% | doing 203 epochs
-----------------------------------------------------------
Minibatch  52800  | loss  0.03 | err rate  1.56%
Minibatch  52900  | loss  0.02 | err rate  0.00%
Minibatch  53000  | loss  0.04 | err rate  0.00%
Minibatch  53100  | loss  0.03 | err rate  0.78%
-----------------------------------------------------------
After epoch 136 | valid err rate:  2.81% | doing 203 epochs
-----------------------------------------------------------
Minibatch  53200  | loss  0.04 | err rate  0.00%
Minibatch  53300  |

Minibatch  61000  | loss  0.04 | err rate  0.00%
Minibatch  61100  | loss  0.04 | err rate  0.78%
Minibatch  61200  | loss  0.05 | err rate  0.78%
Minibatch  61300  | loss  0.04 | err rate  0.78%
-----------------------------------------------------------
After epoch 157 | valid err rate:  2.80% | doing 215 epochs
-----------------------------------------------------------
Minibatch  61400  | loss  0.03 | err rate  0.78%
Minibatch  61500  | loss  0.02 | err rate  0.00%
Minibatch  61600  | loss  0.04 | err rate  0.78%
Minibatch  61700  | loss  0.07 | err rate  1.56%
-----------------------------------------------------------
After epoch 158 | valid err rate:  2.79% | doing 215 epochs
-----------------------------------------------------------
Minibatch  61800  | loss  0.07 | err rate  0.78%
Minibatch  61900  | loss  0.03 | err rate  0.00%
Minibatch  62000  | loss  0.04 | err rate  0.78%
Minibatch  62100  | loss  0.03 | err rate  0.78%
----------------------------------------------------

Minibatch  69700  | loss  0.02 | err rate  0.00%
Minibatch  69800  | loss  0.03 | err rate  0.00%
Minibatch  69900  | loss  0.03 | err rate  0.78%
-----------------------------------------------------------
After epoch 179 | valid err rate:  2.79% | doing 260 epochs
-----------------------------------------------------------
Minibatch  70000  | loss  0.03 | err rate  0.78%
Minibatch  70100  | loss  0.02 | err rate  0.78%
Minibatch  70200  | loss  0.02 | err rate  0.00%
Minibatch  70300  | loss  0.04 | err rate  0.78%
-----------------------------------------------------------
After epoch 180 | valid err rate:  2.80% | doing 260 epochs
-----------------------------------------------------------
Minibatch  70400  | loss  0.02 | err rate  0.00%
Minibatch  70500  | loss  0.02 | err rate  0.00%
Minibatch  70600  | loss  0.02 | err rate  0.00%
Minibatch  70700  | loss  0.03 | err rate  0.78%
-----------------------------------------------------------
After epoch 181 | valid err rate:  2.79% 

Minibatch  78400  | loss  0.03 | err rate  0.00%
Minibatch  78500  | loss  0.03 | err rate  0.00%
-----------------------------------------------------------
After epoch 201 | valid err rate:  2.74% | doing 284 epochs
-----------------------------------------------------------
Minibatch  78600  | loss  0.05 | err rate  1.56%
Minibatch  78700  | loss  0.04 | err rate  1.56%
Minibatch  78800  | loss  0.04 | err rate  1.56%
Minibatch  78900  | loss  0.06 | err rate  0.78%
-----------------------------------------------------------
After epoch 202 | valid err rate:  2.76% | doing 284 epochs
-----------------------------------------------------------
Minibatch  79000  | loss  0.03 | err rate  1.56%
Minibatch  79100  | loss  0.03 | err rate  0.00%
Minibatch  79200  | loss  0.02 | err rate  0.00%
Minibatch  79300  | loss  0.01 | err rate  0.00%
-----------------------------------------------------------
After epoch 203 | valid err rate:  2.72% | doing 305 epochs
------------------------------

Minibatch  87100  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 223 | valid err rate:  2.73% | doing 305 epochs
-----------------------------------------------------------
Minibatch  87200  | loss  0.01 | err rate  0.00%
Minibatch  87300  | loss  0.03 | err rate  0.78%
Minibatch  87400  | loss  0.05 | err rate  1.56%
Minibatch  87500  | loss  0.03 | err rate  0.78%
-----------------------------------------------------------
After epoch 224 | valid err rate:  2.72% | doing 305 epochs
-----------------------------------------------------------
Minibatch  87600  | loss  0.02 | err rate  0.00%
Minibatch  87700  | loss  0.03 | err rate  0.00%
Minibatch  87800  | loss  0.03 | err rate  0.78%
Minibatch  87900  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 225 | valid err rate:  2.71% | doing 338 epochs
-----------------------------------------------------------
Minibatch  88000  |

-----------------------------------------------------------
After epoch 245 | valid err rate:  2.71% | doing 356 epochs
-----------------------------------------------------------
Minibatch  95800  | loss  0.02 | err rate  0.00%
Minibatch  95900  | loss  0.04 | err rate  0.78%
Minibatch  96000  | loss  0.01 | err rate  0.00%
Minibatch  96100  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 246 | valid err rate:  2.71% | doing 356 epochs
-----------------------------------------------------------
Minibatch  96200  | loss  0.03 | err rate  0.00%
Minibatch  96300  | loss  0.05 | err rate  1.56%
Minibatch  96400  | loss  0.02 | err rate  0.00%
Minibatch  96500  | loss  0.03 | err rate  0.78%
-----------------------------------------------------------
After epoch 247 | valid err rate:  2.71% | doing 356 epochs
-----------------------------------------------------------
Minibatch  96600  | loss  0.02 | err rate  0.00%
Minibatch  96700  |

Minibatch 104400  | loss  0.03 | err rate  0.00%
Minibatch 104500  | loss  0.02 | err rate  0.00%
Minibatch 104600  | loss  0.02 | err rate  0.00%
Minibatch 104700  | loss  0.04 | err rate  0.00%
-----------------------------------------------------------
After epoch 268 | valid err rate:  2.70% | doing 356 epochs
-----------------------------------------------------------
Minibatch 104800  | loss  0.02 | err rate  0.00%
Minibatch 104900  | loss  0.04 | err rate  0.00%
Minibatch 105000  | loss  0.02 | err rate  0.78%
Minibatch 105100  | loss  0.01 | err rate  0.00%
-----------------------------------------------------------
After epoch 269 | valid err rate:  2.70% | doing 356 epochs
-----------------------------------------------------------
Minibatch 105200  | loss  0.01 | err rate  0.00%
Minibatch 105300  | loss  0.04 | err rate  0.78%
Minibatch 105400  | loss  0.02 | err rate  0.00%
Minibatch 105500  | loss  0.05 | err rate  0.78%
----------------------------------------------------

Minibatch 113100  | loss  0.02 | err rate  0.00%
Minibatch 113200  | loss  0.03 | err rate  0.78%
Minibatch 113300  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 290 | valid err rate:  2.70% | doing 424 epochs
-----------------------------------------------------------
Minibatch 113400  | loss  0.02 | err rate  0.00%
Minibatch 113500  | loss  0.03 | err rate  0.78%
Minibatch 113600  | loss  0.02 | err rate  0.00%
Minibatch 113700  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 291 | valid err rate:  2.69% | doing 424 epochs
-----------------------------------------------------------
Minibatch 113800  | loss  0.03 | err rate  0.00%
Minibatch 113900  | loss  0.01 | err rate  0.00%
Minibatch 114000  | loss  0.02 | err rate  0.00%
Minibatch 114100  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 292 | valid err rate:  2.70% 

Minibatch 121800  | loss  0.01 | err rate  0.00%
Minibatch 121900  | loss  0.04 | err rate  0.78%
-----------------------------------------------------------
After epoch 312 | valid err rate:  2.70% | doing 424 epochs
-----------------------------------------------------------
Minibatch 122000  | loss  0.01 | err rate  0.00%
Minibatch 122100  | loss  0.02 | err rate  0.78%
Minibatch 122200  | loss  0.02 | err rate  0.00%
Minibatch 122300  | loss  0.01 | err rate  0.00%
-----------------------------------------------------------
After epoch 313 | valid err rate:  2.70% | doing 424 epochs
-----------------------------------------------------------
Minibatch 122400  | loss  0.02 | err rate  0.00%
Minibatch 122500  | loss  0.04 | err rate  1.56%
Minibatch 122600  | loss  0.03 | err rate  0.78%
Minibatch 122700  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 314 | valid err rate:  2.70% | doing 424 epochs
------------------------------

Minibatch 130500  | loss  0.05 | err rate  0.78%
-----------------------------------------------------------
After epoch 334 | valid err rate:  2.70% | doing 424 epochs
-----------------------------------------------------------
Minibatch 130600  | loss  0.03 | err rate  0.00%
Minibatch 130700  | loss  0.03 | err rate  0.00%
Minibatch 130800  | loss  0.02 | err rate  0.00%
Minibatch 130900  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 335 | valid err rate:  2.70% | doing 424 epochs
-----------------------------------------------------------
Minibatch 131000  | loss  0.02 | err rate  0.00%
Minibatch 131100  | loss  0.03 | err rate  0.00%
Minibatch 131200  | loss  0.01 | err rate  0.00%
Minibatch 131300  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 336 | valid err rate:  2.70% | doing 424 epochs
-----------------------------------------------------------
Minibatch 131400  |

-----------------------------------------------------------
After epoch 356 | valid err rate:  2.67% | doing 512 epochs
-----------------------------------------------------------
Minibatch 139200  | loss  0.04 | err rate  0.78%
Minibatch 139300  | loss  0.02 | err rate  0.00%
Minibatch 139400  | loss  0.02 | err rate  0.00%
Minibatch 139500  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 357 | valid err rate:  2.69% | doing 512 epochs
-----------------------------------------------------------
Minibatch 139600  | loss  0.03 | err rate  0.78%
Minibatch 139700  | loss  0.02 | err rate  0.00%
Minibatch 139800  | loss  0.02 | err rate  0.00%
Minibatch 139900  | loss  0.04 | err rate  0.78%
-----------------------------------------------------------
After epoch 358 | valid err rate:  2.67% | doing 512 epochs
-----------------------------------------------------------
Minibatch 140000  | loss  0.03 | err rate  0.00%
Minibatch 140100  |

Minibatch 147800  | loss  0.01 | err rate  0.00%
Minibatch 147900  | loss  0.03 | err rate  0.78%
Minibatch 148000  | loss  0.02 | err rate  0.78%
Minibatch 148100  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 379 | valid err rate:  2.67% | doing 512 epochs
-----------------------------------------------------------
Minibatch 148200  | loss  0.03 | err rate  0.78%
Minibatch 148300  | loss  0.03 | err rate  0.78%
Minibatch 148400  | loss  0.02 | err rate  0.00%
Minibatch 148500  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 380 | valid err rate:  2.66% | doing 571 epochs
-----------------------------------------------------------
Minibatch 148600  | loss  0.03 | err rate  1.56%
Minibatch 148700  | loss  0.02 | err rate  0.00%
Minibatch 148800  | loss  0.02 | err rate  0.00%
Minibatch 148900  | loss  0.04 | err rate  1.56%
----------------------------------------------------

Minibatch 156500  | loss  0.02 | err rate  0.00%
Minibatch 156600  | loss  0.02 | err rate  0.00%
Minibatch 156700  | loss  0.01 | err rate  0.00%
-----------------------------------------------------------
After epoch 401 | valid err rate:  2.64% | doing 578 epochs
-----------------------------------------------------------
Minibatch 156800  | loss  0.05 | err rate  1.56%
Minibatch 156900  | loss  0.02 | err rate  0.00%
Minibatch 157000  | loss  0.02 | err rate  0.00%
Minibatch 157100  | loss  0.01 | err rate  0.00%
-----------------------------------------------------------
After epoch 402 | valid err rate:  2.64% | doing 578 epochs
-----------------------------------------------------------
Minibatch 157200  | loss  0.02 | err rate  0.00%
Minibatch 157300  | loss  0.02 | err rate  0.00%
Minibatch 157400  | loss  0.03 | err rate  0.00%
Minibatch 157500  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 403 | valid err rate:  2.64% 

Minibatch 165200  | loss  0.02 | err rate  0.00%
Minibatch 165300  | loss  0.01 | err rate  0.00%
-----------------------------------------------------------
After epoch 423 | valid err rate:  2.65% | doing 628 epochs
-----------------------------------------------------------
Minibatch 165400  | loss  0.02 | err rate  0.00%
Minibatch 165500  | loss  0.03 | err rate  0.00%
Minibatch 165600  | loss  0.02 | err rate  0.00%
Minibatch 165700  | loss  0.02 | err rate  0.78%
-----------------------------------------------------------
After epoch 424 | valid err rate:  2.65% | doing 628 epochs
-----------------------------------------------------------
Minibatch 165800  | loss  0.04 | err rate  0.00%
Minibatch 165900  | loss  0.02 | err rate  0.00%
Minibatch 166000  | loss  0.02 | err rate  0.00%
Minibatch 166100  | loss  0.04 | err rate  0.78%
-----------------------------------------------------------
After epoch 425 | valid err rate:  2.65% | doing 628 epochs
------------------------------

Minibatch 173900  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 445 | valid err rate:  2.62% | doing 656 epochs
-----------------------------------------------------------
Minibatch 174000  | loss  0.02 | err rate  0.00%
Minibatch 174100  | loss  0.02 | err rate  0.00%
Minibatch 174200  | loss  0.02 | err rate  0.00%
Minibatch 174300  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 446 | valid err rate:  2.61% | doing 656 epochs
-----------------------------------------------------------
Minibatch 174400  | loss  0.01 | err rate  0.00%
Minibatch 174500  | loss  0.01 | err rate  0.00%
Minibatch 174600  | loss  0.04 | err rate  0.78%
Minibatch 174700  | loss  0.01 | err rate  0.00%
-----------------------------------------------------------
After epoch 447 | valid err rate:  2.62% | doing 656 epochs
-----------------------------------------------------------
Minibatch 174800  |

-----------------------------------------------------------
After epoch 467 | valid err rate:  2.61% | doing 691 epochs
-----------------------------------------------------------
Minibatch 182600  | loss  0.01 | err rate  0.00%
Minibatch 182700  | loss  0.01 | err rate  0.00%
Minibatch 182800  | loss  0.03 | err rate  0.78%
Minibatch 182900  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 468 | valid err rate:  2.60% | doing 691 epochs
-----------------------------------------------------------
Minibatch 183000  | loss  0.02 | err rate  0.00%
Minibatch 183100  | loss  0.02 | err rate  0.00%
Minibatch 183200  | loss  0.02 | err rate  0.00%
Minibatch 183300  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 469 | valid err rate:  2.60% | doing 691 epochs
-----------------------------------------------------------
Minibatch 183400  | loss  0.03 | err rate  0.00%
Minibatch 183500  |

Minibatch 191300  | loss  0.01 | err rate  0.00%
Minibatch 191400  | loss  0.02 | err rate  0.00%
Minibatch 191500  | loss  0.02 | err rate  0.78%
-----------------------------------------------------------
After epoch 490 | valid err rate:  2.59% | doing 733 epochs
-----------------------------------------------------------
Minibatch 191600  | loss  0.01 | err rate  0.00%
Minibatch 191700  | loss  0.02 | err rate  0.00%
Minibatch 191800  | loss  0.01 | err rate  0.00%
Minibatch 191900  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 491 | valid err rate:  2.59% | doing 733 epochs
-----------------------------------------------------------
Minibatch 192000  | loss  0.02 | err rate  0.00%
Minibatch 192100  | loss  0.02 | err rate  0.78%
Minibatch 192200  | loss  0.04 | err rate  0.78%
Minibatch 192300  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 492 | valid err rate:  2.59% 

Minibatch 200000  | loss  0.04 | err rate  1.56%
Minibatch 200100  | loss  0.01 | err rate  0.00%
-----------------------------------------------------------
After epoch 512 | valid err rate:  2.59% | doing 733 epochs
-----------------------------------------------------------
Minibatch 200200  | loss  0.02 | err rate  0.00%
Minibatch 200300  | loss  0.01 | err rate  0.00%
Minibatch 200400  | loss  0.02 | err rate  0.00%
Minibatch 200500  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 513 | valid err rate:  2.59% | doing 733 epochs
-----------------------------------------------------------
Minibatch 200600  | loss  0.02 | err rate  0.00%
Minibatch 200700  | loss  0.05 | err rate  1.56%
Minibatch 200800  | loss  0.03 | err rate  0.00%
Minibatch 200900  | loss  0.04 | err rate  1.56%
-----------------------------------------------------------
After epoch 514 | valid err rate:  2.60% | doing 733 epochs
------------------------------

Minibatch 208700  | loss  0.01 | err rate  0.00%
-----------------------------------------------------------
After epoch 534 | valid err rate:  2.59% | doing 733 epochs
-----------------------------------------------------------
Minibatch 208800  | loss  0.01 | err rate  0.00%
Minibatch 208900  | loss  0.03 | err rate  0.78%
Minibatch 209000  | loss  0.03 | err rate  0.78%
Minibatch 209100  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 535 | valid err rate:  2.59% | doing 733 epochs
-----------------------------------------------------------
Minibatch 209200  | loss  0.01 | err rate  0.00%
Minibatch 209300  | loss  0.02 | err rate  0.00%
Minibatch 209400  | loss  0.02 | err rate  0.00%
Minibatch 209500  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 536 | valid err rate:  2.60% | doing 733 epochs
-----------------------------------------------------------
Minibatch 209600  |

-----------------------------------------------------------
After epoch 556 | valid err rate:  2.60% | doing 733 epochs
-----------------------------------------------------------
Minibatch 217400  | loss  0.01 | err rate  0.00%
Minibatch 217500  | loss  0.02 | err rate  0.00%
Minibatch 217600  | loss  0.03 | err rate  0.00%
Minibatch 217700  | loss  0.01 | err rate  0.00%
-----------------------------------------------------------
After epoch 557 | valid err rate:  2.60% | doing 733 epochs
-----------------------------------------------------------
Minibatch 217800  | loss  0.01 | err rate  0.00%
Minibatch 217900  | loss  0.01 | err rate  0.00%
Minibatch 218000  | loss  0.01 | err rate  0.00%
Minibatch 218100  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 558 | valid err rate:  2.60% | doing 733 epochs
-----------------------------------------------------------
Minibatch 218200  | loss  0.02 | err rate  0.00%
Minibatch 218300  |

Minibatch 226000  | loss  0.02 | err rate  0.00%
Minibatch 226100  | loss  0.02 | err rate  0.78%
Minibatch 226200  | loss  0.01 | err rate  0.00%
Minibatch 226300  | loss  0.01 | err rate  0.00%
-----------------------------------------------------------
After epoch 579 | valid err rate:  2.60% | doing 733 epochs
-----------------------------------------------------------
Minibatch 226400  | loss  0.02 | err rate  0.00%
Minibatch 226500  | loss  0.02 | err rate  0.00%
Minibatch 226600  | loss  0.02 | err rate  0.00%
Minibatch 226700  | loss  0.03 | err rate  0.00%
-----------------------------------------------------------
After epoch 580 | valid err rate:  2.60% | doing 733 epochs
-----------------------------------------------------------
Minibatch 226800  | loss  0.01 | err rate  0.00%
Minibatch 226900  | loss  0.01 | err rate  0.00%
Minibatch 227000  | loss  0.03 | err rate  0.78%
Minibatch 227100  | loss  0.02 | err rate  0.78%
----------------------------------------------------

Minibatch 234700  | loss  0.02 | err rate  0.00%
Minibatch 234800  | loss  0.02 | err rate  0.00%
Minibatch 234900  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 601 | valid err rate:  2.60% | doing 733 epochs
-----------------------------------------------------------
Minibatch 235000  | loss  0.02 | err rate  0.00%
Minibatch 235100  | loss  0.01 | err rate  0.00%
Minibatch 235200  | loss  0.03 | err rate  0.78%
Minibatch 235300  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 602 | valid err rate:  2.59% | doing 733 epochs
-----------------------------------------------------------
Minibatch 235400  | loss  0.01 | err rate  0.00%
Minibatch 235500  | loss  0.01 | err rate  0.00%
Minibatch 235600  | loss  0.02 | err rate  0.00%
Minibatch 235700  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 603 | valid err rate:  2.60% 

Minibatch 243400  | loss  0.01 | err rate  0.00%
Minibatch 243500  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 623 | valid err rate:  2.60% | doing 733 epochs
-----------------------------------------------------------
Minibatch 243600  | loss  0.01 | err rate  0.00%
Minibatch 243700  | loss  0.02 | err rate  0.00%
Minibatch 243800  | loss  0.01 | err rate  0.00%
Minibatch 243900  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 624 | valid err rate:  2.60% | doing 733 epochs
-----------------------------------------------------------
Minibatch 244000  | loss  0.02 | err rate  0.78%
Minibatch 244100  | loss  0.02 | err rate  0.00%
Minibatch 244200  | loss  0.02 | err rate  0.00%
Minibatch 244300  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 625 | valid err rate:  2.60% | doing 733 epochs
------------------------------

Minibatch 252100  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 645 | valid err rate:  2.60% | doing 733 epochs
-----------------------------------------------------------
Minibatch 252200  | loss  0.03 | err rate  0.00%
Minibatch 252300  | loss  0.01 | err rate  0.00%
Minibatch 252400  | loss  0.03 | err rate  0.00%
Minibatch 252500  | loss  0.01 | err rate  0.00%
-----------------------------------------------------------
After epoch 646 | valid err rate:  2.60% | doing 733 epochs
-----------------------------------------------------------
Minibatch 252600  | loss  0.02 | err rate  0.78%
Minibatch 252700  | loss  0.01 | err rate  0.00%
Minibatch 252800  | loss  0.02 | err rate  0.00%
Minibatch 252900  | loss  0.03 | err rate  0.00%
-----------------------------------------------------------
After epoch 647 | valid err rate:  2.60% | doing 733 epochs
-----------------------------------------------------------
Minibatch 253000  |

-----------------------------------------------------------
After epoch 667 | valid err rate:  2.60% | doing 995 epochs
-----------------------------------------------------------
Minibatch 260800  | loss  0.01 | err rate  0.00%
Minibatch 260900  | loss  0.01 | err rate  0.00%
Minibatch 261000  | loss  0.02 | err rate  0.00%
Minibatch 261100  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 668 | valid err rate:  2.59% | doing 995 epochs
-----------------------------------------------------------
Minibatch 261200  | loss  0.02 | err rate  0.78%
Minibatch 261300  | loss  0.02 | err rate  0.00%
Minibatch 261400  | loss  0.01 | err rate  0.00%
Minibatch 261500  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 669 | valid err rate:  2.58% | doing 995 epochs
-----------------------------------------------------------
Minibatch 261600  | loss  0.02 | err rate  0.00%
Minibatch 261700  |

Minibatch 269500  | loss  0.01 | err rate  0.00%
Minibatch 269600  | loss  0.02 | err rate  0.00%
Minibatch 269700  | loss  0.01 | err rate  0.00%
-----------------------------------------------------------
After epoch 690 | valid err rate:  2.57% | doing 995 epochs
-----------------------------------------------------------
Minibatch 269800  | loss  0.02 | err rate  0.00%
Minibatch 269900  | loss  0.02 | err rate  0.00%
Minibatch 270000  | loss  0.02 | err rate  0.00%
Minibatch 270100  | loss  0.02 | err rate  0.00%
-----------------------------------------------------------
After epoch 691 | valid err rate:  2.57% | doing 995 epochs
-----------------------------------------------------------
Minibatch 270200  | loss  0.02 | err rate  0.00%
Minibatch 270300  | loss  0.01 | err rate  0.00%
Minibatch 270400  | loss  0.02 | err rate  0.00%
Minibatch 270500  | loss  0.04 | err rate  0.78%
-----------------------------------------------------------
After epoch 692 | valid err rate:  2.57% 

Minibatch 278100  | loss  0.01 | err rate  0.00%
Minibatch 278200  | loss  0.01 | err rate  0.00%
Minibatch 278300  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 712 | valid err rate:  2.56% | doing 1040 epochs
------------------------------------------------------------
Minibatch 278400  | loss  0.03 | err rate  0.00%
Minibatch 278500  | loss  0.02 | err rate  0.00%
Minibatch 278600  | loss  0.03 | err rate  1.56%
Minibatch 278700  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 713 | valid err rate:  2.57% | doing 1040 epochs
------------------------------------------------------------
Minibatch 278800  | loss  0.03 | err rate  0.00%
Minibatch 278900  | loss  0.02 | err rate  0.00%
Minibatch 279000  | loss  0.03 | err rate  0.78%
Minibatch 279100  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 714 | valid err rate: 

Minibatch 286700  | loss  0.02 | err rate  0.00%
Minibatch 286800  | loss  0.02 | err rate  0.78%
Minibatch 286900  | loss  0.02 | err rate  0.00%
------------------------------------------------------------
After epoch 734 | valid err rate:  2.56% | doing 1040 epochs
------------------------------------------------------------
Minibatch 287000  | loss  0.01 | err rate  0.00%
Minibatch 287100  | loss  0.02 | err rate  0.00%
Minibatch 287200  | loss  0.01 | err rate  0.00%
Minibatch 287300  | loss  0.04 | err rate  0.78%
------------------------------------------------------------
After epoch 735 | valid err rate:  2.56% | doing 1040 epochs
------------------------------------------------------------
Minibatch 287400  | loss  0.02 | err rate  0.78%
Minibatch 287500  | loss  0.02 | err rate  0.00%
Minibatch 287600  | loss  0.03 | err rate  0.78%
Minibatch 287700  | loss  0.02 | err rate  0.00%
------------------------------------------------------------
After epoch 736 | valid err rate: 

Minibatch 295300  | loss  0.01 | err rate  0.00%
Minibatch 295400  | loss  0.01 | err rate  0.00%
Minibatch 295500  | loss  0.02 | err rate  0.00%
------------------------------------------------------------
After epoch 756 | valid err rate:  2.56% | doing 1040 epochs
------------------------------------------------------------
Minibatch 295600  | loss  0.01 | err rate  0.00%
Minibatch 295700  | loss  0.01 | err rate  0.00%
Minibatch 295800  | loss  0.02 | err rate  0.00%
Minibatch 295900  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 757 | valid err rate:  2.56% | doing 1040 epochs
------------------------------------------------------------
Minibatch 296000  | loss  0.02 | err rate  0.00%
Minibatch 296100  | loss  0.02 | err rate  0.00%
Minibatch 296200  | loss  0.01 | err rate  0.00%
Minibatch 296300  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 758 | valid err rate: 

Minibatch 303900  | loss  0.04 | err rate  0.78%
Minibatch 304000  | loss  0.02 | err rate  0.00%
Minibatch 304100  | loss  0.02 | err rate  0.00%
------------------------------------------------------------
After epoch 778 | valid err rate:  2.55% | doing 1142 epochs
------------------------------------------------------------
Minibatch 304200  | loss  0.01 | err rate  0.00%
Minibatch 304300  | loss  0.01 | err rate  0.00%
Minibatch 304400  | loss  0.01 | err rate  0.00%
Minibatch 304500  | loss  0.03 | err rate  0.78%
------------------------------------------------------------
After epoch 779 | valid err rate:  2.55% | doing 1142 epochs
------------------------------------------------------------
Minibatch 304600  | loss  0.02 | err rate  0.00%
Minibatch 304700  | loss  0.02 | err rate  0.00%
Minibatch 304800  | loss  0.01 | err rate  0.00%
Minibatch 304900  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 780 | valid err rate: 

Minibatch 312500  | loss  0.02 | err rate  0.00%
Minibatch 312600  | loss  0.02 | err rate  0.00%
Minibatch 312700  | loss  0.01 | err rate  0.00%
Minibatch 312800  | loss  0.02 | err rate  0.00%
------------------------------------------------------------
After epoch 800 | valid err rate:  2.55% | doing 1142 epochs
------------------------------------------------------------
Minibatch 312900  | loss  0.01 | err rate  0.00%
Minibatch 313000  | loss  0.02 | err rate  0.00%
Minibatch 313100  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 801 | valid err rate:  2.55% | doing 1142 epochs
------------------------------------------------------------
Minibatch 313200  | loss  0.01 | err rate  0.00%
Minibatch 313300  | loss  0.01 | err rate  0.00%
Minibatch 313400  | loss  0.02 | err rate  0.78%
Minibatch 313500  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 802 | valid err rate: 

Minibatch 321100  | loss  0.01 | err rate  0.00%
Minibatch 321200  | loss  0.02 | err rate  0.00%
Minibatch 321300  | loss  0.01 | err rate  0.00%
Minibatch 321400  | loss  0.02 | err rate  0.78%
------------------------------------------------------------
After epoch 822 | valid err rate:  2.55% | doing 1142 epochs
------------------------------------------------------------
Minibatch 321500  | loss  0.01 | err rate  0.00%
Minibatch 321600  | loss  0.01 | err rate  0.00%
Minibatch 321700  | loss  0.02 | err rate  0.00%
------------------------------------------------------------
After epoch 823 | valid err rate:  2.54% | doing 1142 epochs
------------------------------------------------------------
Minibatch 321800  | loss  0.02 | err rate  0.78%
Minibatch 321900  | loss  0.02 | err rate  0.00%
Minibatch 322000  | loss  0.02 | err rate  0.00%
Minibatch 322100  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 824 | valid err rate: 

Minibatch 329700  | loss  0.03 | err rate  1.56%
Minibatch 329800  | loss  0.01 | err rate  0.00%
Minibatch 329900  | loss  0.01 | err rate  0.00%
Minibatch 330000  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 844 | valid err rate:  2.54% | doing 1244 epochs
------------------------------------------------------------
Minibatch 330100  | loss  0.02 | err rate  0.00%
Minibatch 330200  | loss  0.01 | err rate  0.00%
Minibatch 330300  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 845 | valid err rate:  2.54% | doing 1244 epochs
------------------------------------------------------------
Minibatch 330400  | loss  0.03 | err rate  0.00%
Minibatch 330500  | loss  0.03 | err rate  0.78%
Minibatch 330600  | loss  0.03 | err rate  0.78%
Minibatch 330700  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 846 | valid err rate: 

Minibatch 338300  | loss  0.01 | err rate  0.00%
Minibatch 338400  | loss  0.02 | err rate  0.00%
Minibatch 338500  | loss  0.01 | err rate  0.00%
Minibatch 338600  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 866 | valid err rate:  2.53% | doing 1244 epochs
------------------------------------------------------------
Minibatch 338700  | loss  0.01 | err rate  0.00%
Minibatch 338800  | loss  0.01 | err rate  0.00%
Minibatch 338900  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 867 | valid err rate:  2.53% | doing 1244 epochs
------------------------------------------------------------
Minibatch 339000  | loss  0.02 | err rate  0.00%
Minibatch 339100  | loss  0.01 | err rate  0.00%
Minibatch 339200  | loss  0.02 | err rate  0.00%
Minibatch 339300  | loss  0.02 | err rate  0.00%
------------------------------------------------------------
After epoch 868 | valid err rate: 

Minibatch 346900  | loss  0.02 | err rate  0.00%
Minibatch 347000  | loss  0.01 | err rate  0.00%
Minibatch 347100  | loss  0.01 | err rate  0.00%
Minibatch 347200  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 888 | valid err rate:  2.54% | doing 1244 epochs
------------------------------------------------------------
Minibatch 347300  | loss  0.01 | err rate  0.00%
Minibatch 347400  | loss  0.01 | err rate  0.00%
Minibatch 347500  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 889 | valid err rate:  2.53% | doing 1244 epochs
------------------------------------------------------------
Minibatch 347600  | loss  0.01 | err rate  0.00%
Minibatch 347700  | loss  0.01 | err rate  0.00%
Minibatch 347800  | loss  0.01 | err rate  0.00%
Minibatch 347900  | loss  0.02 | err rate  0.00%
------------------------------------------------------------
After epoch 890 | valid err rate: 

Minibatch 355500  | loss  0.05 | err rate  0.78%
Minibatch 355600  | loss  0.03 | err rate  0.00%
Minibatch 355700  | loss  0.01 | err rate  0.00%
Minibatch 355800  | loss  0.02 | err rate  0.00%
------------------------------------------------------------
After epoch 910 | valid err rate:  2.53% | doing 1244 epochs
------------------------------------------------------------
Minibatch 355900  | loss  0.02 | err rate  0.78%
Minibatch 356000  | loss  0.02 | err rate  0.00%
Minibatch 356100  | loss  0.01 | err rate  0.00%
Minibatch 356200  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 911 | valid err rate:  2.53% | doing 1244 epochs
------------------------------------------------------------
Minibatch 356300  | loss  0.01 | err rate  0.00%
Minibatch 356400  | loss  0.01 | err rate  0.00%
Minibatch 356500  | loss  0.02 | err rate  0.00%
------------------------------------------------------------
After epoch 912 | valid err rate: 

Minibatch 364100  | loss  0.01 | err rate  0.00%
Minibatch 364200  | loss  0.02 | err rate  0.00%
Minibatch 364300  | loss  0.02 | err rate  0.00%
Minibatch 364400  | loss  0.02 | err rate  0.78%
------------------------------------------------------------
After epoch 932 | valid err rate:  2.53% | doing 1244 epochs
------------------------------------------------------------
Minibatch 364500  | loss  0.01 | err rate  0.00%
Minibatch 364600  | loss  0.01 | err rate  0.00%
Minibatch 364700  | loss  0.01 | err rate  0.00%
Minibatch 364800  | loss  0.02 | err rate  0.00%
------------------------------------------------------------
After epoch 933 | valid err rate:  2.53% | doing 1244 epochs
------------------------------------------------------------
Minibatch 364900  | loss  0.02 | err rate  0.00%
Minibatch 365000  | loss  0.01 | err rate  0.00%
Minibatch 365100  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 934 | valid err rate: 

Minibatch 372700  | loss  0.01 | err rate  0.00%
Minibatch 372800  | loss  0.03 | err rate  0.00%
Minibatch 372900  | loss  0.01 | err rate  0.00%
Minibatch 373000  | loss  0.02 | err rate  0.00%
------------------------------------------------------------
After epoch 954 | valid err rate:  2.53% | doing 1244 epochs
------------------------------------------------------------
Minibatch 373100  | loss  0.02 | err rate  0.00%
Minibatch 373200  | loss  0.02 | err rate  0.78%
Minibatch 373300  | loss  0.01 | err rate  0.00%
Minibatch 373400  | loss  0.02 | err rate  0.00%
------------------------------------------------------------
After epoch 955 | valid err rate:  2.53% | doing 1244 epochs
------------------------------------------------------------
Minibatch 373500  | loss  0.02 | err rate  0.00%
Minibatch 373600  | loss  0.02 | err rate  0.00%
Minibatch 373700  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 956 | valid err rate: 

Minibatch 381300  | loss  0.02 | err rate  0.00%
Minibatch 381400  | loss  0.02 | err rate  0.00%
Minibatch 381500  | loss  0.03 | err rate  0.78%
Minibatch 381600  | loss  0.02 | err rate  0.00%
------------------------------------------------------------
After epoch 976 | valid err rate:  2.53% | doing 1244 epochs
------------------------------------------------------------
Minibatch 381700  | loss  0.02 | err rate  0.78%
Minibatch 381800  | loss  0.01 | err rate  0.00%
Minibatch 381900  | loss  0.02 | err rate  0.00%
Minibatch 382000  | loss  0.02 | err rate  0.00%
------------------------------------------------------------
After epoch 977 | valid err rate:  2.53% | doing 1244 epochs
------------------------------------------------------------
Minibatch 382100  | loss  0.01 | err rate  0.00%
Minibatch 382200  | loss  0.02 | err rate  0.00%
Minibatch 382300  | loss  0.02 | err rate  0.00%
------------------------------------------------------------
After epoch 978 | valid err rate: 

Minibatch 389900  | loss  0.01 | err rate  0.00%
Minibatch 390000  | loss  0.01 | err rate  0.00%
Minibatch 390100  | loss  0.02 | err rate  0.00%
Minibatch 390200  | loss  0.02 | err rate  0.00%
------------------------------------------------------------
After epoch 998 | valid err rate:  2.53% | doing 1244 epochs
------------------------------------------------------------
Minibatch 390300  | loss  0.01 | err rate  0.00%
Minibatch 390400  | loss  0.01 | err rate  0.00%
Minibatch 390500  | loss  0.02 | err rate  0.00%
Minibatch 390600  | loss  0.01 | err rate  0.00%
------------------------------------------------------------
After epoch 999 | valid err rate:  2.53% | doing 1244 epochs
------------------------------------------------------------
Minibatch 390700  | loss  0.01 | err rate  0.00%
Minibatch 390800  | loss  0.01 | err rate  0.00%
Minibatch 390900  | loss  0.02 | err rate  0.00%
Minibatch 391000  | loss  0.01 | err rate  0.00%
----------------------------------------------

Minibatch 398500  | loss  0.01 | err rate  0.00%
Minibatch 398600  | loss  0.01 | err rate  0.00%
Minibatch 398700  | loss  0.01 | err rate  0.00%
Minibatch 398800  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1020 | valid err rate:  2.53% | doing 1508 epochs
-------------------------------------------------------------
Minibatch 398900  | loss  0.02 | err rate  0.00%
Minibatch 399000  | loss  0.02 | err rate  0.78%
Minibatch 399100  | loss  0.01 | err rate  0.00%
Minibatch 399200  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1021 | valid err rate:  2.53% | doing 1508 epochs
-------------------------------------------------------------
Minibatch 399300  | loss  0.01 | err rate  0.00%
Minibatch 399400  | loss  0.01 | err rate  0.00%
Minibatch 399500  | loss  0.02 | err rate  0.78%
Minibatch 399600  | loss  0.02 | err rate  0.78%
----------------------------------------

Minibatch 407100  | loss  0.02 | err rate  0.00%
Minibatch 407200  | loss  0.02 | err rate  0.00%
Minibatch 407300  | loss  0.01 | err rate  0.00%
Minibatch 407400  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1042 | valid err rate:  2.53% | doing 1508 epochs
-------------------------------------------------------------
Minibatch 407500  | loss  0.01 | err rate  0.00%
Minibatch 407600  | loss  0.01 | err rate  0.00%
Minibatch 407700  | loss  0.01 | err rate  0.00%
Minibatch 407800  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1043 | valid err rate:  2.53% | doing 1508 epochs
-------------------------------------------------------------
Minibatch 407900  | loss  0.01 | err rate  0.00%
Minibatch 408000  | loss  0.02 | err rate  0.00%
Minibatch 408100  | loss  0.01 | err rate  0.00%
Minibatch 408200  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 415700  | loss  0.02 | err rate  0.00%
Minibatch 415800  | loss  0.01 | err rate  0.00%
Minibatch 415900  | loss  0.02 | err rate  0.78%
Minibatch 416000  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1064 | valid err rate:  2.52% | doing 1508 epochs
-------------------------------------------------------------
Minibatch 416100  | loss  0.01 | err rate  0.00%
Minibatch 416200  | loss  0.02 | err rate  0.00%
Minibatch 416300  | loss  0.03 | err rate  0.00%
Minibatch 416400  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1065 | valid err rate:  2.52% | doing 1508 epochs
-------------------------------------------------------------
Minibatch 416500  | loss  0.01 | err rate  0.00%
Minibatch 416600  | loss  0.01 | err rate  0.00%
Minibatch 416700  | loss  0.01 | err rate  0.00%
Minibatch 416800  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 424300  | loss  0.02 | err rate  0.00%
Minibatch 424400  | loss  0.01 | err rate  0.00%
Minibatch 424500  | loss  0.03 | err rate  0.78%
Minibatch 424600  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1086 | valid err rate:  2.52% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 424700  | loss  0.02 | err rate  0.00%
Minibatch 424800  | loss  0.01 | err rate  0.00%
Minibatch 424900  | loss  0.01 | err rate  0.00%
Minibatch 425000  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1087 | valid err rate:  2.52% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 425100  | loss  0.02 | err rate  0.00%
Minibatch 425200  | loss  0.01 | err rate  0.00%
Minibatch 425300  | loss  0.01 | err rate  0.00%
Minibatch 425400  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 432900  | loss  0.01 | err rate  0.00%
Minibatch 433000  | loss  0.02 | err rate  0.00%
Minibatch 433100  | loss  0.01 | err rate  0.00%
Minibatch 433200  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1108 | valid err rate:  2.52% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 433300  | loss  0.02 | err rate  0.00%
Minibatch 433400  | loss  0.02 | err rate  0.78%
Minibatch 433500  | loss  0.01 | err rate  0.00%
Minibatch 433600  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1109 | valid err rate:  2.52% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 433700  | loss  0.02 | err rate  0.00%
Minibatch 433800  | loss  0.04 | err rate  0.78%
Minibatch 433900  | loss  0.02 | err rate  0.00%
Minibatch 434000  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 441500  | loss  0.01 | err rate  0.00%
Minibatch 441600  | loss  0.02 | err rate  0.00%
Minibatch 441700  | loss  0.02 | err rate  0.78%
Minibatch 441800  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1130 | valid err rate:  2.52% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 441900  | loss  0.01 | err rate  0.00%
Minibatch 442000  | loss  0.02 | err rate  0.00%
Minibatch 442100  | loss  0.02 | err rate  0.78%
Minibatch 442200  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1131 | valid err rate:  2.52% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 442300  | loss  0.02 | err rate  0.00%
Minibatch 442400  | loss  0.01 | err rate  0.00%
Minibatch 442500  | loss  0.01 | err rate  0.00%
Minibatch 442600  | loss  0.02 | err rate  0.00%
----------------------------------------

Minibatch 450100  | loss  0.01 | err rate  0.00%
Minibatch 450200  | loss  0.02 | err rate  0.00%
Minibatch 450300  | loss  0.01 | err rate  0.00%
Minibatch 450400  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1152 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 450500  | loss  0.01 | err rate  0.00%
Minibatch 450600  | loss  0.01 | err rate  0.00%
Minibatch 450700  | loss  0.02 | err rate  0.00%
Minibatch 450800  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1153 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 450900  | loss  0.03 | err rate  0.00%
Minibatch 451000  | loss  0.03 | err rate  0.00%
Minibatch 451100  | loss  0.02 | err rate  0.00%
Minibatch 451200  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 458700  | loss  0.01 | err rate  0.00%
Minibatch 458800  | loss  0.03 | err rate  0.78%
Minibatch 458900  | loss  0.02 | err rate  0.78%
Minibatch 459000  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1174 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 459100  | loss  0.01 | err rate  0.00%
Minibatch 459200  | loss  0.03 | err rate  0.78%
Minibatch 459300  | loss  0.01 | err rate  0.00%
Minibatch 459400  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1175 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 459500  | loss  0.01 | err rate  0.00%
Minibatch 459600  | loss  0.01 | err rate  0.00%
Minibatch 459700  | loss  0.01 | err rate  0.00%
Minibatch 459800  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 467300  | loss  0.02 | err rate  0.00%
Minibatch 467400  | loss  0.01 | err rate  0.00%
Minibatch 467500  | loss  0.01 | err rate  0.00%
Minibatch 467600  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1196 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 467700  | loss  0.02 | err rate  0.00%
Minibatch 467800  | loss  0.01 | err rate  0.00%
Minibatch 467900  | loss  0.01 | err rate  0.00%
Minibatch 468000  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1197 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 468100  | loss  0.02 | err rate  0.00%
Minibatch 468200  | loss  0.01 | err rate  0.00%
Minibatch 468300  | loss  0.03 | err rate  0.78%
Minibatch 468400  | loss  0.02 | err rate  0.00%
----------------------------------------

Minibatch 475900  | loss  0.01 | err rate  0.00%
Minibatch 476000  | loss  0.01 | err rate  0.00%
Minibatch 476100  | loss  0.02 | err rate  0.78%
Minibatch 476200  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1218 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 476300  | loss  0.02 | err rate  0.00%
Minibatch 476400  | loss  0.02 | err rate  0.78%
Minibatch 476500  | loss  0.01 | err rate  0.00%
Minibatch 476600  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1219 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 476700  | loss  0.01 | err rate  0.00%
Minibatch 476800  | loss  0.02 | err rate  0.00%
Minibatch 476900  | loss  0.02 | err rate  0.00%
Minibatch 477000  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 484500  | loss  0.01 | err rate  0.00%
Minibatch 484600  | loss  0.01 | err rate  0.00%
Minibatch 484700  | loss  0.01 | err rate  0.00%
Minibatch 484800  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1240 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 484900  | loss  0.02 | err rate  0.00%
Minibatch 485000  | loss  0.01 | err rate  0.00%
Minibatch 485100  | loss  0.01 | err rate  0.00%
Minibatch 485200  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1241 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 485300  | loss  0.02 | err rate  0.00%
Minibatch 485400  | loss  0.01 | err rate  0.00%
Minibatch 485500  | loss  0.02 | err rate  0.00%
Minibatch 485600  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 493100  | loss  0.01 | err rate  0.00%
Minibatch 493200  | loss  0.01 | err rate  0.00%
Minibatch 493300  | loss  0.01 | err rate  0.00%
Minibatch 493400  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1262 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 493500  | loss  0.01 | err rate  0.00%
Minibatch 493600  | loss  0.01 | err rate  0.00%
Minibatch 493700  | loss  0.01 | err rate  0.00%
Minibatch 493800  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1263 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 493900  | loss  0.01 | err rate  0.00%
Minibatch 494000  | loss  0.02 | err rate  0.00%
Minibatch 494100  | loss  0.05 | err rate  0.78%
Minibatch 494200  | loss  0.02 | err rate  0.78%
----------------------------------------

Minibatch 501700  | loss  0.01 | err rate  0.00%
Minibatch 501800  | loss  0.01 | err rate  0.00%
Minibatch 501900  | loss  0.01 | err rate  0.00%
Minibatch 502000  | loss  0.00 | err rate  0.00%
-------------------------------------------------------------
After epoch 1284 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 502100  | loss  0.02 | err rate  0.00%
Minibatch 502200  | loss  0.01 | err rate  0.00%
Minibatch 502300  | loss  0.01 | err rate  0.00%
Minibatch 502400  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1285 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 502500  | loss  0.02 | err rate  0.00%
Minibatch 502600  | loss  0.02 | err rate  0.78%
Minibatch 502700  | loss  0.01 | err rate  0.00%
Minibatch 502800  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 510300  | loss  0.02 | err rate  0.00%
Minibatch 510400  | loss  0.01 | err rate  0.00%
Minibatch 510500  | loss  0.02 | err rate  0.00%
Minibatch 510600  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1306 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 510700  | loss  0.02 | err rate  0.00%
Minibatch 510800  | loss  0.01 | err rate  0.00%
Minibatch 510900  | loss  0.02 | err rate  0.00%
Minibatch 511000  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1307 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 511100  | loss  0.01 | err rate  0.00%
Minibatch 511200  | loss  0.01 | err rate  0.00%
Minibatch 511300  | loss  0.01 | err rate  0.00%
Minibatch 511400  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 518900  | loss  0.01 | err rate  0.00%
Minibatch 519000  | loss  0.01 | err rate  0.00%
Minibatch 519100  | loss  0.01 | err rate  0.00%
Minibatch 519200  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1328 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 519300  | loss  0.01 | err rate  0.00%
Minibatch 519400  | loss  0.01 | err rate  0.00%
Minibatch 519500  | loss  0.01 | err rate  0.00%
Minibatch 519600  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1329 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 519700  | loss  0.01 | err rate  0.00%
Minibatch 519800  | loss  0.01 | err rate  0.00%
Minibatch 519900  | loss  0.02 | err rate  0.00%
Minibatch 520000  | loss  0.02 | err rate  0.00%
----------------------------------------

Minibatch 527500  | loss  0.02 | err rate  0.00%
Minibatch 527600  | loss  0.03 | err rate  0.78%
Minibatch 527700  | loss  0.01 | err rate  0.00%
Minibatch 527800  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1350 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 527900  | loss  0.02 | err rate  0.78%
Minibatch 528000  | loss  0.01 | err rate  0.00%
Minibatch 528100  | loss  0.01 | err rate  0.00%
Minibatch 528200  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1351 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 528300  | loss  0.01 | err rate  0.00%
Minibatch 528400  | loss  0.02 | err rate  0.00%
Minibatch 528500  | loss  0.01 | err rate  0.00%
Minibatch 528600  | loss  0.02 | err rate  0.00%
----------------------------------------

Minibatch 536100  | loss  0.01 | err rate  0.00%
Minibatch 536200  | loss  0.01 | err rate  0.00%
Minibatch 536300  | loss  0.01 | err rate  0.00%
Minibatch 536400  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1372 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 536500  | loss  0.01 | err rate  0.00%
Minibatch 536600  | loss  0.02 | err rate  0.00%
Minibatch 536700  | loss  0.01 | err rate  0.00%
Minibatch 536800  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1373 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 536900  | loss  0.04 | err rate  1.56%
Minibatch 537000  | loss  0.02 | err rate  0.00%
Minibatch 537100  | loss  0.01 | err rate  0.00%
Minibatch 537200  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 544700  | loss  0.02 | err rate  0.00%
Minibatch 544800  | loss  0.01 | err rate  0.00%
Minibatch 544900  | loss  0.02 | err rate  0.00%
Minibatch 545000  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1394 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 545100  | loss  0.02 | err rate  0.00%
Minibatch 545200  | loss  0.01 | err rate  0.00%
Minibatch 545300  | loss  0.02 | err rate  0.00%
Minibatch 545400  | loss  0.00 | err rate  0.00%
-------------------------------------------------------------
After epoch 1395 | valid err rate:  2.51% | doing 1600 epochs
-------------------------------------------------------------
Minibatch 545500  | loss  0.01 | err rate  0.00%
Minibatch 545600  | loss  0.01 | err rate  0.00%
Minibatch 545700  | loss  0.01 | err rate  0.00%
Minibatch 545800  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 553300  | loss  0.03 | err rate  0.00%
Minibatch 553400  | loss  0.01 | err rate  0.00%
Minibatch 553500  | loss  0.02 | err rate  0.78%
Minibatch 553600  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1416 | valid err rate:  2.51% | doing 2110 epochs
-------------------------------------------------------------
Minibatch 553700  | loss  0.02 | err rate  0.78%
Minibatch 553800  | loss  0.02 | err rate  0.00%
Minibatch 553900  | loss  0.02 | err rate  0.00%
Minibatch 554000  | loss  0.03 | err rate  0.00%
-------------------------------------------------------------
After epoch 1417 | valid err rate:  2.50% | doing 2110 epochs
-------------------------------------------------------------
Minibatch 554100  | loss  0.02 | err rate  0.00%
Minibatch 554200  | loss  0.03 | err rate  0.00%
Minibatch 554300  | loss  0.02 | err rate  0.00%
Minibatch 554400  | loss  0.02 | err rate  0.00%
----------------------------------------

Minibatch 561900  | loss  0.01 | err rate  0.00%
Minibatch 562000  | loss  0.02 | err rate  0.00%
Minibatch 562100  | loss  0.01 | err rate  0.00%
Minibatch 562200  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1438 | valid err rate:  2.50% | doing 2110 epochs
-------------------------------------------------------------
Minibatch 562300  | loss  0.02 | err rate  0.00%
Minibatch 562400  | loss  0.03 | err rate  0.00%
Minibatch 562500  | loss  0.01 | err rate  0.00%
Minibatch 562600  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1439 | valid err rate:  2.50% | doing 2110 epochs
-------------------------------------------------------------
Minibatch 562700  | loss  0.01 | err rate  0.00%
Minibatch 562800  | loss  0.01 | err rate  0.00%
Minibatch 562900  | loss  0.01 | err rate  0.00%
Minibatch 563000  | loss  0.02 | err rate  0.00%
----------------------------------------

Minibatch 570500  | loss  0.02 | err rate  0.00%
Minibatch 570600  | loss  0.02 | err rate  0.00%
Minibatch 570700  | loss  0.02 | err rate  0.00%
Minibatch 570800  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1460 | valid err rate:  2.50% | doing 2110 epochs
-------------------------------------------------------------
Minibatch 570900  | loss  0.01 | err rate  0.00%
Minibatch 571000  | loss  0.02 | err rate  0.00%
Minibatch 571100  | loss  0.03 | err rate  0.78%
Minibatch 571200  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1461 | valid err rate:  2.50% | doing 2110 epochs
-------------------------------------------------------------
Minibatch 571300  | loss  0.02 | err rate  0.00%
Minibatch 571400  | loss  0.01 | err rate  0.00%
Minibatch 571500  | loss  0.01 | err rate  0.00%
Minibatch 571600  | loss  0.02 | err rate  0.00%
----------------------------------------

Minibatch 579100  | loss  0.01 | err rate  0.00%
Minibatch 579200  | loss  0.02 | err rate  0.00%
Minibatch 579300  | loss  0.02 | err rate  0.00%
Minibatch 579400  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1482 | valid err rate:  2.50% | doing 2110 epochs
-------------------------------------------------------------
Minibatch 579500  | loss  0.01 | err rate  0.00%
Minibatch 579600  | loss  0.02 | err rate  0.00%
Minibatch 579700  | loss  0.02 | err rate  0.00%
Minibatch 579800  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1483 | valid err rate:  2.50% | doing 2110 epochs
-------------------------------------------------------------
Minibatch 579900  | loss  0.02 | err rate  0.00%
Minibatch 580000  | loss  0.02 | err rate  0.00%
Minibatch 580100  | loss  0.01 | err rate  0.00%
Minibatch 580200  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 587700  | loss  0.02 | err rate  0.00%
Minibatch 587800  | loss  0.01 | err rate  0.00%
Minibatch 587900  | loss  0.01 | err rate  0.00%
Minibatch 588000  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1504 | valid err rate:  2.50% | doing 2110 epochs
-------------------------------------------------------------
Minibatch 588100  | loss  0.03 | err rate  1.56%
Minibatch 588200  | loss  0.01 | err rate  0.00%
Minibatch 588300  | loss  0.02 | err rate  0.00%
Minibatch 588400  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1505 | valid err rate:  2.50% | doing 2110 epochs
-------------------------------------------------------------
Minibatch 588500  | loss  0.01 | err rate  0.00%
Minibatch 588600  | loss  0.02 | err rate  0.00%
Minibatch 588700  | loss  0.01 | err rate  0.00%
Minibatch 588800  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 596300  | loss  0.02 | err rate  0.00%
Minibatch 596400  | loss  0.01 | err rate  0.00%
Minibatch 596500  | loss  0.01 | err rate  0.00%
Minibatch 596600  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1526 | valid err rate:  2.50% | doing 2110 epochs
-------------------------------------------------------------
Minibatch 596700  | loss  0.02 | err rate  0.00%
Minibatch 596800  | loss  0.02 | err rate  0.00%
Minibatch 596900  | loss  0.02 | err rate  0.00%
Minibatch 597000  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1527 | valid err rate:  2.50% | doing 2110 epochs
-------------------------------------------------------------
Minibatch 597100  | loss  0.02 | err rate  0.00%
Minibatch 597200  | loss  0.01 | err rate  0.00%
Minibatch 597300  | loss  0.01 | err rate  0.00%
Minibatch 597400  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 604900  | loss  0.01 | err rate  0.00%
Minibatch 605000  | loss  0.02 | err rate  0.00%
Minibatch 605100  | loss  0.02 | err rate  0.00%
Minibatch 605200  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1548 | valid err rate:  2.49% | doing 2318 epochs
-------------------------------------------------------------
Minibatch 605300  | loss  0.01 | err rate  0.00%
Minibatch 605400  | loss  0.02 | err rate  0.00%
Minibatch 605500  | loss  0.02 | err rate  0.78%
Minibatch 605600  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1549 | valid err rate:  2.49% | doing 2318 epochs
-------------------------------------------------------------
Minibatch 605700  | loss  0.01 | err rate  0.00%
Minibatch 605800  | loss  0.02 | err rate  0.78%
Minibatch 605900  | loss  0.02 | err rate  0.78%
Minibatch 606000  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 613500  | loss  0.01 | err rate  0.00%
Minibatch 613600  | loss  0.01 | err rate  0.00%
Minibatch 613700  | loss  0.02 | err rate  0.00%
Minibatch 613800  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1570 | valid err rate:  2.49% | doing 2330 epochs
-------------------------------------------------------------
Minibatch 613900  | loss  0.01 | err rate  0.00%
Minibatch 614000  | loss  0.01 | err rate  0.00%
Minibatch 614100  | loss  0.01 | err rate  0.00%
Minibatch 614200  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1571 | valid err rate:  2.49% | doing 2330 epochs
-------------------------------------------------------------
Minibatch 614300  | loss  0.01 | err rate  0.00%
Minibatch 614400  | loss  0.01 | err rate  0.00%
Minibatch 614500  | loss  0.01 | err rate  0.00%
Minibatch 614600  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 622100  | loss  0.01 | err rate  0.00%
Minibatch 622200  | loss  0.01 | err rate  0.00%
Minibatch 622300  | loss  0.01 | err rate  0.00%
Minibatch 622400  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1592 | valid err rate:  2.48% | doing 2330 epochs
-------------------------------------------------------------
Minibatch 622500  | loss  0.02 | err rate  0.00%
Minibatch 622600  | loss  0.02 | err rate  0.00%
Minibatch 622700  | loss  0.01 | err rate  0.00%
Minibatch 622800  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1593 | valid err rate:  2.48% | doing 2330 epochs
-------------------------------------------------------------
Minibatch 622900  | loss  0.01 | err rate  0.00%
Minibatch 623000  | loss  0.01 | err rate  0.00%
Minibatch 623100  | loss  0.01 | err rate  0.00%
Minibatch 623200  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 630700  | loss  0.01 | err rate  0.00%
Minibatch 630800  | loss  0.01 | err rate  0.00%
Minibatch 630900  | loss  0.01 | err rate  0.00%
Minibatch 631000  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1614 | valid err rate:  2.48% | doing 2330 epochs
-------------------------------------------------------------
Minibatch 631100  | loss  0.01 | err rate  0.00%
Minibatch 631200  | loss  0.01 | err rate  0.00%
Minibatch 631300  | loss  0.02 | err rate  0.00%
Minibatch 631400  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1615 | valid err rate:  2.49% | doing 2330 epochs
-------------------------------------------------------------
Minibatch 631500  | loss  0.01 | err rate  0.00%
Minibatch 631600  | loss  0.02 | err rate  0.00%
Minibatch 631700  | loss  0.03 | err rate  0.78%
Minibatch 631800  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 639300  | loss  0.01 | err rate  0.00%
Minibatch 639400  | loss  0.01 | err rate  0.00%
Minibatch 639500  | loss  0.01 | err rate  0.00%
Minibatch 639600  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1636 | valid err rate:  2.48% | doing 2330 epochs
-------------------------------------------------------------
Minibatch 639700  | loss  0.01 | err rate  0.00%
Minibatch 639800  | loss  0.01 | err rate  0.00%
Minibatch 639900  | loss  0.01 | err rate  0.00%
Minibatch 640000  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1637 | valid err rate:  2.48% | doing 2330 epochs
-------------------------------------------------------------
Minibatch 640100  | loss  0.01 | err rate  0.00%
Minibatch 640200  | loss  0.02 | err rate  0.00%
Minibatch 640300  | loss  0.02 | err rate  0.00%
Minibatch 640400  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 647900  | loss  0.01 | err rate  0.00%
Minibatch 648000  | loss  0.02 | err rate  0.00%
Minibatch 648100  | loss  0.02 | err rate  0.00%
Minibatch 648200  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1658 | valid err rate:  2.48% | doing 2330 epochs
-------------------------------------------------------------
Minibatch 648300  | loss  0.01 | err rate  0.00%
Minibatch 648400  | loss  0.01 | err rate  0.00%
Minibatch 648500  | loss  0.01 | err rate  0.00%
Minibatch 648600  | loss  0.03 | err rate  1.56%
-------------------------------------------------------------
After epoch 1659 | valid err rate:  2.48% | doing 2330 epochs
-------------------------------------------------------------
Minibatch 648700  | loss  0.01 | err rate  0.00%
Minibatch 648800  | loss  0.01 | err rate  0.00%
Minibatch 648900  | loss  0.01 | err rate  0.00%
Minibatch 649000  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 656500  | loss  0.01 | err rate  0.00%
Minibatch 656600  | loss  0.01 | err rate  0.00%
Minibatch 656700  | loss  0.02 | err rate  0.00%
Minibatch 656800  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1680 | valid err rate:  2.48% | doing 2330 epochs
-------------------------------------------------------------
Minibatch 656900  | loss  0.01 | err rate  0.00%
Minibatch 657000  | loss  0.00 | err rate  0.00%
Minibatch 657100  | loss  0.01 | err rate  0.00%
Minibatch 657200  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1681 | valid err rate:  2.48% | doing 2330 epochs
-------------------------------------------------------------
Minibatch 657300  | loss  0.01 | err rate  0.00%
Minibatch 657400  | loss  0.02 | err rate  0.78%
Minibatch 657500  | loss  0.01 | err rate  0.00%
Minibatch 657600  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 665100  | loss  0.01 | err rate  0.00%
Minibatch 665200  | loss  0.01 | err rate  0.00%
Minibatch 665300  | loss  0.01 | err rate  0.00%
Minibatch 665400  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1702 | valid err rate:  2.48% | doing 2330 epochs
-------------------------------------------------------------
Minibatch 665500  | loss  0.01 | err rate  0.00%
Minibatch 665600  | loss  0.01 | err rate  0.00%
Minibatch 665700  | loss  0.02 | err rate  0.00%
Minibatch 665800  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1703 | valid err rate:  2.48% | doing 2330 epochs
-------------------------------------------------------------
Minibatch 665900  | loss  0.01 | err rate  0.00%
Minibatch 666000  | loss  0.02 | err rate  0.00%
Minibatch 666100  | loss  0.01 | err rate  0.00%
Minibatch 666200  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 673700  | loss  0.01 | err rate  0.00%
Minibatch 673800  | loss  0.01 | err rate  0.00%
Minibatch 673900  | loss  0.01 | err rate  0.00%
Minibatch 674000  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1724 | valid err rate:  2.47% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 674100  | loss  0.01 | err rate  0.00%
Minibatch 674200  | loss  0.01 | err rate  0.00%
Minibatch 674300  | loss  0.01 | err rate  0.00%
Minibatch 674400  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1725 | valid err rate:  2.47% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 674500  | loss  0.01 | err rate  0.00%
Minibatch 674600  | loss  0.01 | err rate  0.00%
Minibatch 674700  | loss  0.01 | err rate  0.00%
Minibatch 674800  | loss  0.02 | err rate  0.00%
----------------------------------------

Minibatch 682300  | loss  0.01 | err rate  0.00%
Minibatch 682400  | loss  0.02 | err rate  0.00%
Minibatch 682500  | loss  0.03 | err rate  0.78%
Minibatch 682600  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1746 | valid err rate:  2.47% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 682700  | loss  0.01 | err rate  0.00%
Minibatch 682800  | loss  0.02 | err rate  0.00%
Minibatch 682900  | loss  0.01 | err rate  0.00%
Minibatch 683000  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1747 | valid err rate:  2.47% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 683100  | loss  0.01 | err rate  0.00%
Minibatch 683200  | loss  0.01 | err rate  0.00%
Minibatch 683300  | loss  0.02 | err rate  0.78%
Minibatch 683400  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 690900  | loss  0.01 | err rate  0.00%
Minibatch 691000  | loss  0.02 | err rate  0.00%
Minibatch 691100  | loss  0.01 | err rate  0.00%
Minibatch 691200  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1768 | valid err rate:  2.47% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 691300  | loss  0.01 | err rate  0.00%
Minibatch 691400  | loss  0.01 | err rate  0.00%
Minibatch 691500  | loss  0.01 | err rate  0.00%
Minibatch 691600  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1769 | valid err rate:  2.47% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 691700  | loss  0.01 | err rate  0.00%
Minibatch 691800  | loss  0.01 | err rate  0.00%
Minibatch 691900  | loss  0.01 | err rate  0.00%
Minibatch 692000  | loss  0.01 | err rate  0.00%
----------------------------------------

Minibatch 699600  | loss  0.02 | err rate  0.00%
Minibatch 699700  | loss  0.01 | err rate  0.00%
Minibatch 699800  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1790 | valid err rate:  2.47% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 699900  | loss  0.03 | err rate  0.00%
Minibatch 700000  | loss  0.01 | err rate  0.00%
Minibatch 700100  | loss  0.01 | err rate  0.00%
Minibatch 700200  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1791 | valid err rate:  2.48% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 700300  | loss  0.01 | err rate  0.00%
Minibatch 700400  | loss  0.01 | err rate  0.00%
Minibatch 700500  | loss  0.01 | err rate  0.00%
Minibatch 700600  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1792 | valid er

Minibatch 708200  | loss  0.01 | err rate  0.00%
Minibatch 708300  | loss  0.01 | err rate  0.00%
Minibatch 708400  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1812 | valid err rate:  2.48% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 708500  | loss  0.02 | err rate  0.00%
Minibatch 708600  | loss  0.01 | err rate  0.00%
Minibatch 708700  | loss  0.02 | err rate  0.00%
Minibatch 708800  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1813 | valid err rate:  2.48% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 708900  | loss  0.01 | err rate  0.00%
Minibatch 709000  | loss  0.01 | err rate  0.00%
Minibatch 709100  | loss  0.01 | err rate  0.00%
Minibatch 709200  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1814 | valid er

Minibatch 716800  | loss  0.01 | err rate  0.00%
Minibatch 716900  | loss  0.02 | err rate  0.00%
Minibatch 717000  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1834 | valid err rate:  2.48% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 717100  | loss  0.02 | err rate  0.78%
Minibatch 717200  | loss  0.01 | err rate  0.00%
Minibatch 717300  | loss  0.02 | err rate  0.00%
Minibatch 717400  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1835 | valid err rate:  2.48% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 717500  | loss  0.01 | err rate  0.00%
Minibatch 717600  | loss  0.01 | err rate  0.00%
Minibatch 717700  | loss  0.01 | err rate  0.00%
Minibatch 717800  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1836 | valid er

Minibatch 725400  | loss  0.01 | err rate  0.00%
Minibatch 725500  | loss  0.01 | err rate  0.00%
Minibatch 725600  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1856 | valid err rate:  2.48% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 725700  | loss  0.01 | err rate  0.00%
Minibatch 725800  | loss  0.01 | err rate  0.00%
Minibatch 725900  | loss  0.01 | err rate  0.00%
Minibatch 726000  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1857 | valid err rate:  2.48% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 726100  | loss  0.01 | err rate  0.00%
Minibatch 726200  | loss  0.01 | err rate  0.00%
Minibatch 726300  | loss  0.02 | err rate  0.00%
Minibatch 726400  | loss  0.02 | err rate  0.78%
-------------------------------------------------------------
After epoch 1858 | valid er

Minibatch 734000  | loss  0.01 | err rate  0.00%
Minibatch 734100  | loss  0.01 | err rate  0.00%
Minibatch 734200  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1878 | valid err rate:  2.48% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 734300  | loss  0.01 | err rate  0.00%
Minibatch 734400  | loss  0.01 | err rate  0.00%
Minibatch 734500  | loss  0.01 | err rate  0.00%
Minibatch 734600  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1879 | valid err rate:  2.48% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 734700  | loss  0.01 | err rate  0.00%
Minibatch 734800  | loss  0.03 | err rate  0.78%
Minibatch 734900  | loss  0.01 | err rate  0.00%
Minibatch 735000  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1880 | valid er

Minibatch 742600  | loss  0.01 | err rate  0.00%
Minibatch 742700  | loss  0.01 | err rate  0.00%
Minibatch 742800  | loss  0.03 | err rate  0.00%
Minibatch 742900  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1900 | valid err rate:  2.48% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 743000  | loss  0.02 | err rate  0.00%
Minibatch 743100  | loss  0.01 | err rate  0.00%
Minibatch 743200  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1901 | valid err rate:  2.48% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 743300  | loss  0.02 | err rate  0.00%
Minibatch 743400  | loss  0.03 | err rate  0.78%
Minibatch 743500  | loss  0.01 | err rate  0.00%
Minibatch 743600  | loss  0.02 | err rate  0.00%
-------------------------------------------------------------
After epoch 1902 | valid er

Minibatch 751200  | loss  0.01 | err rate  0.00%
Minibatch 751300  | loss  0.01 | err rate  0.00%
Minibatch 751400  | loss  0.01 | err rate  0.00%
Minibatch 751500  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1922 | valid err rate:  2.48% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 751600  | loss  0.01 | err rate  0.00%
Minibatch 751700  | loss  0.01 | err rate  0.00%
Minibatch 751800  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1923 | valid err rate:  2.48% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 751900  | loss  0.02 | err rate  0.00%
Minibatch 752000  | loss  0.02 | err rate  0.00%
Minibatch 752100  | loss  0.01 | err rate  0.00%
Minibatch 752200  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1924 | valid er

Minibatch 759800  | loss  0.01 | err rate  0.00%
Minibatch 759900  | loss  0.02 | err rate  0.78%
Minibatch 760000  | loss  0.03 | err rate  0.78%
Minibatch 760100  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1944 | valid err rate:  2.48% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 760200  | loss  0.01 | err rate  0.00%
Minibatch 760300  | loss  0.02 | err rate  0.00%
Minibatch 760400  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1945 | valid err rate:  2.48% | doing 2578 epochs
-------------------------------------------------------------
Minibatch 760500  | loss  0.02 | err rate  0.00%
Minibatch 760600  | loss  0.01 | err rate  0.00%
Minibatch 760700  | loss  0.01 | err rate  0.00%
Minibatch 760800  | loss  0.01 | err rate  0.00%
-------------------------------------------------------------
After epoch 1946 | valid er

# Problem 3: Dropout [2p]

Implement a **dropout** layer and try to train a
network getting below 1.5% test error rates with dropout. The best
results with dropout are below 1%!

Remember to turn off dropout during testing, using `model.train_mode()` and `model.eval_mode()`!

Hint: Use [torch.nn.functional.dropout](http://pytorch.org/docs/master/nn.html#torch.nn.functional.dropout).

Details: http://arxiv.org/pdf/1207.0580.pdf.

# Problem 4: Data Augmentation [1p]

Apply data augmentation methods (e.g. rotations, noise, crops) when training networks on MNIST, to significantly reduce test error rate for your network. You can use functions from the [torchvision.transforms](http://pytorch.org/docs/master/torchvision/transforms.html) module.

# Problem 5: Batch Normalization [1p]

*Covariate shift* is a phenomenon associated with training deep models. Simply put, weight changes in early layers cause major changes in distribution of inputs to later layers, making it difficult to train later layers.

[Batch Normalization](https://arxiv.org/abs/1502.03167) addresses this problem by normalizing distributions of inputs to layers within mini-batches. It typically allows to train networks faster and/or with higher learning rates, lessens the importance
of initialization and might eliminate the need for Dropout.

Implement Batch Normalization and compare with regular training of MNIST models.

Remember to use the batch statistics during model training and to use an average of training batch statistics during model evaluation. For details please consult the paper.

# Problem 6: Norm Constraints [1p bonus]

Implement norm constraints, i.e. instead of weight decay, that tries to set 
all weights to small values, apply a limit on the total
norm of connections incoming to a neuron. In our case, this
corresponds to clipping the norm of *rows* of weight
matrices. An easy way of implementing it is to make a gradient
step, then look at the norm of rows and scale down those that are
over the threshold (this technique is called "projected gradient descent").

Please consult the Dropout paper (http://arxiv.org/pdf/1207.0580.pdf) for details.

# Problem 6: Polyak Averaging [1p bonus]

Implement Polyak averaging. For each parameter $\theta$
keep a separate, exponentially decayed average of the past values
$$
\bar{\theta}_n = \alpha_p\bar{\theta}_{n-1} + (1-\alpha_p)\theta_n.
$$
Use that average when evaluating the model on the test set.
Validate the approach by training a model on the MNIST dataset.

# Problem 7: Convolutional Network [2p bonus]

Use convolutional and max-pooling layers (`torch.nn.functional.conv2d`, `torch.nn.functional.max_pool2d`) and (without dropout) get a test error rate below 1.5%.