# MNIST Image Classification using CNN inserted with Inception Modules in Pytorch - 
### Toby Moreno

Traditional ANN and classic machine learning classifiers could only push the accuracy of correctly predicting MNIST images to around 97-98%.  However, with just 12 epochs and the similar hyper-parametrs, Convolutional Neueral Network (CNN) constructed with inception modules, could overcome that limit with an accuracy closely breaking the __99%__ accuracy.  Test set: Average loss: 0.0415, Accuracy: [9876/10000, 9888/10000]

## CNN Architecture

CNNs use a variation of multilayer perceptrons designed to require minimal preprocessing.[1] They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on their shared-weights architecture and translation invariance characteristics ...

CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns the filters that in traditional algorithms were hand-engineered. This independence from prior knowledge and human effort in feature design is a major advantage.- Wikipedia


![](http://parse.ele.tue.nl/cluster/2/CNNArchitecture.jpg)

Credit Source:


> Unlike traditional Neural Networks where all layers are fully-meshed, CNNs are much efficient because it uses a variation of partially-meshed multilayer perceptrons designed to require minimal preprocessing. Also the manual, time-consuming, and very subjective feature engineering of the images typically associated in traditional NN designs doesn’t apply to CNN because they are built-in the training process.  And finally, CNN’s convolved features of low-dimension matrices, the by-product output weights of the middle layers, are spatial invariant.  This is a significant because entities, texts, or objects could be tagged similar regardless of size, orientation, and spatial position in the image.

In [29]:
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

import numpy as np

## Download MNIST images

For the image dataset, Pytorch has a toolkit to download MNIST data for training and testing.  Inside the package torchvision, import datasets and set the download parameter to True.  It has a built-in mechanism to transform also the data and target to tensor data structures

In [2]:
# Training settings
batch_size = 64

# MNIST Dataset
train_dataset = datasets.MNIST(root='./mnist_data/',
                               train=True,
                               transform=transforms.ToTensor(),
                               download=True)


test_dataset = datasets.MNIST(root='./mnist_data/',
                              train=False,
                              transform=transforms.ToTensor())


![](https://github.com/vdumoulin/conv_arithmetic)


![](https://tobymoreno.github.io/images/cnn_mnist.png)


CNN network architectures are commonly deep (see above). But deep networks are prone to overfitting difficult to pass gradient updates through the entire network.  Additionally, just stacking large convolution operations is computationally expensive.  Researches in Google and South Carolina came up with a wider network where filters with multiple sizes operate on the same level.  For example, the below image is the “naive” inception module. It performs convolution on an input, with 3 different sizes of filters (1x1, 3x3, 5x5). Additionally, max pooling is also performed. The outputs are concatenated and sent to the next inception module.

Paper: https://arxiv.org/pdf/1409.4842v1.pdf

## Modified CNN with inception Modules 

![](https://tobymoreno.github.io/images/inception_modules.001.png)

Source: https://www.youtube.com/watch?v=VxhSouuSZDY

Let's look at the sample data in the image

In [3]:
train_dataset

Dataset MNIST
    Number of datapoints: 60000
    Split: train
    Root Location: ./mnist_data/
    Transforms (if any): ToTensor()
    Target Transforms (if any): None

In [4]:
test_dataset

Dataset MNIST
    Number of datapoints: 10000
    Split: test
    Root Location: ./mnist_data/
    Transforms (if any): ToTensor()
    Target Transforms (if any): None

In [5]:
ls

Classify Code Base-Copy1.ipynb         Test2.ipynb
Classify Code Base.ipynb               code_analysis-Copy1.ipynb
Image Classification using CNN.ipynb   code_analysis.ipynb
MNIST Pytorch.ipynb                    [34mdata[m[m/
Pytorch Machine Learning Primer.ipynb  file type classification.ipynb
Softmax Loss.ipynb                     [34mmnist_data[m[m/
Test.ipynb


## DataLoader

DataLoader is pytorch's framework to standardized and scale up the training and testing process.

As long as your dataset conforms to a certain standard interface.. it could enumerate and perform automatic shuffling to fetch the next record in the dataset

You shall implement the following interface:

__getitem__ and __len__



In [6]:
# Data Loader (Input Pipeline)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

### Basic CNN Module without Inception

In [7]:
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.mp = nn.MaxPool2d(2)
        self.fc = nn.Linear(320, 10)

    def forward(self, x):
        in_size = x.size(0)
        x = F.relu(self.mp(self.conv1(x)))
        x = F.relu(self.mp(self.conv2(x)))
        x = x.view(in_size, -1)  # flatten the tensor
        x = self.fc(x)
        return F.log_softmax(x)


model = Net()

optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)

#### Network Summary for Inception (CNN)

```
Net(
  (conv1): Conv2d(1, 10, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(88, 20, kernel_size=(5, 5), stride=(1, 1))
  (incept1): InceptionA(
    (branch1x1): Conv2d(10, 16, kernel_size=(1, 1), stride=(1, 1))
    (branch5x5_1): Conv2d(10, 16, kernel_size=(1, 1), stride=(1, 1))
    (branch5x5_2): Conv2d(16, 24, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (branch3x3dbl_1): Conv2d(10, 16, kernel_size=(1, 1), stride=(1, 1))
    (branch3x3dbl_2): Conv2d(16, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (branch3x3dbl_3): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (branch_pool): Conv2d(10, 24, kernel_size=(1, 1), stride=(1, 1))
  )
  (incept2): InceptionA(
    (branch1x1): Conv2d(20, 16, kernel_size=(1, 1), stride=(1, 1))
    (branch5x5_1): Conv2d(20, 16, kernel_size=(1, 1), stride=(1, 1))
    (branch5x5_2): Conv2d(16, 24, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (branch3x3dbl_1): Conv2d(20, 16, kernel_size=(1, 1), stride=(1, 1))
    (branch3x3dbl_2): Conv2d(16, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (branch3x3dbl_3): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (branch_pool): Conv2d(20, 24, kernel_size=(1, 1), stride=(1, 1))
  )
  (mp): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc): Linear(in_features=1408, out_features=10, bias=True)
)
```


In [27]:
# https://github.com/pytorch/examples/blob/master/mnist/main.py

class InceptionA(nn.Module):

    def __init__(self, in_channels):
        super(InceptionA, self).__init__()
        self.branch1x1 = nn.Conv2d(in_channels, 16, kernel_size=1)

        self.branch5x5_1 = nn.Conv2d(in_channels, 16, kernel_size=1)
        self.branch5x5_2 = nn.Conv2d(16, 24, kernel_size=5, padding=2)

        self.branch3x3dbl_1 = nn.Conv2d(in_channels, 16, kernel_size=1)
        self.branch3x3dbl_2 = nn.Conv2d(16, 24, kernel_size=3, padding=1)
        self.branch3x3dbl_3 = nn.Conv2d(24, 24, kernel_size=3, padding=1)

        self.branch_pool = nn.Conv2d(in_channels, 24, kernel_size=1)

    def forward(self, x):
        branch1x1 = self.branch1x1(x)

        branch5x5 = self.branch5x5_1(x)
        branch5x5 = self.branch5x5_2(branch5x5)

        branch3x3dbl = self.branch3x3dbl_1(x)
        branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl)
        branch3x3dbl = self.branch3x3dbl_3(branch3x3dbl)

        branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1)
        branch_pool = self.branch_pool(branch_pool)

        outputs = [branch1x1, branch5x5, branch3x3dbl, branch_pool]
        return torch.cat(outputs, 1)


![](https://tobymoreno.github.io/images/codemap.png)

Source: https://docs.google.com/presentation/d/1MJxGye-VZVoQfN1yFJlwwgcXxhn0jWB19ZxYYg8iKxY/edit#slide=id.g2901a5d332_174_275

In [28]:
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(88, 20, kernel_size=5)

        self.incept1 = InceptionA(in_channels=10)
        self.incept2 = InceptionA(in_channels=20)

        self.mp = nn.MaxPool2d(2)
        self.fc = nn.Linear(1408, 10)

    def forward(self, x):
        in_size = x.size(0)
        x = F.relu(self.mp(self.conv1(x)))
        x = self.incept1(x)
        x = F.relu(self.mp(self.conv2(x)))
        x = self.incept2(x)
        x = x.view(in_size, -1)  # flatten the tensor
        x = self.fc(x)
        return F.log_softmax(x)


model = Net()

optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)

In [26]:
def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))


def test():
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        data, target = Variable(data, volatile=True), Variable(target)
        output = model(data)
        # sum up batch loss
        test_loss += F.nll_loss(output, target, size_average=False).item()
        # get the index of the max log-probability
        pred = output.data.max(1, keepdim=True)[1]
        correct += pred.eq(target.data.view_as(pred)).cpu().sum()

    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))


Run code below couple of times: 4 to 5
```
Train Epoch: 8 [56960/60000 (95%)]	Loss: 0.031828
Train Epoch: 8 [57600/60000 (96%)]	Loss: 0.005347
Train Epoch: 8 [58240/60000 (97%)]	Loss: 0.006505
Train Epoch: 8 [58880/60000 (98%)]	Loss: 0.003654
Train Epoch: 8 [59520/60000 (99%)]	Loss: 0.006652

Test set: Average loss: 0.0413, Accuracy: 9888/10000 (98.88%)
```

In [23]:
for epoch in range(1, 10):
    train(epoch)
    test()








Test set: Average loss: 0.0426, Accuracy: 9877/10000 (98%)


Test set: Average loss: 0.0407, Accuracy: 9881/10000 (98%)




Test set: Average loss: 0.0442, Accuracy: 9874/10000 (98%)


Test set: Average loss: 0.0401, Accuracy: 9887/10000 (98%)




Test set: Average loss: 0.0422, Accuracy: 9875/10000 (98%)


Test set: Average loss: 0.0460, Accuracy: 9869/10000 (98%)




Test set: Average loss: 0.0417, Accuracy: 9886/10000 (98%)




Test set: Average loss: 0.0413, Accuracy: 9888/10000 (98%)


Test set: Average loss: 0.0453, Accuracy: 9875/10000 (98%)

