### TRAINING A CNN ON MNIST DATASET

Convolutional neural networks (CNNs) are a class of neural networks that have convolutional layers. CNNs are particularly effective for data that have spatial structures and correlations (e.g. images). A multilayer perceptron (MLP) is entirely composed of fully connected layers, which are each a matrix multiply operation (and addition of a bias) followed by a non-linearity (e.g. sigmoid, ReLU). A convolutional layer is similar, except the matrix multiply operation is replaced with a convolution operation (in practice a cross-correlation). Note that a CNN need not be entirely composed of convolutional layers; in fact, many popular CNN architectures end in fully connected layers.

In [3]:
#import the important libraries
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from tqdm.notebook import tqdm, trange


In [16]:
#Load the Data
mnist_train = datasets.MNIST(root="/home/udit/MNIST/datasets", train=True, transform=transforms.ToTensor(), download=False)
mnist_test = datasets.MNIST(root="/home/udit/MNIST/datasets", train=False, transform=transforms.ToTensor(), download=False)
train_loader = torch.utils.data.DataLoader(mnist_train, batch_size=100, shuffle=True)
test_loader = torch.utils.data.DataLoader(mnist_test, batch_size=100, shuffle=False)

### BUILDING A CUSTOM CNN
we'll use the following CNN as our classifier: 5×5 convolution -> 2×2 max pool -> 5×5 convolution -> 2×2 max pool -> fully connected to ℝ256 -> fully connected to ℝ10 (prediction). ReLU activation functions will be used to impose non-linearities. Remember, convolutions produce 4-D outputs, and fully connected layers expect 2-D inputs, so tensors must be reshaped when transitioning from one to the other.

In [19]:
class MNIST_CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1,32, kernel_size=5, padding=2)
        self.conv2 = nn.Conv2d(32,64, kernel_size=5, padding=2)
        self.fc1 = nn.Linear(7*7*64, 256)
        self.fc2 = nn.Linear(256,10)
        
    def forward(self, x):
        #convlayer1
        x = self.conv1(x)
        x = F.relu(x)
        x = F.max_pool2d(x, kernel_size=2)

        #convlayer2
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, kernel_size=2)
        
        #fclayer1
        x = x.view(-1, 7*7*64)
        x = self.fc1(x)
        x = F.relu(x)
        
        #fclayer2
        x = self.fc2(x)
        
        return x
        

In [21]:
#looking at the model parameters
model = MNIST_CNN()
print(model)


MNIST_CNN(
  (conv1): Conv2d(1, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (conv2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (fc1): Linear(in_features=3136, out_features=256, bias=True)
  (fc2): Linear(in_features=256, out_features=10, bias=True)
)


### TRAINING

In [22]:
# Loss and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)  # <---- change here

In [23]:
#Iterate through a mini batch
for epoch in trange(10):
    for images, labels in tqdm(train_loader):
        #Zero out the gradients
        optimizer.zero_grad()
        
        #Forward Pass
        x = images
        y = model(x)
        loss = criterion(y, labels)
        #Backward Pass
        loss.backward()
        optimizer.step()

HBox(children=(IntProgress(value=0, max=10), HTML(value='')))

HBox(children=(IntProgress(value=0, max=600), HTML(value='')))




HBox(children=(IntProgress(value=0, max=600), HTML(value='')))




HBox(children=(IntProgress(value=0, max=600), HTML(value='')))




HBox(children=(IntProgress(value=0, max=600), HTML(value='')))




HBox(children=(IntProgress(value=0, max=600), HTML(value='')))




HBox(children=(IntProgress(value=0, max=600), HTML(value='')))




HBox(children=(IntProgress(value=0, max=600), HTML(value='')))




HBox(children=(IntProgress(value=0, max=600), HTML(value='')))




HBox(children=(IntProgress(value=0, max=600), HTML(value='')))




HBox(children=(IntProgress(value=0, max=600), HTML(value='')))





### TESTING

In [29]:
correct = 0
total = len(mnist_test)

with torch.no_grad():
    # Iterate through test minibatches
    for images, labels in tqdm(test_loader):
        #Forward Pass
        x = images
        y = model(x)
        
        predictions = torch.argmax(y, dim=1)
        correct+=torch.sum((predictions==labels).float())
        
print("Test Accuracy is:{}".format(correct/total))

HBox(children=(IntProgress(value=0), HTML(value='')))


Test Accuracy is:0.9934999942779541


While you certainly can build your own custom CNNs like we did above, more often than not, it's better to use one of the popular existing architectures. The Torchvision documentation has a list of supported CNNs, as well as some performance characteristics. There's a number of reasons for using one of these CNNs instead of designing your ownFirst, for image datasets larger and more complex than MNIST (which is basically all of them), a fair amount network depth and width is often necessary. For example, some of the popular CNNs can be over 100 layers deep, with several tricks and details beyond what we've covered in this notebook. Coding all of this yourself has a high potential for error, especially when you're first getting started. Instead, you can create the CNN architecture using Torchvision, using a couple lines:

In [31]:
import torchvision.models as models
resnet18 = models.resnet18()
#print(resnet18)