# PyTorch Assignment: Convolutional Neural Network (CNN)

**[Duke Community Standard](http://integrity.duke.edu/standard.html): By typing your name below, you are certifying that you have adhered to the Duke Community Standard in completing this assignment.**

Name: Tyler Feldman

### Convolutional Neural Network

Adapt the CNN example for MNIST digit classfication from Notebook 3A. 
Feel free to play around with the model architecture and see how the training time/performance changes, but to begin, try the following:

Image ->  
convolution (32 3x3 filters) -> nonlinearity (ReLU) ->  
convolution (32 3x3 filters) -> nonlinearity (ReLU) -> (2x2 max pool) ->  
convolution (64 3x3 filters) -> nonlinearity (ReLU) ->  
convolution (64 3x3 filters) -> nonlinearity (ReLU) -> (2x2 max pool) -> flatten ->
fully connected (256 hidden units) -> nonlinearity (ReLU) ->  
fully connected (10 hidden units) -> softmax 

Note: The CNN model might take a while to train. Depending on your machine, you might expect this to take up to half an hour. If you see your validation performance start to plateau, you can kill the training.

In [1]:
### YOUR CODE HERE ###
import torch.nn as nn


class MNIST_CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 32, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.conv4 = nn.Conv2d(64, 64, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(7*7*64, 256)
        self.fc2 = nn.Linear(256, 10)
        

    def forward(self, x):
        # First conv layer
        x = self.conv1(x)
        x = F.relu(x)
        
        # Second conv layer
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, kernel_size=2)
        
        # Third conv layer
        x = self.conv3(x)
        x = F.relu(x)
        
        # Fourth conv layer
        x = self.conv4(x)
        x = F.relu(x)
        x = F.max_pool2d(x, kernel_size=2)
        
        # fc layer 1
        x = x.view(-1, 7*7*64)
        x = self.fc1(x)
        x = F.relu(x)
        
        # fc layer 2
        x = self.fc2(x)
        return x

In [2]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from tqdm.notebook import tqdm, trange

mnist_train = datasets.MNIST(root="./datasets",train=True,transform=transforms.ToTensor(),download=True)
mnist_test = datasets.MNIST(root="./datasets", train=False,transform=transforms.ToTensor(),download=True)
train_loader = torch.utils.data.DataLoader(mnist_train,batch_size=100,shuffle=True)
test_loader = torch.utils.data.DataLoader(mnist_test,batch_size=100,shuffle=False)

#Training

model = MNIST_CNN()


criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)


for epoch in trange(3):
    for images, labels in tqdm(train_loader):
        
        # Zero gradients
        optimizer.zero_grad()
        
        # Forward Pass
        x = images
        y = model(x)
        
        loss = criterion(y, labels)
        
        # Backward pass
        loss.backward()
        optimizer.step()

HBox(children=(FloatProgress(value=0.0, max=3.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=600.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=600.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=600.0), HTML(value='')))





In [12]:
## Testing
correct = 0
total = len(mnist_test)

with torch.no_grad():
    # Iterate through test set minibatchs 
    for images, labels in tqdm(test_loader):
        # Forward pass
        x = images
        y = model(x)

        predictions = torch.argmax(y, dim=1)
        correct += torch.sum((predictions == labels).float())

print('Test accuracy: {}'.format(correct/total))

HBox(children=(FloatProgress(value=0.0), HTML(value='')))


Test accuracy: 0.989799976348877


### Short answer

1\. How does the CNN compare in accuracy with yesterday's logistic regression and MLP models? How about training time?

Accuracy is 98.97% compared to the MLP's 97.22% and logistic regression's 90.20%. Training time is much longer since there are a lot more parameters to learn. 

2\. How many trainable parameters are there in the CNN you built for this assignment?

*Note: The total of trainable parameters counts each element in a tensor. For example, a weight matrix that is 10x5 has 50 trainable parameters.*

For the filters, we have a total of 192 3x3 filters, which is 1728 parameters. For the fully connected layers, we have (7* 7 * 64) * (256) + 256 + (256 * 10) + 10 = 805642. The total is 807370 parameters.

3\. When would you use a CNN versus a logistic regression model or an MLP?

When you have a complex dataset of images that cannot easily be separated in terms of their features, or when accuracy is very important. A logistic regression model is best for a very simple problem, where the data can be separated linearly. An MLP is more complex than a logistic regression model, so for a moderately complex dataset (ideally wouldn't be images) it would be good to use, but for images, a CNN is generally best since most images are complex.