# PyTorch Assignment: Convolutional Neural Network (CNN)

**[Duke Community Standard](http://integrity.duke.edu/standard.html): By typing your name below, you are certifying that you have adhered to the Duke Community Standard in completing this assignment.**

Name: 

### Convolutional Neural Network

Adapt the CNN example for MNIST digit classfication from Notebook 3A. 
Feel free to play around with the model architecture and see how the training time/performance changes, but to begin, try the following:

Image ->  
convolution (32 3x3 filters) -> nonlinearity (ReLU) ->  
convolution (32 3x3 filters) -> nonlinearity (ReLU) -> (2x2 max pool) ->  
convolution (64 3x3 filters) -> nonlinearity (ReLU) ->  
convolution (64 3x3 filters) -> nonlinearity (ReLU) -> (2x2 max pool) -> flatten ->
fully connected (256 hidden units) -> nonlinearity (ReLU) ->  
fully connected (10 hidden units) -> softmax 

Note: The CNN model might take a while to train. Depending on your machine, you might expect this to take up to half an hour. If you see your validation performance start to plateau, you can kill the training.

In [3]:
from torchvision import datasets, transforms
import torch.nn as nn
import torch
import numpy as np
import torch.nn.functional as F
from tqdm.notebook import tqdm, trange

# load the data
mnist_train = datasets.MNIST(root="./datasets/", train=True, 
                             transform=transforms.ToTensor(),
                             download=True)
mnist_test = datasets.MNIST(root="./datasets/", train=False,
                            transform=transforms.ToTensor(),
                            download=True)
train_loader = torch.utils.data.DataLoader(mnist_train, 
                                           batch_size=100, 
                                           shuffle=True)
test_loader = torch.utils.data.DataLoader(mnist_test,
                                          batch_size=100,
                                          shuffle=False)

## Training
# Model
class MNIST_CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(nn.Conv2d(1, 32, kernel_size=3, padding = 1),
                                   nn.ReLU(),
                                   nn.Conv2d(32, 32, kernel_size=3, padding = 1),
                                   nn.ReLU(),
                                   nn.MaxPool2d(kernel_size=2),
                                   nn.Conv2d(32, 64, kernel_size=3, padding = 1),
                                   nn.ReLU(),
                                   nn.Conv2d(64, 64, kernel_size=3, padding = 1),
                                   nn.ReLU(),
                                   nn.MaxPool2d(kernel_size=2),
                                   nn.Flatten(),
                                   nn.Linear(64*7*7, 256),
                                   nn.ReLU(),
                                   nn.Linear(256, 10))
    def forward(self, x):
        return self.model(x)

model = MNIST_CNN()

#Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Iterate through train set minibatches
for epoch in trange(1):
    for images, labels in tqdm(train_loader):
        #Zero out the gradient
        optimizer.zero_grad()

        # Forward pass
        x = images
        y = model(x)
        loss = criterion(y, labels)

        # backward pass
        loss.backward()
        optimizer.step()

# Testing
correct = 0
total = len(mnist_test)

with torch.no_grad():
    # Iterate through test set minibatches
    for images, labels in tqdm(test_loader):
        # Forward pass
        x = images
        y = model(x)

        predictions = torch.argmax(y, dim=1)
        correct += torch.sum((predictions == labels).float())
    
print('Test accuracy: {}'.format(correct/total))

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/600 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

Test accuracy: 0.9842000007629395


In [9]:
for images, labels in tqdm(train_loader):
    print(labels.size())
    y = model(images)
    print(y.size())
    print(criterion(y, labels))
    break

  0%|          | 0/600 [00:00<?, ?it/s]

torch.Size([100])
torch.Size([100, 10])
tensor(0.0371, grad_fn=<NllLossBackward0>)


In [12]:
def print_model_weight_size(model):
    print("Trainable parameters in the model:")
    for name, param in model.named_parameters():
        if param.requires_grad:
            print(f"Name: {name} | Shape: {param.shape}")
    trainable_params_count = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print(f"\nTotal number of trainable parameters: {trainable_params_count:,}")
print_model_weight_size(model)

Trainable parameters in the model:
Name: model.0.weight | Shape: torch.Size([32, 1, 3, 3])
Name: model.0.bias | Shape: torch.Size([32])
Name: model.2.weight | Shape: torch.Size([32, 32, 3, 3])
Name: model.2.bias | Shape: torch.Size([32])
Name: model.5.weight | Shape: torch.Size([64, 32, 3, 3])
Name: model.5.bias | Shape: torch.Size([64])
Name: model.7.weight | Shape: torch.Size([64, 64, 3, 3])
Name: model.7.bias | Shape: torch.Size([64])
Name: model.11.weight | Shape: torch.Size([256, 3136])
Name: model.11.bias | Shape: torch.Size([256])
Name: model.13.weight | Shape: torch.Size([10, 256])
Name: model.13.bias | Shape: torch.Size([10])

Total number of trainable parameters: 870,634


### Short answer

1\. How does the CNN compare in accuracy with yesterday's logistic regression and MLP models? How about training time?

much higher accuracy but longer training time

2\. How many trainable parameters are there in the CNN you built for this assignment?

*Note: The total of trainable parameters counts each element in a tensor. For example, a weight matrix that is 10x5 has 50 trainable parameters.*

870,634

3\. When would you use a CNN versus a logistic regression model or an MLP?

CNNs are for image/video datatype. MLPs work well with structured, tabular datasets. logistic regression is best used when the classes in the data are more or less linearly separable and the task can be solved in a predetermined, fixed number of dimensions.