# Convolutional Neural Network (CNN) with PyTorch (on MNIST)
By [Zahra Taheri](https://github.com/zata213), September 8, 2020

## Transition from Feedforward neural networks to Convolutional neural networks

### 1 Hidden Layer Feedforward Neural Network

![alt text](feedforward-nn.png)

### Basic Convolutional neural network

- Additional convolution and pooling layers before Feedforward neural network.
- Layer with a linear function and a non-linearity name: Fully connected layer.

![alt text](convolutional-nn.png)

## One Convolutional layer: High level view

### One Convolutional Layer, Input Depth of 1


- Input_depth=1 means the input is a 'one single color' image, i.e., gray scale image.
- It can be imagined that the touch light shining the image is called a 'filter' or a 'kernel', and where it is shining is called a 'receptive field' or a 'patch'.
- a kernel slides through the hole image to detect the receptive fields' shapes, and the kernel returns $0$ if it cannot detect the shape of a receptive field. Also, if the kernel across and detects the shape that the kernel knows how to detect, then you get a non-zero number like. 
- A convolution is just a mapping of the input to a bunch of numbers in the output. Such a bunch of numbers is called a 'feature map' or an 'activation map'. 
- The number of feature maps depend on the number of kernels (output depths=the number of kernels).
- More kernels results in more feature map, and so more kernels gives us more information about the input.
- The width of input and output may be different.
![alt text](one-conv-layer-depth1.png)
![alt text](one-conv-layer-ex.png)
![alt text](one-conv-layer-ex2.png)

### One Convolutional Layer, Input Depth of 3

- Input_depth=3 means the input is an 'RGB' colored image.
- The kernel must have the same depth, 3.

![alt text](one-conv-layer-depth3.png)
![alt text](one-conv-layer-depth3-ex.png)

### Summary
- As the kernel is sliding/convolving across the image, 2 operations done per match
    1. Element-wise multiplication
    2. Summation

- More kernels means more feature map channels, and so can capture more information about the input.

## Multiple Convolutional Layers

### Pooling Layers
The essence of the pooling is to downsampling the images, i.e., reducing the size of your input.

**2 common types of pooling layers:**
   - Max pooling
   - Average pooling
   
![alt text](pooling.png)
![alt text](pooling-ex.png)
![alt text](pooling-ex2.png)
![alt text](pooling-ex3.png)

#### Multiple pooling layers

![alt text](multiple-pooling.png)

### Padding
- **Valid padding (zero padding)**
    - output size < input size
- **Same padding (zero padding)**
    - output size = input size

![alt text](zero-padding.png)
![alt text](same-padding.png)

### Output size calculation

$O = \frac{W-K+2P}{S}+1$
   - O = Output height
   - W = Input height
   - K = Kernel size/height
   - P = Padding
        - $P = \frac{K-1}{2}$
   - S = Stride

## Building Convolutional Neural Network with PyTorch

### Model A: 
- 2 Convolutional layers
    - With Same padding, i.e. with output size = input size.
- 2 Max pooling layers
- 1 Fully connected layer

**Convolution is just an operation where we do an element-wise multiplication followed by a summation.**

![alt text](model-a.png)

### Output Formula for Convolution

We know that Same padding will result in input size = outpt size. Also, by the formula we have the same result as follows:

$O = \frac{W-K+2P}{S}+1$
   - O = Output height
   - W = Input height
   - K = Kernel size/height := 5
   - P = Padding
        - $P = \frac{K-1}{2}\rightarrow P=2$
   - S = Stride := 1
   
### Output Formula for Pooling

$O = \frac{W}{K}$
   - W = Input height
   - K = Kernel size/height := 2

![alt text](cnn-a.png)

#### Steps
- Step 1: Load dataset
- Step 2: Make dataset iterable
- Step 3: Create model class
- Step 4: Instantiate model class
- Step 5: Instantiate loss class
- Step 6: Instantiate optimizer class
- Step 7: Train the model


In [1]:
# import libraries
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

In [2]:
'''
Step 1: Load dataset
'''

train_dataset = dsets.MNIST(root='.\data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='.\data', 
                           train=False, 
                           transform=transforms.ToTensor())

'''
Step 2: Make dataset iterable
'''

batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs) # the number of times we go through the whole dataset

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)# this part is for that we have multiple epochs

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

'''
Step 3: Create model class
'''
class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()
        
        # Convolution 1
        # Conv2d means 2 dimentional convolution, in_channels=1 means gray scale image,
        # out_channels=16 means there exist 16 kernels
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=2)
        # after each convolution we need non-linearity
        self.relu1 = nn.ReLU()
        
        # Max pool 1
        self.maxpool1 = nn.MaxPool2d(kernel_size=2)
     
        # Convolution 2
        # in_channels=16 because we have 16 feature maps, out_channels=32 is chosen by ourselves
        self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=2)
        self.relu2 = nn.ReLU()
        
        # Max pool 2
        self.maxpool2 = nn.MaxPool2d(kernel_size=2)
        
        # Fully connected 1 (readout)
        self.fc1 = nn.Linear(32 * 7 * 7, 10) 
    
    def forward(self, x):
        # Convolution 1
        out = self.cnn1(x)
        out = self.relu1(out)
        
        # Max pool 1
        out = self.maxpool1(out)
        
        # Convolution 2 
        out = self.cnn2(out)
        out = self.relu2(out)
        
        # Max pool 2 
        out = self.maxpool2(out)
        
        # Resize because we want to feed a linear function
        # Original size: (100, 32, 7, 7)
        # out.size(0): 100
        # New out size: (100, 32*7*7)
        out = out.view(out.size(0), -1)

        # Linear function (readout)
        out = self.fc1(out)
        
        return out

'''
Step 4: Instantiate model class
'''

model = CNNModel()

#######################
#  USE GPU FOR MODEL  #
#######################

if torch.cuda.is_available():
    model.cuda()

'''
Step 5: Instantiate loss class
'''
criterion = nn.CrossEntropyLoss()


'''
- Step 6: Instantiate optimizer class
'''
learning_rate = 0.01

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [3]:
print(model.parameters())
print(len(list(model.parameters())))

# convolution 1: 16 kernels
print(list(model.parameters())[0].size())

# convolution 1 bias: 16 kernels
print(list(model.parameters())[1].size())

# convolution 2: 32 kernels with depth 16
print(list(model.parameters())[2].size())

# convolution 2 bias: 32 kernels with depth 16
print(list(model.parameters())[3].size())

# fully connected layer
print(list(model.parameters())[4].size())

# fully connected layer
print(list(model.parameters())[5].size())

<generator object Module.parameters at 0x000001F2D153C820>
6
torch.Size([16, 1, 5, 5])
torch.Size([16])
torch.Size([32, 16, 5, 5])
torch.Size([32])
torch.Size([10, 1568])
torch.Size([10])


**Difference between a CNN and a Feedforward neural network**

- CNN input size: (1, 28, 28)
- Feedforward NN input size: (1, 28*28)
It means, in Feedforward NN we have to flatten or resize input

In [4]:
'''
Step 7: Train the model
'''
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        
        #######################
        #  USE GPU FOR MODEL  #
        #######################
        if torch.cuda.is_available():
            images = Variable(images.cuda())
            labels = Variable(labels.cuda())
        else:
            images = Variable(images)
            labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                if torch.cuda.is_available():
                    images = Variable(images.cuda())
                else:
                    images = Variable(images)
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                # Total correct predictions
                if torch.cuda.is_available():
                    correct += (predicted.cpu() == labels.cpu()).sum()
                else:
                    correct += (predicted == labels).sum()
            
            accuracy = 100 * correct // total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data, accuracy))

Iteration: 500. Loss: 0.40779000520706177. Accuracy: 90
Iteration: 1000. Loss: 0.3330969512462616. Accuracy: 93
Iteration: 1500. Loss: 0.21542587876319885. Accuracy: 94
Iteration: 2000. Loss: 0.3770652711391449. Accuracy: 96
Iteration: 2500. Loss: 0.07307636737823486. Accuracy: 96
Iteration: 3000. Loss: 0.11064232140779495. Accuracy: 96


### Model B: 
- 2 Convolutional layers
    - With Same padding, i.e. with output size = input size.
- 2 Average pooling layers
- 1 Fully connected layer

![alt text](cnn-b.png)

#### Steps
- Step 1: Load dataset
- Step 2: Make dataset iterable
- Step 3: Create model class
- Step 4: Instantiate model class
- Step 5: Instantiate loss class
- Step 6: Instantiate optimizer class
- Step 7: Train the model


In [5]:
# import libraries
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

In [6]:
'''
Step 1: Load dataset
'''

train_dataset = dsets.MNIST(root='.\data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='.\data', 
                           train=False, 
                           transform=transforms.ToTensor())

'''
Step 2: Make dataset iterable
'''

batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs) # the number of times we go through the whole dataset

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)# this part is for that we have multiple epochs

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

'''
Step 3: Create model class
'''
class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()
        
        # Convolution 1
        # Conv2d means 2 dimentional convolution, in_channels=1 means gray scale image,
        # out_channels=16 means there exist 16 kernels
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=2)
        # after each convolution we need non-linearity
        self.relu1 = nn.ReLU()
        
        # Average pool 1
        self.avgpool1 = nn.AvgPool2d(kernel_size=2)
     
        # Convolution 2
        # in_channels=16 because we have 16 feature maps, out_channels=32 is chosen by ourselves
        self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=2)
        self.relu2 = nn.ReLU()
        
        # Average pool 2
        self.avgpool2 = nn.AvgPool2d(kernel_size=2)
        
        # Fully connected 1 (readout)
        self.fc1 = nn.Linear(32 * 7 * 7, 10) 
    
    def forward(self, x):
        # Convolution 1
        out = self.cnn1(x)
        out = self.relu1(out)
        
        # Max pool 1
        out = self.avgpool1(out)
        
        # Convolution 2 
        out = self.cnn2(out)
        out = self.relu2(out)
        
        # Max pool 2 
        out = self.avgpool2(out)
        
        # Resize because we want to feed a linear function --> Flattening
        # Original size: (100, 32, 7, 7)
        # out.size(0): 100
        # New out size: (100, 32*7*7)
        out = out.view(out.size(0), -1)

        # Linear function (readout)
        out = self.fc1(out)
        
        return out

'''
Step 4: Instantiate model class
'''

model = CNNModel()

#######################
#  USE GPU FOR MODEL  #
#######################

if torch.cuda.is_available():
    model.cuda()

'''
Step 5: Instantiate loss class
'''
criterion = nn.CrossEntropyLoss()


'''
- Step 6: Instantiate optimizer class
'''
learning_rate = 0.01

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [7]:
'''
Step 7: Train the model
'''
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        
        #######################
        #  USE GPU FOR MODEL  #
        #######################
        if torch.cuda.is_available():
            images = Variable(images.cuda())
            labels = Variable(labels.cuda())
        else:
            images = Variable(images)
            labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                if torch.cuda.is_available():
                    images = Variable(images.cuda())
                else:
                    images = Variable(images)
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                # Total correct predictions
                if torch.cuda.is_available():
                    correct += (predicted.cpu() == labels.cpu()).sum()
                else:
                    correct += (predicted == labels).sum()
            
            accuracy = 100 * correct // total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data, accuracy))

Iteration: 500. Loss: 0.4932715892791748. Accuracy: 85
Iteration: 1000. Loss: 0.3048873841762543. Accuracy: 88
Iteration: 1500. Loss: 0.3095475435256958. Accuracy: 90
Iteration: 2000. Loss: 0.16857901215553284. Accuracy: 91
Iteration: 2500. Loss: 0.20932739973068237. Accuracy: 92
Iteration: 3000. Loss: 0.25459539890289307. Accuracy: 93


#### In general: Average pooling test accuracy < Max pooling test accuracy

### Model C: 
- 2 Convolutional layers
    - With valid padding
- 2 Max pooling layers
- 1 Fully connected layer

![alt text](cnn-c.png)

#### Steps
- Step 1: Load dataset
- Step 2: Make dataset iterable
- Step 3: Create model class
- Step 4: Instantiate model class
- Step 5: Instantiate loss class
- Step 6: Instantiate optimizer class
- Step 7: Train the model


In [8]:
# import libraries
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

In [9]:
'''
Step 1: Load dataset
'''

train_dataset = dsets.MNIST(root='.\data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='.\data', 
                           train=False, 
                           transform=transforms.ToTensor())

'''
Step 2: Make dataset iterable
'''

batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs) # the number of times we go through the whole dataset

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)# this part is for that we have multiple epochs

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

'''
Step 3: Create model class
'''
class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()
        
        # Convolution 1
        # Conv2d means 2 dimentional convolution, in_channels=1 means gray scale image,
        # out_channels=16 means there exist 16 kernels
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=0)
        # after each convolution we need non-linearity
        self.relu1 = nn.ReLU()
        
        # Max pool 1
        self.maxpool1 = nn.MaxPool2d(kernel_size=2)
     
        # Convolution 2
        # in_channels=16 because we have 16 feature maps, out_channels=32 is chosen by ourselves
        self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=0)
        self.relu2 = nn.ReLU()
        
        # Max pool 2
        self.maxpool2 = nn.MaxPool2d(kernel_size=2)
        
        # Fully connected 1 (readout)
        self.fc1 = nn.Linear(32 * 4 * 4, 10) 
    
    def forward(self, x):
        # Convolution 1
        out = self.cnn1(x)
        out = self.relu1(out)
        
        # Max pool 1
        out = self.maxpool1(out)
        
        # Convolution 2 
        out = self.cnn2(out)
        out = self.relu2(out)
        
        # Max pool 2 
        out = self.maxpool2(out)
        
        # Flatten
        out = out.view(out.size(0), -1)

        # Linear function (readout)
        out = self.fc1(out)
        
        return out

'''
Step 4: Instantiate model class
'''

model = CNNModel()

#######################
#  USE GPU FOR MODEL  #
#######################

if torch.cuda.is_available():
    model.cuda()

'''
Step 5: Instantiate loss class
'''
criterion = nn.CrossEntropyLoss()


'''
- Step 6: Instantiate optimizer class
'''
learning_rate = 0.01

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [10]:
'''
Step 7: Train the model
'''
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        
        #######################
        #  USE GPU FOR MODEL  #
        #######################
        if torch.cuda.is_available():
            images = Variable(images.cuda())
            labels = Variable(labels.cuda())
        else:
            images = Variable(images)
            labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                if torch.cuda.is_available():
                    images = Variable(images.cuda())
                else:
                    images = Variable(images)
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                # Total correct predictions
                if torch.cuda.is_available():
                    correct += (predicted.cpu() == labels.cpu()).sum()
                else:
                    correct += (predicted == labels).sum()
            
            accuracy = 100 * correct // total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data, accuracy))

Iteration: 500. Loss: 0.3956705331802368. Accuracy: 89
Iteration: 1000. Loss: 0.46202465891838074. Accuracy: 92
Iteration: 1500. Loss: 0.31409895420074463. Accuracy: 94
Iteration: 2000. Loss: 0.14005053043365479. Accuracy: 95
Iteration: 2500. Loss: 0.09357797354459763. Accuracy: 96
Iteration: 3000. Loss: 0.09202654659748077. Accuracy: 96


## Deep learning
#### 3 ways to expand a convolutional neural network
- More convolutional layers
- Less aggressive downsampling
    - Smaller kernel size for pooling (gradually downsampling)
    - Convolutional layers have same padding
- More fully connected layers

#### Cons 
- Need a larger dataset
    - Curse of dimensionality
- Does not necessarily mean higher accuracy