# Introduction to Deep Learning (ECE 685D), Duke University
## Discussion Section
## Term: Fall 2021.

# Part 4: Convolutional Neural Network (CNN)

In [1]:
# Import Pytorch/Numpy modules

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
import matplotlib.pyplot as plt
import numpy as np

## A quick review

![Convolution Operation](https://miro.medium.com/max/720/1*ciDgQEjViWLnCbmX-EeSrA.gif)
![Convolution Operation](https://miro.medium.com/max/640/1*NsiYxt8tPDQyjyH3C08PVA@2x.png)
<div style="text-align: right"> Image from <a href="https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53">towardsdatascience.com.</a></div>

- Weight parameter (W) is a 4D Tensor.
- Shape: $C_{in} \times C_{out} \times H_k \times W_k$
- The output is referred to as the “feature maps”.
- The output of the operation is $C_{out} \times H_{out} \times W_{out}$.

## Effectively, a convoluition operation is a summation of element-wise product between the kernels and image/feature map 

### Relevant PyTorch functions:

- Convolution operation
```torch.nn.functional.conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1)```

- Convolutional layer
```torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, bias=True)```

### What attibutes do those arguments control ?
#### Stride: control the # pixels to be moved horizontally/vertically
#### Padding: control the padding so that inputs and outputs have the same dimensions
#### Dilation: The distance between numbers on input tensors to convolve with the filter 
* kernel_size = 3 x 3 with (left): padding = 1 and dilation = 1 and (right: padding=0, dilation=2)
* dilation > 1 is also called atrous convolution

![Dilation](https://miro.medium.com/max/720/1*btockft7dtKyzwXqfq70_w.gif)
<div style="text-align: right"> Image from <a href="https://towardsdatascience.com/review-drn-dilated-residual-networks-image-classification-semantic-segmentation-d527e1a8fb5">towardsdatascience.com.</a></div>

## An example of convolution operation.

In [2]:
# Create a random 4D tensor. Use the NCHW format, where N = 100, C = 1, H = W =28
x_cnn = torch.randn(100, 1, 28, 28)

# Create a random convolutional kernel (C_out, C_in, H_k = W_k = 3)
W1 = torch.randn(16, 1, 3, 3)

# Create a bias variable of size C_out
b1 = torch.randn(16, requires_grad=True)

# Apply the convolutional layer with relu activation
conv1 = F.relu(F.conv2d(x_cnn, W1, bias=b1, stride=1, padding=1))

# Print input/output shape
print("Input shape: {}".format(x_cnn.shape))
print("Convolution output shape: {}".format(conv1.shape))

Input shape: torch.Size([100, 1, 28, 28])
Convolution output shape: torch.Size([100, 16, 28, 28])


## Pooling and striding

Almost all CNN architectures incorporate either pooling or striding. This is done for a number of reasons, including:
- **Dimensionality reduction**: pooling and striding operations reduces computational complexity by shrinking the number of values passed to the next layer.
For example, a 2x2 maxpool reduces the size of the feature maps by a factor of 4.
- **Translational invariance**: Oftentimes in computer vision, we'd prefer that shifting the input by a few pixels doesn't change the output. Pooling and striding reduces sensitivity to exact pixel locations.
- **Increasing receptive field**: by summarizing a window with a single value, subsequent convolutional kernels are seeing a wider swath of the original input image. For example, a max pool on some input followed by a 3x3 convolution results in a kernel "seeing" a 6x6 region instead of 3x3.

### Pooling
The two most common forms of pooling are max pooling and average pooling. 
Both reduce values within a window to a single value, on a per-feature-map basis.
Max pooling takes the maximum value of the window as the output value; average pooling takes the mean.


![Max Pooling & Average Pooling](https://miro.medium.com/max/640/0*lIcR0gOMK0xzTr6_.png)
<div style="text-align: right"> Image from <a href="https://medium.com/aiguys/pooling-layers-in-neural-nets-and-their-variants-f6129fc4628b">towardsdatascience.com.</a></div>

#### Relevant PyTorch functions:

- Convolution operation
```torch.nn.functional.max_pool2d(fmap_fig, kernel_size=2)```

- Convolutional layer
```torch.nn.functional.avg_pool2d(fmap_fig, kernel_size=2)```

### Striding
As mentioned in class, convolution operation may also be used to reduce feature dimensions. 

If we set ```kernel=(3,3), stride=2, padding=1``` in ```torch.nn.functional.conv2d(...)```, we may also reduce 3/4 of the voxels in the feature map. 

## Example of a CNN classifier: [AlexNet](https://papers.nips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html) is a classic CNN architecture 
![AlexNet](https://miro.medium.com/max/720/1*bD_DMBtKwveuzIkQTwjKQQ.png)

### AlexNet on MNIST and CIFAR10

### Prepare dataset loader

## MNIST

In [3]:
batch_size = 100
train_set,test_set,train_loader,test_loader = {},{},{},{}
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.1307,), (0.3081,))])
train_set['mnist'] = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader['mnist'] = torch.utils.data.DataLoader(train_set['mnist'], batch_size=batch_size, shuffle=True, num_workers=0)
test_set['mnist'] = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
test_loader['mnist'] = torch.utils.data.DataLoader(test_set['mnist'], batch_size=batch_size, shuffle=False, num_workers=0)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw



## CIFAR10

In [4]:
train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])
train_set['cifar10'] = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=train_transform)
train_loader['cifar10'] = torch.utils.data.DataLoader(train_set['cifar10'], batch_size=batch_size, shuffle=True, num_workers=0)
test_set['cifar10'] = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=test_transform)
test_loader['cifar10'] = torch.utils.data.DataLoader(test_set['cifar10'], batch_size=batch_size, shuffle=False, num_workers=0)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


## Model architecture

### Please write a class that implements the above LeNet architecture below:

In [None]:
# For MNIST: 1 x 28 x 28
# For CIFAR10: 3 x 32 x 32

In [15]:
class ExampleNet(nn.Module):
    def __init__(self,in_channels, out_channels):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        #you may need to change the numbers when given an input of different dimensions
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, out_channels)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output


## Define train() and test()

In [6]:
def train(model, device, train_loader, criterion, optimizer, epoch):
    train_loss = 0
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()
        if batch_idx % (len(train_loader)//2) == 0:
            print('Train({})[{:.0f}%]: Loss: {:.4f}'.format(
                epoch, 100. * batch_idx / len(train_loader), train_loss/(batch_idx+1)))

def test(model, device, test_loader, criterion, epoch):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += criterion(output, target).item() # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss = (test_loss*batch_size)/len(test_loader.dataset)
    print('Test({}): Loss: {:.4f}, Accuracy: {:.4f}%'.format(
        epoch, test_loss, 100. * correct / len(test_loader.dataset)))

## Define make_optimizer() and make_scheduler()

In [7]:
def make_optimizer(optimizer_name, model, **kwargs):
    if optimizer_name=='Adam':
        optimizer = optim.Adam(model.parameters(),lr=kwargs['lr'])
    elif optimizer_name=='SGD':
        optimizer = optim.SGD(model.parameters(),lr=kwargs['lr'],momentum=kwargs['momentum'], weight_decay=kwargs['weight_decay'])
    else:
        raise ValueError('Not valid optimizer name')
    return optimizer
    
def make_scheduler(scheduler_name, optimizer, **kwargs):
    if scheduler_name=='MultiStepLR':
        scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=kwargs['milestones'], gamma=kwargs['factor'])
    else:
        raise ValueError('Not valid scheduler name')
    return scheduler

## Define main()

In [19]:
seed = 1
device = 'cuda'
data_name = 'mnist'
optimizer_name = 'SGD'
scheduler_name = 'MultiStepLR'
num_epochs = 5
lr = 0.05
device = torch.device(device)
torch.manual_seed(1)
torch.cuda.manual_seed(1)
in_channels = 1 if data_name== 'mnist' else 3
out_channels = 10
model = ExampleNet(in_channels,out_channels).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = make_optimizer(optimizer_name, model, lr=lr, momentum=0, weight_decay=0)
scheduler = make_scheduler(scheduler_name, optimizer, milestones=[5], factor=0.1)
for epoch in range(1, num_epochs + 1):
    train(model, device, train_loader[data_name], criterion, optimizer, epoch)
    test(model, device, test_loader[data_name], criterion, epoch)
    scheduler.step()
    print('Optimizer Learning rate: {0:.4f}'.format(optimizer.param_groups[0]['lr']))

Train(1)[0%]: Loss: 2.2987
Train(1)[50%]: Loss: 0.5680
Test(1): Loss: 0.0954, Accuracy: 96.9600%
Optimizer Learning rate: 0.0500
Train(2)[0%]: Loss: 0.1442
Train(2)[50%]: Loss: 0.1510
Test(2): Loss: 0.0552, Accuracy: 98.2300%
Optimizer Learning rate: 0.0500
Train(3)[0%]: Loss: 0.0336
Train(3)[50%]: Loss: 0.0988
Test(3): Loss: 0.0457, Accuracy: 98.5500%
Optimizer Learning rate: 0.0500
Train(4)[0%]: Loss: 0.2075
Train(4)[50%]: Loss: 0.0797
Test(4): Loss: 0.0395, Accuracy: 98.6700%
Optimizer Learning rate: 0.0500
Train(5)[0%]: Loss: 0.1092
Train(5)[50%]: Loss: 0.0658
Test(5): Loss: 0.0363, Accuracy: 98.8200%
Optimizer Learning rate: 0.0050


## Major discriminative tasks in Computer Vision
![Major tasks in Computer Vision](https://www.esri.com/about/newsroom/wp-content/uploads/2019/05/computervisionusecases-center.jpg)
<div style="text-align: right"> Image from <a href="https://hackmd.io/@arkel23/computervision2021_starter">web</a></div>