# Week 2, Day 3 (Guided Project with Pytorch on Image Classification)
> Welcome to third day (Week 2) of the McE-51069 course. We will walk you through with the basic flow of building the Convolutional Neural Network.

- sticky_rank: 5
- toc: true
- badges: true
- comments: false
- categories: [Pytorch, Convolutional_Neural_Network]

# Assignment

For the assignment of Week 2, Please visit this [link](http://colab.research.google.com/github/ytu-cvlab/mce-51069-week-2-day3/blob/main/Assignment_3.ipynb) and copy the notebook to your google drive.

After training the network for MNIST dataset is finised, please submit the weight file save at the last cell to this [link].

# Import Necessary Libraries

Pytorch is the open source machine learning framework that we can use for research to production.

If you would like to study the tutorials from the offical [pytorch](https://pytorch.org/) website, please visit [this link](https://pytorch.org/tutorials/). The source code for the entire pytorch framework can be found [here](https://github.com/pytorch).

> SideNote: Pytorch separates different tasks into different modules. e.g., there is a package called **torchaudio** that focus only on audio alone.


Today we will use `torchvision`, which is the package inside torch, that focus on vision tasks. For example, for data augmentation, torchvision provides the function called **transforms**. And if you would like to use transfer learning, torchvision also provides some of the state of the art pretrained models.

The `torch.nn` contains the basic building blocks that we need to construct our model. For example, if we need to construct a **Convolutional Layer**, we can call the function `torch.nn.Conv2d()` to construct that layer.

And if we want linear layer that perform the equation of $y = W^TX + b$, we can call the function `torch.nn.Linear()`.

If you would like to know more about `torch.nn` library, please visit this [link](https://pytorch.org/docs/stable/nn.html) for more information. Also, this [post](https://pytorch.org/tutorials/beginner/nn_tutorial.html) provides better understanding of `torch.nn` module.

In [1]:
import torch
import torchvision
import torchvision.transforms as transforms

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import matplotlib.pyplot as plt
import numpy as np
import cv2

# Main Components

Let's preview what will be built in this notebook.

 1. Dataset 
  - cifar10
 2. Model Architecture
  - Simple Model
 3. Loss (Update model)
  - Cross Entropy
 4. Optimizer (Regularizer)
  - Adam Optimizer
 5. Metrics (Visual for User)
  - Loss, Accuracy
 6. Save Model
  - Model Checkpoint

If you would like to change from CPU to GPU,
select the `Runtime --> Change Runtime type` and select `GPU`.

After done selecting, we can check whether we are running GPU or CPU by using the following function:

In [2]:
# Check if GPU is available
print(torch.cuda.is_available())

True


In [3]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)

cuda:0


If the output show `cuda:0`, then the `GPU` is in used.

# Dataset

In this notebook, we will use `CIFAR10` dataset for classification. 

[CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) is a public classification dataset, which consists of 60,000 32 * 32 color images in 10 classes. There are 50,000 training images and 10,000 testing images. 

In the following cell, we will do data augmentation for the dataset. When doing **trasnforms** on the test dataset, we only *normalize* the input, that is because we do not need to augment(change) our test dataset to see the result.

Tensor is just like Numpy's ndarray, except that it can do calculations on GPU, and is modified to fit the training neural network procedure.

Normalization equation:

$X = \frac{X-\mu}{\sigma}$.

There are many augmentation methods available, if you would like to know more, please visit this [post](https://pytorch.org/docs/stable/torchvision/transforms.html).

In [None]:
# The class names for CIFAR 10
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']


# Image Normalization, Data Augmentation
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,), (0.5,)),
                                transforms.RandomRotation(30)
                                ])

test_transform = transforms.Compose([
                                     transforms.ToTensor(),
                                     transforms.Normalize((0.5,), (0.5,))
])
train_dataset = torchvision.datasets.CIFAR10(
        './data', train=True,
        transform=transform, download=True)

test_dataset = torchvision.datasets.CIFAR10(
        './data', train=False,
        transform=test_transform, download=True
)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Extracting ./data/cifar-10-python.tar.gz to ./data


> Note: Pytorch stores the image dataset in the following format:
`batch_size * dim * height * width (B*D*H*W)`.

`batch_size` means number of images for one training iteration.

`dim` means the image dimension. Conventionally, color images have the dimension of *3* and gray scale images have the dimension of *1*.

The label value for the dataset ranges from `0~9`. e.g., if the label is *7*, then, it would be **class_names[7] = horse**.

## Dataset and DataLoader

Dataset plays the huge role in machine learning and deep learning. For the computer vision task, we can do data augmentation to get the effect of regulating the model, avoid being overfitting. And when training, the augmented dataset needs to be fetched by the data loader. The main duty of the dataloader is to prepare dataset before feeding into the neural network.

Dataloader object is a generator object, which can be accessed through iteration.

In [None]:
train_loader = torch.utils.data.DataLoader(train_dataset, 32, shuffle=True,
                                           num_workers=2)

test_loader = torch.utils.data.DataLoader(test_dataset, 8, shuffle=True,
                                           num_workers=2)

# Model

Let's build a simple Convolutional Neural Network.

When flattening the network, we can use the equation:

$O = \frac{W-K+2P}{S} + 1$

In [None]:
class SampleModel(nn.Module):
    def __init__(self):
        super(SampleModel, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, 3)
        self.conv2 = nn.Conv2d(64, 256, 3)
        self.conv3 = nn.Conv2d(256, 256, 3)
        self.conv4 = nn.Conv2d(256, 128, 3)
        self.fc1 = nn.Linear(24*24*128, 512)
        self.fc2 = nn.Linear(512, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = F.relu(self.conv3(x))
        x = F.relu(self.conv4(x))
        x = x.view(-1, 24*24*128)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

In [None]:
model = SampleModel()

# Move model to GPU
model.to(device)


# Define Loss
criterion = nn.CrossEntropyLoss()

# Define Optimizer
optimizer = optim.Adam(model.parameters())
# optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

# Let's Inspect our model.

We can get the weights of the model by two methods: `model.parameters()` and `model.state_dict()`.

`model.parameters()` contain the value of the weight. For example, in Conv2d layer, it will be the kernel value.

`model.state_dict()` is a dictionary object which have key as the parameter name and value of model.parameters().

In [None]:
print(type(model.state_dict()))
print(type(model.parameters()))
print()

for para in model.parameters():
    print(para.size())
    break

for stats in model.state_dict():
    print(f"Key : {stats}, Value : {model.state_dict()[stats].size()}")
    break

In [None]:
# We could also write the above with key manually provided.
first_conv = model.state_dict()['conv1.weight']
print(first_conv.size())

In [None]:
# Helper function for convolution in 3D image
def conv2d_cv2(img, filter):
    h, w, d = img.shape
    filtered = np.zeros_like(img)
    for i in range(3):
        img_ = img[:,:,i]
        filter_ = filter[:,:,i]
        result = cv2.filter2D(img_, -1, filter_)
        filtered[:,:,i] = result

    filtered = filtered.mean(axis=2)
    return filtered

# Helper function
def visualize_filters(model=model):
    '''
    Visualize Convolutional layer (Weight of model).
    '''
    first_conv = model.state_dict()['conv1.weight']
    plt.figure(figsize=(20, 20))
    for i, filter in enumerate(first_conv):
        filter = filter.cpu().numpy().transpose(1, 2, 0)

        filter += np.abs(np.min(filter))
        filter /= np.max(filter)
        plt.subplot(8, 8, i+1)
        plt.imshow(filter)
        # print(np.max(filter), np.min(filter))
        # break
    plt.show()

# Helper function to visualize first layer output
def visualize_model(model=model, img_ind = 10):
    '''
    Visualize model for better understanding.
    Args:
        model -> model trained or not trained.
    '''
    image = train_dataset[img_ind][0].cpu().numpy().transpose(1, 2, 0)
    first_conv = model.state_dict()['conv1.weight']
    plt.figure(figsize=(20, 20))
    for i, filter in enumerate(first_conv):
        filter = filter.cpu().numpy().transpose(1, 2, 0)
        filtered = conv2d_cv2(image, filter)
        plt.subplot(8, 8, i+1)
        plt.imshow(filtered, cmap='gray')

    plt.show()


visualize_filters(model)
visualize_model(model, img_ind=10)

In [None]:
# Train and plot model performance
EPOCH = 10

loss_log = []
acc_log = []
x_coor = []

for epoch in range(EPOCH):
    loss_epoch = 0
    total_imgs = 0
    correct_epoch = 0

    for i, data in enumerate(train_loader):
        
        # Get Image data and Label data
        imgs, labels = data[0].to(device), data[1].to(device)

        # Clear out the gradient
        optimizer.zero_grad()


        # Forward Propagation Start
        # Predict from Input : B * 10
        predicts = model(imgs)

        # Calculate Loss from batch : 1
        loss = criterion(predicts, labels)
        # Forward Propagation End


        # Backward Propagation Start
        # Calculate Gradient
        loss.backward()

        # Update Model parameters with optimizer : Adam or SGD
        optimizer.step()
        # Backward Propagation End


        # Add to Epoch Loss
        loss_epoch += loss.item()
        # Total Number of images in one batch
        total_imgs += len(imgs)
        # Count the total number of correct prediction
        correct_batch = (torch.argmax(predicts, 1)==labels).sum().item()
        correct_epoch += correct_batch
        acc_batch = correct_batch/ len(imgs)

        # Adding to tensorboard
        x_coor.append((i*len(imgs))+(epoch*len(train_dataset)))
        loss_log.append(loss.item())
        acc_log.append(acc_batch)


    acc_epoch = (correct_epoch/total_imgs)*100
    loss_epoch = loss_epoch/total_imgs
    print(f"EPOCH : {epoch+1}, Acc : {acc_epoch:.2f}, Loss : {loss_epoch:.2f}")

In [None]:
# Let's visualize our model filter again
visualize_filters(model)
visualize_model(model)

# Saving Model

When saving our model, we often save the model `state_dict`. The saved model file contain the values for the weight and the variable name indicating that weight.

So, if we try to reload the model next time, the file will automatically look for the same variable name.

In [None]:
# Checkpoint
ckpt_path = 'checkpoint.pt'

# State Difference between model.parameters and model.state_dict()
torch.save(model.state_dict(), ckpt_path)

In [None]:
saved_model = SampleModel().to(device)
checkpoint = torch.load(ckpt_path)
saved_model.load_state_dict(checkpoint)

In [None]:
# Test Dataset accuracy:
num_total = 0
num_correct = 0

# Deactivate Drop out and Batch-Normalization layers.
model.eval()

# Do not store gradient info in forward propagation.
with torch.no_grad():
    for i, data in enumerate(test_loader):
        image, label = data
        image = image.to(device)
        label = label.to(device)

        predict = model(image)
        predict_class = torch.argmax(predict, dim=1)
        correct = (predict_class == label)


        num_correct += correct.sum().item()
        num_total += len(image)
        
print(num_correct, num_total)

In [None]:
print("Accuracy : ", num_correct/num_total)