# Implementing Convolutional Neural Network (CNN) with MNIST data

CNN is composed of following components:
- CNN: extracts features from an image
- Max Pooling: reduces features from an image
- Fully Connected Network: uses extracted and reduced features as input to perform downstream tasks.

## Import Library

In [1]:
import torch
import torch.nn as nn # 신경망들이 포함됨
import torch.optim as optim # 최적화 알고리즘들이 포함됨
import torch.nn.init as init # 텐서에 초기값을 줌

import torchvision.datasets as datasets # 이미지 데이터셋 집합체
import torchvision.transforms as transforms # 이미지 변환 툴

from torch.utils.data import DataLoader # 학습 및 배치로 모델에 넣어주기 위한 툴

import numpy as np
import matplotlib.pyplot as plt

## Set Hyperparameter

- batch_size: how many data samples are used in one batch. The dataset split into these small batches.
- learning_rate: controls how big each step is when the model learns. It decides how fast the training moves toward the best answer.
- num_epoch: One epoche menas the model has seen the whole dataset once and trained on it. Number of epochs tells how many times the model will go through the full dataset.

In [4]:
batch_size = 100
learning_rate = 0.0002
num_epoch = 10

## Load MNIST Data

In [5]:
mnist_train = datasets.MNIST(root="../Data/", train=True, transform=transforms.ToTensor(), target_transform=None, download=True)
mnist_test = datasets.MNIST(root="../Data/", train=False, transform=transforms.ToTensor(), target_transform=None, download=True)

100%|██████████| 9.91M/9.91M [00:00<00:00, 57.6MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 1.73MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 14.4MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 4.82MB/s]


## Define Loaders

DataLoader is a tool that helps load data in batches for training or testing the model. When we put a dataset into the DataLoader, it gives the data to the model step by step, based on the settings we choose (like batch size).

In [6]:
train_loader = DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=2, drop_last=True)
test_loader = DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=2, drop_last=True)

## Define CNN (Base) Model

At first, define the CNN class, using torch.nn.Module

In [8]:
class CNN(nn.Module):
  def __init__(self):
    super(CNN, self).__init__() #initialize nn.Module

    self.layer = nn.Sequential(
        # extract features from the image using learnable filters
        # [100,1,28,28] -> [100,16,24,24]
        nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5),

        #activation function that introduces non-linearity into the model
        nn.ReLU(),

        # [100,16,24,24]->[100,32,20,20]
        nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5),
        nn.ReLU(),

        #[100,32,20,20]->[100,32,10,10]
        #reduces the spatial dimensions of the feature maps, helping to make the model more robust to small variations in the input
        nn.MaxPool2d(kernel_size=2, stride=2),

        #[100,32,10,10]->[100,64,6,6]
        nn.Conv2d(in_channels=32, out_channels=64, kernel_size=5),
        nn.ReLU(),

        # [100,64,6,6]->[100,64,3,3]
        nn.MaxPool2d(kernel_size=2, stride=2)
    )

    # fully connected layers
    self.fc_layer = nn.Sequential(
        # each neuron is connected to every neuron in the previous layer
        #[100,64*3*3] -> [100,100]
        nn.Linear(64*3*3, 100),
        nn.ReLU(),
        #[100,100] -> [100,10]
        nn.Linear(100,10)
    )

  #defines how the input data (x) flows through the network
  def forward(self,x):
    out = self.layer(x)
    out = out.view(batch_size, -1) #convert the shape of the tensor to [100,remainder] using view()
    out = self.fc_layer(out)
    return out



In [10]:
# GPU or CPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [11]:
model = CNN().to(device)

Loss function and Optimizer

In [12]:
loss_func = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

## Train Model

In train_loader, we get a batch of image and label pairs and send them to the model. The model uses them to calculate the loss. Then, it updates the model using gradient descent based on the loss. Every 1000 iterations, we print the loss and save it to loss_arr.

In [13]:
loss_arr = []

for i in range(num_epoch):

  # enumerate(): to get the batch index j and the image,label-pair
  for j,[image, label] in enumerate(train_loader):
    x = image.to(device)
    y = label.to(device)

    # Gradient Calculation and Update
    optimizer.zero_grad() # reset gradients to 0

    # pass the input image data (x) through the CNN model (model.forward) to get the model's predictions (output)
    output = model.forward(x)

    loss = loss_func(output, y)

    loss.backward() # calculate the gradients of the loss

    # update the model's parameters based on the calculated gradients to minimize the loss.
    optimizer.step()

    # Logging and Monitoring
    if j % 1000 == 0:
      print(loss)

      # convert the loss tensor to a NumPy array and appends it to the loss_arr.
      # cpu() moves the tensor to the CPU, detach() detaches it from the computation graph, and numpy() converts it to a NumPy array.
      loss_arr.append(loss.cpu().detach().numpy())

tensor(2.3084, grad_fn=<NllLossBackward0>)
tensor(2.2982, grad_fn=<NllLossBackward0>)
tensor(2.2985, grad_fn=<NllLossBackward0>)
tensor(2.3080, grad_fn=<NllLossBackward0>)
tensor(2.3011, grad_fn=<NllLossBackward0>)
tensor(2.2998, grad_fn=<NllLossBackward0>)
tensor(2.3009, grad_fn=<NllLossBackward0>)
tensor(2.3007, grad_fn=<NllLossBackward0>)
tensor(2.2961, grad_fn=<NllLossBackward0>)
tensor(2.2873, grad_fn=<NllLossBackward0>)


## Test Model

In [14]:
correct = 0
total = 0

# Setting the Model to Evaluation Mode
model.eval()

# torch.no_grad(): Disabling Gradient Calculation
# make the code faster and use less memory
with torch.no_grad():
  for image, label in test_loader:
    x = image.to(device)
    y = label.to(device)

    output = model.forward(x)

    # maximum, index
    _, output_index = torch.max(output, 1)

    total += label.size(0)
    correct += (output_index == y).sum().float() #when index = label, add to correct

  print("Accuracy of Test Data: {}".format(100*correct/total))


Accuracy of Test Data: 19.469999313354492
