<a href="https://colab.research.google.com/github/woncoh1/era1/blob/main/S6/P2/A6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Summary
Summary of problem, goals, methods, and results

## Problem
- Type: classification
- Task: handwritten digit recognition
- Data: MNIST dataset

## Goals
- Test accuracy > 99.4 %
- Number of parameters < 20,000
- Number of epochs < 20
- Use batch normalization and dropout
- Optionally use a fully connected layer or GAP (global average pooling)

## Methods
- Data augmentation (image transformation)
- Model architecture: fully convolutional neural network
- Loss function: cross entropy (= `nn.functional.log_softmax` + `nn.NLLLoss`)
- Weight optimizer: stochastic gradient descent with momentum (`optim.SGD`)
- Learning rate scheduler: `optim.lr_scheduler.StepLR`

## Results
- Test accuracy
    - Final: 99.57 % 
    - Maximum: 99.60 %
- Number of Parameters: 9,768
- Number of Epochs: 19

In [None]:
from __future__ import print_function
import toolz
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torchsummary import summary
from tqdm.auto import tqdm

In [None]:
torch.manual_seed(1)

<torch._C.Generator at 0x7f97b80f7bf0>

# Device

## Colab
Runtime
- Hardware accelerator: GPU
- GPU type: T4
- Runtime shape: Standard

## Pytorch

In [None]:
use_cuda = torch.cuda.is_available()
device = torch.device('cuda' if use_cuda else 'cpu')

# Data
- Transforms
- Batch size

## Transform
https://pytorch.org/vision/stable/auto_examples/plot_transforms.html

In [None]:
transform = {
    'train': transforms.Compose([
        transforms.RandomApply([transforms.CenterCrop(22)], p=0.1),
        transforms.Resize((28, 28)),
        transforms.RandomRotation((-15., 15.)),
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,)),
    ]),
    'test': transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,)),
    ]),
}

## Dataset

In [None]:
dataset = {
    'train': datasets.MNIST(
        '../data',
        train=True,
        download=True,
        transform=transform['train'],
    ),
    'test': datasets.MNIST(
        '../data',
        train=False,
        download=False,
        transform=transform['test'],
    ),
}

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 100973190.64it/s]


Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 22092958.93it/s]


Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 26539358.37it/s]


Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 23403598.00it/s]


Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw



## DataLoader

In [None]:
params_dataloader = {
    'batch_size': 128,
    'shuffle': True,
}
params_dataloader |= {
    'num_workers': 0,
    'pin_memory': True,
} if use_cuda else {}

In [None]:
dataloader = {
    'train': torch.utils.data.DataLoader(
        dataset['train'],
        **params_dataloader
    ),
    'test': torch.utils.data.DataLoader(
        dataset['test'],
        **params_dataloader
    ),
}

# Model
- MaxPooling: once or twice?
- Padding: 0 or 1 (replicate)?
- How many conv layers?
- GAP: 1x1 conv-GAP or GAP-FC?
- Number of channels for each layer
- Dropout rate

## Define

In [None]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        
        def conv( # Convolution layer
            i: int, # in_channels
            o: int, # out_channels
            p: int = 0, # padding
        ) -> nn.Sequential:
            return nn.Sequential(
                # 3x3 convolution to extract features
                nn.Conv2d(
                    i, o, 3,
                    stride=1,
                    padding=p, padding_mode='replicate',
                    bias=False,
                ),
                nn.ReLU(),
                nn.BatchNorm2d(o),
                nn.Dropout2d(0),
            )
        
        def tran( # Transition layer
            i: int, # in_channels
            o: int, # out_channels
        ) -> nn.Sequential:
            return nn.Sequential(
                # MaxPooling to reduce the channel size
                nn.MaxPool2d(2, stride=2),
                # 1x1 convolution reduce the number of channels
                nn.Conv2d(i, o, 1, stride=1, padding=0, bias=False),
                nn.ReLU(),
                nn.BatchNorm2d(o),
                nn.Dropout2d(0),
            )
        
        def last( # GAP + softmax layer
            i: int, # in_channels
            o: int, # out_channels
            s: int, # kernel_size
        ) -> nn.Sequential:
            return nn.Sequential(
                # [-1, i, s, s]
                nn.Conv2d(i, o, 1, stride=1, padding=0, bias=False),
                # [-1, o, s, s]
                nn.AvgPool2d(s),
                # [-1, o, 1, 1]
                nn.Flatten(),
                # [-1, o]
                nn.LogSoftmax(dim=1),
            )

        self.conv1 = conv(1, 8) # n=26, r=3, j=1
        self.conv2 = conv(8, 16) # n=24, r=5, j=1
        self.conv3 = conv(16, 16) # n=22, r=7, j=1
        self.tran1 = tran(16, 8) # n=11, r=8, j=2
        self.conv4 = conv(8, 16) # n=9, r=12, j=2
        self.conv5 = conv(16, 16) # n=7, r=16, j=2
        self.conv6 = conv(16, 16) # n=5, r=20, j=2
        self.tran2 = last(16, 10, 5) # n=1, r=28, j=2

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return toolz.pipe(x,
            # Block 1: edges and gradients
            self.conv1,
            self.conv2,
            self.conv3,
            self.tran1,
            # Block 2: textures and patterns
            self.conv4,
            self.conv5,
            self.conv6,
            self.tran2,
        )

## Build

In [None]:
model = Net().to(device)
summary(model, input_size=(1, 28, 28))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1            [-1, 8, 26, 26]              72
              ReLU-2            [-1, 8, 26, 26]               0
       BatchNorm2d-3            [-1, 8, 26, 26]              16
         Dropout2d-4            [-1, 8, 26, 26]               0
            Conv2d-5           [-1, 16, 24, 24]           1,152
              ReLU-6           [-1, 16, 24, 24]               0
       BatchNorm2d-7           [-1, 16, 24, 24]              32
         Dropout2d-8           [-1, 16, 24, 24]               0
            Conv2d-9           [-1, 16, 22, 22]           2,304
             ReLU-10           [-1, 16, 22, 22]               0
      BatchNorm2d-11           [-1, 16, 22, 22]              32
        Dropout2d-12           [-1, 16, 22, 22]               0
        MaxPool2d-13           [-1, 16, 11, 11]               0
           Conv2d-14            [-1, 8,

# Train

## Define

In [None]:
def count_correct_predictions(pPrediction, pLabels):
    return pPrediction.argmax(dim=1).eq(pLabels).sum().item()

def train(device, train_loader, model, criterion, optimizer, scheduler, epoch):
    model.train()
    pbar = tqdm(train_loader)
    correct = 0
    processed = 0
    for batch_idx, (data, target) in enumerate(pbar):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        pred = model(data)
        loss = criterion(pred, target)
        loss.backward()
        optimizer.step()
        correct += count_correct_predictions(pred, target)
        processed += len(data)
        pbar.set_description(desc=(
            f"Epoch = {epoch}, "
            f"Batch = {batch_idx}, "
            f"Loss = {loss.item():0.4f}, "
            f"Accuracy = {correct/processed:0.2%}"
        ))
    print(
        f"Train: "
        f"Loss = {loss.item():0.4f}, "
        f"Accuracy = {correct/processed:0.2%}, "
        f"Epoch = {epoch}"
    )

def test(device, test_loader, model, criterion):
    model.eval()
    loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            pred = model(data)
            loss += criterion(pred, target, reduction='sum').item()
            correct += count_correct_predictions(pred, target)
    n = len(test_loader.dataset)
    loss /= n
    accuracy = correct / n
    print(
        f"Test : "
        f"Loss = {loss:.4f}, "
        f"Accuracy = {accuracy:.2%}\n"
    )

## Train

In [None]:
params_trainer = {
    'num_epochs': 19,
}
params_optimizer = {
    'lr': 0.25,
    'momentum': 0.9
}
params_scheduler = {
    'step_size': 10,
    'gamma': 0.1,
    'verbose': True,
}

criterion = F.nll_loss
optimizer = optim.SGD(model.parameters(), **params_optimizer)
scheduler = optim.lr_scheduler.StepLR(optimizer, **params_scheduler)

for epoch in range(1, params_trainer['num_epochs']+1):
    train(
        device,
        dataloader['train'],
        model, criterion, optimizer, scheduler, epoch,
    )
    test(device, dataloader['test'], model, criterion)
    scheduler.step()

Adjusting learning rate of group 0 to 2.5000e-01.


  0%|          | 0/469 [00:00<?, ?it/s]

Train: Loss = 0.0358, Accuracy = 94.01%, Epoch = 1
Test : Loss = 0.0471, Accuracy = 98.46%

Adjusting learning rate of group 0 to 2.5000e-01.


  0%|          | 0/469 [00:00<?, ?it/s]

Train: Loss = 0.0679, Accuracy = 97.78%, Epoch = 2
Test : Loss = 0.0330, Accuracy = 98.95%

Adjusting learning rate of group 0 to 2.5000e-01.


  0%|          | 0/469 [00:00<?, ?it/s]

Train: Loss = 0.0104, Accuracy = 98.25%, Epoch = 3
Test : Loss = 0.0313, Accuracy = 99.02%

Adjusting learning rate of group 0 to 2.5000e-01.


  0%|          | 0/469 [00:00<?, ?it/s]

Train: Loss = 0.0577, Accuracy = 98.38%, Epoch = 4
Test : Loss = 0.0232, Accuracy = 99.20%

Adjusting learning rate of group 0 to 2.5000e-01.


  0%|          | 0/469 [00:00<?, ?it/s]

Train: Loss = 0.0312, Accuracy = 98.58%, Epoch = 5
Test : Loss = 0.0243, Accuracy = 99.25%

Adjusting learning rate of group 0 to 2.5000e-01.


  0%|          | 0/469 [00:00<?, ?it/s]

Train: Loss = 0.0097, Accuracy = 98.55%, Epoch = 6
Test : Loss = 0.0309, Accuracy = 98.94%

Adjusting learning rate of group 0 to 2.5000e-01.


  0%|          | 0/469 [00:00<?, ?it/s]

Train: Loss = 0.0276, Accuracy = 98.69%, Epoch = 7
Test : Loss = 0.0239, Accuracy = 99.28%

Adjusting learning rate of group 0 to 2.5000e-01.


  0%|          | 0/469 [00:00<?, ?it/s]

Train: Loss = 0.0669, Accuracy = 98.70%, Epoch = 8
Test : Loss = 0.0226, Accuracy = 99.26%

Adjusting learning rate of group 0 to 2.5000e-01.


  0%|          | 0/469 [00:00<?, ?it/s]

Train: Loss = 0.1256, Accuracy = 98.81%, Epoch = 9
Test : Loss = 0.0263, Accuracy = 99.16%

Adjusting learning rate of group 0 to 2.5000e-01.


  0%|          | 0/469 [00:00<?, ?it/s]

Train: Loss = 0.0145, Accuracy = 98.84%, Epoch = 10
Test : Loss = 0.0213, Accuracy = 99.31%

Adjusting learning rate of group 0 to 2.5000e-02.


  0%|          | 0/469 [00:00<?, ?it/s]

Train: Loss = 0.0290, Accuracy = 99.17%, Epoch = 11
Test : Loss = 0.0160, Accuracy = 99.46%

Adjusting learning rate of group 0 to 2.5000e-02.


  0%|          | 0/469 [00:00<?, ?it/s]

Train: Loss = 0.0135, Accuracy = 99.22%, Epoch = 12
Test : Loss = 0.0155, Accuracy = 99.49%

Adjusting learning rate of group 0 to 2.5000e-02.


  0%|          | 0/469 [00:00<?, ?it/s]

Train: Loss = 0.0843, Accuracy = 99.25%, Epoch = 13
Test : Loss = 0.0151, Accuracy = 99.57%

Adjusting learning rate of group 0 to 2.5000e-02.


  0%|          | 0/469 [00:00<?, ?it/s]

Train: Loss = 0.0030, Accuracy = 99.24%, Epoch = 14
Test : Loss = 0.0150, Accuracy = 99.57%

Adjusting learning rate of group 0 to 2.5000e-02.


  0%|          | 0/469 [00:00<?, ?it/s]

Train: Loss = 0.0585, Accuracy = 99.25%, Epoch = 15
Test : Loss = 0.0154, Accuracy = 99.57%

Adjusting learning rate of group 0 to 2.5000e-02.


  0%|          | 0/469 [00:00<?, ?it/s]

Train: Loss = 0.0255, Accuracy = 99.30%, Epoch = 16
Test : Loss = 0.0151, Accuracy = 99.57%

Adjusting learning rate of group 0 to 2.5000e-02.


  0%|          | 0/469 [00:00<?, ?it/s]

Train: Loss = 0.0456, Accuracy = 99.26%, Epoch = 17
Test : Loss = 0.0148, Accuracy = 99.57%

Adjusting learning rate of group 0 to 2.5000e-02.


  0%|          | 0/469 [00:00<?, ?it/s]

Train: Loss = 0.0024, Accuracy = 99.30%, Epoch = 18
Test : Loss = 0.0142, Accuracy = 99.60%

Adjusting learning rate of group 0 to 2.5000e-02.


  0%|          | 0/469 [00:00<?, ?it/s]

Train: Loss = 0.0079, Accuracy = 99.31%, Epoch = 19
Test : Loss = 0.0146, Accuracy = 99.57%

Adjusting learning rate of group 0 to 2.5000e-02.


# Evaluate

In [None]:
# Training loss and accuracy curves

In [None]:
# Example predictions