<a href="https://colab.research.google.com/github/venomouscyanide/dl_sain/blob/master/week2/week2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Rewrite MLP from last week using PyTorch

In [2]:
## All Imports
import torch

from torch import nn
from torch.optim.sgd import SGD
from torch.utils.data import DataLoader

from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor, Lambda

In [3]:
# Use Nvidia CUDA if available
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Using {device} device')

Using cuda device


## Class for the Neural Network with 3 layers and size [784, 30, 10]

In [4]:
class TorchMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.mlp = nn.Sequential(
            nn.Linear(784, 30),
            nn.Sigmoid(),
            nn.Linear(30, 10),
            nn.Sigmoid()
        )

    def forward(self, data: torch.Tensor) -> torch.Tensor:
        data = self.flatten(data)
        logits = self.mlp(data)
        return logits

## Initialize the same hyperparameters as week1 

In [5]:
class Hyperparameters:
    LEARNING_RATE: float = 3
    EPOCHS: int = 10
    MINI_BATCH_SIZE: int = 10

## Write out the training and testing methods

In [6]:
def _train(model: TorchMLP, training_loader: DataLoader, learning_rate: float):
    optimizer = SGD(model.parameters(), learning_rate)
    loss_function = nn.MSELoss()

    for input, expected_output in training_loader.dataset:
        prediction = model(input.to(device))
        loss = loss_function(prediction, expected_output.to(device))

        # Backpropagation steps
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()


def _test_accuracy(model: TorchMLP, testing_loader: DataLoader, epoch: int):
    total_size = len(testing_loader.dataset)
    correct_classifications = 0
    for input, expected_output in testing_loader.dataset:
        prediction = model(input.to(device))
        predicted_digit = prediction.to(device).argmax().__index__()
        expected_digit = expected_output.argmax().__index__()
        if expected_digit == predicted_digit:
            correct_classifications += 1
    print(f'Accuracy on testing data for epoch {epoch} is: {round((correct_classifications / total_size * 100), 2)}%')


## Finally the write out the driver for running training and evaluating the network

In [7]:
def train_and_eval_torch_mlp():
    train_data = MNIST(root='mnist_torch_data', train=True, download=True, transform=ToTensor(),
                             target_transform=Lambda(
                                 lambda y: torch.zeros(10, dtype=torch.float).scatter_(0, torch.tensor(y),
                                                                                       value=1).reshape(1, 10)
                             ))
    test_data = MNIST(root='mnist_torch_data', train=False, download=True, transform=ToTensor(),
                            target_transform=Lambda(
                                lambda y: torch.zeros(10, dtype=torch.float).scatter_(0, torch.tensor(y),
                                                                                      value=1).reshape(1, 10)
                            ))

    params = Hyperparameters
    training_loader = DataLoader(train_data, batch_size=params.MINI_BATCH_SIZE, shuffle=True)
    testing_loader = DataLoader(test_data, batch_size=params.MINI_BATCH_SIZE, shuffle=True)

    model = TorchMLP().to(device)
    for epoch in range(params.EPOCHS):
        print(f"Training for epoch: {epoch}")
        _train(model, training_loader, params.LEARNING_RATE)
        _test_accuracy(model, testing_loader, epoch)

In [8]:
train_and_eval_torch_mlp()

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 503: Service Unavailable

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to mnist_torch_data/MNIST/raw/train-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=9912422.0), HTML(value='')))


Extracting mnist_torch_data/MNIST/raw/train-images-idx3-ubyte.gz to mnist_torch_data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 503: Service Unavailable

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to mnist_torch_data/MNIST/raw/train-labels-idx1-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=28881.0), HTML(value='')))


Extracting mnist_torch_data/MNIST/raw/train-labels-idx1-ubyte.gz to mnist_torch_data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 503: Service Unavailable

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to mnist_torch_data/MNIST/raw/t10k-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=1648877.0), HTML(value='')))


Extracting mnist_torch_data/MNIST/raw/t10k-images-idx3-ubyte.gz to mnist_torch_data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 503: Service Unavailable

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to mnist_torch_data/MNIST/raw/t10k-labels-idx1-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=4542.0), HTML(value='')))


Extracting mnist_torch_data/MNIST/raw/t10k-labels-idx1-ubyte.gz to mnist_torch_data/MNIST/raw

Processing...
Done!


  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


Training for epoch: 0
Accuracy on testing data for epoch 0 is: 92.51%
Training for epoch: 1
Accuracy on testing data for epoch 1 is: 94.06%
Training for epoch: 2
Accuracy on testing data for epoch 2 is: 94.36%
Training for epoch: 3
Accuracy on testing data for epoch 3 is: 94.59%
Training for epoch: 4
Accuracy on testing data for epoch 4 is: 94.92%
Training for epoch: 5
Accuracy on testing data for epoch 5 is: 95.11%
Training for epoch: 6
Accuracy on testing data for epoch 6 is: 95.01%
Training for epoch: 7
Accuracy on testing data for epoch 7 is: 94.89%
Training for epoch: 8
Accuracy on testing data for epoch 8 is: 95.06%
Training for epoch: 9
Accuracy on testing data for epoch 9 is: 94.85%


# 2. Improve week1 MLP
## Improvements made
- Add better weight initialization for weight matrixes
- Add L2 normalization(lmda hyper parameter added)
- Use relu activation for hidden layers
- Use softmax for the output layer
- Use Cross Entropy cost function

Changes made to the previous week's methods are highlighted using comments

The hyperparameters are also updated to produce a deeper network

In [1]:
!pip install gdown==3.13.0



## Init all imports

In [2]:
import struct
import gzip
import shutil
from typing import Tuple, List

# Third-party imports
import numpy as np
import gdown



## Use same week1 MNIST dataloader

In [3]:
class MNISTDataLoader:
    # explanation of idx file formats: http://yann.lecun.com/exdb/mnist/
    # help wrt parsing data: https://stackoverflow.com/a/53181925

    TRAINING_DATA_URL: str = 'https://drive.google.com/uc?id=1pmI9wAdNtJkOvkJpdTqM9bmIAwPkGyMU'
    TRAINING_DATA_LABELS_URL: str = 'https://drive.google.com/uc?id=1R8BZL67U1N0GUGnf6AQIBZNVDCWO9QLS'
    TESTING_DATA_URL: str = 'https://drive.google.com/uc?id=10FdcUHw3BcQAU6keKaUwtDwJm4sC00Hu'
    TESTING_DATA_LABELS_URL: str = 'https://drive.google.com/uc?id=1GvsacEnI1eQ1vYZM-oYdERvaE2SPh0Lj'

    def load_data_wrapper(self):
        testing_data_tuple = self.load_data_as_ndarray(self.TESTING_DATA_URL, self.TESTING_DATA_LABELS_URL, False)
        training_data_tuple = self.load_data_as_ndarray(self.TRAINING_DATA_URL, self.TRAINING_DATA_LABELS_URL, True)
        return training_data_tuple, testing_data_tuple

    def load_data_as_ndarray(self, data_file_url: str, data_labels_file_url: str, train: bool) -> List[
        Tuple[np.ndarray, int]]:
        uncompressed_dataset = self._download_and_uncompressed_file(data_file_url)
        uncompressed_labels = self._download_and_uncompressed_file(data_labels_file_url)
        pixel_data = self._get_pixel_data(uncompressed_dataset)
        label_data = self._get_labels(uncompressed_labels)
        zipped_data = [
            (x.reshape(784, 1), self._one_hot_enc(y) if train else y[0]) for x, y in zip(pixel_data, label_data)
        ]
        return zipped_data

    def _one_hot_enc(self, y: np.ndarray):
        one_hot_vector = np.zeros((10, 1))
        one_hot_vector[y[0]][0] = 1
        return one_hot_vector

    def _download_and_uncompressed_file(self, url: str) -> str:
        downloaded_gzip = gdown.download(url, quiet=True)
        decompressed_data_file = self._write_decompressed_data(downloaded_gzip)
        return decompressed_data_file

    def _write_decompressed_data(self, downloaded_gzip: str) -> str:
        with gzip.open(downloaded_gzip, 'rb') as compressed:
            uncompressed_dataset = downloaded_gzip.replace('.gz', '')
            with open(uncompressed_dataset, 'wb') as decompressed:
                shutil.copyfileobj(compressed, decompressed)
        return uncompressed_dataset

    def _get_pixel_data(self, data_file: str) -> np.ndarray:
        with open(data_file, "rb") as dataset:
            _, num_data = struct.unpack(">II", dataset.read(8))
            num_rows, num_colums = struct.unpack(">II", dataset.read(8))
            pixel_data = np.fromfile(dataset, dtype=np.uint8) / 255
            pixel_data = pixel_data.reshape((num_data, num_rows * num_colums))
        return pixel_data

    def _get_labels(self, data_labels_file: str) -> np.ndarray:
        with open(data_labels_file, "rb") as labels:
            _, num_data = struct.unpack(">II", labels.read(8))
            label_data = np.fromfile(labels, dtype=np.uint8)
            label_data = label_data.reshape((num_data, -1))
        return label_data

## Use relu and softmax instead of sigmoid as activations functions 

In [4]:
class NetworkUtils:
    # Replace sigmoid with relu for hidden and softmax for output layer
    @staticmethod
    def relu(z: np.ndarray) -> np.ndarray:
        return np.maximum(z, 0.0)

    @staticmethod
    def relu_prime(z: np.ndarray) -> np.ndarray:
        return (z > 0.0) * 1

    @staticmethod
    def softmax(z):
        exp_z = np.exp(z)
        return exp_z / sum(exp_z)


## The improved neural net
All changes made to last week's net is highlighted through Python comments

In [5]:
class Network:
    def __init__(self, training_data: List[Tuple[np.ndarray, np.ndarray]],
                 testing_data: List[Tuple[np.ndarray, int]],
                 size: List[int], learning_rate: float, epochs: int,
                 mini_batch_size: int, lmda: int):
        self.training_data = training_data
        self.testing_data = testing_data
        self.size = size
        self.num_layers = len(size)
        self.learning_rate = learning_rate
        self.biases = []
        self.weights = []
        self._init_biases()
        self._init_weights()
        self.epochs = epochs
        self.mini_batch_size = mini_batch_size
        self.lmda = lmda

    def _init_biases(self):
        for i in range(1, self.num_layers):
            self.biases.append(np.random.randn(self.size[i], 1))

    def _init_weights(self):
        bias_matrix_sizes = [(self.size[x + 1], self.size[x]) for x in range(self.num_layers - 1)]
        # Init weights by dividing by sqrt of each neuron's input size
        for x, y in bias_matrix_sizes:
            std_dev = 1 / np.sqrt(y)
            self.weights.append(np.random.randn(x, y) * std_dev)

    def train(self):
        for epoch in range(self.epochs):
            np.random.shuffle(self.training_data)
            print(f"Start training for epoch: {epoch + 1} of {self.epochs}")

            num_mini_batches = len(self.training_data) // self.mini_batch_size
            mini_batches = self._create_mini_batches()

            for batch, mini_batch in enumerate(mini_batches, start=1):
                self._update_b_w(mini_batch)

            self._calc_accuracy(epoch + 1)

    def _create_mini_batches(self) -> List[List[Tuple[np.ndarray, np.ndarray]]]:
        mini_batches = [
            self.training_data[multiple:multiple + self.mini_batch_size] for multiple in
            range(0, len(self.training_data), self.mini_batch_size)
        ]
        return mini_batches

    def _update_b_w(self, mini_batch: List[Tuple[np.ndarray, np.ndarray]]):
        nabla_bias = self._get_nabla_bias_zeroes()
        nabla_wt = self._get_nabla_wt_zeroes()

        for x, y in mini_batch:
            del_bias, del_wt = self._run_back_propagation(x, y)

            nabla_bias = [curr_b + del_b for curr_b, del_b in zip(nabla_bias, del_bias)]
            nabla_wt = [curr_wt + del_w for curr_wt, del_w in zip(nabla_wt, del_wt)]

        self.biases = [
            b - ((self.learning_rate / self.mini_batch_size) * nb) for b, nb in zip(self.biases, nabla_bias)
        ]
        # Add L2 normalization
        self.weights = [
            np.dot(w, 1 - (self.learning_rate * self.lmda) / len(self.training_data)) -
            ((self.learning_rate / self.mini_batch_size) * nw) for w, nw in zip(self.weights, nabla_wt)
        ]

    def _get_nabla_bias_zeroes(self) -> List[np.ndarray]:
        return [np.zeros(np.shape(bias)) for bias in self.biases]

    def _get_nabla_wt_zeroes(self) -> List[np.ndarray]:
        return [np.zeros(np.shape(wt)) for wt in self.weights]

    def _run_back_propagation(self, x: np.ndarray, y: np.ndarray) -> Tuple[List[np.ndarray], List[np.ndarray]]:
        nabla_bias = self._get_nabla_bias_zeroes()
        nabla_wt = self._get_nabla_wt_zeroes()

        activations, z_list = self.feedforward(x)
        # Delta for cross entropy
        error_l = self._delta_cross_entropy(activations[-1], y)

        nabla_bias[-1] = error_l
        nabla_wt[-1] = np.dot(error_l, np.transpose(activations[-2]))

        for layer in range(self.num_layers - 2, 0, -1):
            error_l = np.multiply(
                np.dot(np.transpose(self.weights[layer]), error_l), NetworkUtils.relu_prime(z_list[layer - 1])
            )

            nabla_bias[layer - 1] = error_l
            nabla_wt[layer - 1] = np.dot(error_l, activations[layer - 1].transpose())

        return nabla_bias, nabla_wt

    def _delta_cross_entropy(self, a_l: np.ndarray, y: np.ndarray) -> np.ndarray:
        return a_l - y

    def _calc_accuracy(self, epoch: int):
        correct_results = 0
        total_results = len(self.testing_data)
        for x, y in self.testing_data:
            activations, _ = self.feedforward(x)
            logit = activations[-1]
            if np.argmax(logit) == y:
                correct_results += 1
        print(
            f"Accuracy on testing data for epoch {epoch}: {round((correct_results / total_results) * 100, 2)}"
        )

    def feedforward(self, x: np.ndarray) -> Tuple[List[np.ndarray], List[np.ndarray]]:
        a = x
        activations, z_list = list(), list()
        activations.append(x)
        self._set_relu_activations(a, z_list, activations)
        self._set_softmax_activation(activations[-1], z_list, activations)
        return activations, z_list

    def _set_relu_activations(self, a: np.ndarray, z_list: List[np.ndarray], activations: List[np.ndarray]):
        for layer in range(self.num_layers - 2):
            # hidden layers(relu activation)
            z = np.dot(self.weights[layer], a) + self.biases[layer]
            z_list.append(z)
            a = NetworkUtils.relu(z)
            activations.append(a)

    def _set_softmax_activation(self, a: np.ndarray, z_list: List[np.ndarray], activations: List[np.ndarray]):
        # output layer(softmax activation)
        z = np.dot(self.weights[-1], a) + self.biases[-1]
        z_list.append(z)
        a = NetworkUtils.softmax(z)
        activations.append(a)

## Driver method for training and evaluation per 15 epochs

In [6]:
def train_and_eval():
    training, testing = MNISTDataLoader().load_data_wrapper()
    params = Hyperparameters()
    print(params)
    mlp = Network(training, testing, params.SIZE, params.LEARNING_RATE, params.EPOCHS, params.MINI_BATCH_SIZE,
                  params.LMDA)
    mlp.train()

## Init the hyperparameters and call the driver method
For each experiment I print the hyper parameter class to show the parameters set
<br> First try with a 3 level network(setting 1) 

In [7]:
class Hyperparameters:
    SIZE: List[int] = [28 * 28, 100, 10]
    LEARNING_RATE: float = 0.1
    EPOCHS: int = 30
    MINI_BATCH_SIZE: int = 100
    # Add lambda hyperparameter
    LMDA: int = 5

    def __str__(self) -> str:
        str_rep = ""
        str_rep += "Hyperparameters set are as follows"
        for hyper_param in self.__annotations__:
            str_rep += f' \n {hyper_param}: {getattr(self, hyper_param)}'
        return str_rep

train_and_eval()

Hyperparameters set are as follows 
 SIZE: [784, 100, 10] 
 LEARNING_RATE: 0.1 
 EPOCHS: 30 
 MINI_BATCH_SIZE: 100 
 LMDA: 5
Start training for epoch: 1 of 30
Accuracy on testing data for epoch 1: 91.75
Start training for epoch: 2 of 30
Accuracy on testing data for epoch 2: 93.33
Start training for epoch: 3 of 30
Accuracy on testing data for epoch 3: 94.35
Start training for epoch: 4 of 30
Accuracy on testing data for epoch 4: 95.15
Start training for epoch: 5 of 30
Accuracy on testing data for epoch 5: 95.57
Start training for epoch: 6 of 30
Accuracy on testing data for epoch 6: 96.02
Start training for epoch: 7 of 30
Accuracy on testing data for epoch 7: 96.18
Start training for epoch: 8 of 30
Accuracy on testing data for epoch 8: 96.43
Start training for epoch: 9 of 30
Accuracy on testing data for epoch 9: 96.64
Start training for epoch: 10 of 30
Accuracy on testing data for epoch 10: 96.71
Start training for epoch: 11 of 30
Accuracy on testing data for epoch 11: 96.89
Start trainin

Try with a 4 level network(setting 2) 

In [8]:
class Hyperparameters:
    SIZE: List[int] = [28 * 28, 100, 100, 10]
    LEARNING_RATE: float = 0.01
    EPOCHS: int = 45
    MINI_BATCH_SIZE: int = 10
    # Add lambda hyperparameter
    LMDA: int = 5

    def __str__(self) -> str:
        str_rep = ""
        str_rep += "Hyperparameters set are as follows"
        for hyper_param in self.__annotations__:
            str_rep += f' \n {hyper_param}: {getattr(self, hyper_param)}'
        return str_rep

train_and_eval()

Hyperparameters set are as follows 
 SIZE: [784, 100, 100, 10] 
 LEARNING_RATE: 0.01 
 EPOCHS: 45 
 MINI_BATCH_SIZE: 10 
 LMDA: 5
Start training for epoch: 1 of 45
Accuracy on testing data for epoch 1: 92.28
Start training for epoch: 2 of 45
Accuracy on testing data for epoch 2: 94.24
Start training for epoch: 3 of 45
Accuracy on testing data for epoch 3: 95.38
Start training for epoch: 4 of 45
Accuracy on testing data for epoch 4: 96.0
Start training for epoch: 5 of 45
Accuracy on testing data for epoch 5: 96.59
Start training for epoch: 6 of 45
Accuracy on testing data for epoch 6: 96.8
Start training for epoch: 7 of 45
Accuracy on testing data for epoch 7: 96.96
Start training for epoch: 8 of 45
Accuracy on testing data for epoch 8: 97.07
Start training for epoch: 9 of 45
Accuracy on testing data for epoch 9: 97.27
Start training for epoch: 10 of 45
Accuracy on testing data for epoch 10: 97.34
Start training for epoch: 11 of 45
Accuracy on testing data for epoch 11: 97.14
Start trai

Try an even deeper network of 5 levels(setting 3)

Was able to get peak accuracy of **97.93%** on epoch 66 using this setting.

In [10]:
class Hyperparameters:
    SIZE: List[int] = [28 * 28, 100, 100, 100, 10]
    LEARNING_RATE: float = 0.01
    EPOCHS: int = 70
    MINI_BATCH_SIZE: int = 10
    # Add lambda hyperparameter
    LMDA: int = 5

    def __str__(self) -> str:
        str_rep = ""
        str_rep += "Hyperparameters set are as follows"
        for hyper_param in self.__annotations__:
            str_rep += f' \n {hyper_param}: {getattr(self, hyper_param)}'
        return str_rep

train_and_eval()

Hyperparameters set are as follows 
 SIZE: [784, 100, 100, 100, 10] 
 LEARNING_RATE: 0.01 
 EPOCHS: 70 
 MINI_BATCH_SIZE: 10 
 LMDA: 5
Start training for epoch: 1 of 70
Accuracy on testing data for epoch 1: 92.88
Start training for epoch: 2 of 70
Accuracy on testing data for epoch 2: 94.63
Start training for epoch: 3 of 70
Accuracy on testing data for epoch 3: 95.46
Start training for epoch: 4 of 70
Accuracy on testing data for epoch 4: 96.31
Start training for epoch: 5 of 70
Accuracy on testing data for epoch 5: 96.59
Start training for epoch: 6 of 70
Accuracy on testing data for epoch 6: 96.76
Start training for epoch: 7 of 70
Accuracy on testing data for epoch 7: 96.59
Start training for epoch: 8 of 70
Accuracy on testing data for epoch 8: 97.22
Start training for epoch: 9 of 70
Accuracy on testing data for epoch 9: 97.03
Start training for epoch: 10 of 70
Accuracy on testing data for epoch 10: 97.04
Start training for epoch: 11 of 70
Accuracy on testing data for epoch 11: 97.2
Star