# MNIST v2

Welcome to the programming exercise portion of Topic 5! We are going to continue using the same dataset as well as basic architecture of the Neural Network in topic 4, however applying the new optimization techniques learned in topic 5. Note that the first portion of the notebook is already written for you, as it is identical to topic 4.

### Importing Modules

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sn
import torch
import tensorboard

## Preprocessing Data

In [None]:
trainset = pd.read_csv("datasets/train.csv")
testset = pd.read_csv("datasets/test.csv")

trainset

### Preparing the Training Examples

In [None]:
# gets all the rows, and all columns AFTER the first one
features = trainset.iloc[:, 1:].to_numpy()

print(f"features shape: {features.shape}")


In [None]:
plt.imshow(features[8].reshape(28, 28))
plt.show()

In [None]:
# creates one-hot encoding out of the label-encoded classes
labels_dummy = pd.get_dummies(trainset['label'])
labels_dummy

labels = labels_dummy.to_numpy()
print(f"labels shape: {labels.shape}")

### Preparing the Test Set

In [None]:
testset

In [None]:
features_test = testset.to_numpy()
plt.imshow(features_test[2].reshape(28, 28))
plt.show()

### Checking for Imbalanced Classes

In [None]:
# count the number of items in each class
print(np.sum(labels, axis=0))

### Train-CV Split

In [None]:
from sklearn.model_selection import train_test_split

# Train/CV split using the training data.
# The testing data contains no labels, thus we cannot create CV set from it
X_train, X_cv, Y_train, Y_cv = train_test_split(features, labels, test_size=0.1)

### Data Generator

This is where we are going to implement a data generator to break up our dataset into mini-batches. While the mnist-digits dataset is technically small enough to fit into memory, we will build a data generator anyways for your practice.

In [None]:
def gen(X, Y, batch_size=32):
    pass
            
        

In [None]:
pass

## Neural Network

This Neural Network will use a **3-layer** design with **900, 900,** and **10** nodes in each layer. Since the input data are images, we need to flatten the **28 x 28** images into a vector of size **784**. This network is identical to the one from topic 4; we will implement the techniques learned in `Topic 5 -- Advanced Optimization` into this model.

### Defining Our Model

In [None]:
import torch 
from torch.nn import Module, Linear, Softmax, BatchNorm1d, ReLU, Dropout
from torch.optim import Adam
from torch.nn.init import xavier_normal, kaiming_normal
from torchmetrics import Accuracy
from torch.utils.tensorboard import SummaryWriter
from datetime import datetime

class NNv2(Module):
    def __init__(self, input_dim, output_dim, drop=0.2):
        super().__init__()
        self.bn_in = BatchNorm1d(input_dim)
        self.z1 = Linear(input_dim, 900)
        kaiming_normal(self.z1.weight)
        self.bn1 = BatchNorm1d(900)
        self.a1 = ReLU()
        self.d1 = Dropout(drop)
        
        self.z2 = Linear(900, 900)
        kaiming_normal(self.z2.weight)
        self.bn2 = BatchNorm1d(900)
        self.a2 = ReLU()     
        self.d2 = Dropout(drop)
        
        self.z3 = Linear(900, output_dim)
        xavier_normal(self.z3.weight)
        self.bn3 = BatchNorm1d(output_dim)
        self.a3 = Softmax(dim=1)  
        
        self.acc = Accuracy()
        
        self.val_loss = None
        self.val_acc = None
        self.loss = None
        self.accuracy = None
        
        
    def forward(self, x):
        x = self.bn_in(x)
        
        x = self.z1(x)
        x = self.bn1(x)
        x = self.a1(x)
        x = self.d1(x)
        
        x = self.z2(x)
        x = self.bn2(x)
        x = self.a2(x)
        x = self.d2(x)
        
        x = self.z3(x)
        x = self.bn3(x)
        x = self.a3(x)
        
        return x
    
    def fit(self, t_gen, loss_fn, opt, cv_gen=None, epochs=1, train_steps=1, val_steps=1):
        
        writer_train = SummaryWriter("runs/" + datetime.now().strftime("%Y%m%d-%H%M%S") + "-train")
        writer_val = SummaryWriter("runs/" + datetime.now().strftime("%Y%m%d-%H%M%S") + "-val")
        
        pass
    
    
            
            # Tensorboard Writer
#             writer_train.add_scalar("loss", self.loss, i)
#             writer_train.add_scalar("accuracy", self.accuracy, i)
#             writer_val.add_scalar("loss", self.val_loss, i)
#             writer_val.add_scalar("accuracy", self.val_acc, i)
                
            
      
    
# Custom CCE Loss function because torch doesn't have one that is suitable
def CCE(Y_pred, Y_true):
    Y_pred = 0.9999999*Y_pred + (1-0.9999999)/2
    ylogy = -Y_true * torch.log(Y_pred)
    sum_across = torch.sum(ylogy, dim=1)
    sum_down = torch.sum(sum_across, dim=0)/Y_true.shape[0]
    return sum_down
                
                

### Test Train

Before the Hyperparameter Search process, let's test out training our model just to make sure that it works.

In [None]:
pass

## Hyperparameter Search

The final task in this notebook is to search for hyperparameters. Here we are going to create a random search algorithm to train multiple models and evaluate each one.

In [None]:
from torch.utils.tensorboard import SummaryWriter
from datetime import datetime


input_dim, output_dim = (X_train.shape[1], 10)

# Feel free to change the EPOCHS and NUM_MODELS
EPOCHS = 100
NUM_MODELS = 1
BATCH_SIZE = 1024

### Hyperparameters ###
LEARNING_RATE = np.random.uniform(2e-3, 5e-3, NUM_MODELS)
L2 = np.random.uniform(5e-5, 9e-5, NUM_MODELS)
DROP = np.random.uniform(0.4, 0.6, NUM_MODELS)#0.4, 0.6
#######################



T_STEPS = int(X_train.shape[0]/BATCH_SIZE)
V_STEPS = int(X_cv.shape[0]/BATCH_SIZE)
t_gen = gen(X_train, Y_train, BATCH_SIZE)
v_gen = gen(X_cv, Y_cv, BATCH_SIZE)

hparams = pd.DataFrame(columns=['val_acc', 'val_loss', 'L2', 'Learning Rate', 'Dropout'])
model=None

for i in range(NUM_MODELS):
    

    
    print(f"##### MODEL {i+1}/{NUM_MODELS} #####")
    model = NNv2(input_dim, output_dim, drop=DROP[i]).cuda()
    optimizer = Adam(model.parameters(), weight_decay=L2[i], lr=LEARNING_RATE[i])
    criterion = CCE

    model.fit(t_gen, criterion, optimizer, epochs=EPOCHS, train_steps=T_STEPS, cv_gen=v_gen, val_steps=V_STEPS)
    
    
    history = pd.DataFrame([[model.val_acc.cpu().item(), model.val_loss.cpu().item(), L2[i], LEARNING_RATE[i], DROP[i]]], 
                          columns=['val_acc', 'val_loss', 'L2', 'Learning Rate', 'Dropout'])
    hparams = hparams.append(history, ignore_index=True)
    
    
    print('\n\n')
    
#     del model
#     del optimizer
#     del criterion

In [None]:
hparams = hparams.sort_values(by=['val_acc'], ascending=False)
hparams

In [None]:
%load_ext tensorboard
%tensorboard --logdir runs

In [None]:
model.cpu().eval()

# make predictions on the first 20 examples of X_test_t
Y_test_t = model(torch.from_numpy(features_test[:20, :]).float())

# Convert the first 20 examples from torch.tensor
# to np.ndarray so we can plot it
X_test = features_test[:20, :]
Y_test = Y_test_t.cpu().detach().numpy()

# Create subplots with 20 subplots
fig, axes = plt.subplots(X_test.shape[0], 1, figsize=(100, 100))

# each element in axes will contain an image plot
for i in range(X_test.shape[0]):
    axes[i].imshow(X_test[i].reshape((28, 28)))
    axes[i].title.set_text(f'My Prediction: {np.argmax(Y_test, axis=1)[i]}')
    
plt.show()

In [None]:
tscript = torch.jit.trace(model.cpu(), torch.rand((1, 784)))
tscript.save('mnist_predictor.pt')