# Implementing a simple Neural Network in Pytorch

The code below seeks to implement a basic neural network, optimized for hyperparamter testing and core performance, using the Pytorch framework. The program uses the F-MNIST dataset, available through the torchvision library.

Cells <a id ='imports'>[1]</a> and <a id ='version'>[2]</a> contain necessary imports, and version information for torch and torchvision respectively.   

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import torchvision
import torchvision.transforms as transforms

import time
import pandas as pd
import json

torch.set_printoptions(linewidth=120)
torch.set_grad_enabled(True)

from torch.utils.tensorboard import SummaryWriter
from collections import OrderedDict
from collections import namedtuple
from itertools import product
from IPython import display


In [2]:
print(torch.__version__)
print(torchvision.__version__)

1.4.0
0.5.0


### Network

The neural network is a simple five-layer neural network with two convolution layers and three linear layers, the last of which is the ouput layer.

The Network class extends the <code>nn.Module</code> class. This class keeps track of all the weights in the layers and allows network to update towards gaining higher accuracy. It is composed on an <code>\__init__</code> constructor and a <code>forward</code> function. The <code>super()</code> function allows us to inherit functionality from the Network class to subclasses.

The two convolutional networks perform a convolution operation. <code>in_channels</code> are data dependent hyperparameters, initially on the color channel information of the image and then on based on the number of <code>out_channels</code> obtained from the previous layer. The <code>kernel_size</code> and <code>out_channels</code> are manually set hyperparameters. 

Note that when passing from convolutional layer to linear layer, the data is flattened (hence the $12*4*4$, where 12 is the number of output channels from the previous layer and $4*4$ are the dimensions of the image tensor. The <code>out_features</code> are hyperparameters, whereas <code>in_features</code> depend on the number of output features of the previous layer. The output layer <code>out_features</code> correspond with the number of label classes in the data. 

The <code>forward</code> method describes the propagation of the data through the network. The convolution layers take a tensor, run it through a <b>convolution</b> operation, followed by a <b>relu activation</b> operation and a <b>max-pooling</b> operation. The linear layers use a <b>relu activation </b>. The first layer reshapes the data passing from a convolutional to a linear layer before applying the activation function. The output layer typically uses <b>softmax</b> function to return probability for a single-category label class. However, softmax isn't used here as the loss function used later in the implementation performs an implicit softmax.       

In [3]:
class Network(nn.Module):
    def __init__(self):
        super(Network, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        
        self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)
    
    def forward(self, t):
        #layer (1)
        t = F.relu(self.conv1(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)
        
        #Layer (2)
        t = F.relu(self.conv2(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)
        
        #Layer (3)
        t = F.relu(self.fc1(t.reshape(-1, 12*4*4)))
        
        #Layer(4)
        t = F.relu(self.fc2(t))
        
        #Layer(5)
        t = self.out(t)
        
        return t 

### Dataset

The training dataset is initialized using torchvision utilities. The F-MNIST dataset is a dataset comprised of German retailer Zalando's article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a $28*28$ grayscale image, associated with a label from 1 of 10 classes. Zalando's intention was to create a direct drop-in replacment of the original MNIST dataset for benchmarking machine learning algorithms.

Link to F-MNIST github: https://github.com/zalandoresearch/fashion-mnist

In [4]:
train_set = torchvision.datasets.FashionMNIST(
    root='./data/FashionMNIST'
    ,train=True
    ,download=True
    ,transform=transforms.Compose([
        transforms.ToTensor()
    ])
)

### RunBuilder
The <b>RunBuilder</b> class builds sets of parameters that define out runs. The class contains a static method <code>get_runs(params)</code>, which retreives the runs that the class builds based on the parameters. 

Some basic termniology: 

An <b>epoch</b> is a hyperparameter, which denotes the period in which an entire dataset is passed both forward and backward through the network.

In [5]:
class RunBuilder():
    @staticmethod
    def get_runs(params):
        
        Run = namedtuple('Run', params.keys()) #create tuple subclass which encapsulates data for each run
        
        runs = [] #list of runs
        for v in product(*params.values()): #creates Cartesian product using parameter values
            runs.append(Run(*v)) #appends obtained set of ordered pairs, which define each run, to runs list 
        
        return runs

### RunManager
The <b>RunManager</b> class builds the training loop and manages each of our runs (as instantiated by the RunBuilder class) inside the loop. The <code>\__init__ </code> class contructor initializes the attributes we will need to keep track of data across epochs, including the number of epochs, running loss, correct predictions and start time.

The <code>begin_epoch</code> and <code>end_epoch</code> methods allow us to manage these values across the epoch lifetime. The same is true for <code>begin_run</code> and <code>end_run</code> across a run.  

In [6]:
class RunManager():
    
    def __init__(self):

        self.epoch_count = 0 
        self.epoch_loss = 0
        self.epoch_num_correct = 0
        self.epoch_start_time = None

        self.run_params = None
        self.run_count = 0
        self.run_data = []
        self.run_start_time = None

        self.network = None #network initialization
        self.loader = None #loader initialization
        self.tb = None #tensorboard initialization
    
    def begin_run(self, run, network, loader): #start a run

        self.run_start_time = time.time() #Capture run start time

        self.run_params = run #pass in run parameters
        self.run_count += 1 #increment run count

        self.network = network #save network to run 
        self.loader = loader #save data loader
        self.tb = SummaryWriter(comment=f'-{run}') #allows to uniquely identify run in Tensorboard (tb)

        images, labels = next(iter(self.loader)) 
        grid = torchvision.utils.make_grid(images)

        self.tb.add_image('images', grid) #add images to tb
        self.tb.add_graph(self.network, images) #add network to tb
    
    def end_run(self): #end a run
        
        self.tb.close() #close tb run
        self.epoch_count = 0 #reset epoch count back to zero
    
    def begin_epoch(self): #start an epoch
        
        self.epoch_start_time = time.time() #instantiate epoch start time

        self.epoch_count += 1 #increment epoch number 
        self.epoch_loss = 0 #set running loss to zero
        self.epoch_num_correct = 0 #set number of correct predictions to zero
    
    def end_epoch(self): #end an epoch

        epoch_duration = time.time() - self.epoch_start_time #estimate epoch duration
        run_duration = time.time() - self.run_start_time #estimate run duration

        loss = self.epoch_loss / len(self.loader.dataset) #estimate loss w.r.t number of items in dataset 
        accuracy = self.epoch_num_correct / len(self.loader.dataset) #estimate correct predictions w.r.t number of items in dataset 

        self.tb.add_scalar('Loss', loss, self.epoch_count)
        self.tb.add_scalar('Accuracy', accuracy, self.epoch_count)

        for name, param in self.network.named_parameters(): #we iterate over and pass network parameters to tensorboard
            self.tb.add_histogram(name, param, self.epoch_count)
            self.tb.add_histogram(f'{name}.grad', param.grad, self.epoch_count)
        
        results = OrderedDict() #an dictionary 'results' is built, which contains all values we want to track and display
        results["run"] = self.run_count
        results["epoch"] = self.epoch_count
        results["loss"] = loss
        results["accuracy"] = accuracy
        results["epoch duration"] = epoch_duration
        results["run duration"] = run_duration
        
        for k,v in self.run_params._asdict().items(): #iterate over key-value pairs in parameters and add values to results dictionary 
            results[k] = v
        self.run_data.append(results) #append results to run_data list 
        df = pd.DataFrame.from_dict(self.run_data, orient='columns') #convert list to pandas dataframe to obtain formatted output
        
        display.clear_output(wait=True) #clear current output
        display.display(df) #display new data frame
        
    def track_loss(self, loss): #function to track loss across an epoch
        self.epoch_loss += loss.item() * self.loader.batch_size #calculate loss for each item in batch

    def track_num_correct(self, preds, labels): #calculate correct productions
        self.epoch_num_correct += self._get_num_correct(preds, labels) #calculate predictions for each item in batch
    
    @torch.no_grad() #remove gradient counting for pytorch
    def _get_num_correct(self, preds, labels): #private function to obtain number of correct predictions
        return preds.argmax(dim=1).eq(labels).sum().item() #returns number of correct predictions
    
    def save(self, fileName): #save results to csv and json files 
        
        pd.DataFrame.from_dict(
            self.run_data
            ,orient='columns'
        ).to_csv(f'{fileName}.csv')
        
        with open(f'{fileName}.json', 'w', encoding = 'utf-8') as f:
            json.dump(self.run_data, f, ensure_ascii=False, indent=4) 

We then set of parameters we would like to vary across runs, including the <b>loss rate</b> <code>lr</code> and <b>batch size</b>. The <code>num_workers</code> parameter is for performance optimization, and comes from the <code>torch.utils.data.DataLoader</code> class, which allows to designate the number of subprocesses being used for each run, allowing to harness multi-core CPUs for parallelized tasking and reducing training time. 

In [7]:
params = OrderedDict(
    lr = [.01]
    ,batch_size = [100, 1000, 10000]
    ,num_workers = [0, 1, 2, 4, 8, 12]
)

m = RunManager() 
for run in RunBuilder.get_runs(params): 
    
    network = Network()
    loader = torch.utils.data.DataLoader(train_set, batch_size=run.batch_size, num_workers=run.num_workers) #data loader
    optimizer = optim.Adam(network.parameters(), lr=run.lr) #using Adam optimizer

    m.begin_run(run, network, loader) #run begins

    for epoch in range(1):
        m.begin_epoch()
        for batch in loader:
            images, labels = batch 
            preds = network(images) 
            loss = F.cross_entropy(preds, labels) #cross_entropy loss function performs softmax intuitively  
            optimizer.zero_grad() #prevents gradient accumulation for every occurance of backpropagation 
            loss.backward() #calculates derivative loss w.r.t x for every parameter x 
            optimizer.step() #updates weight tensors of network

            m.track_loss(loss) 
            m.track_num_correct(preds, labels)
        
        m.end_epoch()
    m.end_run()
m.save('results')

Unnamed: 0,run,epoch,loss,accuracy,epoch duration,run duration,lr,batch_size,num_workers
0,1,1,0.584497,0.779967,24.063153,24.263617,0.01,100,0
1,2,1,0.565245,0.789283,17.056375,17.920956,0.01,100,1
2,3,1,0.587211,0.77805,17.49919,18.358734,0.01,100,2
3,4,1,0.566234,0.789517,17.562022,18.588276,0.01,100,4
4,5,1,0.588382,0.77335,17.729574,19.31633,0.01,100,8
5,6,1,0.603205,0.772783,19.03508,21.723888,0.01,100,12
6,7,1,1.022862,0.608183,17.955968,19.001172,0.01,1000,0
7,8,1,1.030616,0.601267,12.346971,13.905802,0.01,1000,1
8,9,1,0.976413,0.629183,12.92742,14.594959,0.01,1000,2
9,10,1,1.020133,0.616567,12.92343,14.8413,0.01,1000,4
