# CS331 - Spring 2022 - Phase 1 [10%]

*__Submission Guidelines:__* 
- Naming convention for submission of this notebook is `groupXX_phase1.ipynb` where XX needs to be replaced by your group number. For example: group 1 would rename their notebook to `group01_phase1.ipynb`
- Only the group lead is supposed to make the submission
- All the cells <b>must</b> be run once before submission. If your submission's cells are not showing the results (plots etc.), marks wil be deducted
- Only the code written within this notebook will be considered while grading. No other files will be entertained
- You are advised to follow good programming practies including approriate variable naming and making use of logical comments 

Please note that your notebooks will be checked against submissions from last year's course offering for plagiarism. The university honor code should be maintained. Any violation, if found, will result in disciplinary action.


#### <b>Introduction</b> 
This is the first of the three phases of this offering's project. To give an overview of this phase, we will essentially be building everything from scratch. The datasets that we will be using for this project are the MNIST and the Fashion_MNIST dataset. <b> This notebook will focus on the MNIST dataset. </b> 

The MNIST dataset has a training set of 60,000 examples, and a test set of 10,000 examples. These examples consist of hand-written digits that belong to ten different classes (numbers 1 through 10). The given images have been size-normalized and centered in a fixed-size image. It would also be highly advisable to go through [this link](http://yann.lecun.com/exdb/mnist/) the information provided in this link to fully understand this dataset.

You will begin by pre-processing the already-loaded dataset in this notebook followed by from-scratch implementation of a Neural Network (NN). Once done, you will have to tweak the hyperparameters (such as learning rate, number of epochs etc.) to get the best results for your NN's implementation

###### <b>You will strictly be using for-loops fort this phase's implementation of NN (unless specified otherwise in the sub-section)

###### Modification of the provided code without prior discussion with the TAs will result in a grade deduction </b>

---

###### <b>Side note</b>
The `plot_model` method will only work if you have the `pydot` python package installed along with [Graphviz](https://graphviz.gitlab.io/download/). If you do not wish to use this then simply comment out the import for `pydot`

###### <b>Need Help?</b>
If you need help, please refer to the course staff ASAP and do not wait till the last moment as they might not be available on very short notice close to deadlines

#### <b>Before You Begin</b>

Skeleton code is provided to get you started. The main methods that you need to implement correspond to the four steps of the training process of a NN which are as follows:
1. Initialize variables and initialize weights
2. Forward pass
3. Backward pass AKA Backpropagation
4. Weight Update AKA Gradient Descent

__Look for comments in the code to see where you are supposed to write your code__ 

A `fit` function is what combines the previous three functions and overall trains the network to __fit__ to the provided training examples. The provided `fit` methods requires all the four steps of the training process to be working correctly. The function has been setup in a way that it expects the above four methods to take particular inputs and return particular outputs. __You are supposed to work within this restriction__ 



__To see if your model is working correctly, you need to make sure that your model loss is going down during training__


In [36]:
# Making all the necessary imports here
import keras
import numpy as np
import pandas as pd
import time
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn')
from IPython.display import Image
import pydot
from tqdm import tqdm_notebook
import seaborn as sns
from keras.datasets import mnist
from sklearn.model_selection import train_test_split
from keras.utils import np_utils
from sklearn.datasets import make_moons
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report
from google.colab import drive
import cv2


In [37]:
# This function will be used to plot the confusion matrix at the end of this notebook

def plot_confusion_matrix(conf_mat):
    classes = ['1','2','3','4','5','6','7','8','9','10']
    df_cm = pd.DataFrame(conf_mat,classes,classes)
    plt.figure(figsize=(15,9))
    sns.set(font_scale=1.4)
    sns.heatmap(df_cm, annot=True,annot_kws={"size": 16})
    plt.show()

class_labels = ['1','2','3','4','5','6','7','8','9','10']

In [38]:
# Enter group lead's roll number here. This will be used for plotting purposes

rollnumber = 24100043

#### __Loading and Pre-processing the MNIST dataset__

The MNIST dataset has been already loaded using the imported keras APIs. You have to pre-process the given training and testing samples according to the implementation of your NN. It is recommended that you print the given matrices to figure out how to go about pre-processing. 

In [39]:
# Loading the dataset using keras APIs
''' 
x_train = variable to store training images
y_train = variable to store training images' labels
x_test = variable to store test images
y_test = variable to store test images' labels
'''

classes = 10  # do not change this
(x_train, y_train),(x_test, y_test) = mnist.load_data()

# Sizes of test and training samples
print("Size of training samples matrix:", x_train.shape)
print("Size of testing samples matrix:", x_test.shape)

###### Code Here ######
'''Hint: You will have to normalize the given matrices'''

x_train = x_train/255.0
x_test = x_test/255.0






#x_test = x_test.reshape(x_test.shape[0], image_vector_size) # not sure if test flatten 


#one hot encode y_train and y_test
y_train = np_utils.to_categorical(y_train, classes) #shape(60000,10) 60000 observation and 10 possible output label
y_test = np_utils.to_categorical(y_test, classes)

#input is 28x28 image which will be flattened to 784 size vector in skeleton code. have 60000 such observation
#784 inputs observation 60000
# 10 output classes 60000 observation so 10 nodes at output 




Size of training samples matrix: (60000, 28, 28)
Size of testing samples matrix: (10000, 28, 28)


In [40]:
'''
a = np.array([[1, 3]])
a=a.reshape(2,)
print(a.shape)


print(c*c)
b = [[4, 1], [2, 2],[1,1]]
x=np.matmul(b,a)
print(x)

n2=np.array([1,2,3])
n1=np.array([5,6])
n1=n1.reshape(1,2)
x=np.dot(n2,n1)
'''



'\na = np.array([[1, 3]])\na=a.reshape(2,)\nprint(a.shape)\n\n\nprint(c*c)\nb = [[4, 1], [2, 2],[1,1]]\nx=np.matmul(b,a)\nprint(x)\n\nn2=np.array([1,2,3])\nn1=np.array([5,6])\nn1=n1.reshape(1,2)\nx=np.dot(n2,n1)\n'

#### __NN Implementation__
Your implementation of NN needs to use the `sigmoid` activation function for the hidden layer(s) and the `softmax` activation function for the output layer. The NN model you will be creating here will consits of only three layers: 1 input layer, 1 hidden layer and 1 output layer

In [41]:
class NeuralNetwork():
    
    def matrix_mult(self,a,b):
        row_a=a.shape[0]
        row_b=b.shape[0]
        col_b=b.shape[1]
        prod= np.zeros((row_a, col_b))
        for i in range(row_a): #rows of a
   
          for j in range(col_b): #col of b
            for k in range(row_b): #row of b
              prod[i][j]+= a[i][k]*b[k][j]
        return prod

    def transpose_mat(self,a):
        row_a= a.shape[0]
        col_a= a.shape[1]
        trans= np.zeros((col_a,row_a))
        for i in range(row_a):
          for j in range(col_a):
            trans[j][i]= a[i][j]
        return trans
    @staticmethod
    
    def cross_entropy_loss(y_pred, y_true):
        ###### Code Here ######
        #shape of y here is (60000,10)
        total_loss= -(1/y_pred.shape[0])*np.sum(y_true*np.log(y_pred)) #over all nodes and all observation
        return total_loss 
    
    @staticmethod
    def accuracy(y_pred, y_true):
        ###### Code Here ######
        

        return None
    
    @staticmethod
    def softmax(x):
        ###### Code Here ######
        e=np.exp(x)
        act_x= e/np.sum(e)

        # x here is z . we activate z
        #and shape is(1,n) where n is number of nodes in that layer. It is just for one observation

        return act_x
    
    @staticmethod
    def sigmoid(x):
        ###### Code Here ######
        act_x= 1/(1+np.exp(-x))
        #x here is z
        #and shape is(1,n) where n is number of nodes in that layer. It is just for one observation
        return act_x
    def sigmoid_derivative(self,act):
        derv= act*(1-act)
        #shape act is(1,n) where n is number of nodes in that layer
        return derv
    def __init__(self, input_size, hidden_nodes, output_size):
        '''Creates a Feed-Forward Neural Network.
        The parameters represent the number of nodes in each layer (total 3). 
        Look at the inputs to the function'''
        
        self.num_layers = 3
        self.input_shape = input_size
        self.hidden_shape = hidden_nodes
        self.output_shape = output_size
        
        self.weights_ = []
        self.biases_ = []
        self.__init_weights()
    
    def __init_weights(self):
        '''Initializes all weights based on standard normal distribution and all biases to 0.'''
        
        ###### Code Here (Replace 'None' by appropriate values/variables) ######
        
        W_h = np.random.normal(size=(self.hidden_shape,self.input_shape))
        b_h = np.zeros(shape=(self.hidden_shape,1))
        #shape w_h=(n_h,x_input)
        #shape b_h=(n_h,1)

        W_o = np.random.normal(size=(self.output_shape,self.hidden_shape))
        b_o = np.zeros(shape=(self.output_shape,1))
        # shape w_o=(n_o,n_h)

        #n_h is node in hidden layer and n_o is nodes in output layer

        # self.weights_ becomes a list of np.arrays. 0th index has W_h and 1st index has W_o
        self.weights_.append(W_h)  
        self.weights_.append(W_o)  

        # self.biases_ becomes a list of np.arrays. 0th index has b_h and 1st index has b_o
        self.biases_.append(b_h)
        self.biases_.append(b_o)

    def forward_pass(self, input_data):
        '''Executes the feed forward algorithm.
        "input_data" is the input to the network in row-major form
        Returns "activations", which is a list of all layer outputs (excluding input layer of course)'''
        

        #z_h= w_h*x + b_h
        z_h= self.matrix_mult(self.weights_[0],self.transpose_mat(input_data))+self.biases_[0] #2d (n_h,1)
        z_h= self.transpose_mat(z_h)# (1,n_h)
        act_h= self.sigmoid(z_h) #(1,n_h)
        
        #z_o= w_o*act_h +b_o
        z_o= self.matrix_mult(self.weights_[1],self.transpose_mat(act_h)) +self.biases_[1] #(n_o,1)
        z_o= self.transpose_mat(z_o) #(1,n_o)
        act_o= self.softmax(z_o) #(1,n_o)
        activations= []
        activations.append(act_h)
        activations.append(act_o)
        ###### Code Here ######
        

        return activations

    def backward_pass(self, input,targets, layer_activations):
        '''Executes the backpropogation algorithm.
        "targets" is the ground truth/labels
        "layer_activations" are the return value of the forward pass step
        Returns "deltas", which is a list containing weight update values for all layers (excluding the input layer of course)'''
        
        ###### Code Here ######
        w_h=self.weights_[0]
        w_o=self.weights_[1]
        b_h=self.biases_[0]
        b_o=self.biases_[1]
        a_h=layer_activations[0]
        a_o=layer_activations[1]

        
        #dcost/dw_o= dcost/dz_o *dz_o/dw_o

        dcost_dz_o= a_o-targets # shape(1,n_o) n_o output
        dcost_dw_o= self.matrix_mult(self.transpose_mat(dcost_dz_o),a_h) #(n_o,n_h)
        dcost_db_o= dcost_dz_o #(n_o,1)

        #dcost/dw_h=dcost/dz_h *dz_h/dw_h
        #dcost/dz_h=dcost/dz_o*dz_o/da_h
        #dz_h/dw_h= x
        dcost_dz_h=(self.matrix_mult(dcost_dz_o,w_o)) * self.sigmoid_derivative(a_h) # both are (1,n_h)
        dcost_dw_h= self.matrix_mult(self.transpose_mat(dcost_dz_h),input) #(n_h,inputsize)
        dcost_db_h= dcost_dz_h

        deltas=[]
        deltas.append(dcost_dw_h)
        deltas.append(dcost_db_h)
        deltas.append(dcost_dw_o)
        deltas.append(dcost_db_o)

        return deltas
    
    def weight_update(self, deltas, layer_inputs, lr):
        '''Executes the gradient descent algorithm.
        "deltas" is return value of the backward pass step
        "layer_inputs" is a list containing the inputs for all layers (including the input layer)
        "lr" is the learning rate'''
        w_h=self.weights_[0]
        w_o=self.weights_[1]
        b_h=self.biases_[0]
        b_o=self.biases_[1]
        dcost_dw_h=deltas[0]
        dcost_db_h=deltas[1]
        dcost_dw_o=deltas[2]
        dcost_db_o=deltas[3]
        
        w_h=w_h - lr*dcost_dw_h
        b_h=b_h - lr*self.transpose_mat(dcost_db_h)
        w_o=w_o - lr*dcost_dw_o
        b_o=b_o - lr*self.transpose_mat(dcost_db_o)

        self.weights_[0]=w_h
        self.weights_[1]=w_o
        self.biases_[0]=b_h
        self.biases_[1]=b_o

        ###### Code Here ######

    
    
    
    ###### Do Not Change Anything Below this line in This Cell ######
    
    def fit(self, Xs, Ys, epochs, lr=1e-3):
            history = []
            for epoch in tqdm_notebook(range(epochs)):
                num_samples = Xs.shape[0]
                for i in range(num_samples):

                    sample_input = Xs[i,:].reshape((1,self.input_shape))
                    sample_target = Ys[i,:].reshape((1,self.output_shape))
                    
                    activations = self.forward_pass(sample_input)   # Call forward_pass function 
                    deltas = self.backward_pass(sample_input,sample_target, activations)    # Call backward_pass function  #edit adding sampleinput
                    layer_inputs = [sample_input] + activations[:-1]
                    
                    # Call weight_update function 
                    self.weight_update(deltas, layer_inputs, lr)
                
                preds = self.predict(Xs)   # Call predict function 
                
                current_loss = self.cross_entropy_loss(preds, Ys)
                
                if  epoch==epochs-1:
                  confusion_mat=confusion_matrix(Ys.argmax(axis=1), preds.argmax(axis=1),labels=np.arange(10))  
                  plot_confusion_matrix(confusion_mat)
                  report = classification_report(Ys, np_utils.to_categorical(preds.argmax(axis=1),num_classes=classes), target_names=class_labels)
                  print(report)
                history.append(current_loss)
            return history
    
    def predict(self, Xs):
        '''Returns the model predictions (output of the last layer) for the given "Xs".'''
        predictions = []
        num_samples = Xs.shape[0]
        for i in range(num_samples):
            sample = Xs[i,:].reshape((1,self.input_shape))
            sample_prediction = self.forward_pass(sample)[-1]
            predictions.append(sample_prediction.reshape((self.output_shape,)))
        return np.array(predictions)
    
    def evaluate(self, Xs, Ys):
        '''Returns appropriate metrics for the task, calculated on the dataset passed to this method.'''
        pred = self.predict(Xs)
        return self.cross_entropy_loss(pred, Ys), self.accuracy(pred.argmax(axis=1), Ys.argmax(axis=1))
    
    def plot_model(self, filename):
        '''Provide the "filename" as a string including file extension. Creates an image showing the model as a graph.'''
        graph = pydot.Dot(graph_type='digraph')
        graph.set_rankdir('LR')
        graph.set_node_defaults(shape='circle', fontsize=0)
        nodes_per_layer = [self.input_shape, self.hidden_shape, self.output_shape]
        for i in range(self.num_layers-1):
            for n1 in range(nodes_per_layer[i]):
                for n2 in range(nodes_per_layer[i+1]):
                    edge = pydot.Edge(f'l{i}n{n1}', f'l{i+1}n{n2}')
                    graph.add_edge(edge)
        graph.write_png(filename)

In [42]:
# These are what we call the hyperparameters (a.k.a Black Magic). You need to research on them and tweak them to see what generates the best result for you 

INPUT_SIZE = 28*28       # must be an int, this number represents the numeber of nodes/neurons in the input layer of the network
HIDDEN_NODES = 32     # must be an int, this number represents the numeber of nodes/neurons in the only hidden layer of the network
OUTPUT_SIZE = 10      # must be an int, this number represents the numeber of nodes/neurons in the output layer of the network
EPOCH = 1      # must be an int
LEARNING_RATE = 0.02

In [43]:
start = time.time()

nn = NeuralNetwork(input_size = INPUT_SIZE, hidden_nodes = HIDDEN_NODES, output_size = OUTPUT_SIZE)
history = nn.fit(x_train, y_train, epochs=EPOCH, lr=LEARNING_RATE)
plt.plot(history);
plt.gca().set(xlabel='Epoch', ylabel='Cross-entropy', title='Training Plot {}'.format(rollnumber));
end = time.time()

print("Runtime of the algorithm is ", round((end - start),3)," seconds")

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`


  0%|          | 0/1 [00:00<?, ?it/s]

KeyboardInterrupt: ignored