# 3.	Multiclass neural network classifier
### a)	Implementation and convergence criterion:
The multiclass neural network classifier with one hidden layer is implemented using python. The implementation has two parts – <br> 
*(1) a class named MulticlassNN.py is implemented that contains the definition of training model and* <br>
*(2) a train-test implementation named TrainTestNN.ipynb that imports training model MulticlassNN.py to run the training and test data (dev data) with different configurations.* <br><br>
**In MulticlassNN Neural network model**,
- no. of layers (input, hidden and output), no. of neurons per hidden layer, activation function and cost function are initialized. <br>
- An input layer to hidden layer and a hidden layer amongst themselves are connected using activation functions which implies, output of layer is input to the next layer after activation function is acted upon each set of inputs until it reaches final output layer. I have used three activation functions – sigmoid, tanh and softmax, implemented using NumPy package’s inbuilt functions – tanh, exp, sum and divide. <br>
- The training model has three steps, viz. forward pass, error computation and backward pass. Each instance will go through activation function while forward pass and error is computed at output. Based on error, it will be back propagated by taking the derivative deltas w.r.t activation and multiplying it with learning rate to decide on size of learning step, i.e., calculating the gradient descent.
- Error is computed based on two cost functions, viz. cross entropy or mean square error. <br>
- Accuracy percentage is then computed based on total correctly classified predicted labels divided by total labels in the input data multiplied by 100.

In [1]:
# Created By: Ali, Zeenat
# Homework 4: CMSC 678 Spring 2020
import numpy as np
import dill as dill

class MulticlassNN:
    # Initialize no of layers( input, hidden and output), no. of neurons, activation and cost functions
    def __init__(self, noOfLayers, neurons, activation, costFunction):
        self.layers = []
        self.noOfLayers = noOfLayers
        self.neurons = neurons
        self.cost_function = costFunction

        # Declare vector size for each neuron in each layer, the elements in neuron list should match no. of layers
        if not noOfLayers == len(neurons):
            raise ValueError("Layers and neuron count mismatch")

        # Include next layer neurons for input layer and all hidden layers (excluding output layer)
        for x in range(noOfLayers):
            if x != noOfLayers-1:
                layer_x = layer(neurons[x], neurons[x+1], activation[x])
            else:
                layer_x = layer(neurons[x], 0, activation[x])
            self.layers.append(layer_x)

    # Each instance will go through activation function while forward pass, then error is computed at output;
    # based on error, it will be back propagated, weights are updated and again passed forward until the classifier
    # converges and a decent accuracy is attained.
    def trainNeuralNetwork(self, batch, trainingDataX, trainingLabelY, epochs, learningRate, filename):
        self.batch = batch
        self.learningRate = learningRate
        for j in range(epochs):
            i = 0
            print("***** EPOCH#: ", j+1, "out of", epochs, " *******")
            while i+batch != len(trainingDataX):
                print("Training with ", i+batch+1, "/", len(trainingDataX), end="\r")
                self.error = 0
                self.forwardPass(trainingDataX[i:i+batch])
                self.computeError(trainingLabelY[i:i+batch])
                self.backPropagate(trainingLabelY[i:i+batch])
                i += batch
            self.error /= batch
            print("***** \nError : ", self.error, "*****")
        dill.dump_session(filename)

    # Each forward pass will update weights based on an activation function
    def forwardPass(self, TrainingDataX):
        self.layers[0].activations = TrainingDataX
        for i in range(self.noOfLayers-1):
            tempMat = np.add(np.matmul(self.layers[i].activations, self.layers[i].currentLayerWeights), self.layers[i].currentLayerBias)
            if self.layers[i+1].activation == "sigmoid":
                self.layers[i+1].activations = self.sigmoid(tempMat)
            elif self.layers[i+1].activation == "tanh":
                self.layers[i+1].activations = self.tanh(tempMat)
            elif self.layers[i+1].activation == "softmax":
                self.layers[i+1].activations = self.softmax(tempMat)
            else:
                self.layers[i+1].activations = tempMat

    # Activation function = sigmoid
    def sigmoid(self, layer):
        return np.divide(1, np.add(1, np.exp(np.negative(layer))))

    # Activation function = tanh
    def tanh(self, layer):
        return np.tanh(layer)
    
    # Activation function = softmax
    def softmax(self, layer):
        exp = np.exp(layer)
        if isinstance(layer[0], np.ndarray):
            return exp/np.sum(exp, axis=1, keepdims=True)
        else:
            return exp/np.sum(exp, keepdims=True)

    # Error can be calculate based on cost function as cross entropy or mean squared
    def computeError(self, trainingLabelY):
        if len(trainingLabelY[0]) != self.layers[self.noOfLayers-1].currentLayerNeurons:
            print ("Error: Label Y and output layer matrix dimension mismatch.")
            return
        if self.cost_function == "meanSquared":
            self.error += np.mean(np.divide(np.square(np.subtract(trainingLabelY, self.layers[self.noOfLayers-1].activations)), 2))
        elif self.cost_function == "crossEntropy":
            self.error += np.negative(np.sum(np.multiply(trainingLabelY, np.log(self.layers[self.noOfLayers-1].activations))))

    # Once we have error, we apply back propagation by differeniating w.r.t activation and update weights to improve accuracy 
    def backPropagate(self, trainingLabelY):
        targets = trainingLabelY
        i = self.noOfLayers-1
        y = self.layers[i].activations
        deltab = np.multiply(y, np.multiply(1-y, targets-y))
        deltaw = np.matmul(np.asarray(self.layers[i-1].activations).T, deltab)
        new_weights = self.layers[i-1].currentLayerWeights - self.learningRate * deltaw
        new_bias = self.layers[i-1].currentLayerBias - self.learningRate * deltab
        for i in range(i-1, 0, -1):
            y = self.layers[i].activations
            deltab = np.multiply(y, np.multiply(1-y, np.sum(np.multiply(new_bias, self.layers[i].currentLayerBias)).T))
            deltaw = np.matmul(np.asarray(self.layers[i-1].activations).T, np.multiply(y, np.multiply(1-y, np.sum(np.multiply(new_weights, self.layers[i].currentLayerWeights),axis=1).T)))
            self.layers[i].currentLayerWeights = new_weights
            self.layers[i].currentLayerBias = new_bias
            new_weights = self.layers[i-1].currentLayerWeights - self.learningRate * deltaw
            new_bias = self.layers[i-1].currentLayerBias - self.learningRate * deltab
        self.layers[0].currentLayerWeights = new_weights
        self.layers[0].currentLayerBias = new_bias

    
    def computeAccuracy(self, filename, inputDataX, labelY):
        dill.load_session(filename)
        self.batch = len(inputDataX)
        self.forwardPass(inputDataX)
        a = self.layers[self.noOfLayers-1].activations
        a[np.where(a==np.max(a))] = 1
        a[np.where(a!=np.max(a))] = 0
        total=0
        correct=0
        for i in range(len(a)):
            total += 1
            if np.equal(a[i], labelY[i]).all():
                correct += 1
        print("Accuracy percentage: ", correct*100/total)
        print(correct)
        print(total)

class layer:
    def __init__(self, currentLayerNeurons, nextLayerNeurons, activation):
        self.currentLayerNeurons = currentLayerNeurons
        self.activation = activation
        self.activations = np.zeros([currentLayerNeurons,1])
        #Random distribution of weights for hidden layers and adding bias element
        if nextLayerNeurons != 0:
            self.currentLayerWeights = np.random.normal(0, 0.001, size=(currentLayerNeurons, nextLayerNeurons))
            self.currentLayerBias = np.random.normal(0, 0.001, size=(1, nextLayerNeurons))
        else:
            self.currentLayerWeights = None
            self.currentLayerBias = None

**In train-test implementation,**<br>
- mnist data and MulticlassNN is imported. 60,000 samples are used as training data and 10,000 samples as test data or dev data.
- The data is then converted to numpy array format to perform numpy operations.
- A network is created using MulticlassNN neural network train model by plugging in appropriate configuration. The baseline model takes 3 layers, viz. input layer, one hidden layer and output layer. The input vector has 784 elements, hidden layer has 20 neurons and output layer has 10 elements corresponding to output class labels 0 to 9.
- The activation for input layer is set as none as no activation is needed at input layer. I have taken tanh as the activation function for hidden layer and softmax activation function for output layer.
- The error is calculated based on cost function as cross Entropy.
- Once the confirgurations are set, training data is passed to configured network to train the model with 5 iterations (epochs) and a learning rate of 0.001. The training is done collectively in a single batch. A filename 'Result.pkl is used for pickling, i.e, to store the intermediary result to perform numpy operations.
- Once the data is trained, the accuracy is calculated for the training data.
- The trained model is then applied to test data for its accuracy

In [None]:
%pip install python-mnist
%pip install import_ipynb

import numpy as np
import MulticlassNW as nn
from mnist import MNIST

#Import MNIST data and load training and Dev/Test data 
mnist = MNIST('mnist-dataset')
TrainingDataX, TrainingLabelY = mnist.load_training() # 60000 training samples
TestDataX, TestLabelY = mnist.load_testing()          # 10000 test samples

#Converting data to numpy array format
TrainingDataX = np.asarray(TrainingDataX).astype(np.float32)
TrainingLabelY = np.asarray(TrainingLabelY).astype(np.int32)
TestDataX = np.asarray(TestDataX).astype(np.float32)
TestLabelY = np.asarray(TestLabelY).astype(np.int32)

# Baseline: Train Neural Network - Single hidden layer using sigmoid activation and output layer using softmax activation
NumberOfLabels = 10
Trainingclass = np.eye(NumberOfLabels)[TrainingLabelY] # all zeroes except ones on diagonals 
Network = nn.MulticlassNN(3, [784, 100, 10], [None, "tanh", "softmax"], costFunction="crossEntropy")
Network.trainNeuralNetwork(1, trainingDataX=TrainingDataX, trainingLabelY=Trainingclass, epochs=1, learningRate=0.1, filename="Result.pkl")
print("Training Accuracy")
Network.computeAccuracy("Result.pkl", TrainingDataX, Trainingclass)

# Test Baseline Neural Network
TestClass = np.eye(NumberOfLabels)[TestLabelY]
print("Testing Accuracy")
Network.computeAccuracy("Result.pkl", TestDataX, TestClass)

Note: you may need to restart the kernel to use updated packages.
Collecting import_ipynb
  Downloading https://files.pythonhosted.org/packages/63/35/495e0021bfdcc924c7cdec4e9fbb87c88dd03b9b9b22419444dc370c8a45/import-ipynb-0.1.3.tar.gz
Building wheels for collected packages: import-ipynb
  Building wheel for import-ipynb (setup.py): started
  Building wheel for import-ipynb (setup.py): finished with status 'done'
  Created wheel for import-ipynb: filename=import_ipynb-0.1.3-cp37-none-any.whl size=2982 sha256=4e412a84f8e5b66eaf8da3fda86bf2fd5988996a988265bfdd8218de20f088bb
  Stored in directory: C:\Users\zeena\AppData\Local\pip\Cache\wheels\b4\7b\e9\a3a6e496115dffdb4e3085d0ae39ffe8a814eacc44bbf494b5
Successfully built import-ipynb
Installing collected packages: import-ipynb
Successfully installed import-ipynb-0.1.3
Note: you may need to restart the kernel to use updated packages.




Note: you may need to restart the kernel to use updated packages.
***** EPOCH#:  1 out of 1  *******
Training with  33276 / 6000004 / 60000 60000 1018 / 6000060000 / 60000 2334 / 60000 2568 / 60000 / 60000 60000/ 60000 3676 / 60000 4311 / 600004794 / 600005028 / 60000 / 60000 7061 / 60000 7286 / 60000/ 60000 7721 / 6000060000 8171 / 60000 8388 / 600006000060000/ 60000 10014 / 60000 10248 / 60000 10444 / 600006000011771 / 60000 60000 / 6000012817 / 6000060000 13475 / 60000 13913 / 60000 14545 / 6000060000 60000/ 60000 15336 / 60000 15785 / 60000 16004 / 60000 / 6000017640 / 60000 17840 / 60000 60000 18835 / 60000 19185 / 60000 / 60000 20332 / 60000/ 60000 21339 / 60000 21939 / 60000 22129 / 6000060000 23106 / 60000 23310 / 60000 23520 / 60000 23731 / 60000 / 60000 24605 / 6000060000 60000600006000025765 / 60000 26166 / 60000 60000 27783 / 60000 28024 / 6000060000/ 6000029957 / 60000 30164 / 60000 / 60000/ 600006000031231 / 60000 31440 / 60000 31885 / 60000 60000 32912 / 60000

**Convergence criteria:**<br>
- Weights and bias have been initialized based on random normal distribution.
- For each derivative w.r.t activation while back propagating, weights are updated by subtracting the gradients from current weight and again passed forward to check for error at each neuron. <br>
- The process continues until the error stabilizes and stops reducing further based on cost function.
- Eventually the updated weights starts giving right classification of labels and converges to an accuracy of about 87%.

### b)	Validation using XOR input data:


In [2]:
import numpy as np
import MulticlassNN as nn
from mnist import MNIST

#Import  data and load training and Dev/Test data 
datum_1 = [0, 1, 1]
datum_2 = [0, 1, 0]
datum_3 = [0, 0, 1]
datum_4 = [0, 0, 0]
datum_5 = [1, 0, 1]
datum_6 = [1, 1, 1]
datum_7 = [1, 1, 0]
datum_8 = [1, 0, 0]

training_dataX = [datum_1, datum_2, datum_3, datum_4, datum_5, datum_6, datum_7, datum_8]
training_labelY = [0, 1, 1, 0, 0, 1, 0, 1]

# #Converting data to numpy array format
TrainingDataX = np.array(training_dataX).astype(np.float32)
TrainingLabelY = np.asarray(training_labelY).astype(np.int32)

# Baseline: XOR data Train Neural Network - Single hidden layer using sigmoid activation and output layer using softmax activation
NumberOfLabels = 2
Trainingclass = np.eye(NumberOfLabels)[TrainingLabelY] # all zeroes except ones on diagonals 
print(Trainingclass)
Network = nn.MulticlassNN(3, [3, 4, 2], [None, "tanh", "softmax"], costFunction="crossEntropy")
Network.trainNeuralNetwork(1, trainingDataX=TrainingDataX, trainingLabelY=Trainingclass, epochs=20000, learningRate=0.2, filename="Result.pkl")
print("Training Accuracy")
Network.computeAccuracy("Result.pkl", TrainingDataX, TrainingLabelY)

[[1. 0.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [0. 1.]]
Training Accuracy/ 8 3 / 8888 7 / 8 3 / 84 / 88/ 88 / 8/ 8 88 4 / 8/ 8/ 8/ 8 8 / 8 4 / 88 5 / 8 / 8 8 4 / 8 / 8/ 84 / 8 8 8/ 88 6 / 8 7 / 88 / 8 8 / 8 3 / 8 2 / 86 / 8 8/ 8 8 / 8 8 8 4 / 8 6 / 8/ 8 / 8/ 8 8 / 8/ 8 / 88 8 / 8 / 888/ 88 8 2 / 8 4 / 8 8 / 8 6 / 8 7 / 8 8/ 8 / 87 / 8 2 / 8 8 / 8 6 / 8 / 8/ 83 / 8 8 4 / 88 / 888 88 / 8 7 / 88 8 3 / 8 3 / 8 8 / 8 6 / 8 6 / 8 6 / 886 / 8 8 / 8 / 888 / 8 2 / 8/ 8 6 / 8/ 8/ 88 8 3 / 8/ 8 2 / 8 / 8 3 / 8 8 / 8 8 / 88 / 8 8 8 / 8 / 8 / 8 4 / 8 8888 / 8 / 8/ 8 8 8 / 8/ 8 2 / 88 / 8 6 / 8 6 / 8 6 / 8 / 8885 / 8 7 / 83 / 8 / 83 / 88 8 5 / 8 5 / 8 6 / 8 6 / 8 5 / 8 8 88 6 / 8 6 / 8 2 / 8 8 7 / 8/ 85 / 8 3 / 8 8 / 8 4 / 8 5 / 8 8 / 8 7 / 8 / 88 3 / 8 / 8 2 / 87 / 8 / 8 5 / 8 6 / 8 4 / 8 4 / 8 / 8 / 8/ 8 8/ 8 4 / 8 8 / 8 5 / 8 8 2 / 8/ 88 8 / 8/ 85 / 8 7 / 87 / 8/ 8 6 / 8 / 8 4 / 8 2 / 8 / 8 7 / 8 5 / 8 6 / 8 6 / 8 2 / 8/ 83 / 88 / 8 4 / 8 8 / 88 4 / 8/ 88 4 / 8 8 3 / 8 / 8 2 / 88