**`Simple Neural Network Implementations`**

**Model 1:** We are given a set of instances with numerical attributes, and a numerical label/target value (i.e. ground truth) for each instance. POur goal is to create a neural network that can predict the label for any given instance. In our simplest neural network model, the prediction is just a linear combination of the attributes. The network is trained by optimizing the weights (i.e. constant co-efficients) of this linear combination using gradient descent.

To demo this model, we will use the "traffic lights" example, where we have three traffic lights, the state of each light is either `on` or `off` (i.e. 1 or 0) and the corresponding label is either walk or stop (1 or 0). The training dataset is contrived such that there is a strong correlation between the second attribute/light and the target. We would therefore expect the second weight to be much larger than the other two weights after the model has been trained sufficiently. 

In [131]:
import numpy as np

# traffic lights dataset (each row is and instance, the first three coulumns are the attributes and the last column is the label)
traffic_lights = np.array([ [1, 0, 1, 0], 
                            [0, 1, 1, 1],
                            [0, 0, 1, 0],
                            [1, 1, 1, 1],
                            [0, 1, 1, 1],
                            [1, 0, 1, 0]] )

# number of gradient descent iterations
niters = 30

# learning rate (i.e gradient descent step-size)
alpha = 0.1

# initialize random weights
weights = np.random.randn(3) 
print(f"Initial weights: {weights}")

# train the network
for i in range(niters):

    total_error = 0.0
    for j in range(traffic_lights.shape[0]):
        
        input = traffic_lights[j, :-1]
        target = traffic_lights[j, -1]
         
        # compute prediction
        prediction = np.dot(weights, input) 
        
        # compute squared error
        error = (prediction - target)**2
        total_error += error

        # compute gradient of error w.r.t. weights
        grad = 2 * (prediction - target) * input

        # optimize weights using gradient descent
        weights -= alpha * grad

    print(f"Iteration# {i+1}, Updated weights: {weights}, Total error: {total_error}")



Initial weights: [ 0.62151601  1.16478045 -0.87023502]
Iteration# 1, Updated weights: [ 0.5805458   1.29714955 -0.55782568], Total error: 1.319210129742167
Iteration# 2, Updated weights: [ 0.46945491  1.2993988  -0.46105522], Total error: 0.6696506977513328
Iteration# 3, Updated weights: [ 0.38056432  1.27976931 -0.39240438], Total error: 0.457592298524496
Iteration# 4, Updated weights: [ 0.31243045  1.25349121 -0.33514602], Total error: 0.3236465018257741
Iteration# 5, Updated weights: [ 0.2592895   1.22551061 -0.28648292], Total error: 0.2328646988599997
Iteration# 6, Updated weights: [ 0.21698546  1.19823641 -0.24499457], Total error: 0.16906006496928797
Iteration# 7, Updated weights: [ 0.18271609  1.17286488 -0.209578  ], Total error: 0.12330872270172383
Iteration# 8, Updated weights: [ 0.15456557  1.14991184 -0.17931977], Total error: 0.09015201816516408
Iteration# 9, Updated weights: [ 0.13119006  1.12951049 -0.1534536 ], Total error: 0.06599009691179981
Iteration# 10, Updated we

**Model 2:** We will now build a neural network with one hidden-layer, between the input and output layers, and introduce non-linearity via a relu activation function. This three layer network has two sets of weights, both of which are optimized during training. That training phase has two separate stages: a forward propagation and a backward propagation. Forward propagation involves computing the output at the end of each layer and sending them forward to be the inputs for the next layer. Backward propagatiuon involves computing the derivatives w.r.t. the inputs of the operation performed at each layer, composing these derivatives with those obtained from the next layer, and then sending these back to the previous layer. This composition of derivatives from the current layer with derivatives from the next layer is simply the application of the chain-rule for derivatives.  

In [132]:
import numpy as np

'''
    Input layer class: Input layer does not perform any operations
'''
class input_layer(object):
    '''
        class constructor
    '''
    def __init__(self) -> None:
        pass

    ''' 
        Input layer forward pass
    '''
    def forward(self, L_0):
        self.L_0 = L_0
        return self.L_0
    
''' 
    Hidden layer class: Hidden layer performs 2 operations. First it performs matrix multiplication
                        of inputs L_0 with weights W_0. Then it operates on this result with the Relu
                        function.
'''    
class hidden_layer(object):
    '''
        class constructor
    '''
    def __init__(self, W) -> None:
        self.W = W
        self.W_grad = np.zeros_like(W)

    ''' 
        Hidden layer forward propagation
    '''
    def forward(self, L): 
        self.L = L
        return self.forward_matrix_mult()

    def forward_matrix_mult(self):
        self.Z =  np.dot(self.L, self.W)
        return self.forward_relu()
    
    def forward_relu(self):
        return Relu(self.Z)
    
    ''' 
        Hidden layer backpropagation of derivatives
    '''
    def backward(self, D):
        self.backward_relu(D)

    def backward_relu(self, D):
        # dE/dZ
        dE_dZ = D * Relu_deriv(self.Z) 
        self.backward_matrix_mult(dE_dZ)
    
    def backward_matrix_mult(self, D):
        # dE/dW0
        self.W_grad = np.dot((self.L).T, D)

    ''' 
        Gradient descent optimization of hidden layer weights
    '''
    def update_weights(self, alpha):
        self.W -= alpha * self.W_grad

       

''' 
    Ouput layer class: Performs two operations, first matrix multiplication of inputs L_1 with weights
                       W_1. This result is then operated on by squared error function.  
'''
class output_layer(object):
    
    ''' 
        class constructor
    '''

    def __init__(self, W) -> None:
        self.W = W
        self.W_grad = np.zeros_like(W)

    ''' 
        Output layer forward propagation
    '''
    def forward(self, L, Y):
        self.L = L
        self.Y = Y
        return self.forward_matrix_mult()

    def forward_matrix_mult(self):
        self.P = np.dot(self.L, self.W) 
        return self.P, self.forward_error()
 
    def forward_error(self):
        return np.sum((self.P - self.Y)**2) / self.P.shape[0]

    '''     
        Output layer backpropagation of derivatives
    '''
    def backward(self):
        return self.backward_error()

    def backward_error(self):
        # dE/dP
        dE_dP = 2*(self.P - self.Y) / self.P.shape[0]
        return self.backward_matrix_mult(dE_dP)

    def backward_matrix_mult(self, D):
        # dE/dW1
        self.W_grad = np.dot((self.L).T, D)
        # dE/dL1
        dE_dL = np.dot(D, (self.W).T)
        return dE_dL
    
    ''' 
        Gradient descent optimization of output layer weights
    '''
    def update_weights(self, alpha):
        self.W -= alpha * self.W_grad

'''
    A 3-layer neural network class
'''
class three_layer_net(object):
    ''' 
        class constructor: Takes in the following parameters- number of neurons in input layer (which is the number of feature attributes for each instance), number of hidden layers (has to be at least 1 and can be arbitrarily large), number of neurons in the output layer (which is the number of target attributes) and gradient descent step-size (alpha)
    '''
    def __init__(self, input_neurons, hidden_neurons, output_neurons, alpha) -> None:
        self.input_neurons  = input_neurons
        self.hidden_neurons = hidden_neurons
        self.output_neurons = output_neurons
        self.alpha = alpha
        
        np.random.seed(1)
        # initialize weights W0 between input layer and hidden layer 
        W0 = 0.2*np.random.random(size=(input_neurons, hidden_neurons)) - 0.1
        # initialize weights W1 between hidden layer and output layer
        W1 = 0.2*np.random.random(size=(hidden_neurons, output_neurons)) - 0.1 

        # initialize layer objects
        self.layer_0 = input_layer()
        self.layer_1 = hidden_layer(W0)
        self.layer_2 = output_layer(W1)

    ''' 
        neural network forward pass
    '''
    def forward_net(self, L0, Y):
        # input layer forward pass
        self.L0 = self.layer_0.forward(L0) 
        # hidden layer forward pass
        self.L1 = self.layer_1.forward(self.L0) 
        # output layer forward pass
        self.L2, error = self.layer_2.forward(self.L1, Y) 

        return self.L2, error

    ''' 
        neural network backward pass
    ''' 
    def backward_net(self):
       # output layer backpropagation
       D = self.layer_2.backward() 
       # hidden layer backpropagation
       self.layer_1.backward(D) 

    '''     
        weight optimization
    '''
    def optimize(self):
        # update output layer weights
        self.layer_2.update_weights(self.alpha)
        # update hidden layer weights
        self.layer_1.update_weights(self.alpha)

    '''     
        train the network
    ''' 
    def train(self, X_train, y_train, batch_size, niters):

        #training iterations
        for i in range(niters):
            total_error = 0.0
            correct_count = 0
            # train using batch of instances
            for j in range(int(X_train.shape[0]/batch_size)):

                lo = j * batch_size
                hi = min((j+1) * batch_size, X_train.shape[0])

                X = X_train[lo:hi]
                y = y_train[lo:hi]

                # forward propagation
                prediction, error = self.forward_net(X, y)
                total_error += error
                correct_count += int(np.argmax(prediction) == np.argmax(y))
                
                #if(i == (niters-1)):
                #    print(f"Instance# {j+1}, Target: {y}, Prediction: {prediction}")

                # backpropagation
                self.backward_net()

                # weight optimization
                self.optimize()

            print(f"Iteration# {i+1}, Total error: {total_error}, Training accuracy: {correct_count/len(y_train)}")

# Relu function
def Relu(x):
    return x*(x > 0)

# Relu derivative function
def Relu_deriv(x):
    return (x > 0)

We can test this 3 layer network using the same traffic lights dataset.

In [133]:
# traffic lights dataset (each row is and instance, the first three coulumns are the attributes and the last column is the label)
traffic_lights = np.array([ [1, 0, 1, 0], 
                            [0, 1, 1, 1],
                            [0, 0, 1, 0],
                            [1, 1, 1, 1],
                            [0, 1, 1, 1],
                            [1, 0, 1, 0]] )

# dataset preprocessing
X_train = traffic_lights[:,:-1]
y_train = traffic_lights[:,-1:]

# initialize a three layer neural net object
three_net = three_layer_net(input_neurons=X_train.shape[1], hidden_neurons=4, output_neurons=1, alpha=0.5)

# train the net
three_net.train(X_train, y_train, batch_size=2, niters=30)

Iteration# 1, Total error: 1.4968982369314432, Training accuracy: 0.0
Iteration# 2, Total error: 1.4054938161457935, Training accuracy: 0.3333333333333333
Iteration# 3, Total error: 0.6799233509512006, Training accuracy: 0.5
Iteration# 4, Total error: 0.19354392147471092, Training accuracy: 0.5
Iteration# 5, Total error: 0.06960693584863642, Training accuracy: 0.5
Iteration# 6, Total error: 0.03536133055781871, Training accuracy: 0.5
Iteration# 7, Total error: 0.022186055130499184, Training accuracy: 0.5
Iteration# 8, Total error: 0.012995570009013491, Training accuracy: 0.5
Iteration# 9, Total error: 0.006853192664528585, Training accuracy: 0.5
Iteration# 10, Total error: 0.0034041452811268925, Training accuracy: 0.5
Iteration# 11, Total error: 0.0016247809501076357, Training accuracy: 0.5
Iteration# 12, Total error: 0.0007632862148341745, Training accuracy: 0.5
Iteration# 13, Total error: 0.0003596241751247776, Training accuracy: 0.5
Iteration# 14, Total error: 0.00017018676391926835

Testing our 3 layer model with the `MNIST dataset` of handwritten digits

In [134]:
from keras.datasets import mnist

In [136]:
'''
    MNIST dataset of handwritten digits:

    Each observation is an image. The input values per image are 28x28 pixels (i.e. 784 features/inputs per observation).
'''

(X_train, y_train), (X_test, y_test) = mnist.load_data() # x values contain image pixels (i.e. features) and y vaues are the corresponding labels

print(f"X_train shape = {X_train.shape}")
print(f"y_train shape = {y_train.shape}")
print(f"X_test shape = {X_test.shape}")
print(f"y_test shape = {y_test.shape}")

# flatten image pixels array
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1]*X_train.shape[2]) 
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1]*X_test.shape[2]) 

# normalize of pixel values from (0,255) to (0,1)
X_train = X_train / 255
X_test = X_test / 255

# one-hot encode the labels
y_train_onehot = np.zeros(shape=(y_train.shape[0], 10))
y_test_onehot = np.zeros(shape=(y_test.shape[0], 10))

for i in range(y_train.shape[0]):
    y_train_onehot[i, y_train[i]] = 1

for i in range(y_test.shape[0]):
    y_test_onehot[i, y_test[i]] = 1

# training dataset preparation
training_images = X_train[0:1000]
training_labels = y_train_onehot[0:1000]

# initialize a three layer neural net object
three_net = three_layer_net(input_neurons=training_images.shape[1], hidden_neurons=50, output_neurons=training_labels.shape[1], alpha=0.005)

# train the net
three_net.train(training_images, training_labels, batch_size=1, niters=100)    

X_train shape = (60000, 28, 28)
y_train shape = (60000,)
X_test shape = (10000, 28, 28)
y_test shape = (10000,)
Iteration# 1, Total error: 620.7554327195994, Training accuracy: 0.627
Iteration# 2, Total error: 399.9212829858161, Training accuracy: 0.812
Iteration# 3, Total error: 313.6991711596335, Training accuracy: 0.872
Iteration# 4, Total error: 259.55156583439305, Training accuracy: 0.907
Iteration# 5, Total error: 221.53570257875884, Training accuracy: 0.928
Iteration# 6, Total error: 193.46117176606424, Training accuracy: 0.943
Iteration# 7, Total error: 172.91822663462085, Training accuracy: 0.956
Iteration# 8, Total error: 157.1053729758403, Training accuracy: 0.962
Iteration# 9, Total error: 144.67181167065124, Training accuracy: 0.966
Iteration# 10, Total error: 133.9582386319036, Training accuracy: 0.973
Iteration# 11, Total error: 124.91255666566657, Training accuracy: 0.979
Iteration# 12, Total error: 116.99675394038027, Training accuracy: 0.98
Iteration# 13, Total error: