**`Simple Neural Network Implementations`**

**Model 1:** We are given a set of instances with numerical attributes, and a numerical label/target value (i.e. ground truth) for each instance. POur goal is to create a neural network that can predict the label for any given instance. In our simplest neural network model, the prediction is just a linear combination of the attributes. The network is trained by optimizing the weights (i.e. constant co-efficients) of this linear combination using gradient descent.

To demo this model, we will use the "traffic lights" example, where we have three traffic lights, the state of each light is either `on` or `off` (i.e. 1 or 0) and the corresponding label is either walk or stop (1 or 0). The training dataset is contrived such that there is a strong correlation between the second attribute/light and the target. We would therefore expect the second weight to be much larger than the other two weights after the model has been trained sufficiently. 

In [29]:
import numpy as np

# traffic lights dataset (each row is and instance, the first three coulumns are the attributes and the last column is the label)
traffic_lights = np.array([ [1, 0, 1, 0], 
                            [0, 1, 1, 1],
                            [0, 0, 1, 0],
                            [1, 1, 1, 1],
                            [0, 1, 1, 1],
                            [1, 0, 1, 0]] )

# number of gradient descent iterations
niters = 30

# learning rate (i.e gradient descent step-size)
alpha = 0.1

# initialize random weights
weights = np.random.randn(3) 
print(f"Initial weights: {weights}")

# train the network
for i in range(niters):

    total_error = 0.0
    for j in range(traffic_lights.shape[0]):
        
        input = traffic_lights[j, :-1]
        target = traffic_lights[j, -1]
         
        # compute prediction
        prediction = np.dot(weights, input) 
        
        # compute squared error
        error = (prediction - target)**2
        total_error += error

        # compute gradient of error w.r.t. weights
        grad = 2 * (prediction - target) * input

        # optimize weights using gradient descent
        weights -= alpha * grad

    print(f"Iteration# {i+1}, Updated weights: {weights}, Total error: {total_error}")



Initial weights: [-0.03840893 -0.41475276 -1.04493862]
Iteration# 1, Updated weights: [ 0.30101988  0.418822   -0.01432602], Total error: 8.463158685880734
Iteration# 2, Updated weights: [0.19111435 0.64926449 0.06246911], Total error: 0.8313322046188765
Iteration# 3, Updated weights: [0.10154897 0.77038928 0.06233831], Total error: 0.30162201445951053
Iteration# 4, Updated weights: [0.04739158 0.84500878 0.05585175], Total error: 0.11721242910424626
Iteration# 5, Updated weights: [0.01632727 0.89254286 0.04909695], Total error: 0.04735412555066993
Iteration# 6, Updated weights: [-8.14194520e-04  9.23379496e-01  4.27949363e-02], Total error: 0.020371532238372722
Iteration# 7, Updated weights: [-0.00971601  0.94377761  0.03710021], Total error: 0.009599725237706198
Iteration# 8, Updated weights: [-0.01381643  0.95758233  0.0320437 ], Total error: 0.005052973209135628
Iteration# 9, Updated weights: [-0.01518043  0.96717031  0.02760445], Total error: 0.0029656882212010143
Iteration# 10, U

**Model 2:** We will now build a neural network with one hidden-layer, between the input and output layers, and introduce non-linearity via a relu activation function. This three layer network has two sets of weights, both of which are optimized during training. That training phase has two separate stages: a forward propagation and a backward propagation. Forward propagation involves computing the output at the end of each layer and sending them forward to be the inputs for the next layer. Backward propagatiuon involves computing the derivatives w.r.t. the inputs of the operation performed at each layer, composing these derivatives with those obtained from the next layer, and then sending these back to the previous layer. This composition of derivatives from the current layer with derivatives from the next layer is simply the application of the chain-rule for derivatives.  

In [51]:
import numpy as np

'''
    Input layer class: Input layer does not perform any operations
'''
class input_layer(object):
    '''
        class constructor
    '''
    def __init__(self) -> None:
        pass

    ''' 
        Input layer forward pass
    '''
    def forward(self, L_0):
        self.L_0 = L_0
        return self.L_0
    
''' 
    Hidden layer class: Hidden layer performs 2 operations. First it performs matrix multiplication
                        of inputs L_0 with weights W_0. Then it operates on this result with the Relu
                        function.
'''    
class hidden_layer(object):
    '''
        class constructor
    '''
    def __init__(self, W) -> None:
        self.W = W
        self.W_grad = np.zeros_like(W)

    ''' 
        Hidden layer forward propagation
    '''
    def forward(self, L): 
        self.L = L
        return self.forward_matrix_mult()

    def forward_matrix_mult(self):
        self.Z =  np.dot(self.L, self.W)
        return self.forward_relu()
    
    def forward_relu(self):
        return Relu(self.Z)
    
    ''' 
        Hidden layer backpropagation of derivatives
    '''
    def backward(self, D):
        self.backward_relu(D)

    def backward_relu(self, D):
        # dE/dZ
        dE_dZ = D * Relu_deriv(self.Z) 
        self.backward_matrix_mult(dE_dZ)
    
    def backward_matrix_mult(self, D):
        # dE/dW0
        self.W_grad = np.dot((self.L).T, D)

    ''' 
        Gradient descent optimization of hidden layer weights
    '''
    def update_weights(self, alpha):
        self.W -= alpha * self.W_grad

       

''' 
    Ouput layer class: Performs two operations, first matrix multiplication of inputs L_1 with weights
                       W_1. This result is then operated on by squared error function.  
'''
class output_layer(object):
    
    ''' 
        class constructor
    '''

    def __init__(self, W) -> None:
        self.W = W
        self.W_grad = np.zeros_like(W)

    ''' 
        Output layer forward propagation
    '''
    def forward(self, L, Y):
        self.L = L
        self.Y = Y
        return self.forward_matrix_mult()

    def forward_matrix_mult(self):
        self.P = np.dot(self.L, self.W) 
        return self.P, self.forward_error()
 
    def forward_error(self):
        return (self.P - self.Y)**2

    '''     
        Output layer backpropagation of derivatives
    '''
    def backward(self):
        return self.backward_error()

    def backward_error(self):
        # dE/dP
        dE_dP = 2*(self.P - self.Y)
        return self.backward_matrix_mult(dE_dP)

    def backward_matrix_mult(self, D):
        # dE/dW1
        self.W_grad = np.dot((self.L).T, D)
        # dE/dL1
        dE_dL = np.dot(D, (self.W).T)
        return dE_dL
    
    ''' 
        Gradient descent optimization of output layer weights
    '''
    def update_weights(self, alpha):
        self.W -= alpha * self.W_grad

'''
    A 3-layer neural network class
'''
class three_layer_net(object):
    ''' 
        class constructor: Takes in the following parameters- number of neurons in input layer (which is the number of feature attributes for each instance), number of hidden layers (has to be at least 1 and can be arbitrarily large), number of neurons in the output layer (which is the number of target attributes) and gradient descent step-size (alpha)
    '''
    def __init__(self, input_neurons, hidden_neurons, output_neurons, alpha) -> None:
        self.input_neurons  = input_neurons
        self.hidden_neurons = hidden_neurons
        self.output_neurons = output_neurons
        self.alpha = alpha
        
        np.random.seed(1)
        # initialize weights W0 between input layer and hidden layer 
        W0 = 2*np.random.random(size=(input_neurons, hidden_neurons)) - 1
        # initialize weights W1 between hidden layer and output layer
        W1 = 2*np.random.random(size=(hidden_neurons, output_neurons)) - 1

        # initialize layer objects
        self.layer_0 = input_layer()
        self.layer_1 = hidden_layer(W0)
        self.layer_2 = output_layer(W1)

    ''' 
        neural network forward pass
    '''
    def forward_net(self, L0, Y):
        # input layer forward pass
        self.L0 = self.layer_0.forward(L0) 
        # hidden layer forward pass
        self.L1 = self.layer_1.forward(self.L0) 
        # output layer forward pass
        self.L2, error = self.layer_2.forward(self.L1, Y) 

        return self.L2, error

    ''' 
        neural network backward pass
    ''' 
    def backward_net(self):
       # output layer backpropagation
       D = self.layer_2.backward() 
       # hidden layer backpropagation
       self.layer_1.backward(D) 

    '''     
        weight optimization
    '''
    def optimize(self):
        # update output layer weights
        self.layer_2.update_weights(self.alpha)
        # update hidden layer weights
        self.layer_1.update_weights(self.alpha)

    '''     
        train the network
    ''' 
    def train(self, X_train, y_train, niters):

        #training iterations
        for i in range(niters):
            total_error = 0.0
            # train using each instance one by one
            for j in range(X_train.shape[0]):

                X = X_train[j:j+1]
                y = y_train[j:j+1]

                # forward propagation
                prediction, error = self.forward_net(X, y)
                total_error += error
                print(f"Instance# {j+1}, Target: {y[0]}, Prediction: {prediction[0]}")

                # backpropagation
                self.backward_net()

                # weight optimization
                self.optimize()

                #if(i == (niters-1)):
                #    print(f"Instance# {j+1}, Target: {y[0]}, Prediction: {prediction[0]}")


            print(f"Iteration# {i+1}, Total error: {total_error[0][0]}")

# Relu function
def Relu(x):
    return x*(x > 0)

# Relu derivative function
def Relu_deriv(x):
    return (x > 0)

We can test this 3 layer network using the same traffic lights dataset.

In [58]:
# traffic lights dataset (each row is and instance, the first three coulumns are the attributes and the last column is the label)
traffic_lights = np.array([ [1, 0, 1, 0], 
                            [0, 1, 1, 1],
                            [0, 0, 1, 0],
                            [1, 1, 1, 1],
                            [0, 1, 1, 1],
                            [1, 0, 1, 0]] )

# dataset preprocessing
X_train = traffic_lights[:,:-1]
y_train = traffic_lights[:,-1]

# initialize a three layer neural net object
three_net = three_layer_net(input_neurons=X_train.shape[1], hidden_neurons=4, output_neurons=1, alpha=0.2)

# train the net
three_net.train(X_train, y_train, niters=30)

Instance# 1, Target: 0, Prediction: [0.39194327]
Instance# 2, Target: 1, Prediction: [0.02098811]
Instance# 3, Target: 0, Prediction: [0.18396339]
Instance# 4, Target: 1, Prediction: [0.]
Instance# 5, Target: 1, Prediction: [0.09895252]
Instance# 6, Target: 0, Prediction: [0.2771006]
Iteration# 1, Total error: 3.0345976354882427
Instance# 1, Target: 0, Prediction: [0.12744125]
Instance# 2, Target: 1, Prediction: [0.19178685]
Instance# 3, Target: 0, Prediction: [0.36902804]
Instance# 4, Target: 1, Prediction: [0.08603863]
Instance# 5, Target: 1, Prediction: [0.52586509]
Instance# 6, Target: 0, Prediction: [0.43765601]
Iteration# 2, Total error: 2.0573035366441093
Instance# 1, Target: 0, Prediction: [0.18586415]
Instance# 2, Target: 1, Prediction: [0.61193456]
Instance# 3, Target: 0, Prediction: [0.58169219]
Instance# 4, Target: 1, Prediction: [0.32570855]
Instance# 5, Target: 1, Prediction: [0.98189686]
Instance# 6, Target: 0, Prediction: [0.32690769]
Iteration# 3, Total error: 1.085371