# Implementation of a simple neural network

This is an example of implementating a simple neural network manually without dl frameworks (such as tensorflow and pytorch, etc). The network structure is as below:
![jupyter](./images/network.png)
The caculation in the neural network:
![jupyter](./images/network_cal.png)

## Import the necessary packages 
All we need is the math and numpy module

In [92]:
import math
from math import e
import numpy.random as rnd

## Define the network parameters
To make it simple, we ignore the bias (b=0) and only focus on the weight parameters. So this neural network has 8 parameters, which is initialized randomly.
- w0 defines the weights of the input of the hidden layer, it contains 2*3 parameters 
- w1 defines weights of the output of the hidden layer, it contains 2 parameters
- h is the hidden layer's output

In [93]:
# initialize weights randomly
w0 = [[rnd.rand()] * 3 for i in range(2)]
w1 = [rnd.rand() for i in range(2)]
h = [rnd.rand() for i in range(2)]
print(w0)
print(w1)
print(h)

[[0.6611529751056572, 0.6611529751056572, 0.6611529751056572], [0.5374184281561075, 0.5374184281561075, 0.5374184281561075]]
[0.45861134553660576, 0.8206650584684276]
[0.3467988090138425, 0.8889623486404796]


## Define activation function: sigmoid
![jupyter](./images/sigmoid.png)

In [94]:
# define sigmoid
def sigmoid(x):
    return 1/(1+math.pow(e, -1 * 2* x))

# define sigmoid derivative
def d_sigmoid(x):
    return sigmoid(2*x) * (1 - sigmoid(2*x)) * 2

print(sigmoid(0))
print(d_sigmoid(0))

0.5
0.5


## Define a single neuron 
x and w are input vector and weight vector of the neron, so the calculation can be coded as below:  

- p = f(h, w1, activate=False)  
- y^ = sigmoid(p) = f(h, w1, activate=True)  
- q = f(xi, w0[0], activate=False)  
- h[0] = sigmoid(q) = f(xi, w0[0], activate=False)


In [95]:
# define a single neuron
def f(x, w, activate=True):
    ret = 0
    for i in range(len(x)):
        ret += x[i] * w[i]
    if activate:
        return sigmoid(ret)
    else:
        return ret

#print(f([1.0, 0.9, 0.8], w0[0]))

## Define loss function: logloss
Here add a small variable epsilon to y^ incase y^ equals 0.   
Parameter `label` is the ground truth and `prob` is the predicted y^.
![jupyter](./images/logloss.png)

In [96]:
# define logloss
epsilon = 0.0000001
def logloss(label, prob):
    return -1 * (label * math.log(prob + epsilon, e) + (1 - label) * math.log(1 - prob + epsilon, e))

# define losloss derivative
def d_logloss(label, prob):
    return -1 * label / (prob+epsilon) + (1 - label)/(1 - prob + epsilon)

print(logloss(1, 0.5))
print(d_logloss(0, 0.01))
print(d_logloss(1, 0.99))

0.6931469805599654
1.0101009080706154
-1.0101009080706154


## Define the sample data
Let's define the dataset based on below rules:
- If the value of xi in all dimentions > 0.5, y=1  
- If any of the xi <=0.5, y=0

In [97]:
# define training data set x and y
x = [[0.1,0.1, 0.1],
     [0.9, 0.9, 0.9],
     [0.2,0.1,0.1],
     [0.9,0.97,0.89],
     [0.1,0.2,0.1],
     [0.8,0.9,0.9],
     [0.3,0.1,0.4],
     [0.9,0.8,0.7],
     [0.11,0.22,0.15],
     [0.88,0.9,0.9]
   ]
y = [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]

## Train the neural network
### Define feed forward

In [98]:
# define feed forward 
def feed_forward(x):
    h[0] = f(x, w0[0])
    h[1] = f(x, w0[1])
    prob = f(h, w1)
    return prob

### Define backward propagation
![jupyter](./images/backward.png)
To make it simple, use batchsize=1. If batchsize > 1, need to sum up d_w for all batch data records.

In [99]:
# define backward propagation
# w = w - lr * d_w
def back_propagation(iteration, x, prob, label, LR):
    
    w00_tmp = [w0[0][i] for i in range(len(w0[0]))]
    w01_tmp = [w0[1][i] for i in range(len(w0[1]))]
    w1_tmp = [w1[i] for i in range(len(w1))]
    # update w0[0]
    for i in range(len(w0[0])):
        w0[0][i] = w0[0][i] - LR * d_logloss(label, prob) * ( d_sigmoid(f(h, w1_tmp, False))*w1_tmp[0] ) * ( d_sigmoid(f(x, w00_tmp, False))*x[i] )
        #print("1", label, d_logloss(label, prob) * ( d_sigmoid(f(h, w1_tmp, False))*w1_tmp[0] ) * ( d_sigmoid(f(x, w00_tmp, False))*x[i] ))
    
    # update w0[1]
    for i in range(len(w0[1])):
        w0[1][i] = w0[1][i] - LR * d_logloss(label, prob) * ( d_sigmoid(f(h, w1_tmp, False))*w1_tmp[1] ) * ( d_sigmoid(f(x, w01_tmp, False))*x[i] )
        #print("2", label, d_logloss(label, prob) * ( d_sigmoid(f(h, w1_tmp, False))*w1_tmp[1] ) * ( d_sigmoid(f(x, w01_tmp, False))*x[i] ))

    # update w[1]
    for i in range(len(w1)):
        w1[i] = w1[i] - LR * d_logloss(label, prob) * d_sigmoid(f(h, w1_tmp, False)) * h[i]
        #print("3", label, d_logloss(label, prob) * d_sigmoid(f(h, w1_tmp, False)) * h[i])

### Start the training
Run 100 epoch and print the loss. the loss is reduced during the training and hence we got a trained model after training.

In [100]:
# start training
EPOCH = 100
for epo in range(EPOCH):
    loss = 0
    for i in range(len(x)):  # batch size=1
        prob = feed_forward(x[i])
        #print(prob, y[i])
        back_propagation(i, x[i], prob, y[i], 0.4)
        loss += logloss(y[i], prob)
    
    if (epo % 10 == 0):
        print("\nEpoch {}:".format(epo))
        print("w0: " + str(w0))
        print("w1: " + str(w1))
        print("loss:", loss/len(x))
        #print(feed_forward([0.1, 0.1, 0.1]))
        #print(feed_forward([0.9, 0.9, 0.9]))



Epoch 0:
w0: [[0.6423246656512983, 0.6425862575905961, 0.6436168409544221], [0.488754480452788, 0.48908115207036446, 0.48819181249045596]]
w1: [0.1811927657376681, 0.560779820435523]
loss: 0.8769063993807341

Epoch 10:
w0: [[0.38357211213971754, 0.3897357260739514, 0.3832595451841559], [0.1763071217050511, 0.2182758441658364, 0.1567285414667283]]
w1: [0.45462153789899573, 0.24526385152143668]
loss: 0.8266556537312943

Epoch 20:
w0: [[0.26763635640782163, 0.3805026795049878, 0.17308846566115746], [-0.34516348277751985, -0.4115679812866288, -0.20130057901126192]]
w1: [1.254031925213235, -2.6045413407748526]
loss: 0.3815644337165899

Epoch 30:
w0: [[0.28103721268284404, 0.4763106703940957, 0.07315480374010036], [-0.34226887563116737, -0.575503185141438, 0.04914811610567399]]
w1: [1.839289811071246, -3.934247333002802]
loss: 0.23111074577590185

Epoch 40:
w0: [[0.2902508603697472, 0.5043964486216846, 0.03251306107526391], [-0.35282281240372715, -0.6368249780711157, 0.14651047174195442]]
w

## Let's do some prediction :)
Now the neural network is trained, we could use this network to predict some new coming data records.

In [101]:
# predict 
print(feed_forward([0.9, 0.8, 0.92]))
print(feed_forward([0.11, 0.2, 0.18]))

0.8802025264184699
0.1065727392355342


In [102]:
print(feed_forward([0.9, 0.1, 0.92]))

0.24739257024682754
