<a href="https://colab.research.google.com/github/tharina11/Deep-Learning-Exercises/blob/main/Simple_Neural_Network_from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **A Simple Neural Network**

A neural network consists of multiple layers (input,hidden, and output). Because of the ability to learn complex trends in input data, neural networks often outperform other algorithms in prediction modeling tasks. This document is created to build a neural network from scratch, and to explore what happens inside a simple neural network, which can be mapped to understand more complex neural network architectures. The code is written follwing this [video](https://www.youtube.com/watch?v=PQCE9ChuIDY&list=PLeo1K3hjS3uu7CxAacxVndI4bE_o3BDtO&index=13) using Google Colab. 

In a neural network, each node extracts features coming from input data/ preceding neurons, that can be used to predict the expected output with a minimum error. In multi layer neural networks, first layers extract low level features (fine details) and last layers extract high level features (coarse details).

Numpy and pandas libraries are used to process and explore data, Maptplotlib is used to plot data, and Tensorflow and Keras (wrapper) libraries are used to build and run the neural network.

Explanations are included within the code blocks where necessary. Comments are used for almost each code block to explain the functionality.

In [None]:
# Import libraries
import numpy as np
import tensorflow as tf
from tensorflow import keras
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

Google Colab is used to build this neural network because of the ease of managing Python packages and ability to visualize data right below the code. 

Data file is uploaded to google drive. The drive should be mounted with an authentication before importing the data file.

In [None]:
# Mount google drive
from google.colab import drive

In [None]:
drive.mount('/content/drive/')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


In [None]:
# import data into a dataframe
df = pd.read_csv('/content/drive/MyDrive/Deep Learning/Codebasics/insurance_data.csv')

In [None]:
# Visualize the first five rows of the dataset
df.head()

Unnamed: 0,age,affordibility,bought_insurance
0,22,1,0
1,25,0,0
2,47,1,1
3,52,0,0
4,46,1,1


The data set consists of age and affrodability for insurance of a group of people. The output of the model is binary; either the customer will buy the insurance or not. 

In [None]:
# Split data into training and testing sets
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(df[['age', 'affordibility']], df.bought_insurance, test_size=0.2, random_state=25)

In [None]:
# View the lengths of training and testing data sets
print(len(x_train), len(x_test))

22 6


The input features have data in different scales. This data should be scaled into one range to speed up the gradient descent convergence. 

For example, when you take the third row from the top and apply to the neural network, your X1= 47 and X2 = 1. Then the weighted sum **(w1 * X1 + w2 * X2)** will be dominated with the **w1* X1** value (assuming w1 and w2 values are close values). This could slow down the convergence of gradient descent.

In [None]:
# Make copies of original data and scale data to the range between 0 and 100
x_train_scaled = x_train.copy()
x_train_scaled['age'] = x_train_scaled['age']/100

x_test_scaled = x_test.copy()
x_test_scaled['age'] = x_test_scaled['age']/100

In [None]:
# View scaled data
x_train_scaled

Unnamed: 0,age,affordibility
0,0.22,1
13,0.29,0
6,0.55,0
17,0.58,1
24,0.5,1
19,0.18,1
25,0.54,1
16,0.25,0
20,0.21,1
3,0.52,0


Now the features are in the range between 0 and 1.

Next we can create a one input layer and one output layer neural network using Keras Sequential function with the follwing values:
- Number of nuerons in the output layer is one, becuase the output is 1 or 0, which tells that the person will buy insurance or not
- Input shape is set to 2, because there are two input features
- Activation function is set to sigmoid because the output is binary
- Kernel intializer is set to one, so the weights of the connections between nodes is 1
- Bias initializer is set to zero, so the bias will be set to 0 at the start

Once the model parameters set, model complie parameters are set as below.
- Optimizer is set to 'adam',. Adam is an extension for stochasitc Gradient descent and it updates the weights during training.
- Loss is set to binary cross entropy because this is a classification problem
- Metric is set to accuray (quantifies the how much the predictions gets equal to the labels)




In [None]:
# Simple neural network
model = keras.Sequential([
    keras.layers.Dense(1, input_shape=(2,), activation= 'sigmoid', kernel_initializer='ones',
                       bias_initializer= 'zeros'
                       )
                        ])
model.compile(optimizer = 'adam',
              loss = 'binary_crossentropy', 
              metrics = ['accuracy'])

model.fit(x_train_scaled, y_train, epochs = 5000) # number of epochs are decided through trial and error

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Epoch 1832/5000
Epoch 1833/5000
Epoch 1834/5000
Epoch 1835/5000
Epoch 1836/5000
Epoch 1837/5000
Epoch 1838/5000
Epoch 1839/5000
Epoch 1840/5000
Epoch 1841/5000
Epoch 1842/5000
Epoch 1843/5000
Epoch 1844/5000
Epoch 1845/5000
Epoch 1846/5000
Epoch 1847/5000
Epoch 1848/5000
Epoch 1849/5000
Epoch 1850/5000
Epoch 1851/5000
Epoch 1852/5000
Epoch 1853/5000
Epoch 1854/5000
Epoch 1855/5000
Epoch 1856/5000
Epoch 1857/5000
Epoch 1858/5000
Epoch 1859/5000
Epoch 1860/5000
Epoch 1861/5000
Epoch 1862/5000
Epoch 1863/5000
Epoch 1864/5000
Epoch 1865/5000
Epoch 1866/5000
Epoch 1867/5000
Epoch 1868/5000
Epoch 1869/5000
Epoch 1870/5000
Epoch 1871/5000
Epoch 1872/5000
Epoch 1873/5000
Epoch 1874/5000
Epoch 1875/5000
Epoch 1876/5000
Epoch 1877/5000
Epoch 1878/5000
Epoch 1879/5000
Epoch 1880/5000
Epoch 1881/5000
Epoch 1882/5000
Epoch 1883/5000
Epoch 1884/5000
Epoch 1885/5000
Epoch 1886/5000
Epoch 1887/5000
Epoch 1888/5000
Epoch 1889/5000
Epoch 1

<keras.callbacks.History at 0x7f72781411c0>

Model fitted to the training dataset and model weighets are optimized. The model resulted in final accuracy of 0.91 and loss of 0.46. Let's evaluate the model with the testing set which is not seen by the model during training. 

In [None]:
# Evaluate the model using testing data
model.evaluate(x_test_scaled, y_test)



[0.3549776077270508, 1.0]

Accuracy of the model is 1.0 which is perfect. This is because of the limited number of data points we used for evaluation. If the data set is large, we do not expect to see 1.0 accuracy.

In [None]:
# View x_test scaled data
x_test_scaled

Unnamed: 0,age,affordibility
2,0.47,1
10,0.18,1
21,0.26,0
11,0.28,1
14,0.49,1
9,0.61,1


Now we can use the model and predict the possibility to buy insurance.

In [None]:
# Predict with test data using model
model.predict(x_test_scaled)



array([[0.7054848 ],
       [0.3556957 ],
       [0.16827832],
       [0.47801185],
       [0.7260696 ],
       [0.8294983 ]], dtype=float32)

These predicted outputs should be rounded to nearest integer to get a binary result. So the rounded output should be [1, 0, 0, 0, 1, 1]

That is exactly what we see in out testing data set (below).

In [None]:
#View the values of y_test
y_test

2     1
10    0
21    0
11    0
14    1
9     1
Name: bought_insurance, dtype: int64

In [None]:
# View w1, w2, and bias from the Tensorflow model
coef, intercept = model.get_weights()
coef, intercept

(array([[5.060863 ],
        [1.4086521]], dtype=float32), array([-2.913703], dtype=float32))

We can implement this neural network from Scratch without using tensorflow. First we will write the functions of a neural network individually and test samples.

In [None]:
# Sigmoid funciton
def sigmoid(x):
  import math
  return 1/ (1+ math.exp(-x))
sigmoid(18)


0.9999999847700205

In [None]:
# Function to calculate weighted sum of input variables and apply sigmoid function
def prediction_function(age, affordability):
  weighted_sum = coef[0]*age +coef[1]*affordability + intercept
  return sigmoid(weighted_sum)

Let's predict the first sample of our testing set.

In [None]:
# Predict first data point of test data
prediction_function(0.47 ,1)

0.705484819775958

The predicted value by tensorflow for the first sample in testing set is 0.7054848. So the prediction function works.

In [None]:
# log loss function
def log_loss(actual, predicted):
  epsilon = 1e-15
  predicted_new = [max(i, epsilon) for i in predicted]
  predicted_new = [min(i, 1- epsilon) for i in predicted_new]
  predicted_new = np.array(predicted_new)
  return -np.mean(actual*np.log(predicted_new)+(1-actual)*np.log(1-predicted_new))

In [None]:
# Sigmoid function using numpy
def sigmoid_numpy(x):
  return 1/(1+np.exp(-x))

Sigmoid function defined with numpy just because numpy works faster with arrays compared to the sigmoid function we defined previously.

Let's put all the functions together and write the gradient descent function.

In [None]:
# Gradient Descent function
def gradient_descent(age, affordability, y_true, epochs, loss_threshold):
  # w1, w2, bias
  w1 = w2 = 1
  bias = 0
  rate = 0.5
  n = len(age)

  for i in range(epochs):
    weighted_sum = w1 * age + w2 *affordability + bias
    y_predicted = sigmoid_numpy(weighted_sum)

    loss = log_loss(y_true, y_predicted)

    w1d = (1/n)*np.dot(np.transpose(age), (y_predicted - y_true))
    w2d = (1/n)*np.dot(np.transpose(affordability), (y_predicted - y_true))

    bias_d = np.mean(y_predicted- y_true)

    w1 = w1 - rate *w1d
    w2 = w2 - rate *w2d
    bias = bias - rate * bias_d

    print(f'Epoch:{i}, w1:{w1}, bias:{bias}, loss:{loss}')
    
    if loss < loss_threshold:
      break

  return w1, w2, bias


In [None]:
gradient_descent(x_train_scaled['age'], x_train_scaled['affordibility'], y_train, 1000, 0.4631)

Epoch:0, w1:0.974907633470177, bias:-0.11341867736368583, loss:0.7113403233723417
Epoch:1, w1:0.9556229728273669, bias:-0.2122349122718517, loss:0.681264778737757
Epoch:2, w1:0.9416488476693794, bias:-0.2977578997796538, loss:0.6591474252715025
Epoch:3, w1:0.9323916996249162, bias:-0.3715094724003511, loss:0.6431523291301917
Epoch:4, w1:0.9272267472726993, bias:-0.43506643026891584, loss:0.6316873063379158
Epoch:5, w1:0.9255469396815343, bias:-0.48994490058938817, loss:0.623471707997592
Epoch:6, w1:0.9267936114129968, bias:-0.5375299543522853, loss:0.6175321183044205
Epoch:7, w1:0.93047170420295, bias:-0.5790424270894963, loss:0.6131591858705934
Epoch:8, w1:0.9361540784567942, bias:-0.6155315088627655, loss:0.6098518179750948
Epoch:9, w1:0.9434791243557357, bias:-0.6478828179413606, loss:0.6072639970231438
Epoch:10, w1:0.9521448361628082, bias:-0.6768343869109611, loss:0.6051606942838051
Epoch:11, w1:0.9619014360798376, bias:-0.7029956527236098, loss:0.6033841405177724
Epoch:12, w1:0.9

(5.051047623653049, 1.4569794548473887, -2.9596534546250037)

Now our gradient descent function run is completed. Here is the final output of for w1, w2, and bias respectively.

(5.051047623653049, 1.4569794548473887, -2.9596534546250037)


Outputs from Tensorflow model:

w1 = 5.060863

w2 = 1.4086521

bias = -2.913703

The values are very close!

We can create a neural network from scratch using the functions we defined above. 

In [None]:
class myNN:
  def __init__(self):
    self.w1 = 1
    self.w2 = 1
    self.bias = 0

  def fit(self, X, y, epochs, loss_threshold):
    self.w1, self.w2, self.bias = self.gradient_descent(X['age'], X['affordibility'], y, epochs, loss_threshold)
 
  def predict(self, x_test):
    weighted_sum = self.w1 * x_test['age'] + self.w2 * x_test['affordibility'] + self.bias
    return sigmoid_numpy(weighted_sum)

  # Gradient Descent function
  def gradient_descent(self, age, affordability, y_true, epochs, loss_threshold):
    # w1, w2, bias
    w1 = w2 = 1
    bias = 0
    rate = 0.5
    n = len(age)

    for i in range(epochs):
      weighted_sum = w1 * age + w2 *affordability + bias
      y_predicted = sigmoid_numpy(weighted_sum)

      loss = log_loss(y_true, y_predicted)

      w1d = (1/n)*np.dot(np.transpose(age), (y_predicted - y_true))
      w2d = (1/n)*np.dot(np.transpose(affordability), (y_predicted - y_true))

      bias_d = np.mean(y_predicted- y_true)

      w1 = w1 - rate *w1d
      w2 = w2 - rate *w2d
      bias = bias - rate * bias_d
      if i%50==0:
        print(f'Epoch:{i}, w1:{w1}, bias:{bias}, loss:{loss}')
      
      if loss < loss_threshold:
        print(f'Epoch:{i}, w1:{w1}, bias:{bias}, loss:{loss}')
        break

    return w1, w2, bias

Let's fit the model to train data and evaluate it with test data.

In [None]:
customModel = myNN()
customModel.fit(x_train_scaled, y_train, epochs=500, loss_threshold= 0.4631)

Epoch:0, w1:0.974907633470177, bias:-0.11341867736368583, loss:0.7113403233723417
Epoch:50, w1:1.503319554173139, bias:-1.2319047301235464, loss:0.5675865113475955
Epoch:100, w1:2.200713131760032, bias:-1.6607009122062801, loss:0.5390680417774752
Epoch:150, w1:2.8495727769689085, bias:-1.986105845859897, loss:0.5176462164249294
Epoch:200, w1:3.443016970881803, bias:-2.2571369883752723, loss:0.5005011269691375
Epoch:250, w1:3.982450494649576, bias:-2.494377365971801, loss:0.48654089537617085
Epoch:300, w1:4.472179522095915, bias:-2.707387811922373, loss:0.4750814640632793
Epoch:350, w1:4.917245868007634, bias:-2.901176333556766, loss:0.46561475306999006
Epoch:366, w1:5.051047623653049, bias:-2.9596534546250037, loss:0.46293944095888917


In [None]:
customModel.predict(x_test_scaled)

2     0.705020
10    0.355836
21    0.161599
11    0.477919
14    0.725586
9     0.828987
dtype: float64

The weights and bias values, and the predicted values are very close to the values in the tensorflow model. Also, after rounding the predicted values, the results are similar to the y_test values: [1, 0, 0, 0, 1, 1].
