<a href="https://colab.research.google.com/github/vivekpatel99/ML-Algorithms/blob/main/Gradient_Descent_For_Neural_Network_from_scrach.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Implement Gradient Descent For Neural Network (or Logistic Regression)

This dataset contains details of 1000 customers who intend to buy a car, considering their annual salaries. \

Columns:
* User ID
* Gender
* Age
* Annual Salary
* Purchase Decision (No = 0; Yes = 1) (PREDICT)

It is a binary logistic regression problem as there are only two possible outcomes (i.e. Yes/No).

In [75]:
!pip install opendatasets --upgrade --quiet

In [76]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
import pandas as pd
from matplotlib import pyplot as plt
import pandas as pd
import os
import pandas as pd
import opendatasets as od

In [77]:
dataset_url = 'https://www.kaggle.com/datasets/gabrielsantello/cars-purchase-decision-dataset?datasetId=2329085&sortBy=voteCount'
data_dir = './cars-purchase-decision-dataset'
if not os.path.isdir(data_dir):
    od.download(dataset_url)

In [78]:
car_data_csv = os.path.join(data_dir, 'car_data.csv')
car_purchase_df = pd.read_csv(car_data_csv)
car_purchase_df.head()

Unnamed: 0,User ID,Gender,Age,AnnualSalary,Purchased
0,385,Male,35,20000,0
1,681,Male,40,43500,0
2,353,Male,49,74000,0
3,895,Male,40,107500,1
4,661,Male,25,79000,0


Check for missing values

In [79]:
car_purchase_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   User ID       1000 non-null   int64 
 1   Gender        1000 non-null   object
 2   Age           1000 non-null   int64 
 3   AnnualSalary  1000 non-null   int64 
 4   Purchased     1000 non-null   int64 
dtypes: int64(4), object(1)
memory usage: 39.2+ KB


## Gradient Decent

Equation for calculating weights \
$$w_n = w_n - learning rate * \frac{\partial }{\partial w_n}$$

Where $$ \frac{\partial }{\partial w_n}  = \frac{1}{n}\sum_{i=1}^{n}x_i (\hat y_i - y_i)$$
Equation for calculating bias \
$$b = b - learning rate * \frac{\partial }{\partial b}$$


## Separate dataset into train and test

In [80]:
from sklearn.model_selection import train_test_split

In [81]:
X =  car_purchase_df.drop(['User ID', 'Purchased'], axis=1)
X.head()

Unnamed: 0,Gender,Age,AnnualSalary
0,Male,35,20000
1,Male,40,43500
2,Male,49,74000
3,Male,40,107500
4,Male,25,79000


In [82]:
X.Gender = X.Gender.map({'Male':1, 'Female': 0})

In [83]:
y = car_purchase_df.Purchased
y

0      0
1      0
2      0
3      1
4      0
      ..
995    0
996    0
997    1
998    1
999    0
Name: Purchased, Length: 1000, dtype: int64

In [84]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify=X.Gender)

## Preprocessing Convert Categorical columns to numerical and scale the data

In [85]:
from sklearn.preprocessing import StandardScaler

In [86]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## Using keras

In [87]:
model = keras.Sequential([
    keras.layers.Dense(1, input_shape=(3,), activation='sigmoid', kernel_initializer='ones', bias_initializer='zeros')
])

In [88]:
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(X_train_scaled, y_train, epochs=500)

Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500
Epoch 54/500
Epoch 55/500
Epoch 56/500
Epoch 57/500
Epoch 58/500
Epoch 59/500
Epoch 60/500
Epoch 61/500
Epoch 62/500
Epoch 63/500
Epoch 64/500
Epoch 65/500
Epoch 66/500
Epoch 67/500
Epoch 68/500
Epoch 69/500
Epoch 70/500
Epoch 71/500
Epoch 72/500
Epoch 73/500
Epoch 74/500
Epoch 75/500
Epoch 76/500
Epoch 77/500
Epoch 78

<keras.callbacks.History at 0x7ffa845df5e0>

In [89]:
model.evaluate(X_test_scaled, y_test)



[0.40394359827041626, 0.828000009059906]

Get the value of weights and bias from the model

In [90]:
coef, intercept = model.get_weights()
coef, intercept

(array([[0.23373315],
        [2.5493822 ],
        [1.2048651 ]], dtype=float32), array([-0.66569364], dtype=float32))

## Lets write our own Gradient Descent now

In [91]:
def sigmoid(x):
    return 1/(1 + np.exp(-x))

Let's write our own prediction function

In [92]:
def prediction_function(gender, age, annual_salary):
    weighted_sum = coef[0]*gender + coef[1]*age + coef[2]*annual_salary + intercept
    return sigmoid(weighted_sum)


Let's define loss function to calculate losses during Gradient discent

$$ log \ loss \ or \ binary \ cross\ entropy = - \frac{1}{n}\sum_{i=0}^{n}y_i log(\hat y_i) +(1 - y_i). log(1-\hat y_i) $$ 

In [93]:
def log_loss(y_true, y_predicted):
    epsilon = 1e-15
    y_predicted_new = [max(i,epsilon) for i in y_predicted]
    y_predicted_new = [min(i,1-epsilon) for i in y_predicted_new]
    y_predicted_new = np.array(y_predicted_new)
    return -np.mean(y_true*np.log(y_predicted_new)+(1-y_true)*np.log(1-y_predicted_new))

Define Gradient descent

Here we are going to use only 3 features (Gender, Age, AnnualSalary), UserId is not useful.
So our Equtions are looking like following, \
for Gender 
$$w_0 = w_0 - learning \ rate * \frac{\partial }{\partial w_0}$$

Where,  \
$$ \frac{\partial }{\partial w_0}  = \frac{1}{n}\sum_{i=1}^{n}x_i (\hat y_i - y_i)$$

for Age 
$$w_1 = w_1 - learning \ rate * \frac{\partial }{\partial w_1}$$

Where,  \
$$ \frac{\partial }{\partial w_1}  = \frac{1}{n}\sum_{i=1}^{n}x_i (\hat y_i - y_i)$$



for AnnualSalary 
$$w_2 = w_2 - learning \ rate * \frac{\partial }{\partial w_2}$$

Where,  \
$$ \frac{\partial }{\partial w_2}  = \frac{1}{n}\sum_{i=1}^{n}x_i (\hat y_i - y_i)$$




for Bias, it is the same equation as before, 
$$b = b - learning \ rate * \frac{\partial }{\partial b}$$

Where,  \
$$ \frac{\partial }{\partial b}  = \frac{1}{n}\sum_{i=1}^{n} (\hat y_i - y_i)$$

In [100]:
def gradient_descent(gender, age, annual_salary, y_true,  epochs, loss_thresold):
    # initialize weights and bias
    w0 = w1 = w2 = 1
    bias = 0
    learning_rate= 0.5
    n = len(gender)
    for i in range(epochs):

        # find first random weights
        weighted_sum = w0*gender + w1*age + w2*annual_salary + bias
        y_pred = sigmoid(weighted_sum)
        loss = log_loss(y_true, y_pred)
        
        # Calculate partial derivatives 
        w0d = 1/n * (gender.T @ (y_pred - y_true))
        w1d = 1/n * (age.T @ (y_pred - y_true))
        w2d = 1/n * (annual_salary.T @ (y_pred - y_true))

        bias_d  = np.mean(y_pred - y_true)

        # Create a final formula for each weight
        w0 = w0 - learning_rate * w0d
        w1 = w1 - learning_rate * w1d
        w2 = w2 - learning_rate * w2d

        bias = bias - learning_rate * bias_d 
        
        print (f'Epoch:{i}, w0:{w0}, w1:{w1}, w2:{w2}, bias:{bias}, loss:{loss}')
        if loss<=loss_thresold:
            break
    return w0, w1, w2, bias


Convert back to pandas dataframe from numpy array

In [103]:
gradient_descent(X_train_scaled[:, 0], # all rows, 0th column
                 X_train_scaled[:, 1], # all rows, 0th column
                 X_train_scaled[:, 2], # all rows, 0th column
                 y_train,
                 1000,
                 0.3626)

Epoch:0, w0:0.9161867405605904, w1:1.071890755288459, w2:0.9952660208681701, bias:-0.04025501819616826, loss:0.5280521057753129
Epoch:1, w0:0.8411830674311904, w1:1.1368132944421787, w2:0.9897514458182458, bias:-0.07703539719304175, loss:0.5017595012111972
Epoch:2, w0:0.7740178925118222, w1:1.1957128162644375, w2:0.9839223802053491, bias:-0.11067121308821626, loss:0.4804085252341044
Epoch:3, w0:0.7138180865503557, w1:1.2494005668408645, w2:0.9781010411558307, bias:-0.14147587623126287, loss:0.4629904365627866
Epoch:4, w0:0.6598069740956147, w1:1.2985658725283915, w2:0.9725047730792751, bias:-0.16973683526869002, loss:0.4487097143439152
Epoch:5, w0:0.6112978666722705, w1:1.3437917609727807, w2:0.9672752211331614, bias:-0.1957128070684041, loss:0.4369405227206958
Epoch:6, w0:0.5676854943067733, w1:1.3855708417059693, w2:0.9624997700203514, bias:-0.2196343604548869, loss:0.42719021777745836
Epoch:7, w0:0.5284369598733688, w1:1.4243198590133632, w2:0.9582271803210524, bias:-0.2417060229688

(0.22088774513080653,
 2.4776232240313854,
 1.1739622769157416,
 -0.6502780752030489)

In [104]:
coef, intercept

(array([[0.23373315],
        [2.5493822 ],
        [1.2048651 ]], dtype=float32), array([-0.66569364], dtype=float32))

As you can notice coef and intercept are really close to each other

Little clean up for Gradient Descent functionn
* we do not need to calculate all the weights separately. numpy can handle it for us. so updated code will look like this

In [115]:
def gradient_descent(X, y_true,  epochs, loss_thresold):
    num_features = X.shape[1] # num of columns/features
    # initialize weights and bias
    weights = np.ones(shape=num_features)
    bias = 0
    learning_rate= 0.5
    n = X.shape[0] # num of rows
    for i in range(epochs):

        # find first random weights
        weighted_sum = (weights @ X.T) + bias
        y_pred = sigmoid(weighted_sum)
        loss = log_loss(y_true, y_pred)
        
        # Calculate partial derivatives 
        weight_deriv = 1/n * (X.T @ (y_pred - y_true))

        bias_d  = np.mean(y_pred - y_true)

        # Create a final formula for each weight
        weights = weights - learning_rate * weight_deriv

        bias = bias - learning_rate * bias_d 
        if i%50==0: #  print only at 50th interation
            print (f'Epoch:{i}, weights:{weights},  bias:{bias}, loss:{loss}')
            
        if loss<=loss_thresold:
            print (f'Epoch:{i}, weights:{weights},  bias:{bias}, loss:{loss}')
            break
    return weights, bias

In [116]:
gradient_descent(X_train_scaled,
                 y_train,
                 1000,
                 0.3626)

Epoch:0, weights:[0.91618674 1.07189076 0.99526602],  bias:-0.04025501819616826, loss:0.5280521057753129
Epoch:50, weights:[0.17434606 2.11007066 1.01739978],  bias:-0.5564826490371374, loss:0.3663762264753037
Epoch:100, weights:[0.1994183  2.34779816 1.1148544 ],  bias:-0.6199921629487526, loss:0.36326155856384584
Epoch:150, weights:[0.21627609 2.44996256 1.16127081],  bias:-0.6439345865685552, loss:0.3626840417657526
Epoch:174, weights:[0.22088775 2.47762322 1.17396228],  bias:-0.6502780752030489, loss:0.3625990673125825


(array([0.22088775, 2.47762322, 1.17396228]), -0.6502780752030489)