## Understanding the Logistic Regression as Neuron
Logistic Regression is the algorithm used for binary classification

<img src="LR.gif"/>

###### First we shall create a toy dataset
This toy dataset will be easy to visualize what is happening

Let say we want to classify data as either it is <strong> A Mango </strong> or <strong>  Not A Mango </strong>

In [1]:
# importing the necessary libraries
import numpy as np

            # Weight, Class 1 = Mango and Class 0 = Not-Mango
data = np.array([[50, 1], 
                 [75, 1],
                 [100, 0],
                 [150, 0]])

# clearly we can see that if the weight < 100 it is a mango and weight >= 100 not a mango

In [2]:
# Separate the examples with features and their class labels from data
# x = examples and y = class label for each example
x = data[:, 0].reshape(1, 4)         # reshape is used to avoid rank 1 array
y = data[:, 1].reshape(1, 4) 
print("Shape of x:", x.shape)        # feel free to try without reshape and check the shape
print("Shape of y:",y.shape)
y

Shape of x: (1, 4)
Shape of y: (1, 4)


array([[1, 1, 0, 0]])

### Initializing the parameters
Note: We can initialize the weights and bias in the LR to Zero but for NN we should do Random Initialization for weights, else the weights corresponding to inputs going into each hidden unit areidentical and prevents NN to learn any new feature
- Use: `np.random.randn(a,b) * 0.01` to randomly initialize a matrix of shape (a,b)

In [3]:
w = np.zeros((x.shape[0], 1)) # feel free to randomly initialize the w
b = 0 

print("w: ", w)
print("Shape of W: ", w.shape)

w:  [[0.]]
Shape of W:  (1, 1)


### Forward Propagation
We shall use "Sigmoid" activation function 

$$ sigmoid = \sigma(z) = \frac{1}{1 + e^{-z}} $$

compute $A = \sigma(w^T X + b) = (a^{(1)}, a^{(2)}, ..., a^{(m-1)}, a^{(m)})$

In [4]:
def sigmoid(x):
    return (1 / (1 + np.exp(-x)))

# linear computation
z = np.dot(w.T, x) + b
print(z)

# using sigmoid
A = sigmoid(z)
print("Computed Sigmoid: ",A)
A.shape

[[0. 0. 0. 0.]]
Computed Sigmoid:  [[0.5 0.5 0.5 0.5]]


(1, 4)

### Computing the Cost Function
$$ J(\theta) =  \frac{-1}{m} \left[ \sum_ {i = 1}^{m} y^{(i)}\log(h_\theta(x^{(i)}) + (1 - y^{(i)})\log(1 - h_\theta(x^{(i)})) \right]$$

In [5]:
# y = 0
# a = 0.5
# loss = y * np.log(a) + (1-y)*np.log(1-a)
# loss

In [6]:
# Calculating the cost
m = x.shape[1] # total number of examples in the data is given by x.shape[1] = 4
costs = [] # list to maintain cost for each iteration otherwise it would replace the value in cost variable
cost = - np.sum(( y*np.log(A) + ((1-y)*np.log(1-A)))) / m
costs.append(cost)
print(cost)

0.6931471805599453


### Backward Propagation
<img src="Backprop.gif"/>
<strong>
1. Loss function $L(\theta)$ computes error for single training example <br>
2. Cost function $J(\Theta)$ is the average of the loss functions of the entire training set. 
</strong>

- $$ \frac {\partial} {\partial a}L = -\frac{y}{a} + \frac{1-y}{1-a}$$
- $$ \frac {\partial} {\partial z}L = a - y $$
- $$ \frac {\partial} {\partial w}L = x.\partial z $$
- $$ \frac {\partial} {\partial b}L = \partial z $$

In [7]:
# Backward propagation
# dL/da = -(y/a) + ((1-y)/(1-a))

print("A: ",A)
print("y: ",y)

dz = (A - y)                  # dL/dz = a - y
dw = np.dot(x, dz.T) / m      # dL/dw = x.dz
db = np.sum(dz) / m           # dL/db = dz
print("dZ: ",dz)
print("dW: ",dw)
print("db: ",db)

A:  [[0.5 0.5 0.5 0.5]]
y:  [[1 1 0 0]]
dZ:  [[-0.5 -0.5  0.5  0.5]]
dW:  [[15.625]]
db:  0.0


### Gradient Checking
$$\frac {\partial}{\partial \theta} J(\theta) \approx \frac{J(\theta + \epsilon) - J(\theta - \epsilon)}{2\epsilon} , \epsilon = 10^{-4}$$

In [8]:
epsilon = 10**(-4)
print(w.shape)

def compute_cost(weights, bias, x, y, m):
    z = np.dot(weights.T, x) + bias
    A = sigmoid(z)
    cost = - np.sum(( y*np.log(A) + ((1-y)*np.log(1-A)) )) / m
    return cost
    
def grad_check(weights, bias, x, y, m):
    pos_weights = np.copy(weights)
    neg_weights = np.copy(weights)
    
    pos_weights[0] = pos_weights[0] + epsilon
    neg_weights[0] = neg_weights[0] - epsilon
    
    compute_pos_cost = compute_cost(pos_weights, bias, x, y, m)
#     print(compute_pos_cost)
    compute_neg_cost = compute_cost(neg_weights, bias, x, y, m)
#     print(compute_neg_cost)
    grad_cost = (compute_pos_cost - compute_neg_cost) / (2 * epsilon)
    return grad_cost

(1, 1)


In [9]:
# compare the value of dW from above cell
grad_check(w, b, x, y, m)

# we can see that the values are approx. same, hence the implementation of backpropagation is correct

15.624999999999666

### Optimize / Update parameters
Gradient Descent

In [10]:
# updating the parameters 
learning_rate = 0.001
w = w - learning_rate * dw
b = b - learning_rate * db

In [11]:
# updated parameters
print(w)
print(b)

[[-0.015625]]
0.0


### Predict the training set

In [12]:
# predicting the training set with new parameters value
predict = sigmoid(np.dot(w.T, x) + b)
predict

array([[0.31405054, 0.23651624, 0.17328821, 0.08756384]])

We can clearly see that training was poor as we wanted it to output [1, 1, 0, 0]

## 2nd iteration 
we shall use the previous weights and bias that was calculated to perform this iteration

In [13]:
# carry forward the w and b 
z = np.dot(w.T, x) + b
A = sigmoid(z)

# compute the cost
cost = - np.sum(( y*np.log(A) + ((1-y)*np.log(1-A)))) / m
costs.append(cost)

# backpropagation
dz = (A - y)                  
dw = np.dot(x, dz.T) / m  
# uncomment below two lines to perform gradient checking 
# print(dw)
# print(grad_check(w, b, x, y, m))
db = np.sum(dz) / m           

# update the parameteres
w = w - learning_rate * dw
b = b - learning_rate * db

# predict the training set
predict = sigmoid(np.dot(w.T, x) + b)
print("Predict: ", predict)
print("Costs: ", costs)

[[-15.27383962]]
-15.273687042650641
Predict:  [[0.49568489 0.4934904  0.49129616 0.48690876]]
Costs:  [0.6931471805599453, 0.7204690131374397]


## 3rd iteration 

In [14]:
# carry forward the w and b 
z = np.dot(w.T, x) + b
A = sigmoid(z)

# compute the cost
cost = - np.sum(( y*np.log(A) + ((1-y)*np.log(1-A)))) / m
costs.append(cost)

# backpropagation
dz = (A - y)                  
dw = np.dot(x, dz.T) / m      
db = np.sum(dz) / m           

# update the parameteres
w = w - learning_rate * dw
b = b - learning_rate * db

# predict the training set
predict = sigmoid(np.dot(w.T, x) + b)
print("Predict: ", predict)
print("Costs: ", costs)

Predict:  [[0.31989005 0.24387178 0.18110786 0.09419627]]
Costs:  [0.6931471805599453, 0.7204690131374397, 0.6878144031746712]


In [15]:
# doing the same operation for 50000 iterations
for i in range(50000):
    # carry forward the w and b 
    z = np.dot(w.T, x) + b
    A = sigmoid(z)

    # compute the cost
    cost = - np.sum(( y*np.log(A) + ((1-y)*np.log(1-A)))) / m
    costs.append(cost)
    # prints cost at every 5000th iteration
    if i%5000 == 0:
        print(i, cost)

    # backpropagation
    dz = (A - y)                  
    dw = np.dot(x, dz.T) / m      
    db = np.sum(dz) / m           

    # update the parameteres
    w = w - learning_rate * dw
    b = b - learning_rate * db

    # predict the training set
    predict = sigmoid(np.dot(w.T, x) + b)

# checking the final prediction
print("Final prediction: ", predict)

0 0.7124065426563126
5000 0.5606916012290944
10000 0.48369676691299734
15000 0.4329869696643792
20000 0.39380795672215
25000 0.36280721422234197
30000 0.33771619182749435
35000 0.31698682328018823
40000 0.2995459576565257
45000 0.28463556038115834
Final prediction:  [[0.84076563 0.60911763 0.31502586 0.03851658]]
