# Introduction
Consider the following Neural Network,

<img src = "../artifacts/neural_networks_36.png" alt = "drawing" width = "500">

The computation graph for the above looks as follows,

<img src = "../artifacts/neural_networks_37.png" alt = "drawing" width = "500">

# Forward Propagation
In forward propagation, the propagation is from left to right. The following is done during a forward pass (forward propagation),
- Calculate the value of $z_i$.
- Apply activation function on top of it.
- Then pass it to the Neuron in front of it.
- Ultimately, the probabilities are obtained.
- Then these probabilities are used to calculate the loss. Since it is multi-class classification problem, the loss function used is categorical cross entropy.

The final objective is to compute $z^2$.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
df = pd.read_csv("spiral.csv")
df.head()

Unnamed: 0,x1,x2,y
0,0.0,0.0,0
1,-0.00065,0.01008,0
2,0.009809,0.017661,0
3,0.007487,0.029364,0
4,-2.7e-05,0.040404,0


In [3]:
# separating features and labels
x = df.drop(columns = ["y"])
y = df["y"]
x.shape, y.shape

((300, 2), (300,))

In [4]:
# initialize the parameters at random
d = 2 # dimensions or number of inputs
n = 3 # number of classes or number of neurons in the output layer
h = 4 # number of neurons in the hidden layer

In [5]:
# input layer to the hidden layer
# weight and bias of layer 1
w1 = 0.01 * np.random.randn(d, h)
b1 = np.zeros((1, h))
w1.shape, b1.shape

((2, 4), (1, 4))

### Calculating $z^1$
Each row of $x$ is multiplied with each column of $w_1$ and bias is added to the result of this.

In [6]:
# z1 = np.dot(x, w) + b
z1 = np.dot(x, w1) + b1
z1.shape

(300, 4)

### Calculating $a^1$
The ReLU function is applied to $z^1$.

In [7]:
# ReLU activation function
a1 = np.maximum(0, z1)
a1.shape

(300, 4)

In [8]:
# hidden layer to the output layer
# weight and bias of layer 2
w2 = 0.01 * np.random.randn(h, n)
b2 = np.zeros((1, n))
w2.shape, b2.shape

((4, 3), (1, 3))

### Calculating $z^2$
In order to calculate $z^2$, $a^1$ is multiplied with $w_2$ and the bias $b^2$ is added to the result.

In [9]:
z2 = np.dot(a1, w2) + b2
z2.shape

(300, 3)

### Calculating $a^2$

In [10]:
# apply the softmax function to compute a2
z2_exp = np.exp(z2)
a2 = z2_exp/ np.sum(z2_exp, axis = 1, keepdims = True)
probs = a2
probs.shape

(300, 3)

<img src = "../artifacts/neural_networks_38.png" alt = "drawing" width = "500">

# Loss Calculation
### Will the loss function change?
No.

# Backward Propagation
### Will the gradient calculation change in case of n layer Neural Network?
No. But, there is an additional requirement to back propagate the gradients for one additional layer.

In [11]:
# number of data points (training samples)
m = y.shape[0]
m

300

### Calculating $dz^2$

<img src = "artifacts/neural_networks_39.png" alt = "drawing" width = "500">

$dz^2 = \frac{\partial L}{\partial z^2}$

So,

$\frac{\partial L}{\partial z^2} = \frac{\partial L}{\partial a^2} * \frac{\partial a^2}{\partial z^2}$

Here, $a^2$ is the output probabilities.

Replace $a^2$ with $p$, $\frac{\partial L}{\partial z^2} = \frac{\partial L}{\partial p} * \frac{\partial p}{\partial z^2}$

The above equation is similar to what was calculated previously, i.e., derivative of loss with respect to $z$.

$dz = \frac{\partial J}{\partial p} * \frac{\partial p}{\partial z}$.

The derivative came out to be, $dz = (p_i - I(i = y))$

Hence, $dz^2 = (p_i - I(i = y))$.

In [12]:
dz2 = probs
dz2[range(m), y] -= 1

The shape of $dz^2$ is the same as the shape of probabilities, `(m, n)` (i.e., in this case `(300, 3)`).

### Calculating $dw^2$ and $db^2$
Gradient calculation for $dw^2$ and $db^2$ will also be similar to $dw$ and $db$ as it was in the softmax classifier.

<img src = "../artifacts/neural_networks_40.png" alt = "drawing" width = "500">

