# Notations

\[1\]: First layer

\[2\]: Second layer

...

\[n\]: n-th layer

From the computation graph below, we can deduce the associated notation of each layer of our neural network. This is what we will dive into for this week's videos. 

![Neural Network Computation Graph](images/nn-graph-small.png)

# Neural Network Representation

We start with giving some names for easier reference of our neural network representation.

1. Input layer: takes in the inputs directly
2. Hidden layer: layers/nodes between input and output layers
3. Output layer: the final activation before output $\hat{y}$

![Neural Network Representation](images/neural-network-representation-small.png)

An alternative notation to represent the inputs to our neural network is to use $a^{[0]}$ - the activations in the zero-th layer.

Subsequently, the hidden layer will output $a^{[1]}$ etc. The $i$ node will generate $a^{[1]}_i$.

Finally, our $\hat{y}$ can be denoted as $a^{[2]}$.

We we see above is known as a **2 layer neural network**. By convention we do not count the input layer when naming neural networks.

The parameters for each layer is denoted as $W^{[1]}$ and $b^{[1]}$ etc. with shape `(4, 3)` and `(4, 1)` respectively.

# Computing a Neural Network's Output

Recall that each node in the neural network computes $w^T x + b$ followed by an activation function $\sigma$. Suppose that this node is now in hidden layer 1. The computation for the **first node** in hidden layer 1 would be

$$z_1^{[1]} = w_1^{[1]T}x + b_1^{[1]}$$

$$a_1^{[1]} = \sigma(z_1^{[1]})$$

Similarly, the second node would compute:

$$z_2^{[1]} = w_2^{[1]T}x + b_2^{[1]}$$

$$a_2^{[1]} = \sigma(z_2^{[1]})$$

The same is done for the 3rd and 4th nodes. We could do this in a neural network by running each neuron in the layer through a for loop. But as we learnt in previous videos, that is highly inefficient. So we try to **vectorize** this operation.

## Vectorization

First we stack our 4 $w^{[1]T}$ vectors together to get a matrix of shape `(4, 3)`. Then we multiply this matrix by our $x$ and add the bias vector $b$ to it. Our end result will be a vector with each individual entry as:

$$w_i^{[1]T}x + b_i^{[1]}$$

![Vectorization](images/vectorize-nn-small.png)

We can call this result as $z^{[1]}$. To find the output values for the layer, we just have to apply an activation function on each of the values in $z^{[1]}$. In this case it will be the sigmoid function. We call this result $a^{[1]}$.

To calculate the output value $\hat{y}$, we perform similar operations on the output layer as we did for the hidden layer by taking in $a^{[1]}$ as input for the last output layer neuron. In order to compute the final result, we perform the following:

$$z^{[2]} = W^{[2]}a^{[1]} + b^{[2]}$$

Then we apply the sigmoid function to obtain $a^{[2]}$.