# Shallow neural network

## Neural network overview

- Logistic regression: $(x, w, b) \rightarrow z = w^{T}x +b \rightarrow a = \sigma(z) \rightarrow \mathcal{L}(a, y)$
- Neural network is a stack of multiple of sigmoid units: $(x, W^{[1]}, b^{[1]}) \rightarrow z^{[1]} = W^{[1]}x +b^{[1]} \rightarrow a^{[1]} = \sigma(z^{[1]}) \rightarrow (x, W^{[2]}, b^{[2]}) \rightarrow z^{[2]} = W^{[2]}x +b^{[2]} \rightarrow a^{[2]} = \sigma(z^{[2]}) \rightarrow \mathcal{L}(a^{[2]}, y)$
- Backward computation to compute derivative: $da^{[2]} \rightarrow dz^{[2]} \rightarrow (dW^{[2]}, db^{[2]}) \rightarrow da^{[1]} \rightarrow dz^{[1]} \rightarrow (dW^{[1]}, db^{[1]})$


## Neural network representation (2 layers)

Normally we will not count the first layer or input layer, the 3 layers neural network will include these following layers:

- Input layer passs the input X to the hidden layer: 
$
a^{[0]} = X = \begin{bmatrix}
    x_{1} \\
    x_{2} \\
    x_{3} \\
    ... \\
    x_{m} \\
\end{bmatrix}
$ 
- Hidden layer: 
$
a^{[1]} = \begin{bmatrix}
    a^{[1]}_{1} \\
    a^{[1]}_{2} \\
    a^{[1]}_{3} \\
    ... \\
    a^{[1]}_{n} \\
\end{bmatrix}
$ 
associate with parameters 
$
W^{[1]} = \begin{bmatrix}
    w^{[1]}_{11} & w^{[1]}_{12} ... & w^{[1]}_{1m}  \\
    w^{[1]}_{21} & w^{[1]}_{22} ... & w^{[1]}_{2m}  \\
    w^{[1]}_{31} & w^{[1]}_{32} ... & w^{[1]}_{3m}  \\
    ... \\
    w^{[1]}_{n1} & w^{[1]}_{n2} ... & w^{[1]}_{nm}  \\
\end{bmatrix}
$ 
(n x m because we have n nodes in layer 1 and m nodes in the input layer) and
$
b^{[1]} = \begin{bmatrix}
    b^{[1]}_{1} \\
    w^{[1]}_{2} \\
    w^{[1]}_{3} \\
    ... \\
    w^{[1]}_{n} \\
\end{bmatrix}
$ (n x 1).
- Output layer: $a^{[2]} = \hat{y}$ associate with parameters 
$
W^{[2]} = \begin{bmatrix}
    w^{[2]}_{1} & w^{[2]}_{2} ... & w^{[2]}_{n} 
\end{bmatrix}
$ 
(1 x n because we have 1 node in the output layer and n nodes in the layer 2) and
$
b^{[2]} 
$ (1 x 1).

## Computing the neural network output

### Computing the logistic regression output
To compute the output of logistic regression we do 2 steps as follow:
- Compute $z$ with a linear function: $z = w^Tx + b$
- Compute activation $a$ as a sigmoid function of $z$: $a = \sigma(z)$

### Computing the neural network
Computing the output of neural network is similar to the logistic regression but repeated for a lot of time
- $Z^{[l]}_{i} = (W^{[l]}_i)^Tx + b^{[l]}_{i}$
- $a^{[1]}_{i} = \sigma(z^{[l]}_{i})$

With $l$ is layer and $i$ is the node in layer

**Vectorize**

$
Z^{[l]} = 
\begin{bmatrix}
    z^{[l]}_1 \\
    z^{[l]}_2 \\
    ... \\
    z^{[l]}_n \\
\end{bmatrix}
= 
\begin{bmatrix}
    (w^{[l]}_1)^Tx + b^{[l]}_1\\
    (w^{[l]}_2)^Tx + b^{[l]}_2\\
    ... \\
    (w^{[l]}_n)^Tx + b^{[l]}_n\\
\end{bmatrix}
$

$
a^{[l]} = 
\begin{bmatrix}
    a^{[l]}_1 \\
    a^{[l]}_2 \\
    ... \\
    a^{[l]}_n \\
\end{bmatrix}
= 
\begin{bmatrix}
    \sigma(z^{[l]}_1) \\
    \sigma(z^{[l]}_2) \\
    ... \\
    \sigma(z^{[l]}_n) \\
\end{bmatrix}
$

**Given input x, we have:**

$z^{[1]} = W^{[1]}x + b^{[1]}$

$a^{[1]} = \sigma(z^{[1]})$

$z^{[2]} = W^{[2]}a^{[1]} + b^{[2]}$

$\hat{y} = a^{[2]} = \sigma(z^{[2]})$

## Vectorizing across multiple examples