# Neural Networks - Model Representation - II - Forward Propogation
Look at how to carry out the computation efficiently through a <font color="blue">vectorized implementation</font>. 
<br>We'll also consider <font color="blue">why NNs are good for learning complex non-linear things</font>

## Original problem from before
Sequence of steps to compute output of hypothesis are the equations below
<img src="images/four nodes - activation example.png">

## Define some additional terms
- $z_1^2 = Ɵ$<sub>10</sub>$^1x_0 + Ɵ$<sub>11</sub>$^1x_1 + Ɵ$<sub>12</sub>$^1x_2 + Ɵ$<sub>13</sub>$^1x_3$
- Which means that $a_1^2 = g(z_1^2)$
- N.B. Superscript numbers are the layer associated
- Similarly, we define the others as $z_2^2$ and $z_3^2$
  - These values are just a linear combination of the values


## We can vectorize the neural network computation
- define $x$ as the feature vector $x$
- $z$<sup>(2)</sup> as the vector of $z$ values from the second layer
<img src="images/vectorize-NN-1.png">
- $z$<sup>(2)</sup> is a [3x1] vector
- We can vectorize the computation of the neural network as as follows in two steps
  - $z$<sup>(2)</sup> $= Ɵ$<sup>(1)</sup>$x$
    - $Ɵ$<sup>(1) is the matrix defined above
    - $x$ is the feature vector
  - $a$<sup>(2)</sup> $= g(z$<sup>(2)</sup>$)$
    - $z$<sup>(2)</sup> is a [3x1] vecor
    - $a$<sup>(2)</sup> is also a [3x1] vector
    - $g()$ applies the *sigmoid* (logistic) function element wise to each member of the $z$<sup>(2)</sup> vector
  - For notation completeness $a$<sup>(1)</sup> $= x$
    - $a$<sup>(1)</sup> is the activations in the input layer
    - Obviously the "activation" for the input layer is just the input!
  - Having calculated then $z$<sup>(2)</sup> vector, we need to calculate $a_0^2$ 
    for the final hypothesis calculation
    <img src="images/vectorize-NN-2.png">

- This process is also called **<font color="blue">Forward Propagation</font>**
  - Start off with activations of input unit (i.e. the $x$ vector as input)
  - Forward propagate and calculate the activation of each layer sequentially
  - This is a vectorized version of this implementation

## Neural networks learning its own features
Diagram below looks a lot like logistic regression
- Layer 3 is a logistic regression node
- The only difference is, instead of input a feature vector, 
  the features are just values calculated by the hidden layer
- <font color="red">The features $a_1^2$, $a_2^2$, and $a_3^2$ are 
  calculated/learned - not original features</font>
- So instead of being constrained by the original input features, 
  a neural network can learn its own features to feed into logistic regression
<img src="images/vectorize-NN-3.png">


## Network Architectures
As well as the networks already seen, other architectures (topology) are possible
- More/less nodes per layer
- More layers
  - By the time you get to the output layer you get very interesting non-linear hypothesis
<img src="images/NN Architecture - multiple layers.png">