# Neural Networks - Model Representation - I
How do we represent neural networks (NNs)?
- Neural networks were developed as a way to simulate networks of neurones

## What does a neurone look like
- Three things to notice
  - **<font color="#1C3387">Cell body</font>**
  - Number of input wires **<font color="#1C3387">(dendrites)</font>**
  - Output wire **<font color="#1C3387">(axon)</font>**
<img src="images/brain neuron.png">
- Simple level
  - Neurone gets one or more **<font color="#1C3387">inputs through dendrites</font>**
  - Does **<font color="#1C3387">processing</font>**
  - Sends **<font color="#1C3387">output down axon</font>** 
    (electric spikes - pulse of electricity via axon to another neurone)

## Artificial neural network - representation of a neurone
- In an artificial neural network, a neurone is a **<font color="#1C3387">logistic unit</font>**
- That logistic computation is just like our previous logistic regression hypothesis calculation
  - Feed input via input wires
  - Logistic unit does computation
  - Sends output down output wires
<img src="images/representation of a neurone.png">
- Very simple model of a neuron's computation
  - Often good to include an $x_0$ input - the **<font color="#1C3387">bias unit</font>** (this is equal to 1)
- This is an artificial neurone with a **<font color="#1C3387">sigmoid (logistic) activation function</font>**
  - $Ɵ$ vector may also be called the **<font color="#1C3387">weights</font>** of a model

## A group of neurones strung together
- Input is $x_1$, $x_2$ and $x_3$
- Three neurones in layer 2 ($a_1^2$, $a_2^2$ and $a_3^2$)
- Final fourth neurone which produces the output ($a_1^3$)
<img src="images/neural network - 1.png">
- First layer is the **<font color="#1C3387">input layer</font>**
- Final layer is the **<font color="#1C3387">output layer</font>** - produces value computed by a hypothesis
- Middle layer(s) are called the **<font color="#1C3387">hidden layers</font>**
  - You don't observe the values processed in the hidden layer
  - Not a great name
  - Can have many hidden layers

## Neural networks - notation
- **<font color="#E30000">$a_i(j)$ - activation of unit $i$ in layer $j$</font>**
  - e.g., $a_1^2$ - is the activation of the 1<sup>st</sup> unit in the second layer
  - By *<font color="#1C3387">activation</font>*, we mean the value which is computed and output by that node
<p>
- **<font color="#E30000">$Ɵ^j$ - matrix of parameters controlling the function mapping 
  from layer $j$ to layer $j + 1$</font>**
  - Parameters are controlling the *<font color="#1C3387">mapping</font>* from one layer to the next
    - If network has
      - **<font color="#1C3387">$s_j$</font>** units in layer **<font color="#1C3387">$j$</font>** and
      - **<font color="#1C3387">$s$<sup>$j+1$</sup></font>** units in layer 
        **<font color="#1C3387">$j + 1$</font>**
      - Then **<font color="#1C3387">$Ɵ^j$</font>** will be of dimensions 
        **<font color="#1C3387">$[s$<sub>$j+1$</sub> X $s_j + 1]$</font>**
        - Because
          - $s$<sub>$j+1$</sub> is equal to the number of units in layer $(j + 1)$
          - is equal to the number of units in layer $j$, plus an additional unit (bias)
    - Looking at the $Ɵ$ matrix
      - Column length = the number of units in the following layer
      - Row length = the number of units in the current layer + 1 (bias unit)
      - So, if we had two layers - 101 and 21 units in each
        - Then $Ɵ^j$ size would be $[21$ x $102]$

## What is the computation performed
- We have to calculate the activation for each node
- Activation depends on
  - The input(s) to the node
  - The parameters associated with that node (from the $Ɵ$ vector associated with that layer)

### Example
A network, with the associated calculations for the 4 nodes
- Calculate each of the layer-2 activations based on the input values with the bias term
  - <font color="#E30000">Every input/activation goes to every node in following layer</font>
- The activation value on each hidden unit (e.g. $a_1^2$ ) is equal to the sigmoid function 
  applied to the linear combination of inputs
  - Three input units
    - $Ɵ$<sup>$(1)$</sup> is the matrix of parameters mapping the input units to hidden units
    - $Ɵ$<sup>$(1)$</sup> here is a [3 x 4] dimensional matrix
  - Three hidden units
    - $Ɵ$<sup>$(2)$</sup> is the matrix of parameters mapping the hidden layer to output layer
    - $Ɵ$<sup>$(2)$</sup> here is a [1 x 4] dimensional matrix
<img src="images/four nodes - activation example.png">

 ### Perhaps more clearly shown as:
 Each "layer transition" uses a matrix of parameters with the following significance
 <br>For example:
 - $Ɵ$<sub><font color="green"><b>1</b></font><font color="purple"><b>3</b></font>
   </sub><font color="blue"><b>$^1$</b></font> = means
   - <font color="green"><b>$1$</b></font> - we're mapping to node $1$ in layer $l+1$
   - <font color="purple"><b>$3$</b></font> - we're mapping from node $3$ in layer $l$
   - <font color="blue"><b>$1$</b></font> - we're mapping from layer $1$
 <img src="images/activation explained.png">