# Feedforward Neural Networks

A map consisting of **neurons** (a.k.a **nodes**) which are interconnected in layers, such that data flows in only one direction. Neural networks are used to approximate complicated functions given their input and output.

You set up a model with weights $\mathbf{w}$ and seek to find
$$ \text{argmin} \; E(\mathbf{w})$$
where $E$ measures the error of your model, known as the **loss function**.

## Neural Network with no Hidden Layers

![Perceptron](https://i.imgur.com/RHfrnA1.png)

In this 2-layer neural network, you input $x_1,x_2,\dots x_n$ and your output is $f(w_0 + x_1 w_1 + x_2 w_2 + \dots + x_n w_n)$, where $f$ is your **activation function**.

This 1-layer neural network has an **input layer**, taking in the parameters $x_1,\dots x_n$.

It has an **output layer** $y$.

In the above example, $f$ is the **step function**:

$$ f(x) = \begin{cases}
1 && \text{ if } w_0 + x_1 w_1 + \dots + x_n w_n \ge 0\\
0 && \text{ else }
\end{cases}
$$

A feedforward neural network consists of multiple layers of 1-layer neural networks.

$w_0$ is usually referred to as the **bias** $b$.

### Relation to Least Squares Regression

If $f(x) = x$, then the model is $y = w_0 + x_1 w_1 + x_2 w_2 + \dots + x_n w_n$. This is just a linear model: let $A = \begin{bmatrix}
1 & x_{11} & x_{12} \dots & x_{1n} \\
\vdots \\
1 & x_{m1} & x_{m2} \dots & x_{mn}
\end{bmatrix}$ and multiply it with $\begin{bmatrix}
w_0 \\
w_1 \\
\vdots \\
w_n
\end{bmatrix}$

for inputs $(x_{i1}, \dots x_{in})$ for $1 \le i \le m$.

If $f(x)$ is a different activation function, it's not quite a linear model, but the resemblance is there.

## Multi-Layer Neural Network

If you have multiple layers, then you take the output of a neuron and use that as input in the next neuron.

![NeuralNet](https://i.imgur.com/QU8ilo9.png)

**Input Layer**: Holds your input data. Here with 3 nodes in the input layer, we would have 3 features in our input. Consists of three **neurons (nodes)**, labeled $x_1, x_2, x_3$ on the graph.

**Output Layer**: Holds the output of the neural network. Consists of two **neurons (nodes)**, labeled $y_1, y_2$.

**Hidden Layers**: Intermediate layers in the network. The 1st hidden layer has 2 nodes, the 2nd hidden layer also has 2 nodes.

## Backpropagation

**The goal**: Find the weights $\mathbf{w}$ so to minimize the loss function.

**How to find these weights?** At every step of training, we can propagate forward to calculate the output of the neural network, then use gradient descent to update the weights.

### Example:

Consider $E(\mathbf{w}) = \frac{1}{2} \sum_{s=1}^m (y_{ds} - y_s)^2$

Where $y_{ds}$ is the desired output, and $y_s$ is the output of the model, with $1 \le s \le m$ ($m$ number of outputs). You can expand $y_s$. In a 3-layer neural network:

$$ E(\mathbf{w}) = \sum_{p=1}^m \left( y_{dp} - f_p^o \left( \sum_{q=1}^l w_{pq}^o z_q\right)\right)^2$$

where $z_q = f_q^h\left(\sum_{i=1}^n w_{qi}^h x_{di}\right)$

$y_{ds}$ is the desired/actual y

$y_s$ is the output from the model