<h1 style='text-align:center'>Multilayer Perceptrons</h1>

<img src='images/non-linear-meme.webp'/>

How do we learn a non linear decision boundary? 

<img src='images/non-linear.png'/>

By connecting several perceptrons together and introducing non-linear activation functions neural networks can learn more complex functions. For a more intuitive understanding of how this works check out this video: https://www.youtube.com/watch?v=u5GAVdLQyIg

## MLP Architecture 

An MLP is composed of one (passthrough) input layer, one or more layers of LTUs,
called hidden layers, and one final layer of linear threshold units (LTUs) called the output layer. When an ANN has two or more hidden layers, it is called a deep neural network (DNN).

<img src='images/mlp.png' />

## How do MLPs Learn? 

Similar to regular perceptrons! Make a calculation, see how close it is to the actual answer, adjust weights and try again. 

### Forward-Propagation  

In forward-propagation, the hidden layer(s) multiply each input node by a weight, apply a decision function (activation function) to decide whether fire/activate or not, and the output layer makes the final decision based on inputs from the previously layers fired neurons. 

### Additional Activation Functions 

<img src='images/activation.png'/>

##### Why do we need non-linear activation functions? 

<img src='images/derivative_functions.png'/>

### Bias Term
Every layer except the output layer includes a bias neuron and is fully connected to the next layer.

<img src='images/bias_term.png'/>

Bias is like the intercept added in a linear equation. It is an additional parameter in the Neural Network which is used to adjust the output along with the weighted sum of the inputs to the neuron. Therefore Bias is a constant which helps the model in a way that it can fit best for the given data. In effect, a bias value allows you to shift the activation function to the left or right, which may be critical for successful learning.

<img src='images/bias.png'/>

### Back-Propagation

In back-propagation, we are updating the weights based on our cost function

<img src='images/back.png'/>

Let’s make this even shorter: for each training instance the backpropagation algorithm first makes a prediction (forward pass), measures the error, then goes through
each layer in reverse to measure the error contribution from each connection (reverse
pass), and finally slightly tweaks the connection weights to reduce the error (Gradient
Descent step).

### Working with MultiClass Problems

An MLP is often used for classification, with each output corresponding to a different
binary class (e.g., spam/ham, urgent/not-urgent, and so on). When the classes are exclusive (e.g., classes 0 through 9 for digit image classification), the output layer is typically modified by replacing the individual activation functions by a shared softmax function (see Figure 10-9). The softmax function was introduced in Chapter 3.
The output of each neuron corresponds to the estimated probability of the corresponding class. Note that the signal flows only in one direction (from the inputs to
the outputs), so this architecture is an example of a feedforward neural network
(FNN).

<img src='images/softmax.png'/>

## Resources

https://www.youtube.com/watch?v=u5GAVdLQyIg

https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi

https://www.coursera.org/learn/neural-networks-deep-learning/home/welcome

https://towardsdatascience.com/optimizing-neural-networks-where-to-start-5a2ed38c8345