# What is Activation Functions and Loss Functions?<br>And how to pick the right one?

<img src="../images/note/which one.png" alt="Which one?" />

## What is an Artificial Neural Network?
<img src="../images/note/Multilayer Perceptron (MLP).png" alt="Multilayer Perceptron (MLP)" />

The above diagram is a Multilayer Perceptron (MLP).
* An MLP must have at least three layers: the input layer, a hidden layer and the output layer.
* They are fully connected
* each node in one layer connects with a weight to every node in the next layer.

Neural networks can learn complex patterns using layers of neurons which mathematically transform the data
The layers between the input and output are referred to as “hidden layers”
A neural network can learn relationships between the features that other algorithms cannot easily discover

## What is a Neuron?
* It's a <strong>mathematical function</strong>
* A sum of one or more inpust multiplies by the weights
* This value is then passed to a non-linear function called <strong>activation function</strong>, to become neuron's output
<img src="../images/note/neuron.png" alt="What is a Neuron" />

* The x values refer to inputs, either the original features or inputs from a previous hidden layer
* At each layer, there is also a bias b which can help better fit the data
* The neuron passes the value a to all neurons it is connected to in the next layer, or returns it as the final value

Starts with a linear equation

<img src="../images/note/linear equation.png" alt="Linear equation" />

Before adding a non-linear activation function:

<img src="../images/note/non-linear activation function.png" alt="non-linear activation function" />

## What is an Activation Function?
<ul>
<li>An activation function is a non-linear function applied by a neuron to introduce non-linear properties in the network.
<ul>
<li>A relationship is linear if a change in the first variable corresponds to a constant change in the second variable. A non-linear relationship means that a change in the first variable doesn&rsquo;t necessarily correspond with a constant change in the second. However, they may impact each other but it appears to be unpredictable.</li>
</ul>
</li>
</ul>
<img src="../images/note/Linear function.png" alt="linear and non-linear models" />

<table>
<th colspan=2>
Neuron with and without activation fuction
</th>
<tr>
  <td>
<img src="../images/note/neuron without af.png" alt="without activation function">
  </td>
  <td>
  <img src="../images/note/neuron with af.png" alt="with activation function">
  </td>
</tr>
</table>

## Choosing the right activation function
<img src="../images/note/activation function summary.png" alt="What is a Neuron" />

### Regression Predicting a numerical value

E.g. Predicting the price of a product
* The final layer of the neural network will have one neuron and the value it returns is a continuous numerical value.
* To understand the accuracy of the prediction, it is compared with the true value which is also a continuous number.

<img src="../images/note/regression predicting a numerical value.png" alt="What is a Neuron" />

#### Chosen Activation Function
<strong>Linear</strong> — This results in a numerical value which we require
<img src="../images/note/Linear Activation Function.png" alt="Linear AF" />

<strong>OR</strong>

<strong>ReLU</strong> — This results in a numerical value greater than 0
<img src="../images/note/Rectified Linear Unit (ReLU) Activation Function.png" alt="ReLU AF" />

#### Loss Function
<strong>Mean squared error (MSE)</strong> — This finds the average squared difference between the predicted value and the true value
<img src="../images/note/Mean squared error (MSE).png" alt="ReLU AF" />

### Categorical: Predicting a binary outcome

E.g. predicting a transaction is fraud or not
* The final layer of the neural network will have one neuron and will return a value between 0 and 1, which can be inferred as a probably.
* To understand the accuracy of the prediction, it is compared with the true value. If the data is that class, the true value is a 1, else it is a 0.

<img src="../images/note/Categorical Predicting a binary outcome.png/" alt="What is a Neuron" />

#### Chosen Activation Function
<strong>Sigmoid </strong> — This results in a value between 0 and 1 which we can infer to be how confident the model is of the example being in the class
<img src="../images/note/Sigmoid Activation Function.png" alt="Sigmoid" />

#### Loss Function
<strong>Binary Cross Entropy</strong> — Cross entropy quantifies the difference between two probability distribution. Our model predicts a model distribution of {p, 1-p} as we have a binary distribution. We use binary cross-entropy to compare this with the true distribution {y, 1-y}
<img src="../images/note/Binary Cross Entropy.png" alt="Binary Cross Entropy.png" />