# Activation Functions
The activation function $\sigma:\mathbb{R}\rightarrow\mathbb{R}$ determines the output of a neuron based on its input. It takes the weighted sum of inputs and applies a (typically non-linear) transformation to produce the output. The choice of activation function affects the learning process and network performance. Common activation functions include:

> __Sigmoid function__: A smooth S-shaped curve that maps inputs to the range $(0, 1)$:
>$$\sigma(x) = \frac{1}{1 + e^{-x}}$$
> The sigmoid function is often used in output layers for binary classification, as the output can be interpreted as a probability. 

> __Hyperbolic tangent (tanh)__: Similar to sigmoid but maps inputs to the range $(-1, 1)$:
>$$\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$$
> The tanh function is often used in hidden layers because its outputs are centered around zero.

> __Rectified Linear Unit (ReLU)__: A piecewise linear function that outputs the input if positive and zero otherwise:
>$$\text{ReLU}(x) = \max(0, x)$$
> ReLU helps mitigate the vanishing gradient problem, making it a popular choice for hidden layers in deep networks. However, it can suffer from [the dying ReLU problem](https://arxiv.org/abs/1903.06733), where neurons become inactive and stop learning.

> __Softmax function__: Converts a vector of raw scores (logits) into a probability distribution over $K$ classes:
>$$\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}}$$
> where $z_i$ is the raw score for class $i$. The softmax function ensures output probabilities sum to 1, making it suitable for multi-class classification.

We use the [activation functions exported by the `NNlib.jl` package](https://fluxml.ai/NNlib.jl/dev/).

___