# Neural Networks From Scratch - Lecture 7: Tanh Activation Function

## Overview
- The Tanh function (hyperbolic tangent) is an activation function used in neural networks, similar to the sigmoid function but with distinct advantages.
- This lecture discusses the characteristics of the Tanh function, its advantages over sigmoid, and its implementation.

## Problems with the Sigmoid Activation Function
- **Vanishing Gradient Problem**: Gradients can become very small, leading to slow convergence during training.
- **Not Zero-Centered**: Outputs are always positive, which can cause issues during optimization.
- **Limited to Binary Classification**: The sigmoid function is suitable for binary classification but not for multi-class outputs.

### Example of Multi-Class Classification
- Consider an output layer with four neurons, representing four classes. 
- If the weighted sums are \([2, 3, 5, 1]\), applying the sigmoid function yields outputs like \([0.88, 0.95, 0.99, 0.73]\). 
- Since sigmoid does not enforce that outputs sum to 1, it is unsuitable for multi-class classification.

## Introduction to Tanh Activation Function
- The Tanh function is defined as:
$$
\text{tanh}(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}
$$
- **Range**: The output of Tanh varies from -1 to +1.
- **Zero-Centered**: Tanh is a zero-centered function, addressing one of sigmoid's drawbacks.

### Tanh Function Characteristics
- **Steeper Slope**: Tanh has a steeper slope than sigmoid, which helps in achieving larger gradients and faster training.
- **Graph of Tanh**:
![Tanh Activation Function](https://production-media.paperswithcode.com/methods/Screen_Shot_2020-05-27_at_4.23.22_PM_dcuMBJl.png) <!-- Replace with actual path or URL -->

### Derivative of the Tanh Function
- The derivative is given by:
$$
\frac{d}{dx} \text{tanh}(x) = 1 - \text{tanh}^2(x)
$$
- **Range**: The derivative also varies between 0 and 1, with saturation effects similar to sigmoid.

## Drawbacks of Tanh
- **Still Prone to Vanishing Gradient**: While better than sigmoid, Tanh can still encounter vanishing gradients.
- **Computationally Intensive**: Tanh takes longer to compute than simpler activation functions.
- **Not Suitable for Output Layers**: Due to its range -1 to +1 , it can produce negative outputs, making it unsuitable for probabilities(only positive values) in multi-class classification.

## Python Implementation
```python
import numpy as np
import matplotlib.pyplot as plt

def tanh(x):
    return np.tanh(x)

def tanh_derivative(x):
    return 1 - np.tanh(x) ** 2

# Input range
x = np.linspace(-4, 4, 100)

# Tanh and its derivative
y_tanh = tanh(x)
y_derivative = tanh_derivative(x)

# Plotting
plt.figure(figsize=(10, 5))
plt.plot(x, y_tanh, label='tanh', color='blue', linewidth=3)
plt.plot(x, y_derivative, label='Derivative', color='orange', linewidth=3)
plt.title('Tanh Activation Function and Its Derivative')
plt.xlabel('Input')
plt.ylabel('Output')
plt.axhline(0, color='grey', lw=0.5, ls='--')
plt.axvline(0, color='grey', lw=0.5, ls='--')
plt.legend()
plt.grid()
plt.show()
