# **Activation Functions: Sigmoid, ReLU, and others**

## Python Demonstration of Activation Functions

1. **Sigmoid**
2. **Tanh**
3. **ReLU**
4. **Leaky ReLU**
5. **ELU (Exponential Linear Unit)**
6. **Swish** (used in modern architectures like EfficientNet)

---

### Theoretical Summary

| Activation | Formula | Range | Differentiability | Common Usage |
|------------|---------|-------|--------------------|---------------|
| **Sigmoid** | $$ \sigma(x) = \frac{1}{1 + e^{-x}} $$ | (0, 1) | Yes | Binary classification |
| **Tanh** | $$ \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} $$ | (-1, 1) | Yes | Hidden layers |
| **ReLU** | $$ f(x) = \max(0, x) $$ | [0, ∞) | No at x=0 | CNNs |
| **Leaky ReLU** | $$ f(x) = \max(0.01x, x) $$ | (-∞, ∞) | Yes | CNNs with dying ReLU problem |
| **ELU** | $$ f(x) = x , if ( x > 0 )$$  , $$ \alpha(e^x - 1), if x \leq 0 $$ | (-α, ∞) | Yes | Deep nets |
| **Swish** | $$ f(x) = x \cdot \sigma(x) $$ | (-0.28, ∞) | Yes | Modern DNNs |

In [1]:
import matplotlib as mpl
mpl.rcParams['animation.ffmpeg_path'] = r'C:\ffmpeg\bin\ffmpeg.exe'

In [2]:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
import matplotlib.animation as animation
import os

# Activation functions
def relu(x): return np.maximum(0, x)
def sigmoid(x): return 1 / (1 + np.exp(-x))
def tanh(x): return np.tanh(x)
def softplus(x): return np.log(1 + np.exp(x))
def softsign(x): return x / (1 + np.abs(x))
def leaky_relu(x, alpha=0.01): return np.where(x > 0, x, x * alpha)
def elu(x, alpha=1.0): return np.where(x > 0, x, alpha * (np.exp(x) - 1))
def swish(x): return x * sigmoid(x)
def mish(x): return x * np.tanh(softplus(x))
def gelu(x): return 0.5 * x * (1 + np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * x**3)))
def hard_sigmoid(x): return np.clip((x + 1) / 2, 0, 1)
def linear(x): return x

functions = {
    "relu": relu,
    "sigmoid": sigmoid,
    "tanh": tanh,
    "softplus": softplus,
    "softsign": softsign,
    "leaky_relu": leaky_relu,
    "elu": elu,
    "swish": swish,
    "mish": mish,
    "gelu": gelu,
    "hard_sigmoid": hard_sigmoid,
    "linear": linear
}

# Create meshgrid
x = np.linspace(-10, 10, 100)
y = np.linspace(-10, 10, 100)
X, Y = np.meshgrid(x, y)
Z_input = X + Y

# Output directory
output_dir = "activation_videos"
os.makedirs(output_dir, exist_ok=True)

# Create animation for each activation function
for name, func in functions.items():
    print(f"Generating animation for: {name}")
    Z = func(Z_input)

    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')

    def update_view(angle, ax=ax, X=X, Y=Y, Z=Z):
        ax.clear()
        ax.plot_surface(X, Y, Z, cmap=cm.viridis)
        ax.set_title(f"{name.upper()} Activation Function", fontsize=12)
        ax.set_xlabel("X")
        ax.set_ylabel("Y")
        ax.set_zlabel(f"{name.upper()}(X + Y)")
        ax.view_init(30, angle)

    ani = animation.FuncAnimation(fig, update_view, frames=np.arange(0, 360, 4), interval=100)
    output_path = os.path.join(output_dir, f"{name}_activation.mp4")
    ani.save(output_path, writer='ffmpeg', fps=10)
    plt.close()
print("All animations generated!")


Generating animation for: relu
Generating animation for: sigmoid
Generating animation for: tanh
Generating animation for: softplus
Generating animation for: softsign
Generating animation for: leaky_relu
Generating animation for: elu
Generating animation for: swish
Generating animation for: mish
Generating animation for: gelu
Generating animation for: hard_sigmoid
Generating animation for: linear
All animations generated!


In [3]:
import numpy as np
import matplotlib.pyplot as plt
import os

# Activation functions
def relu(x): return np.maximum(0, x)
def sigmoid(x): return 1 / (1 + np.exp(-x))
def tanh(x): return np.tanh(x)
def softplus(x): return np.log(1 + np.exp(x))
def softsign(x): return x / (1 + np.abs(x))
def leaky_relu(x, alpha=0.01): return np.where(x > 0, x, x * alpha)
def elu(x, alpha=1.0): return np.where(x > 0, x, alpha * (np.exp(x) - 1))
def swish(x): return x * sigmoid(x)
def mish(x): return x * np.tanh(softplus(x))
def gelu(x): return 0.5 * x * (1 + np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * x**3)))
def hard_sigmoid(x): return np.clip((x + 1) / 2, 0, 1)
def linear(x): return x

functions = {
    "relu": relu,
    "sigmoid": sigmoid,
    "tanh": tanh,
    "softplus": softplus,
    "softsign": softsign,
    "leaky_relu": leaky_relu,
    "elu": elu,
    "swish": swish,
    "mish": mish,
    "gelu": gelu,
    "hard_sigmoid": hard_sigmoid,
    "linear": linear
}

# Create x range
x = np.linspace(-10, 10, 400)

# Output directory
output_dir = "activation_plots"
os.makedirs(output_dir, exist_ok=True)

# Create and save static 2D plots
for name, func in functions.items():
    print(f"Generating plot for: {name}")
    y = func(x)

    plt.figure(figsize=(6, 4))
    plt.plot(x, y, label=name.upper(), color="blue")
    plt.title(f"{name.upper()} Activation Function")
    plt.xlabel("Input")
    plt.ylabel("Output")
    plt.grid(True)
    plt.axhline(0, color='black', linewidth=0.5)
    plt.axvline(0, color='black', linewidth=0.5)
    plt.legend()
    plt.tight_layout()
    
    output_path = os.path.join(output_dir, f"{name}_activation.png")
    plt.savefig(output_path)
    plt.close()

print("All activation function plots saved in 'activation_plots/' folder.")


Generating plot for: relu
Generating plot for: sigmoid
Generating plot for: tanh
Generating plot for: softplus
Generating plot for: softsign
Generating plot for: leaky_relu
Generating plot for: elu
Generating plot for: swish
Generating plot for: mish
Generating plot for: gelu
Generating plot for: hard_sigmoid
Generating plot for: linear
All activation function plots saved in 'activation_plots/' folder.


In [5]:
import pandas as pd

# Previously created dataframe for activation functions
activation_details = [
    {
        "Function": "ReLU",
        "Formula": "f(x) = max(0, x)",
        "Pros": "Simple, fast, and effective; avoids vanishing gradients.",
        "Cons": "Can 'die' during training if inputs are always negative.",
        "Example": "Input: [-2, 0, 3] → Output: [0, 0, 3]"
    },
    {
        "Function": "Sigmoid",
        "Formula": "f(x) = 1 / (1 + exp(-x))",
        "Pros": "Good for binary classification; outputs in (0,1).",
        "Cons": "Saturates and kills gradients; centered at 0.5.",
        "Example": "Input: [-2, 0, 2] → Output: [0.12, 0.5, 0.88]"
    },
    {
        "Function": "Tanh",
        "Formula": "f(x) = tanh(x)",
        "Pros": "Centered at zero; better than sigmoid in hidden layers.",
        "Cons": "Still suffers from saturation and vanishing gradients.",
        "Example": "Input: [-2, 0, 2] → Output: [-0.96, 0, 0.96]"
    },
    {
        "Function": "Softmax",
        "Formula": "f(xᵢ) = exp(xᵢ) / sum(exp(xⱼ))",
        "Pros": "Outputs valid probability distributions.",
        "Cons": "Not suitable for hidden layers; soft competition.",
        "Example": "Input: [2.0, 1.0, 0.1] → Output: [0.65, 0.24, 0.11]"
    },
    {
        "Function": "Leaky ReLU",
        "Formula": "f(x) = x if x > 0 else αx",
        "Pros": "Fixes dying ReLU by allowing small gradients when x < 0.",
        "Cons": "Still not zero-centered.",
        "Example": "Input: [-2, 0, 2] → Output: [-0.02, 0, 2]"
    },
    {
        "Function": "ELU",
        "Formula": "f(x) = x if x >= 0 else α(exp(x)-1)",
        "Pros": "Negative values allow mean activation closer to zero.",
        "Cons": "Computationally more expensive than ReLU.",
        "Example": "Input: [-1, 0, 1] → Output: [-0.63, 0, 1]"
    },
    {
        "Function": "SELU",
        "Formula": "λ * (x if x > 0 else α*(exp(x)-1))",
        "Pros": "Self-normalizing when used with correct initialization.",
        "Cons": "Sensitive to network configuration and dropout.",
        "Example": "Input: [-1, 0, 1] → Output: [-1.11, 0, 1.05]"
    },
    {
        "Function": "GELU",
        "Formula": "f(x) ≈ 0.5x(1 + tanh(√(2/π)*(x + 0.044715x³)))",
        "Pros": "Smooth and combines benefits of ReLU and sigmoid.",
        "Cons": "More computationally intensive.",
        "Example": "Input: [-2, 0, 2] → Output: [-0.05, 0, 1.95]"
    },
    {
        "Function": "Swish (SiLU)",
        "Formula": "f(x) = x * sigmoid(x)",
        "Pros": "Non-monotonic, smooth, improves model performance.",
        "Cons": "Heavier computation than ReLU.",
        "Example": "Input: [-2, 0, 2] → Output: [-0.24, 0, 1.76]"
    },
    {
        "Function": "Mish",
        "Formula": "f(x) = x * tanh(softplus(x))",
        "Pros": "Improved performance in vision tasks, smooth activation.",
        "Cons": "High computational cost.",
        "Example": "Input: [-2, 0, 2] → Output: [-0.25, 0, 1.94]"
    },
    {
        "Function": "Softplus",
        "Formula": "f(x) = log(1 + exp(x))",
        "Pros": "Smooth version of ReLU.",
        "Cons": "Not sparse; all neurons fire.",
        "Example": "Input: [-2, 0, 2] → Output: [0.13, 0.69, 2.13]"
    },
    {
        "Function": "Softsign",
        "Formula": "f(x) = x / (1 + |x|)",
        "Pros": "Slower saturation than tanh/sigmoid.",
        "Cons": "Weaker gradients for large values.",
        "Example": "Input: [-2, 0, 2] → Output: [-0.67, 0, 0.67]"
    },
    {
        "Function": "Linear",
        "Formula": "f(x) = x",
        "Pros": "Used in output layers for regression.",
        "Cons": "No non-linearity; can't learn complex patterns.",
        "Example": "Input: [-2, 0, 2] → Output: [-2, 0, 2]"
    }
]

df = pd.DataFrame(activation_details)
df.columns = ['Activation Function', 'Mathematical Formula', 'Advantages', 'Disadvantages', 'Example Output']
df.head(10)


Unnamed: 0,Activation Function,Mathematical Formula,Advantages,Disadvantages,Example Output
0,ReLU,"f(x) = max(0, x)","Simple, fast, and effective; avoids vanishing ...",Can 'die' during training if inputs are always...,"Input: [-2, 0, 3] → Output: [0, 0, 3]"
1,Sigmoid,f(x) = 1 / (1 + exp(-x)),"Good for binary classification; outputs in (0,1).",Saturates and kills gradients; centered at 0.5.,"Input: [-2, 0, 2] → Output: [0.12, 0.5, 0.88]"
2,Tanh,f(x) = tanh(x),Centered at zero; better than sigmoid in hidde...,Still suffers from saturation and vanishing gr...,"Input: [-2, 0, 2] → Output: [-0.96, 0, 0.96]"
3,Softmax,f(xᵢ) = exp(xᵢ) / sum(exp(xⱼ)),Outputs valid probability distributions.,Not suitable for hidden layers; soft competition.,"Input: [2.0, 1.0, 0.1] → Output: [0.65, 0.24, ..."
4,Leaky ReLU,f(x) = x if x > 0 else αx,Fixes dying ReLU by allowing small gradients w...,Still not zero-centered.,"Input: [-2, 0, 2] → Output: [-0.02, 0, 2]"
5,ELU,f(x) = x if x >= 0 else α(exp(x)-1),Negative values allow mean activation closer t...,Computationally more expensive than ReLU.,"Input: [-1, 0, 1] → Output: [-0.63, 0, 1]"
6,SELU,λ * (x if x > 0 else α*(exp(x)-1)),Self-normalizing when used with correct initia...,Sensitive to network configuration and dropout.,"Input: [-1, 0, 1] → Output: [-1.11, 0, 1.05]"
7,GELU,f(x) ≈ 0.5x(1 + tanh(√(2/π)*(x + 0.044715x³))),Smooth and combines benefits of ReLU and sigmoid.,More computationally intensive.,"Input: [-2, 0, 2] → Output: [-0.05, 0, 1.95]"
8,Swish (SiLU),f(x) = x * sigmoid(x),"Non-monotonic, smooth, improves model performa...",Heavier computation than ReLU.,"Input: [-2, 0, 2] → Output: [-0.24, 0, 1.76]"
9,Mish,f(x) = x * tanh(softplus(x)),"Improved performance in vision tasks, smooth a...",High computational cost.,"Input: [-2, 0, 2] → Output: [-0.25, 0, 1.94]"


# Activation Functions Overview

| Activation Function | Mathematical Formula | Advantages | Disadvantages | Example Output |
|---------------------|----------------------|------------|----------------|----------------|
| ReLU | $f(x) = max(0, x)$ | Simple, fast, and effective; avoids vanishing gradients. | Can 'die' during training if inputs are always negative. | `Input: [-2, 0, 2]`→ `Output: [0, 0, 2]` |
| Sigmoid | $f(x) = \frac{1}{1 + exp(-x)} $ | Good for binary classification; outputs in $(0,1)$. | Saturates and kills gradients; centered at 0.5. | `Input: [-2, 0, 2]` → `Output: [0.12, 0.5, 0.88]` |
| Tanh | $f(x) = tanh(x)$ | Centered at zero; better than sigmoid in hidden layers. | Still suffers from saturation and vanishing gradients. | `Input: [-2, 0, 2]` → `Output: [-0.96, 0, 0.96]` |
| Softmax | $f(xᵢ) = \frac{exp(xᵢ)} {\sum(exp(xⱼ))}$ | Outputs valid probability distributions. | Not suitable for hidden layers; soft competition. | `Input: [2.0, 1.0, 0.1]` → `Output: [0.65, 0.24, 0.11]` |
| Leaky ReLU | $f(x) = x$ if $x > 0$ else $αx$ | Fixes dying ReLU by allowing small gradients when $x < 0$. | Still not zero-centered. | `Input: [-2, 0, 2]` → `Output: [-0.02, 0, 2]` |
| ELU | $f(x) = x$ if $x >= 0$ else $α(exp(x)-1)$ | Negative values allow mean activation closer to zero. | Computationally more expensive than ReLU. | `Input: [-1, 0, 1]` → `Output: [-0.63, 0, 1]` |
| SELU | $λ * x$ if $x > 0$ else $λ*α*(exp(x)-1)$ | Self-normalizing when used with correct initialization. | Sensitive to network configuration and dropout. | `Input: [-1, 0, 1]` → `Output: [-1.11, 0, 1.05]` |
| GELU | $f(x) ≈ 0.5x(1 + tanh(√(2/π)*(x + 0.044715x³)))$ | Smooth and combines benefits of ReLU and sigmoid. | More computationally intensive. | `Input: [-2, 0, 2]` → `Output: [-0.05, 0, 1.95]` |
| Swish (SiLU) | $f(x) = x * sigmoid(x)$ | Non-monotonic, smooth, improves model performance. | Heavier computation than ReLU. | `Input: [-2, 0, 2]` → `Output: [-0.24, 0, 1.76]` |
| Mish | $f(x) = x * tanh(softplus(x))$ | Improved performance in vision tasks, smooth activation. | High computational cost. | `Input: [-2, 0, 2]` → `Output: [-0.25, 0, 1.94]` |
