# 1. Sigmoid (Logistic) Activation Function

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

## Advantages:
- Smooth gradient: Helps with gradient-based optimization methods.
- Output range: (0, 1), which can be interpreted as probabilities.

## Disadvantages:
- Vanishing gradient problem: For very high or very low input values, the gradient becomes very small, which can slow down the training.
- Outputs not zero-centered: This can slow down convergence during gradient descent.

## Usage Recommendations:
- Sigmoid: Historically used but less common now due to the vanishing gradient problem.

## Example Code

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Create the model
model_sigmoid = Sequential([
    Dense(64, input_shape=(784,), activation='sigmoid'),
    Dense(64, activation='sigmoid'),
    Dense(10, activation='softmax')
])

# Compile the model
model_sigmoid.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Summary of the model
model_sigmoid.summary()

# 2. Hyperbolic Tangent (tanh) Activation Function

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

## Advantages:
- Zero-centered: The outputs range from -1 to 1, which can help with convergence.
- Smooth gradient: Like the sigmoid, it provides a smooth gradient.

## Disadvantages:
- Vanishing gradient problem: Similar to sigmoid, though less severe.

## Usage Recommendations:
- Tanh: Historically used but less common now due to the vanishing gradient problem.

## Example Code

In [None]:
# Create the model
model_tanh = Sequential([
    Dense(64, input_shape=(784,), activation='tanh'),
    Dense(64, activation='tanh'),
    Dense(10, activation='softmax')
])

# Compile the model
model_tanh.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Summary of the model
model_tanh.summary()

# 3. Rectified Linear Unit (ReLU) Activation Function

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

## Advantages:
- Computationally efficient: Only requires a threshold at zero.
- Sparse activation: A portion of neurons are deactivated, promoting sparsity and efficient computation.
- Alleviates vanishing gradient problem: Increases the convergence rate of gradient descent.

## Disadvantages:
- Dying ReLU problem: Neurons can sometimes get stuck during training and output zero for all inputs. This can be mitigated using variants like Leaky ReLU.

## Usage Recommendations:
- ReLU: The default choice for most deep learning models due to its simplicity and effectiveness.

## Example Code

In [None]:
# Create the model
model_relu = Sequential([
    Dense(64, input_shape=(784,), activation='relu'),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])

# Compile the model
model_relu.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Summary of the model
model_relu.summary()

# 4. Leaky ReLU Activation Function

![image.png](attachment:image.png)

## f(x)=max (αx, x)
#### Where α is a small constant, typically 0.01.

![image-2.png](attachment:image-2.png)

## Advantages:
- Solves dying ReLU problem: Allows a small, non-zero gradient when the unit is not active.

## Disadvantages:
- Computationally slightly less efficient: Due to the added multiplication.

## Usage Recommendations:
- Leaky ReLU: Used when the dying ReLU problem is encountered.


## Example Code

In [None]:
from tensorflow.keras.layers import LeakyReLU

# Create the model
model_leaky_relu = Sequential([
    Dense(64, input_shape=(784,)),
    LeakyReLU(alpha=0.01),
    Dense(64),
    LeakyReLU(alpha=0.01),
    Dense(10, activation='softmax')
])

# Compile the model
model_leaky_relu.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Summary of the model
model_leaky_relu.summary()

# 5. Parametric ReLU (PReLU) Activation Function

![image.png](attachment:image.png)

## f(x)=max (αi x, x)
#### Where αi is a parameter learned during training.

![image-2.png](attachment:image-2.png)

## Advantages:
- Adaptive: The parameter α\alphaα can be learned to best fit the data.

## Disadvantages:
- Increased computational cost: Due to learning an additional parameter.

## Usage Recommendations:
- PReLU: Used when the dying ReLU problem is encountered.

## Example Code

In [None]:
from tensorflow.keras.layers import PReLU

# Create the model
model_prelu = Sequential([
    Dense(64, input_shape=(784,)),
    PReLU(),
    Dense(64),
    PReLU(),
    Dense(10, activation='softmax')
])

# Compile the model
model_prelu.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Summary of the model
model_prelu.summary()


# 6. Exponential Linear Unit (ELU)

![image.png](attachment:image.png)

## f(x)=max (α(ex - 1), x)
#### where α is a positive constant that determines the value to which an ELU saturates for negative net inputs.

![image-2.png](attachment:image-2.png)

## Advantages:
- Improves learning characteristics: Combines the benefits of ReLU and leaky ReLU.
- Smooth gradient: Reduces the vanishing gradient problem.
## Disadvantages:
- Computational cost: More complex than ReLU.
## Usage Recommendations:
- ELU: Can be used when additional performance is needed, and computational resources allowed.

## Example Code

In [None]:
# Create the model
model_elu = Sequential([
    Dense(64, input_shape=(784,), activation='elu'),
    Dense(64, activation='elu'),
    Dense(10, activation='softmax')
])

# Compile the model
model_elu.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Summary of the model
model_elu.summary()


# 7. Swish Activation Function

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

## Advantages:
- Smooth and non-monotonic: Often outperforms ReLU on deep networks.

## Disadvantages:
- Computationally more expensive: Due to the combination of multiplication and sigmoid function.

## Usage Recommendations:
- Swish: Can be used when additional performance is needed, and computational resources allowed.

## Example Code 

In [None]:
from tensorflow.keras.layers import Activation

def swish(x):
    return x * tf.nn.sigmoid(x)

# Create the model
model_swish = Sequential([
    Dense(64, input_shape=(784,)),
    Activation(swish),
    Dense(64),
    Activation(swish),
    Dense(10, activation='softmax')
])

# Compile the model
model_swish.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Summary of the model
model_swish.summary()


# 8. Usage Recommendations:
### - Sigmoid and Tanh: Historically used but less common now due to the vanishing gradient problem.
### - ReLU: The default choice for most deep learning models due to its simplicity and effectiveness.
### - Leaky ReLU / PReLU: Used when the dying ReLU problem is encountered.
### - ELU and Swish: Can be used when additional performance is needed and computational resources allow.

![image.png](attachment:image.png)