# **Step Activation Function: Explanation & Limitations**

The video discusses the **Step Activation Function**, explaining its use, limitations, and why it is no longer commonly used in neural networks. Here's a breakdown:

---

## **1. What is the Step Activation Function?**
- It is an activation function used in early neural networks, particularly **perceptrons**.
- The function outputs **1** if the input is greater than a threshold (commonly 0) and **0** otherwise.
- This makes it useful for **binary classification problems**, where the output is either **yes (1)** or **no (0)**.

---

## **2. How Does it Work?**
The function follows a simple rule:

$$
f(x) =
\begin{cases}
    1, & \text{if } x \geq \theta \\
    0, & \text{otherwise}
\end{cases}
$$

where:
- \( x \) is the input (weighted sum of neurons).
- \( \theta \) (threshold) is usually 0 but can be any value.

---

## **3. Example Calculation**
Given inputs \( (1, 0) \) and weights \( (1,2) \), we calculate the **weighted sum** for three neurons:

$$
(1 \times 1) + (0 \times 2) = 1
$$

$$
(1 \times 2) + (0 \times 3) = 2
$$

$$
(1 \times 3) + (0 \times 4) = 3
$$

Applying the step function (threshold = 0), the outputs are **1, 1, 1**.

---

## **4. Why is it Not Used Anymore?**
### **Drawbacks:**
1. **Non-Differentiability & Zero Derivative**  
   - The function is **constant** for all inputs, meaning its **derivative is zero**.
   - Backpropagation relies on gradients (derivatives) to update weights.  
   - Since the step function has a zero derivative, **weights do not update**, making learning impossible.

2. **Binary Classification Only**  
   - The output is **either 0 or 1**, so it cannot be used for multi-class problems.

3. **Thresholding Issue**  
   - The function does not consider the magnitude of input values.
   - Example:  
     - If \( x = 10 \) or \( x = 100 \), the output is still **1**.
     - It does not distinguish between large and small inputs.

---

## **5. Conclusion**
- **Step functions were used in early perceptrons (1960s)** but have been replaced by other activation functions.
- Functions like **Sigmoid, ReLU, and Tanh** solve its problems by being differentiable and considering input magnitude.
- In the **next video**, the Sigmoid activation function will be explained.

---
