# Derivatives and Activation Functions

## Run The Next Cell 
Like the tensors exercise, we are going to use torch! We also need another library for fun visuals: Matplotlib

In [None]:
import torch
import matplotlib.pyplot as plt
print('PyTorch version:', torch.__version__)

### Note: In case you are not familiar with writing math formulas using Markdown, it's time to learn!
Here is a nice cheat sheet for that: https://www.upyesp.org/posts/makrdown-vscode-math-notation/

## Derivatives Refresher
Given a function $f(x)$, the derivative $f'(x)$ measures how much $f(x)$ changes when $x$ changes.

**Exercise 1:** Derive the derivative of the following function:
$$
f(x) = 3x^3 - 2x^2 + 5x - 4
$$

**Your Solution:**

$$f'(x)=9x^2-4x+5$$

**Exercise 2:** Write $f(x)$ and $f'(x)$ as two functions and try to visualize them.

**Your Solution:**

In [None]:
def f(x):
    return 3*x**3 - 2*x**2 + 5*x - 4

def f_prime(x):
    return 9*x**2 - 4*x + 5

x = torch.linspace(-3, 3, 100)

plt.plot(x, f(x), label='f(x)')
plt.plot(x, f_prime(x), label="f'(x)")
plt.legend()
plt.title('Random Function and its Derivative')
plt.grid(True)
plt.show()

## Sigmoid Function
The sigmoid function is defined as:
$$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$

**Exercise 1:** Calculate its derivative and express it in terms of $\sigma(x)$

**Your Solution:**



Given the sigmoid:
$$
\sigma(x) = \frac{1}{1 + e^{-x}} = (1 + e^{-x})^{-1}
$$

we compute its derivative $\sigma'(x)$.

$$
\sigma(x) = (1 + e^{-x})^{-1}
$$

Let $g(u) = u^{-1}$ and $u(x) = 1 + e^{-x}$ (We are applying the chain rule)

$$
\sigma(x) = g(u(x))
$$
Then:

$$
\frac{d}{dx}\sigma(x) = g'(u(x)) \cdot u'(x)

$$

We have $g'(u) = -u^{-2}$ and $u'(x) = \frac{d}{dx}(1 + e^{-x}) = -e^{-x}$.  

Therefore:

$$
\sigma'(x) = - (1 + e^{-x})^{-2} \cdot (-e^{-x}) = \frac{e^{-x}}{(1 + e^{-x})^{2}}
$$

Write the result as a product:

$$
\sigma'(x) = \frac{1}{1 + e^{-x}} \cdot \frac{e^{-x}}{1 + e^{-x}}
$$

Recognize the first factor as $\sigma(x)$.  

For the second factor, observe:

$$
\frac{e^{-x}}{1 + e^{-x}} = \frac{1 + e^{-x} - 1}{1 + e^{-x}} = 1 - \frac{1}{1 + e^{-x}} = 1 - \sigma(x)
$$

$$
\boxed{\sigma'(x) = \sigma(x)\,\bigl(1 - \sigma(x)\bigr)}
$$

**Exercise 1:** Implement the sigmoid function and its derivative, then visualize both.

In [None]:
def sigmoid(x):
    return 1 / (1 + torch.exp(-x))

def sigmoid_derivative(x):
    s = sigmoid(x)
    return s * (1 - s)

x = torch.linspace(-8, 8, 200)
plt.plot(x, sigmoid(x), label='sigmoid(x)')
plt.plot(x, sigmoid_derivative(x), label="sigmoid'(x)")
plt.title('Sigmoid and its Derivative')
plt.legend(); plt.grid(True); plt.show()

## 4. ReLU Function
The Rectified Linear Unit (ReLU) is defined as:
$$
f(x) = \max(0, x)
$$

**Exercise 1:** Calculate the derivative of ReLU.

**Your Solution**


Given the Rectified Linear Unit (ReLU) function:

$$
\text{ReLU}(x) = \max(0, x)
$$

we compute its derivative $\text{ReLU}'(x)$.



The ReLU function can be written as a piecewise function:

$$
\text{ReLU}(x) =
\begin{cases}
x, & \text{if } x > 0, \\
0, & \text{if } x \leq 0.
\end{cases}
$$



For $x > 0$:

$$
\frac{d}{dx}\text{ReLU}(x) = \frac{d}{dx}(x) = 1
$$

For $x < 0$:

$$
\frac{d}{dx}\text{ReLU}(x) = \frac{d}{dx}(0) = 0
$$

At $x = 0$, the derivative is undefined because of the discontinuity in slope (the function is continuous but not differentiable at 0).  
In practice, deep learning frameworks define it as either 0 or 1 at $x = 0$ for convenience, most commonly **0**.

$$
\boxed{
\text{ReLU}'(x) =
\begin{cases}
1 & \text{if } x > 0, \\
0 & \text{if } x \leq 0.
\end{cases}
}
$$


**Exercise 2:** Implement ReLU and its derivative, then visualize both.

In [None]:
def relu(x):
    return torch.maximum(torch.tensor(0.0), x)

def relu_derivative(x):
    return (x > 0).float()

x = torch.linspace(-5, 5, 200)
plt.plot(x, relu(x), label='ReLU(x)')
plt.plot(x, relu_derivative(x), label="ReLU'(x)")
plt.title('ReLU and its Derivative')
plt.legend(); plt.grid(True); plt.show()

## Binary Cross-Entropy Loss
For a single prediction $y_{pred}$ and true label $y \in \{0, 1\}$, the **binary cross-entropy loss** is:
$$
\mathcal{L} = -[y \log(y_{pred}) + (1 - y) \log(1 - y_{pred})]
$$


**Exercise 1: calculate the derivative with respect to $y_{pred}$**

**Your Solution:**

We derive the gradient of the **binary cross-entropy loss** with respect to the model output (logit).  
This is the most fundamental form used in binary classification neural networks.


Let the model output (before activation) be $z$, and the **sigmoid function** be

$$
\sigma(z) = \frac{1}{1 + e^{-z}}
$$

The predicted probability is:

$$
\hat{y} = \sigma(z)
$$

and the true label is $y \in \{0, 1\}$.

The **binary cross-entropy loss** is defined as:

$$
L(y, \hat{y}) = - \big( y \log(\hat{y}) + (1 - y)\log(1 - \hat{y}) \big)
$$


Differentiate $L$ with respect to $\hat{y}$:

$$
\frac{\partial L}{\partial \hat{y}}
= - \left( \frac{y}{\hat{y}} - \frac{1 - y}{1 - \hat{y}} \right)
$$

Simplify:

$$
\frac{\partial L}{\partial \hat{y}}
= \frac{\hat{y} - y}{\hat{y}(1 - \hat{y})}
$$

Since $\hat{y} = \sigma(z)$, and

$$
\sigma'(z) = \sigma(z)\,(1 - \sigma(z)) = \hat{y}(1 - \hat{y}),
$$

we have:

$$
\frac{\partial \hat{y}}{\partial z} = \hat{y}(1 - \hat{y})
$$

We want $\Large{\frac{\partial L}{\partial z}}$:

$$
\frac{\partial L}{\partial z}
= \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial z}
$$

Substitute both expressions:

$$
\frac{\partial L}{\partial z}
= \left( \frac{\hat{y} - y}{\hat{y}(1 - \hat{y})} \right) \cdot \hat{y}(1 - \hat{y})
$$

Simplify:

$$
\frac{\partial L}{\partial z} = \hat{y} - y
$$

The derivative of the **binary cross-entropy loss** with respect to the model logit $z$ is:

$$
\boxed{\frac{\partial L}{\partial z} = \sigma(z) - y}
$$

**Exercise 2:** Implement *BCE* and its derivative, then visualize both.

In [None]:
def binary_cross_entropy(y_pred, y):
    eps = 1e-8
    return - (y * torch.log(y_pred + eps) + (1 - y) * torch.log(1 - y_pred + eps))

def bce_derivative(y_pred, y):
    eps = 1e-8
    return - (y / (y_pred + eps)) + ((1 - y) / (1 - y_pred + eps))

y_pred = torch.linspace(0.01, 0.99, 100)

for label in [0, 1]:
    plt.plot(y_pred, binary_cross_entropy(y_pred, torch.tensor(label)), label=f'y={label}')
plt.title('Binary Cross-Entropy Loss')
plt.legend(); plt.grid(True); plt.show()

for label in [0, 1]:
    plt.plot(y_pred, bce_derivative(y_pred, torch.tensor(label)), label=f"dL/dy_pred (y={label})")
plt.title('Derivative of BCE Loss')
plt.legend(); plt.grid(True); plt.show()