<a href="https://colab.research.google.com/github/ywang1110/PyTorch_Colab_Files/blob/main/01_02_autograd.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Auto Grad

## ✅ What is a gradient?

A **gradient** is the **slope** or **derivative** of a function — it tells you:

> **How much the output changes when the input changes.**

In deep learning, it's how we know which direction to move our parameters to **minimize loss**.

---

**🧭 Intuition (Simple English)**

Imagine you're on a hill and want to go **down** to the lowest point.

* The **gradient** tells you:

  * Which **direction is downhill**
  * How **steep** the slope is

That’s how **gradient descent** works!

---

**🔧 In PyTorch**

If you have:

```python
x = torch.tensor(2.0, requires_grad=True)
y = x**2 + 3*x + 1
y.backward()
print(x.grad)
```

PyTorch computes:

$$
\frac{dy}{dx} = 2x + 3
$$

So:

$$
x.grad = 2×2 + 3 = 7
$$

This gradient tells you:

> If `x` increases by a little, `y` increases by **about 7× that amount**

---

**📚 In Deep Learning**

* The **gradient of the loss** tells the model:

  > How should I adjust my weights to make predictions better?

* Optimizers (like SGD, Adam) use these gradients to **update model parameters**.

---

**✅ Summary**

| Term       | Meaning                                        |
| ---------- | ---------------------------------------------- |
| Gradient   | The rate of change of a value (slope)          |
| Use        | Tells how to adjust variables to minimize loss |
| In PyTorch | Computed by `.backward()`, stored in `.grad`   |

In [1]:
import torch
import numpy

## Basic Autograd

### Create Tensor that requires gradient



#### Tensor.requires_grad
* Is True if gradients need to be computed for this Tensor,
* False otherwise.


In [2]:
input_x = torch.tensor(2.0, requires_grad=True)
print(f"input tensor: {input_x}")
print(f"Requires grad: {input_x.requires_grad}")

input tensor: 2.0
Requires grad: True


#### Define function y = x^2 + 3x + 1

In [3]:
output_y = input_x**2 + input_x*3 + 1
print(f"Function value: {output_y}")
print(f"y requires grad: {output_y.requires_grad}")

Function value: 11.0
y requires grad: True


### Compute gradient

#### torch.Tensor.backward

* It tells PyTorch to **compute the gradient of `output_y` with respect to `input_x`**
* Since `output_y = input_x**2 + input_x*3 + 1`, the math is:

$$
\frac{dy}{dx} = 2x + 3
$$

At `x = 2.0`, we get:

$$
2 × 2 + 3 = 4 + 3 = 7
$$

---

## ✅ Summary

| Expression            | Meaning                          |
| --------------------- | -------------------------------- |
| `output_y.backward()` | Computes gradient dy/dx          |
| `input_x.grad`        | Shows result: dy/dx at `x = 2.0` |
| Gradient value        | `7.0`                            |



In [4]:
output_y.backward()
print(input_x.grad) # dy/dx = 2x + 3 → at x=2 → 2*2 + 3 = 7

tensor(7.)


### Zero gradients

In [5]:
print(input_x.grad.zero_())

tensor(0.)


## Vector Autograd

### Create vector


In [6]:
input_vector =torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
print(f"Input vector: {input_vector}")

Input vector: tensor([1., 2., 3.], requires_grad=True)


### Vector function

In [7]:
output_vector = input_vector**2 + input_vector*3 + 1 # element-wise operations
print(f"Function values: {output_vector}")

Function values: tensor([ 5., 11., 19.], grad_fn=<AddBackward0>)


### Scalar output for backpropagation

In [8]:
loss_value = output_vector.sum() # sum to get scalar
print(f"Loss value: {loss_value}")

Loss value: 35.0


### Compute gradients

In [9]:
loss_value.backward()
print(f"Gradients: {input_vector.grad}") # y = 2*x + 3
print(f"Theortical values: {2*input_vector + 3}")

Gradients: tensor([5., 7., 9.])
Theortical values: tensor([5., 7., 9.], grad_fn=<AddBackward0>)


## Computational Graph

### Build complex computational graph

In [10]:
var_x = torch.tensor(1.0, requires_grad=True)
var_a = torch.tensor(2.0, requires_grad=True)
var_b = torch.tensor(3.0, requires_grad=True)

### Composite function

In [11]:
intermediate_y = var_a * var_x**2 + var_b * var_x + 1

In [12]:
final_z = intermediate_y**2 + 2*intermediate_y

In [13]:
print(f"x = {var_x.data}, a = {var_a.data}, b = {var_b.data}")

x = 1.0, a = 2.0, b = 3.0


In [14]:
print(f"y = a*x^2 + b*x + 1 = {intermediate_y.data}")

y = a*x^2 + b*x + 1 = 6.0


In [15]:
print(f"z = y^2 + 2*y = {final_z.data}")

z = y^2 + 2*y = 48.0


### Backpopagation

In [16]:
final_z.backward()

In [17]:
print(f'dz/dx = {var_x.grad}')

dz/dx = 98.0


In [18]:
print(f'dz/da = {var_a.grad}')

dz/da = 14.0


In [19]:
print(f'dz/db = {var_b.grad}')

dz/db = 14.0


### Manual Verification

**🧩 Step 1: Define scalar variables with gradients**

```python
var_x = torch.tensor(1.0, requires_grad=True)
var_a = torch.tensor(2.0, requires_grad=True)
var_b = torch.tensor(3.0, requires_grad=True)
```

You define 3 scalar variables:

* `x = 1.0`, `a = 2.0`, `b = 3.0`
* All have `requires_grad=True` to track gradients

---

**🧮 Step 2: Build composite function**

```python
intermediate_y = var_a * var_x**2 + var_b * var_x + 1
final_z = intermediate_y**2 + 2 * intermediate_y
```

This builds:

### Inner function:

$$
y = a x^2 + b x + 1
$$

### Final function:

$$
z = y^2 + 2y
$$

So this is a **composite function**:
$z = (a x^2 + b x + 1)^2 + 2(a x^2 + b x + 1)$

---

**📤 Step 3: Forward Pass (Print Values)**

```python
print(f"x = {var_x.data}, a = {var_a.data}, b = {var_b.data}")
print(f"y = a*x^2 + b*x + 1 = {intermediate_y.data}")
print(f"z = y^2 + 2*y = {final_z.data}")
```

At:

* `x = 1`, `a = 2`, `b = 3`
* Then:

$$
y = 2×1^2 + 3×1 + 1 = 6 \\
z = 6^2 + 2×6 = 36 + 12 = 48
$$

---

**🔁 Step 4: Backward (Auto Gradients)**

```python
final_z.backward()
```

Now PyTorch computes:

* $\frac{dz}{dx}$
* $\frac{dz}{da}$
* $\frac{dz}{db}$

Then you print:

```python
print(f"dz/dx = {var_x.grad}")
print(f"dz/da = {var_a.grad}")
print(f"dz/db = {var_b.grad}")
```

---

**📐 Step 5: Manual gradient for $dz/dx$**

Using chain rule:

$$
z = y^2 + 2y \Rightarrow \frac{dz}{dy} = 2y + 2 \\
y = ax^2 + bx + 1 \Rightarrow \frac{dy}{dx} = 2ax + b
$$

So:

$$
\frac{dz}{dx} = \frac{dz}{dy} \cdot \frac{dy}{dx} = (2y + 2)(2ax + b)
$$

```python
manual_dz_dx = (2*intermediate_y.data + 2) * (2*var_a.data*var_x.data + var_b.data)
```

Plug in the values:

* `y = 6`, `a = 2`, `x = 1`, `b = 3`
* $$
  $$

dz/dx = (2×6 + 2)(2×2×1 + 3) = 14 × 7 = 98
]

So this confirms:

```python
dz/dx = 98
```

✅ Matches `var_x.grad`

---

**✅ Summary**

| Variable | Meaning               | Gradient Calculated  |
| -------- | --------------------- | -------------------- |
| `x`      | Input                 | $(2y + 2)(2a x + b)$ |
| `a`      | Quadratic coefficient | $(2y + 2)(x^2)$      |
| `b`      | Linear coefficient    | $(2y + 2)(x)$        |

## Gradient Accumulation

### Create parameter

In [20]:
param_x = torch.tensor(1.0, requires_grad=True)

### Mutiple forward and backward passes

In [21]:
for iteration in range(3):
  output_y = param_x**2 + iteration
  output_y.backward()
  print(f"Iteration {iteration + 1} : {param_x.grad}")

Iteration 1 : 2.0
Iteration 2 : 4.0
Iteration 3 : 6.0


#### 🧩 Code Breakdown

```python
param_x = torch.tensor(1.0, requires_grad=True)
```

* You define a scalar parameter `x = 1.0`
* `requires_grad=True` lets PyTorch track gradients

---

### 🔁 Loop with forward and backward

```python
for iteration in range(3):
    output_y = param_x**2 + iteration
    output_y.backward()
    print(f"Iteration {iteration+1}: gradient = {param_x.grad}")
```

Each time:

$$
y = x^2 + \text{iteration}
\Rightarrow \frac{dy}{dx} = 2x
\Rightarrow \frac{dy}{dx} = 2 \times 1.0 = 2
$$

But since **you didn’t clear gradients**, PyTorch **adds** each new gradient to the previous one.

---

### 🧪 Output:

```python
Iteration 1: gradient = 2.0
Iteration 2: gradient = 4.0   (2 + 2)
Iteration 3: gradient = 6.0   (4 + 2)
```

---

## 🔧 Why gradients accumulate

By default, PyTorch does **not reset `.grad`** after `.backward()`
So gradients **accumulate** — this is useful when:

* Doing custom gradient accumulation across mini-batches
* Manually controlling gradient updates

## ✅ Summary

| Concept                  | Meaning                                 |
| ------------------------ | --------------------------------------- |
| `.backward()`            | Adds gradient to `.grad`                |
| `.grad.zero_()`          | Clears stored gradient                  |
| Why gradients accumulate | By design (e.g. for batch accumulation) |

#### Clear gradients

In [22]:
print(f"Gradient: {param_x.grad}")
param_x.grad.zero_();
print(f"Gradient after clearing: {param_x.grad}")

Gradient: 6.0
Gradient after clearing: 0.0


## Disable Gradient Demo

In [23]:
input_x = torch.tensor(1.0, requires_grad=True)

In [24]:
output_y1 = input_x ** 2
print(f"Normal computation: requires_grad = {output_y1.requires_grad}")

Normal computation: requires_grad = True


### Use torch.no_grad() to disable gradient

In [25]:
with torch.no_grad():
  output_y2 = input_x ** 2
  print(f"no_grad context: requires_grad = {output_y2.requires_grad}")

no_grad context: requires_grad = False


### Use detach() to detach tensor

In [26]:
output_y3 = input_x ** 2
print(f"Before detach: requires_grad = {output_y3.requires_grad}")
y3_detached = output_y3.detach()
print(f"After detach: requires_grad = {y3_detached.requires_grad}")

Before detach: requires_grad = True
After detach: requires_grad = False


## Linear Regression Autograd Example

In [27]:
torch.manual_seed(42)
n_samples = 100
true_weight = 2.0
true_bias = 1.0

In [28]:
input_x = torch.randn(n_samples, 1)
print(input_x.shape)
print(input_x[:5])

torch.Size([100, 1])
tensor([[ 1.9269],
        [ 1.4873],
        [ 0.9007],
        [-2.1055],
        [ 0.6784]])


In [29]:
target_y = input_x * true_weight + true_bias + 0.1 * torch.randn(n_samples, 1)

### Initialize parameters

In [30]:
weight_param = torch.tensor(0.0, requires_grad=True)
bias_param = torch.tensor(0.0, requires_grad=True)

### Train parameters

In [31]:
learning_rate = 0.01
n_epochs = 100

In [32]:
loss_history = []

In [33]:
for epoch in range(n_epochs):
  # Forward pass
  predictions = weight_param * input_x + bias_param
  loss = ((predictions - target_y)**2).mean() # MSE

  # propagation / backward pass
  loss.backward()

  # paramter update
  with torch.no_grad():
    weight_param -= learning_rate * weight_param.grad
    bias_param -= learning_rate * bias_param.grad

    # zero gradients
    weight_param.grad.zero_()
    bias_param.grad.zero_()

  loss_history.append(loss.item())

  if ((epoch + 1) % 20 == 0):
    print(f"Epoch {epoch + 1}: Loss = {loss.item():.4f}, w = {weight_param.item():.4f}, b = {bias_param.item():.4f}")

print(f"\nFinal parameters:")
print(f"Learned weight: {weight_param.item():.4f} (True: {true_weight})")
print(f"Learned bias: {bias_param.item():.4f} (True: {true_bias})")

Epoch 20: Loss = 2.3416, w = 0.6630, b = 0.3662
Epoch 40: Loss = 1.0303, w = 1.1058, b = 0.5999
Epoch 60: Loss = 0.4566, w = 1.4017, b = 0.7487
Epoch 80: Loss = 0.2051, w = 1.5996, b = 0.8432
Epoch 100: Loss = 0.0947, w = 1.7320, b = 0.9030

Final parameters:
Learned weight: 1.7320 (True: 2.0)
Learned bias: 0.9030 (True: 1.0)


## Custom function demo