PyTorch allows to automatically obtain the gradients of a tensor with respect to a defined function. 
When creating the tensor, we have to indicate that it requires the gradient computation using the flag `requires_grad` 

Sample Program:

In [1]:
import torch 
import numpy as np
x = torch.rand(3,requires_grad=True) 
print(x)

tensor([0.6694, 0.6904, 0.6900], requires_grad=True)


<b><u>Problem 1:</u></b>

Consider that $ y $ and $ z $ are calculated as follows:

$$
y = x^2
$$

$$
z = 2y + 3
$$

We are interested in how output $ z $ changes with input $ x $:

$$
\frac{dz}{dx} = \frac{dz}{dy} \cdot \frac{dy}{dx}
$$

$$
\frac{dz}{dx} = 2 \cdot 2x = 4x
$$

For input x=3.5,  will make z = 14 

In [2]:
# set up simple graph relating x, y and z 
x = torch.tensor(3.5, requires_grad=True) 
y = x*x 
z = 2*y + 3 
print("x: ", x) 
print("y = x*x: ", y) 
print("z= 2*y + 3: ", z) 
# work out gradients 
z.backward() 
print("Working out gradients dz/dx") 
# what is gradient at x = 3.5 
print("Gradient at x = 3.5: ", x.grad) 

x:  tensor(3.5000, requires_grad=True)
y = x*x:  tensor(12.2500, grad_fn=<MulBackward0>)
z= 2*y + 3:  tensor(27.5000, grad_fn=<AddBackward0>)
Working out gradients dz/dx
Gradient at x = 3.5:  tensor(14.)


<b><u> Problem 2: </u></b> 

Consider the function $ f(x) = (x - 2)^2 $.

Compute $ \frac{d}{dx} f(x) $, and then compute $ f'(1) $. Write code to check analytical gradient.

In [3]:
def f(x): 
    return (x-2)**2 
def fp(x): 
    return 2*(x-2)  
x = torch.tensor([1.0], requires_grad=True) 
y = f(x) 
y.backward() 
print('Analytical f\'(x):', fp(x)) 
print('PyTorch\'s f\'(x):', x.grad)

Analytical f'(x): tensor([-2.], grad_fn=<MulBackward0>)
PyTorch's f'(x): tensor([-2.])


<b><u> Problem 3: </u></b>

Define a function $ y = x^2 + 5 $. The function $ y $ will not only carry the result of evaluating $ x $, but also the gradient function $\frac{\partial y}{\partial x}$ called `grad_fn` in the new tensor $ y $. Compare the result with the analytical gradient.

In [4]:
x = torch.tensor([2.0]) 
x.requires_grad_(True)  #indicate we will need the gradients with respect to this variable 
y = x**2 + 5 
print(y)

tensor([9.], grad_fn=<AddBackward0>)


To evaluate the partial derivative $\frac{\partial y}{\partial x}$, we use the `.backward()` function and the result of the gradient evaluation is stored in `x.grad`

In [5]:
y.backward()  #dy/dx 
print('PyTorch gradient:', x.grad) 

#Let us compare with the analytical gradient of y = x**2+5 with torch.no_grad():    
#this is to only use the tensor value without its gradient information 
dy_dx = 2*x  #analytical gradient 
print('Analytical gradient:',dy_dx) 

PyTorch gradient: tensor([4.])
Analytical gradient: tensor([4.], grad_fn=<MulBackward0>)


<span style="font-size: 18px;">
<b><u>Problem 4:</u></b>

Write a function to compute the gradient of the sigmoid function $ \sigma(x) = \frac{1}{1 + e^{-x}} $.



Express $ \sigma(x) $ as a composition of several elementary functions:  $ \sigma(x) = s(c(b(a(x)))) $

where:

- $ a(x) = -x $  
- $ b(a) = e^a $  
- $ c(b) = 1 + b $  
- $ s(c) = \frac{1}{c} $

Each intermediate variable is a basic expression for which the local gradients can be easily computed.

The input to this function is $ x $,  and the output is represented by node $ s $. Compute the gradient of $ s $ with respect to $ x $, $\frac{\partial s}{\partial x}$. In order to make use of our intermediate computations, we can use the chain rule as follows:

$\frac{\partial s}{\partial x} = \frac{\partial s}{\partial c} \cdot \frac{\partial c}{\partial b} \cdot \frac{\partial b}{\partial a} \cdot \frac{\partial a}{\partial x}$
</span>

In [6]:
def grad_sigmoid_manual(x): 
    """ Implements the gradient of the logistic sigmoid function  
        sigma(x) = 1 / (1 + e^{-x})  
    """ 
    # Forward pass, keeping track of intermediate values for use in the backward pass 
    a = -x         # -x in denominator 
    b = np.exp(a)  # e^{-x} in denominator 
    c = 1 + b      # 1 + e^{-x} in denominator 
    s = 1.0 / c    # Final result, 1.0 / (1 + e^{-x}) 
    
    # Backward pass 
    dsdc = (-1.0 / (c**2)) 
    dsdb = dsdc * 1 
    dsda = dsdb * np.exp(a) 
    dsdx = dsda * (-1) 
    
    return dsdx

def sigmoid(x): 
    y = 1.0 / (1.0 + torch.exp(-x)) 
    return y



In [7]:
input_x = 2.0  
x = torch.tensor(input_x).requires_grad_(True) 
y = sigmoid(x) 
y.backward() 
# Compare the results of manual and automatic gradient functions: 
print('autograd:', x.grad.item()) 
print('manual:', grad_sigmoid_manual(input_x)) 

autograd: 0.10499356687068939
manual: 0.1049935854035065


<b><u> Problem 5: </u></b>

Compute gradient for the function $ y=8x^4+ 3x^3 +7x^2+6x+3 $ and verify the gradients provided by PyTorch with the analytical gradients.

In [8]:
x = torch.tensor(2.0, requires_grad=True)
# Analytical gradient for comparison
analytical_gradient = 32*x**3 + 9*x**2 + 14*x + 6
print("Analytical Gradient:", analytical_gradient.item())

Analytical Gradient: 326.0


In [9]:
y=8*x**4+3*x**3+7*x**2+6*x+3
y.backward()
x.grad

tensor(326.)

<b><u> Problem 6: </u></b>

Work out the gradient <span style="font-size: 18px;">$ \frac{da}{dw} $ </span> for $ a = ReLU(wx+b) $  and compare the result with the analytical gradient.

In [10]:
import torch

# Define w, x, b as tensors with requires_grad=True for gradient computation
w = torch.tensor(2.0, requires_grad=True)
x = torch.tensor(1.0)
b = torch.tensor(0.5)

# Define the function a = ReLU(wx + b)
a = torch.relu(w * x + b)

# Compute the gradient da/dw
a.backward()

# Print the computed gradient
print("Computed Gradient da/dw:", w.grad.item())

# Analytical gradient for comparison
# The ReLU derivative is 1 if wx + b > 0, otherwise 0.
wx_b = w * x + b
analytical_gradient = x if wx_b > 0 else 0
print("Analytical Gradient da/dw:", analytical_gradient)


Computed Gradient da/dw: 1.0
Analytical Gradient da/dw: tensor(1.)
