In [1]:
import torch

We want to calculate gradient of some function wrt x. After this Pytorch will calculate a **computation graph**

By default, **requires_grad=False**

In [2]:
x = torch.randn(size=(3, ), requires_grad=True)
x

tensor([-2.1159, -0.8031, -2.4649], requires_grad=True)

A computational graph will be created.

Since, + operation is used. Therefore **grad_fn=\<AddBackward0\>**

In [3]:
y = x+2
print(y)
print(y.grad_fn)

tensor([-0.1159,  1.1969, -0.4649], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x7a8849cd3f40>


### Calculating gradients

In [4]:
z = y*y*2
print(z)
z.backward(x)
print(x.grad)

tensor([0.0269, 2.8652, 0.4322], grad_fn=<MulBackward0>)
tensor([ 0.9813, -3.8449,  4.5832])


Since this is a scalar value, gradient can be calculated implicitly

In [5]:
x = torch.ones(size=(1, ), requires_grad=True)
y = x**2
# dy/dx
y.backward() # This calculates dy/dx and returns None
print(x.grad)

tensor([2.])


**grad can be implicitly created for scalar outputs.**

If we want it to create for a vector, we have to create a vector of same size and that vector needs to be passed as **argument in .backward()**

**v** provides weightage to how much each element of z, contributes to the gradient and ideally should be a **tensor of 1**

In [6]:
x = torch.ones(size=(3, ), requires_grad=True)
y = x**2
z = y*y*2
print(z)

v = torch.ones_like(z)
# v = torch.tensor([0.1, 1.0, 0.001], dtype=torch.float32)
z.backward(v)
print(x.grad)

tensor([2., 2., 2.], grad_fn=<MulBackward0>)
tensor([8., 8., 8.])


In [7]:
# dz/dx
x = torch.randn(size=(3, ), requires_grad=True)
y = x+2
z = y*y*2
z_mean = z.mean()
print(z_mean)
z_mean.backward()
print(x.grad)

tensor(15.8178, grad_fn=<MeanBackward0>)
tensor([3.4531, 4.9754, 2.3458])


### Prevent tracking of gradients:

1. x.requires_grad(False)

2. x.detach()-> Creates a new tensor that doesn't require the gradient

3. with torch.no_grad():

In [8]:
x = torch.rand(size=(3, ), requires_grad=True)
print(x)

# Modifies x in-place
x.requires_grad_(False)
print(x)

tensor([0.5594, 0.7272, 0.6071], requires_grad=True)
tensor([0.5594, 0.7272, 0.6071])


In [9]:
x = torch.rand(size=(3, ), requires_grad=True)
print(x)

y = x.detach()
print(y)

tensor([0.1746, 0.7210, 0.6549], requires_grad=True)
tensor([0.1746, 0.7210, 0.6549])


In [10]:
x = torch.rand(size=(3, ), requires_grad=True)
print(x)

with torch.no_grad():
    y = x+2
    print(y)

tensor([0.5700, 0.5308, 0.0444], requires_grad=True)
tensor([2.5700, 2.5308, 2.0444])


**backward()** will keep accumulating gradients, i.e sums it up,  until explicilty mentioned not to.

Here, each epoch has produced gradient of 3, which is being summed up which previous stored values

In [11]:
weights = torch.ones(size=(4, ), requires_grad=True)

# Dummy operation which simulates w*x+b where b=0
for epoch in range(1):
    model_output = (weights*3).sum()
    model_output.backward()
    print(weights.grad)

print("-"*50)
for epoch in range(2):
    model_output = (weights*3).sum()
    model_output.backward()
    print(weights.grad)

tensor([3., 3., 3., 3.])
--------------------------------------------------
tensor([6., 6., 6., 6.])
tensor([9., 9., 9., 9.])


To pevent, empty the variable wrt which, grad is being calculated

In [12]:
weights = torch.ones(size=(4, ), requires_grad=True)

# Dummy operation which simulates w*x+b
for epoch in range(1):
    model_output = (weights*3).sum()
    model_output.backward()
    print(weights.grad)
    # Emptying the grads
    weights.grad.zero_()

print("-"*50)
for epoch in range(2):
    model_output = (weights*3).sum()
    model_output.backward()
    print(weights.grad)
    # Emptying the grads
    weights.grad.zero_()





tensor([3., 3., 3., 3.])
--------------------------------------------------
tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])


### Using optimizers

In [13]:
weights = torch.rand(size=(3, 3), requires_grad=True)

optimizer = torch.optim.SGD(params=[weights], lr=0.01)
optimizer.step()
optimizer.zero_grad()