## Pytorch Autograd

Trainging neural networks uses error back-propagation method and it requires gradient to adjust model parametes, also called "gradient descent", which needs the gradient of the loss function with respect to the given parameter $\frac {\partial L}{\partial  \theta}$.
![https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html](comp-graph.png "Title")

In [40]:
import torch

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

In [41]:
print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

Gradient function for z = <AddBackward0 object at 0x7f2989ad5220>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x7f2989ad50d0>


In [42]:
print(f"Gradient function for x = {x.grad_fn}")

Gradient function for x = None


In [43]:
print(f"Gradient function for y = {y.grad_fn}")

Gradient function for y = None


In [44]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.0118, 0.2882, 0.2693],
        [0.0118, 0.2882, 0.2693],
        [0.0118, 0.2882, 0.2693],
        [0.0118, 0.2882, 0.2693],
        [0.0118, 0.2882, 0.2693]])
tensor([0.0118, 0.2882, 0.2693])


In [45]:
z = torch.matmul(x, w)+b
print(z.requires_grad)

True


In [46]:
with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

False


In [82]:
import torch
alpha=0.1

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b

for i in range(10):
    loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
    print(loss)
    loss.backward(retain_graph=True)
    with torch.no_grad():
        w.sub_(w.grad*alpha)
        b.sub_(b.grad*alpha)
        w.grad = None
        b.grad = None

tensor(1.3036, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)
tensor(1.3036, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)
tensor(1.3036, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)
tensor(1.3036, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)
tensor(1.3036, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)
tensor(1.3036, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)
tensor(1.3036, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)
tensor(1.3036, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)
tensor(1.3036, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)
tensor(1.3036, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)


In [104]:
#https://medium.com/@mrityu.jha/understanding-the-grad-of-autograd-fc8d266fd6cf
import torch
alpha=0.1

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)

n=100
for i in range(n):
    z = torch.matmul(x, w)+b
    loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
    if i%(n/10)==0:
        print(loss)
    loss.backward(retain_graph=True)
    with torch.no_grad():
        w-=alpha*w.grad
        w.grad = None
        b-=alpha*b.grad
        b.grad = None

tensor(0.8867, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)
tensor(0.4407, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)
tensor(0.2634, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)
tensor(0.1822, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)
tensor(0.1378, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)
tensor(0.1102, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)
tensor(0.0916, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)
tensor(0.0782, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)
tensor(0.0682, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)
tensor(0.0604, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)
