* We assign tensors and function

### Autograd
* We will use gradient descent method to find a weight $w$, which would make this equation true: $3 = w * 1$
* Obviously, $w$ should be 3.

In [1]:
import torch

x = torch.tensor(1.)          # input tensor
y = torch.tensor(3.)          # expected output
lossFunc = torch.nn.MSELoss() # Error Estimation Function

* We draw a directed acyclic graph (DAG)

In [2]:
w = torch.tensor(1., requires_grad=True) 
z = x*w
loss = lossFunc(y, z)

* Upon running loss.backward(), DAG is traced backward and a differentiation is done to w <br>
$loss|_{\mathrm{x}=1,\mathrm{y}=3} = (z - y)^2 = (x*w-y)^2 = (w-3)^2$<br>
$\frac{d\mathrm{loss}}{d\mathrm{w}}|_{\mathrm{w}=1} = 2w-6 = -4$<br> 

In [3]:
print('Before backpropagation:', w.grad)
loss.backward()
print('Before backpropagation:', w.grad)

Before backpropagation: None
Before backpropagation: tensor(-4.)


* Manually update $w$

In [4]:
w.data.sub_(w.grad*0.01)
w.grad.data.zero_()

tensor(0.)

* How much is w after one update

In [5]:
w.data

tensor(1.0400)

* Do 10000 update cycles

In [6]:
for i in range(100000):
    z = x*w
    loss = lossFunc(y, z) 
    loss.backward() 
    w.data.sub_(w.grad*0.01)
    w.grad.data.zero_()  

* How much is w after 100001 updates

In [7]:
w.data # we have got what we want

tensor(3.0000)

* What if we don't draw DAG

In [8]:
with torch.no_grad():  
    loss_has_no_grad = lossFunc(z, y)
try:
    loss_has_no_grad.backward()
except:
    print('We turned off DAG drawing when we define loss_has_no_grad, so we cannot do a differentiation.')
    print("We don't draw DAG if we are not training the network. However, ")

We turned off DAG drawing when we define loss_has_no_grad, so we cannot do a differentiation.
We don't draw DAG if we are not training the network. However, 


### More about DAG
* z    is dependent on w an b so len(   z.grad_fn.next_functions) == 2 
* loss is dependent on z      so len(loss.grad_fn.next_functions) == 1

In [9]:
import torch

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

for i in ['x','y','w','b','z','loss']:
    j = globals()[i] 
    if j.grad_fn: 
        print(f"{i.rjust(4)} \thas a {j.grad_fn.next_functions}") 

   z 	has a ((<SqueezeBackward3 object at 0x7f403b65dbe0>, 0), (<AccumulateGrad object at 0x7f403b65dc40>, 0))
loss 	has a ((<AddBackward0 object at 0x7f403b65dbb0>, 0), (None, 0))
