# Differentiation in Autograd
Let’s take a look at how autograd collects gradients. We create two tensors a and b with requires_grad=True. This signals to autograd that every operation on them should be tracked

In [1]:
#pragma cling add_include_path("../../libtorch/include")
#pragma cling add_include_path("../../libtorch/include/torch/csrc/api/include")
#pragma cling add_library_path("../../libtorch/lib")
#pragma cling load("libtorch")

In [2]:
#include <iostream>
#include <tuple>
#include <torch/torch.h>
namespace nn = torch::nn;

# 1.1 basic usage

In [3]:
torch::Tensor a = torch::tensor({2.0, 3.0}, torch::requires_grad());

In [4]:
std::cout << a << std::endl;

 2
 3
[ CPUFloatType{2} ]


In [5]:
torch::Tensor b = torch::tensor({6.0, 4.0}, torch::requires_grad());

In [6]:
std::cout << b << std::endl;

 6
 4
[ CPUFloatType{2} ]


We create another tensor Q from a and b.

Q = 3a^3 - b^2

In [7]:
torch::Tensor Q = 3*a.pow(3) - b.pow(2);

In [8]:
std::cout << Q << std::endl;

-12
 65
[ CPUFloatType{2} ]


we want gradients of the Q w.r.t. parameters a, b, i.e.

∂Q/ ∂a =9a^2

∂Q/ ∂b = −2b

When we call .backward() on Q, autograd calculates these gradients and stores them in the respective tensors’ .grad attribute.

We need to explicitly pass a gradient argument in Q.backward() because it is a vector. gradient is a tensor of the same shape as Q, and it represents the gradient of Q w.r.t. itself, i.e.

dQ/dQ =1

Equivalently, we can also aggregate Q into a scalar and call backward implicitly, like Q.sum().backward().

In [9]:
torch::Tensor external_grad = torch::tensor({1., 1.});
Q.backward(/*gradient=*/external_grad);

In [10]:
std::cout << a.grad() << std::endl;

 36
 81
[ CPUFloatType{2} ]


In [11]:
std::cout << b.grad() << std::endl;

-12
 -8
[ CPUFloatType{2} ]


# 1.2 Computational Graph
Conceptually, autograd keeps a record of data (tensors) & all executed operations (along with the resulting new tensors) in a directed acyclic graph (DAG) consisting of Function objects. In this DAG, leaves are the input tensors, roots are the output tensors. By tracing this graph from roots to leaves, you can automatically compute the gradients using the chain rule.

In a forward pass, autograd does two things simultaneously:

* run the requested operation to compute a resulting tensor, and
* maintain the operation’s gradient function in the DAG.

The backward pass kicks off when .backward() is called on the DAG root. autograd then:

* computes the gradients from each .grad_fn,
* accumulates them in the respective tensor’s .grad attribute, and
* using the chain rule, propagates all the way to the leaf tensors.

In [12]:
at::IntArrayRef shape = {3,1};
torch::Tensor x = torch::randn(shape, torch::requires_grad());

In [13]:
std::cout << x << std::endl;

 0.5560
-1.0466
-0.8018
[ CPUFloatType{3,1} ]


In [14]:
torch::Tensor y = x+1;
std::cout << y << std::endl;
std::cout << y.requires_grad() << std::endl;
std::cout << y.grad_fn() << std::endl;
std::cout << y.grad_fn()->name() << std::endl;

 1.5560
-0.0466
 0.1982
[ CPUFloatType{3,1} ]
1
0x55d18ace60d0
AddBackward1


In [15]:
torch::Tensor z = y.pow(2) + 3;
std::cout << z << std::endl;
std::cout << z.requires_grad() << std::endl;
std::cout << z.grad_fn() << std::endl;
std::cout << z.grad_fn()->name() << std::endl;

 5.4211
 3.0022
 3.0393
[ CPUFloatType{3,1} ]
1
0x55d18ad30240
AddBackward1


## Disabling Gradient Tracking
Exclusion from the DAG torch.autograd tracks operations on all tensors which have their requires_grad flag set to True. For tensors that don’t require gradients, setting this attribute to False excludes it from the gradient computation DAG.

The output tensor of an operation will require gradients even if only a single input tensor has requires_grad=True

In [16]:
at::IntArrayRef shape = {5,5};
torch::Tensor x = torch::rand(shape);
torch::Tensor y = torch::rand(shape);
torch::Tensor z = torch::rand(shape, torch::requires_grad());

In [17]:
torch::Tensor a = x + y;
std::cout << "Does `a` require gradients? : " << a.requires_grad() << std::endl;

Does `a` require gradients? : 0


In [18]:
torch::Tensor b = x + z;
std::cout << "Does `b` require gradients? : " << b.requires_grad() << std::endl;

Does `b` require gradients? : 1


Disabling Gradient Tracking


 By default, all tensors with ``requires_grad=True`` are tracking their
 computational history and support gradient computation. However, there
 are some cases when we do not need to do that, for example, when we have
 trained the model and just want to apply it to some input data, i.e. we
 only want to do *forward* computations through the network. We can stop
 tracking computations by surrounding our computation code with ``torch.no_grad()`` block:

In [19]:
{
    torch::NoGradGuard no_grad;
    torch::Tensor c = x + z;
    std::cout << "Does `c` require gradients? : " << c.requires_grad() << std::endl;
}

Does `c` require gradients? : 0


Another way to achieve the same result is to use the ``detach()`` method on the tensor:

In [21]:
torch::Tensor d = x + z;
torch::Tensor d_detached = d.detach();
std::cout << "Does `d_detached` require gradients? : " << d_detached.requires_grad() << std::endl;

Does `d_detached` require gradients? : 0


# 1.3 autograd in Neutral Network

In [22]:
int sample_size = 2;
int number_features = 3;

torch::Tensor x = torch::arange(sample_size*number_features).reshape({sample_size,number_features}) * 1.0;

In [24]:
std::cout << x << std::endl;

 0  1  2
 3  4  5
[ CPUFloatType{2,3} ]


In [26]:
torch::Tensor target = torch::arange(2, sample_size+2).reshape({sample_size, 1}) * 1.0;

In [27]:
std::cout << target << std::endl;

 2
 3
[ CPUFloatType{2,1} ]


In [28]:
torch::Tensor w =torch::ones({3,1}, torch::requires_grad()); 

In [29]:
std::cout << w << std::endl;
std::cout << w.grad() << std::endl;

 1
 1
 1
[ CPUFloatType{3,1} ]
[ Tensor (undefined) ]


In [30]:
torch::Tensor output = x.matmul(w);
std::cout << output << std::endl;

  3
 12
[ CPUFloatType{2,1} ]


In [31]:
torch::Tensor loss = (output - target).square().mean();

In [32]:
loss.backward();

In [33]:
std::cout << w.grad() << std::endl;

 27
 37
 47
[ CPUFloatType{3,1} ]


In [34]:
void update_param(torch::Tensor param, float learning_rate){
    torch::NoGradGuard no_grad;
    param.data().add_(param.grad(), -1 * learning_rate);
}

In [35]:
update_param(w, 0.01);

In [36]:
std::cout << w << std::endl;

 0.7300
 0.6300
 0.5300
[ CPUFloatType{3,1} ]
