https://pytorch.org/docs/stable/notes/autograd.html

How autograd encodes the history
Autograd is reverse automatic differentiation system. Conceptually, autograd records a graph recording all of the operations that created the data as you execute operations, giving you a directed acyclic graph whose leaves are the input tensors and roots are the output tensors. By tracing this graph from roots to leaves, you can automatically compute the gradients using the chain rule.

Internally, autograd represents this graph as a graph of Function objects (really expressions), which can be apply() ed to compute the result of evaluating the graph. When computing the forwards pass, autograd simultaneously performs the requested computations and builds up a graph representing the function that computes the gradient (the .grad_fn attribute of each torch.Tensor is an entry point into this graph). When the forwards pass is completed, we evaluate this graph in the backwards pass to compute the gradients.

An important thing to note is that the graph is recreated from scratch at every iteration, and this is exactly what allows for using arbitrary Python control flow statements, that can change the overall shape and size of the graph at every iteration. You don’t have to encode all possible paths before you launch the training - what you run is what you differentiate.


Setting requires_grad
requires_grad is a flag, defaulting to false unless wrapped in a nn.Parameter, that allows for fine-grained exclusion of subgraphs from gradient computation. It takes effect in both the forward and backward passes:

During the forward pass, an operation is only recorded in the backward graph if at least one of its input tensors require grad. During the backward pass (.backward()), only leaf tensors with requires_grad=True will have gradients accumulated into their .grad fields.

It is important to note that even though every tensor has this flag, setting it only makes sense for leaf tensors (tensors that do not have a grad_fn, e.g., a nn.Module’s parameters). Non-leaf tensors (tensors that do have grad_fn) are tensors that have a backward graph associated with them. Thus their gradients will be needed as an intermediary result to compute the gradient for a leaf tensor that requires grad. **From this definition, it is clear that all non-leaf tensors will automatically have require_grad=True.**  

上面的这句话是错误的。 正确的应该是，在最终的生成的有向无环图上，如果某个中间节点，对应的所有的leaf tensor 或者 其他上游tensor都是 require_grad == false，那么该 non-leaf tensor的 require_grad 也会默认是 false。 例如可以参考 下面的例子。 


In [2]:
import torch

x1 = torch.randn(5, requires_grad=True)
print(x1)

x2 = torch.randn(5)

y1 = x1.pow(x1)

y2 = x2.pow(x2)

z = y1 + y2

print("x1 requires_grad:", x1.requires_grad)
print("x2 requires_grad:", x2.requires_grad)
print("y1 requires_grad:", y1.requires_grad)
print("y2 requires_grad:", y2.requires_grad)
print("z requires_grad:", z.requires_grad)

tensor([-0.6190,  0.3973,  0.2645, -0.0750,  0.6133], requires_grad=True)
x1 requires_grad: True
x2 requires_grad: False
y1 requires_grad: True
y2 requires_grad: False
z requires_grad: True


In [13]:
import torch

a = torch.randn(1)
print(a, a.shape)
a = a.squeeze()
print(a, a.shape)

print(type(a))

x = 1.0
print(type(x))

x += a
print(x, type(x))

tensor([0.1747]) torch.Size([1])
tensor(0.1747) torch.Size([])
<class 'torch.Tensor'>
<class 'float'>
tensor(1.1747) <class 'torch.Tensor'>


In [29]:
import torch
from torch import nn
from typing import Union, Tuple, Any, Callable, Iterator, Set, Optional, overload, TypeVar, Mapping, Dict, List

class MyLinear(nn.Module):
  def __init__(self, in_features, out_features):
    super().__init__()
    self.weight = nn.Parameter(torch.randn(in_features, out_features))
    self.bias = nn.Parameter(torch.randn(out_features))

  def forward(self, input):
    return (input @ self.weight) + self.bias

  def register_module(self, name: str, module: Optional['Module']) -> None:
    r"""Alias for :func:`add_module`."""
    self.add_module(name, module)
  def __setattr__(self, name, value):
    super().__setattr__(name, value)
    print(name, " is being added into this instance")

my = MyLinear(3, 64)
print(len(list(my.parameters())))

training  is being added into this instance
_parameters  is being added into this instance
_buffers  is being added into this instance
_non_persistent_buffers_set  is being added into this instance
_backward_hooks  is being added into this instance
_is_full_backward_hook  is being added into this instance
_forward_hooks  is being added into this instance
_forward_pre_hooks  is being added into this instance
_state_dict_hooks  is being added into this instance
_load_state_dict_pre_hooks  is being added into this instance
_load_state_dict_post_hooks  is being added into this instance
_modules  is being added into this instance
weight  is being added into this instance
bias  is being added into this instance


TypeError: object of type 'generator' has no len()