![alt text](pictures/cover.png "PyTorch")

# Notes of PyTorch
**author: Zhang Xiwu;**

*Creating date: 2017.05.08.*

## Package: <span style="color:red;">Autograd</span>
**Autograde**中有两个重要的类，分别为：<span style="color:blue;">Variable</span>类和<span style="color:blue;">Function</span>类。
- Variable的结构如下图所示，其中封装了.data, .grad和.creator属性。
 - .data就是Variable的数据本身，也就是Tensor类型的数据；
 - .grad存储了关于这个变量的梯度；
 - .creator指明产生这个Variable的Function。
![alt text](pictures/Variable.png "Variable")

- Function
 - <span style="color:blue;">Variable</span> and <span style="color:blue;">Function</span> are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each variable has a <span style="color:blue;">.creator</span> attribute that references a  <span style="color:blue;">Function</span> that has created the <span style="color:blue;">Variable</span> (except for Variables created by the user - their  <span style="color:blue;">creator is None</span>).


### Variable的.backward()方法
求导的方法是调用Variable的.backward()，如果Variable是一个标量，则.backward()不需要参数；而当Variable的元素个数大于1的时候，需要对.backward()指定一个与Variable大小一样的参数。
#### 下面的例子演示了当Variable只有一个元素的时候，如何求导：

In [3]:
import torch 
from torch.autograd import Variable

x = Variable(torch.ones(2, 2), requires_grad = True) # 创建一个Variable

# operations
y = x + 2
z = y * y * 3
out = z.mean()

# 求梯度
out.backward()

# out关于x的导数：d(out)/dx
print(x.grad)

Variable containing:
 4.5000  4.5000
 4.5000  4.5000
[torch.FloatTensor of size 2x2]



#### 求导的过程：
$out = \displaystyle \frac{1}{4}\sum_i z_i$, 
$z_i = 3(x_i+2)^2$ and $z_i\bigr\rvert_{x_i=1} = 27$.

Therefore,
$\displaystyle \frac{\partial out}{\partial x_i} = \frac{3}{2}(x_i+2)$, hence
$\displaystyle \frac{\partial out}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$.

___
#### Variable元素个数大于1的情况：

In [20]:
x = Variable(torch.Tensor([1, 3, 1, 2]), requires_grad = True)
y = x + 2
z = y * y * 3
print "z= \n", z

gradients = torch.Tensor([1,1,0.1,1])  # 需要对.backward()指定的参数
z.backward(gradients)
print "dz/dx = \n", x.grad

z= 
Variable containing:
 27
 75
 27
 48
[torch.FloatTensor of size 4]

dz/dx = 
Variable containing:
 18.0000
 30.0000
  1.8000
 24.0000
[torch.FloatTensor of size 4]



注：$z = 3(x+2)^2$, so $\displaystyle \frac{dz}{dx_i}=6(x_i+2)$。

##### Q：这里面的‘gradients’参数应该如何理解？
Ans: 
- From Doc: gradient (Tensor) – Gradient of the differentiated function w.r.t. the data. Required only if the data has more than one element. Type and location should match these of self.data.

- （个人理解）是求梯度而不是简单的求导数，‘gradients’相当于提供步长？