# Pytorch 简介
## 介绍 Torch's tensor 库
深度学习中我们操作的是 tensor。tensor 是向量和矩阵的一种泛化延伸。向量是 1d 的 tensor，矩阵是 2d 的 tensor。首先让我们来看看 tensor 能做些什么。

In [1]:
import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)

<torch._C.Generator at 0x1114bbe50>

### 创建 tensors
我们可以通过 torch.Tensor() 方法把 Python list 变成 Tensors。

In [4]:
# torch.tensor(data) creates a torch.Tensor object with the given data
V_data = [1., 2., 3.]
V = torch.tensor(V_data)
print(V)

# Creats a matrix
M_data = [[1., 2., 3.], [4., 5., 6.]]
M = torch.tensor(M_data)
print(M)

# Create a 3D tensor of size 2x2x2.
T_data = [[[1., 2.], [3., 4.]],
          [[5., 6.], [7., 8.]]]
T = torch.tensor(T_data)
print(T)

# Index into V and get a scalar (0 dimensional tensor)
print(V[0])
# Get a Python number from it
print(V[0].item())

# Index into M and get a vector
print(M[0])

# Index into T and get a matrix
print(T[0])

tensor([ 1.,  2.,  3.])
tensor([[ 1.,  2.,  3.],
        [ 4.,  5.,  6.]])
tensor([[[ 1.,  2.],
         [ 3.,  4.]],

        [[ 5.,  6.],
         [ 7.,  8.]]])
tensor(1.)
1.0
tensor([ 1.,  2.,  3.])
tensor([[ 1.,  2.],
        [ 3.,  4.]])


你也可以创建其他数据类型的 tensor。具体请查看官方文档。默认情况下，我们创建的 tensors 都是 Float 型的。如果想要创建整型的 tensors 可以尝试使用 torch.LongTensor()。一般而言，Float 和 Long 是最常用的。

你可以创建一个多维随机数据的 tensor 通过使用 torch.randn()。

In [9]:
x = torch.randn((3, 4, 5))
print("x.size() is {}".format(x.size()))
print(x)

x.size() is torch.Size([3, 4, 5])
tensor([[[ 0.7575, -0.4068, -0.1277,  0.2804,  0.0375],
         [-0.6378, -0.8148, -0.6895,  0.7705, -1.0739],
         [-0.2015, -0.5603,  0.6817, -0.5170,  1.7902],
         [ 0.5877,  0.2505, -0.7930,  0.5231,  1.2236]],

        [[-0.9458, -1.3529,  3.3837, -2.4044, -0.3891],
         [-0.0796,  0.7605, -1.0025,  0.9462,  0.3512],
         [ 1.5728,  1.7185, -0.0594, -2.4919,  0.2423],
         [ 0.2883, -0.1095,  0.3126,  1.5038,  0.5038]],

        [[ 0.6223, -0.4481, -0.2856,  0.3880,  0.2352],
         [ 1.9142,  1.8364,  1.3245, -0.0705,  0.3470],
         [-0.6537,  1.5586,  0.2186, -0.5743,  1.4571],
         [ 1.7710, -2.0173,  0.4235,  0.5730, -1.7962]]])


### Tensor 之间的操作

In [10]:
x = torch.tensor([1., 2., 3.])
y = torch.tensor([4., 5., 6.])
z = x + y
print(z)

tensor([ 5.,  7.,  9.])


可以查看官方文档看更多的操作。这里的操作不止局限于数学运算，还有很多的扩展。

其中，concatenation 是一个很有用的操作。

In [12]:
# By default, it concatecates along the first axis (concatecate rows)
x_1 = torch.randn(2, 5)
y_1 = torch.randn(3, 5)
z_1 = torch.cat([x_1, y_1])
print(z_1.size())

# Concatenate columns:
x_2 = torch.randn(2, 3)
y_2 = torch.randn(2, 5)
z_2 = torch.cat([x_2, y_2], 1)
print(z_2.size())

# If your tensors are not compatible, torch will complain.  Uncomment to see the error
# torch.cat([x_1, x_2])

torch.Size([5, 5])
torch.Size([2, 8])


### 改变 tensor 的形状（Reshaping Tensors）
用 .view() 来改变 tensor 的形状。这个方法很常用。因为在深度学习中，很多模块要求特定尺寸的输入。所以在输入某个模块之前，你需要改变 tensor 的尺寸来适应输入大小的要求。

In [14]:
x = torch.randn(2, 3, 4)
print(x)
print(x.view(2, 12)) # Reshape to 2 rows, 12 columns
# Same as above. If one of the dimensions is -1, its size can be inferred
print(x.view(2, -1))

tensor([[[ 0.9385,  1.4253,  1.5083,  0.1054],
         [-1.6050, -0.1064,  0.2466,  0.6125],
         [-0.7129, -0.0639,  1.0757, -0.5536]],

        [[-1.6160,  0.0934, -1.3898, -0.3105],
         [ 0.2693, -0.4886,  1.3694,  0.4539],
         [-0.0498,  0.3745,  1.4389,  1.4151]]])
tensor([[ 0.9385,  1.4253,  1.5083,  0.1054, -1.6050, -0.1064,  0.2466,
          0.6125, -0.7129, -0.0639,  1.0757, -0.5536],
        [-1.6160,  0.0934, -1.3898, -0.3105,  0.2693, -0.4886,  1.3694,
          0.4539, -0.0498,  0.3745,  1.4389,  1.4151]])
torch.Size([2, 3, 4])
tensor([[ 0.9385,  1.4253,  1.5083,  0.1054, -1.6050, -0.1064,  0.2466,
          0.6125, -0.7129, -0.0639,  1.0757, -0.5536],
        [-1.6160,  0.0934, -1.3898, -0.3105,  0.2693, -0.4886,  1.3694,
          0.4539, -0.0498,  0.3745,  1.4389,  1.4151]])


## 计算图和自动微分
计算图的概念对于高效的深度学习编程来说尤为重要。因为他可以让你免于繁琐的反向梯度传播的计算编写的痛苦。计算图定义了输出的 tensors 是输入的 tensors 通过哪些运算得到的。即它可以跟踪输出 tensors 的计算历史。这样就为自动计算梯度提供了可能。在程序中，我们通过 requires_grad 这个标志来决定是否跟踪一个 tensor 的计算历史。

首先，对于程序员来说，我们只关心 torch.Tensor 中的数据以及它的 shape。当我们把两个 tensors 相加得到另一个 tensor。如果我们得到这个 tensor，我们不知道 tensor 是怎么来的（它有可能是从文件中读取的，有可能几个 tensors 相乘得到的等等）。但是，如果 requires_grad=True。则我们能够追踪这个 output 的 tensor 是如何被创造出来的。

接下来看一小段代码加以理解：

In [18]:
# Tensor factory methods have a ''requires_grad'' flag
x = torch.tensor([1., 2., 3.], requires_grad=True)

# With requires_grad=True, you can still do all 
# the operations you previously could
y = torch.tensor([4., 5., 6.], requires_grad=True)
z = x + y
print(z)

# BUT z knows something extra
print(z.grad_fn)

tensor([ 5.,  7.,  9.])
<AddBackward1 object at 0x114ef4cf8>


所以，你知道 z 是通过相加操作得到的，而不是从文件或者相乘或者指数操作等得到的结果。只要你顺着 grad_fn 方向继续下去，你就能找到你的 x 和 y。

但是这个如何帮助我们计算梯度呢？

In [20]:
# Lets sum up all the entries in z
s = z.sum()
print(s)
print(s.grad_fn)

tensor(21.)
<SumBackward0 object at 0x114ef4dd8>


通过微积分的知识我们可以知道，s 对于 x 的第一个元素的微分为 1（可以通过链式法则求解）。那么接下来我们来验证一下：

In [30]:
print(x.grad) # before backward check for gradient of x
# if x.grad is not None, uncomment to assign x.grad = None
# x.grad = None
s.backward()
print(x.grad)

tensor([ 1.,  1.,  1.])
tensor([ 1.,  1.,  1.])


理解下面的代码对于深度学习的编码很总要：

In [31]:
x = torch.randn(2, 2)
y = torch.randn(2, 2)
# By default, user created Tensors have ``requires_grad=False``
print(x.requires_grad, y.requires_grad)
z = x + y
# So you can't backprop through z
print(z.grad_fn)

# ``.requires_grad_( ... )`` changes an existing Tensor's ``requires_grad``
# flag in-place. The input flag defaults to ``True`` if not given.
x = x.requires_grad_()
y = y.requires_grad_()
# z contains enough information to compute gradients, as we saw above
z = x + y
print(z.grad_fn)
# If any input to an operation has ``requires_grad=True``, so will the output
print(z.requires_grad)

# Now z has the computation history that relates itself to x and y
# Can we just take its values, and **detach** it from its history?
new_z = z.detach()

# ... does new_z have information to backprop to x and y?
# NO!
print(new_z.grad_fn)
# And how could it? ``z.detach()`` returns a tensor that shares the same storage
# as ``z``, but with the computation history forgotten. It doesn't know anything
# about how it was computed.
# In essence, we have broken the Tensor away from its past history

False False
None
<AddBackward1 object at 0x114f110b8>
True
None


你可以通过 .requires_grad=True by wrapping the code block in with torch.no_grad(): 来停止追踪 tensors 的计算历史

In [32]:
print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)

True
True
False
