## 基本配置　　
### 导入包和版本查询

In [1]:
import PIL
import torch
import torch.nn as nn
import torchvision
print(torch.__version__)
print(torch.version.cuda)
print(torch.backends.cudnn.version())
print(torch.cuda.get_device_name(0))

1.5.0
10.1
7603
GeForce RTX 2080 Ti


### 更新PyTorch

`PyTorch`将被安装在`anaconda3/lib/python3.7/site-packages/torch/`目录下  
`conda update pytorch torchvision -c pytorch`  

### 固定随机种子

`torch.manual_seed(0)`

`torch.cuda.manual_seed_all(0)`  

### 指定程序运行在特定GPU卡上

在命令行指定环境变量

`CUDA_VISIBLE_DEVICES=0,1 python train.py`
或在代码中指定

`os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'`   

### 判断是否有CUDA支持

`torch.cuda.is_available()`

### 设置为cuDNN benchmark模式

`Benchmark`模式会提升计算速度，但是由于计算中有随机性，每次网络前馈结果略有差异。

`torch.backends.cudnn.benchmark = True`
如果想要避免这种结果波动，设置

`torch.backends.cudnn.deterministic = True`

### 清除GPU存储

有时`Control-C`中止运行后`GPU`存储没有及时释放，需要手动清空。在`PyTorch`内部可以

`torch.cuda.empty_cache()`  
或在命令行可以先使用`ps`找到程序的`PID`，再使用`kill`结束该进程

`ps aux | grep python`   
`ps aux:see every process on the system`  
`kill -9 [pid]`
或者直接重置没有被清空的`GPU`

`nvidia-smi --gpu-reset -i [gpu_id]`

In [2]:
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

### 张量(Tensor)处理  

`torch.is_tensor(obj)`  
`torch.is_storage(obj)`:Returns True if obj is a PyTorch storage object.  
`torch.is_complex(input), input:Tensor`  
`torch.is_floating_point(input), input:Tensor`  

In [2]:
import torch
torch.tensor([1.2, 3]).dtype # initial default for floating point is torch.float32

torch.float32

In [3]:
torch.set_default_dtype(torch.float64)

In [4]:
torch.tensor([1.2, 3]).dtype

torch.float64

In [5]:
torch.get_default_dtype()

torch.float64

In [6]:
torch.set_default_tensor_type(torch.FloatTensor)  # changed to torch.float32, the dtype for torch.FloatTensor

In [7]:
torch.get_default_dtype()

torch.float32

In [8]:
torch.set_default_tensor_type(torch.DoubleTensor)

In [9]:
torch.tensor([1.2, 3]).dtype 

torch.float64

`torch.numel(input)--> int, input:Tensor`  

Returns the total number of elements in the input tensor.

In [10]:
a = torch.randn(1, 2, 3, 4, 5)
torch.numel(a)

120

In [11]:
a = torch.zeros(4,4)
torch.numel(a)

16

In [57]:
# 类型转换
tensor = tensor.cuda()
tensor = tensor.cpu()
print(tensor.dtype)
tensor = tensor.long()
print(tensor.dtype)

torch.int64
torch.int64


`torch.set_printoptions(precision=None, threshold=None, edgeitems=None, linewidth=None, profile=None)`

`precision`是每一个元素的输出精度，默认是八位；  
`threshold`是输出时的阈值，当`tensor`中元素的个数大于该值时，进行缩略输出，默认时1000；  
`edgeitems`是输出的维度，默认是3；  
`linewidth`字面意思，每一行输出的长度；  
`profile=None`，修正默认设置（不太懂，感兴趣的可以试试)  

In [12]:
a = torch.randn(2, 3)

In [13]:
a

tensor([[ 0.8221, -1.8515, -0.3171],
        [ 0.1327,  0.3422,  0.3552]])

In [14]:
torch.set_printoptions(precision=2)

In [15]:
a

tensor([[ 0.82, -1.85, -0.32],
        [ 0.13,  0.34,  0.36]])

In [17]:
torch.set_printoptions(precision=8)

In [18]:
tensor = torch.randn(3,4,5)
print(tensor.type())  # 数据类型
print(tensor.size())  # 张量的shape，是个元组
print(tensor.dim())   # 维度的数量

torch.DoubleTensor
torch.Size([3, 4, 5])
3


In [19]:
torch.tensor([[0.11111, 0.222222, 0.3333333]],
                 dtype=torch.float64,
                 device=torch.device('cuda:0')) 

tensor([[0.11111000, 0.22222200, 0.33333330]], device='cuda:0')

### tensor.data()和tensor.detach()的区别

建议使用 `.detach()`, 区别在于 `.data` 返回和 `x` 的相同数据 `tensor`, 但不会加入到`x`的计算历史里，且`requires_grad = False`, 这样有些时候是不安全的, 因为**x.data不能被 autograd追踪求微分**。 `.detach()` 返回相同数据的 `tensor` ,且 `requires_grad=False` ,但能通过 `in-place` 操作报告给 `autograd` 在进行反向传播的时候.  

### tensor.data

In [3]:
a = torch.tensor([1,2,3.], requires_grad =True)

In [4]:
out = a.sigmoid()
out

tensor([0.7311, 0.8808, 0.9526], grad_fn=<SigmoidBackward>)

In [6]:
c = out.data
c

tensor([0.7311, 0.8808, 0.9526])

In [7]:
c.zero_()

tensor([0., 0., 0.])

In [8]:
out

tensor([0., 0., 0.], grad_fn=<SigmoidBackward>)

In [9]:
out.sum().backward()

In [10]:
a.grad                #  这个结果很严重的错误，因为out已经改变了

tensor([0., 0., 0.])

### tensor.detach()

In [12]:
a = torch.tensor([1,2,3.], requires_grad =True)
out = a.sigmoid()
out

tensor([0.7311, 0.8808, 0.9526], grad_fn=<SigmoidBackward>)

In [13]:
c = out.detach()
c

tensor([0.7311, 0.8808, 0.9526])

In [15]:
c.zero_()

tensor([0., 0., 0.])

In [16]:
out

tensor([0., 0., 0.], grad_fn=<SigmoidBackward>)

In [17]:
out.sum().backward()  #  需要原来out得值，但是已经被c.zero_()覆盖了，结果报错 ???

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [3]], which is output 0 of SigmoidBackward, is at version 2; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

### torch.as_tensor/zeros/empty/arange/linspace/eye/full/
`torch.as_tensor(data, dtype=None, device=None) → Tensor,data:array`  

数列转tensor

In [22]:
import numpy as np
a = np.array([1, 2, 3])
t = torch.as_tensor(a)

In [23]:
t

tensor([1, 2, 3])

In [24]:
t[0]

tensor(1)

In [25]:
t[0] = -1

In [26]:
t

tensor([-1,  2,  3])

In [27]:
torch.zeros(2,3) #shape is (2,3)

tensor([[0., 0., 0.],
        [0., 0., 0.]])

In [28]:
torch.zeros(5)

tensor([0., 0., 0., 0., 0.])

In [33]:
input = torch.empty(2, 3, dtype=torch.int64)
torch.zeros_like(input)

tensor([[0, 0, 0],
        [0, 0, 0]])

In [30]:
torch.arange(1, 2.5, 0.5)

tensor([1.00000000, 1.50000000, 2.00000000])

In [31]:
torch.linspace(start=-10, end=10, steps=5) #steps:number of points to sample between start and end. Default: 100.

tensor([-10.,  -5.,   0.,   5.,  10.])

In [34]:
torch.eye(3, 3)

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])

In [35]:
torch.full((2, 3), 3.141592)

tensor([[3.14159200, 3.14159200, 3.14159200],
        [3.14159200, 3.14159200, 3.14159200]])

### torch.cat and torch.chunk
`torch.chunk(tensor, chunk_num, dim)`是将`tensor`按`dim`（行或列）分割成`chunk_num`个`tensor`块，返回的是一个元组。

In [42]:
a = torch.Tensor([[1,2,4]])
b = torch.Tensor([[4,5,7], [3,9,8], [9,6,7]])
c = torch.cat((a,b), dim=0)
print(c)
print(c.size())
print('********************')
d = torch.chunk(c,4,dim=0)
print(d)
print(len(d))

tensor([[1., 2., 4.],
        [4., 5., 7.],
        [3., 9., 8.],
        [9., 6., 7.]])
torch.Size([4, 3])
********************
(tensor([[1., 2., 4.]]), tensor([[4., 5., 7.]]), tensor([[3., 9., 8.]]), tensor([[9., 6., 7.]]))
4


### torch.narrow

In [43]:
x = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
x

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

parameter:

`input (Tensor) – the tensor to narrow`    
`dim (int) – the dimension along which to narrow`    
`start (int) – the starting dimension`  
`length (int) – the distance to the ending dimension`  

In [44]:
torch.narrow(x, 0, 0, 2) 

tensor([[1, 2, 3],
        [4, 5, 6]])

In [45]:
torch.narrow(x, 1, 1, 2)

tensor([[2, 3],
        [5, 6],
        [8, 9]])

### torch.nonzero

In [46]:
torch.nonzero(torch.tensor([[0.6, 0.0, 0.0, 0.0],
                            [0.0, 0.4, 0.0, 0.0],
                            [0.0, 0.0, 1.2, 0.0],
                            [0.0, 0.0, 0.0,-0.4]]))

	nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
	nonzero(Tensor input, *, bool as_tuple)


tensor([[0, 0],
        [1, 1],
        [2, 2],
        [3, 3]])

In [47]:
torch.nonzero(torch.tensor([[0.6, 0.0, 0.0, 0.0],
                                [0.0, 0.4, 0.0, 0.0],
                                [0.0, 0.0, 1.2, 0.0],
                                [0.0, 0.0, 0.0,-0.4]]), as_tuple=True)

(tensor([0, 1, 2, 3]), tensor([0, 1, 2, 3]))

### torch.unsqueeze  
扩展维度

In [48]:
import torch

x = torch.Tensor([1, 2, 3, 4])  # torch.Tensor是默认的tensor类型（torch.FlaotTensor）的简称。

print('-' * 50)
print(x)  # tensor([1., 2., 3., 4.])
print(x.size())  # torch.Size([4])
print(x.dim())  # 1
print(x.numpy())  # [1. 2. 3. 4.]

--------------------------------------------------
tensor([1., 2., 3., 4.])
torch.Size([4])
1
[1. 2. 3. 4.]


In [49]:
print('-' * 50)
print(torch.unsqueeze(x, 0))  # tensor([[1., 2., 3., 4.]])
print(torch.unsqueeze(x, 0).size())  # torch.Size([1, 4])
print(torch.unsqueeze(x, 0).dim())  # 2
print(torch.unsqueeze(x, 0).numpy())  # [[1. 2. 3. 4.]]

--------------------------------------------------
tensor([[1., 2., 3., 4.]])
torch.Size([1, 4])
2
[[1. 2. 3. 4.]]


In [50]:
print('-' * 50)
print(torch.unsqueeze(x, 1))
print(torch.unsqueeze(x, 1).size())  # torch.Size([4, 1])
print(torch.unsqueeze(x, 1).dim())  # 2

--------------------------------------------------
tensor([[1.],
        [2.],
        [3.],
        [4.]])
torch.Size([4, 1])
2


In [51]:
print('-' * 50)
print(torch.unsqueeze(x, -1))
print(torch.unsqueeze(x, -1).size())  # torch.Size([4, 1])
print(torch.unsqueeze(x, -1).dim())  # 2

--------------------------------------------------
tensor([[1.],
        [2.],
        [3.],
        [4.]])
torch.Size([4, 1])
2


In [52]:
print('-' * 50)
print(torch.unsqueeze(x, -2))  # tensor([[1., 2., 3., 4.]])
print(torch.unsqueeze(x, -2).size())  # torch.Size([1, 4])
print(torch.unsqueeze(x, -2).dim())  # 2

--------------------------------------------------
tensor([[1., 2., 3., 4.]])
torch.Size([1, 4])
2


`unsqueeze_` 和`unsqueeze` 实现一样的功能,区别在于 `unsqueeze_` 是 `in_place` 操作,即 `unsqueeze` 不会对使用 `unsqueeze` 的 `tensor` 进行改变,想要获取 `unsqueeze` 后的值必须赋予个新值, `unsqueeze_` 则会对自己改变。

### torch.squeeze  
将输入张量形状中的1 去除并返回。 如果输入是形如(A×1×B×1×C×1×D)，那么输出形状就为： (A×B×C×D)
当给定dim时，那么挤压操作只在给定维度上。例如，输入形状为: (A×1×B), squeeze(input, 0) 将会保持张量不变，只有用 squeeze(input, 1)，形状会变成 (A×B)。

多维张量本质上就是一个变换，如果维度是 1 ，那么，1 仅仅起到扩充维度的作用，而没有其他用途，因而，在进行降维操作时，为了加快计算，是可以去掉这些 1 的维度。

In [53]:
m = torch.zeros(2, 1, 2, 1, 2)
print(m.size())  # torch.Size([2, 1, 2, 1, 2])

n = torch.squeeze(m)
print(n.size())  # torch.Size([2, 2, 2])

n = torch.squeeze(m, 0)  # 当给定dim时，那么挤压操作只在给定维度上
print(n.size())  # torch.Size([2, 1, 2, 1, 2])

torch.Size([2, 1, 2, 1, 2])
torch.Size([2, 2, 2])
torch.Size([2, 1, 2, 1, 2])


In [None]:
n = torch.squeeze(m, 1)
print(n.size())  # torch.Size([2, 2, 1, 2])

In [54]:
n = torch.squeeze(m, 2)
print(n.size())  # torch.Size([2, 1, 2, 1, 2])

n = torch.squeeze(m, 3)
print(n.size())  # torch.Size([2, 1, 2, 2])

torch.Size([2, 1, 2, 1, 2])
torch.Size([2, 1, 2, 2])


In [4]:
# Tensor[N, C, H, W]
images = torch.randn(32, 3, 56, 56)

In [5]:
images.sum(dim=1).shape

torch.Size([32, 56, 56])

In [6]:
NCHW = ['N', 'C', 'H', 'W']
images = torch.randn(32, 3, 56, 56, names=NCHW)



In [7]:
images.sum('C').shape

torch.Size([32, 56, 56])

In [8]:
images.select('C', index=0)

tensor([[[ 5.6744e-01,  4.1752e-01,  2.8830e-01,  ..., -8.4218e-01,
          -2.1659e-01, -9.8932e-01],
         [ 1.1841e+00, -1.5200e+00, -1.2679e-02,  ..., -1.4709e+00,
           2.9981e-01, -1.3500e+00],
         [ 7.9807e-01,  1.0632e+00, -1.2566e+00,  ...,  6.1915e-01,
          -1.3127e+00,  1.3306e-01],
         ...,
         [ 2.1473e+00, -8.3112e-01,  1.4068e+00,  ..., -7.4190e-01,
           2.0996e+00, -1.2519e+00],
         [-3.0205e-01,  3.9233e-03, -4.0610e-01,  ...,  4.1151e-01,
           3.7060e-01,  1.5014e-01],
         [ 1.3275e-01, -1.1107e+00,  1.2884e+00,  ...,  1.1835e-01,
           1.3068e+00,  1.6116e-01]],

        [[ 3.6937e-01,  2.1532e-01, -1.0256e+00,  ..., -2.1193e-01,
           9.5672e-02,  1.5446e+00],
         [ 5.0710e-01, -4.4327e-01,  4.4714e-01,  ..., -1.2731e+00,
          -4.8913e-01,  8.1772e-01],
         [-3.6150e-01,  7.9346e-01,  4.3908e-01,  ..., -1.7832e-01,
          -3.3333e-01, -1.1776e+00],
         ...,
         [-8.9571e-01,  2

In [9]:
tensor = torch.rand(3,4,1,2,names=('C', 'N', 'H', 'W'))

In [10]:
tensor.shape

torch.Size([3, 4, 1, 2])

In [11]:
tensor = tensor.align_to('N', 'C', 'H', 'W')

In [12]:
tensor.shape

torch.Size([4, 3, 1, 2])

In [25]:
# torch.Tensor -> PIL.Image.  
torch.clamp(img_2 * 255, min=0, max=255).byte()

tensor([[[  0,   0, 255,  ...,   0, 255,   0],
         [  0, 242,   0,  ...,   0,   0,   0],
         [255,   0,   0,  ...,   0,   0,   0],
         ...,
         [  0,   0,   0,  ..., 110,   0, 139],
         [  0, 147,   0,  ..., 153,  15, 185],
         [  0,  11,   0,  ...,   0, 146, 174]],

        [[230,   0, 255,  ..., 140,   0, 255],
         [  0, 177, 229,  ...,   0, 236,   0],
         [  0, 255, 184,  ...,   0,   0,   0],
         ...,
         [  0,   0,  57,  ..., 255,   0,  62],
         [ 86,   0,   0,  ...,   0,   0, 213],
         [255, 255,   0,  ...,   0,  78,   0]],

        [[  0,   0,   0,  ...,  54,   0, 242],
         [  0, 113, 114,  ..., 255, 154, 128],
         [114,   0, 255,  ..., 244, 255,   0],
         ...,
         [ 87,   0,   0,  ...,   0,   0,  76],
         [  0,   0,   0,  ...,   0,   0, 255],
         [  0, 177,   0,  ...,  72, 255, 136]]], dtype=torch.uint8)

In [29]:
image = torch.clamp(img_2 * 255, min=0, max=255).byte().permute(1, 2, 0).cpu().numpy() # permute：维度换位　
#image = torchvision.transforms.functional.to_pil_image(img_2)  # Equivalently way


In [41]:
# PIL.Image -> torch.Tensor.
tensor = torch.from_numpy(np.asarray(PIL.Image.open('/home/weiweia92/Downloads/kobe.jpeg'))).permute(2, 0, 1).float()/255
#tensor = torchvision.transforms.functional.to_tensor(PIL.Image.open(path))  # Equivalently way

### np.ndarray与PIL.Image转换

`# np.ndarray -> PIL.Image.`
`image = PIL.Image.fromarray(ndarray.astypde(np.uint8))`

`# PIL.Image -> np.ndarray.`
`ndarray = np.asarray(PIL.Image.open(path))`

In [42]:
value = torch.rand(1).item() # 提取值

In [43]:
value

0.2026575207710266

### tensor 变形  
在将卷积层输入全连接层的情况下通常需要对张量做形变处理，
相比`torch.view，torch.reshape`可以自动处理输入张量不连续的情况。

In [58]:
tensor = torch.rand(2,3,4)
shape = (6, 4)
tensor = torch.reshape(tensor, shape)

In [59]:
tensor.shape

torch.Size([6, 4])

In [63]:
x = torch.randn(2, 3)
x

tensor([[ 0.50822594, -0.53035324, -1.06180327],
        [-0.53724749,  0.00923640,  0.29436072]])

### torch.transpose和torch.permute的区别  
相同点:都是交换维度用的  
不同点:
`Tensor.permute(a,b,c,d, ...)：permute`函数可以对任意高维矩阵进行转置，但没有 `torch.permute()` 这个调用方式， 只能 `Tensor.permute()`  
`torch.transpose(Tensor, a,b)：transpose`只能操作2D矩阵的转置，有两种调用方式  

In [64]:
torch.transpose(x, 0, 1)

tensor([[ 0.50822594, -0.53724749],
        [-0.53035324,  0.00923640],
        [-1.06180327,  0.29436072]])

In [69]:
x = torch.randn(2, 3, 5)
x.size()

torch.Size([2, 3, 5])

In [71]:
x.permute(2, 0, 1).size()

torch.Size([5, 2, 3])

### torch.Generator

In [76]:
g_cpu = torch.Generator()

In [77]:
g_cpu

<torch._C.Generator at 0x7f45ad2a0550>

In [78]:
g_cpu.device

device(type='cpu')

In [79]:
g_cuda = torch.Generator(device='cuda')

In [80]:
g_cuda

<torch._C.Generator at 0x7f45ad2a03d0>

In [81]:
g_cuda.device

device(type='cuda')

### torch.Generator's method 
`get_state`  
`initial_seed`  
`manual_seed`  
`seed`  
`set_state`  

In [82]:
g_cpu.get_state() # a torch.ByteTensor

tensor([  1, 209, 156,  ...,   0,   0,   0], dtype=torch.uint8)

In [83]:
g_cpu.initial_seed()

67280421310721

In [84]:
g_cpu.manual_seed(2147483647)

<torch._C.Generator at 0x7f45ad2a0550>

### Random Sampling   
`torch.seed()`
`torch.manual_seed(seed)`  
`torch.initial_seed()`  
`torch.get_rng_state()`  
`torch.set_rng_state()`  
`torch.default_generator`  

### 概率统计

In [87]:
a = torch.empty(3, 3).uniform_(0, 1)  # generate a uniform random matrix with range [0, 1]
a

tensor([[0.29625628, 0.10042466, 0.94973225],
        [0.02867882, 0.46407592, 0.45112413],
        [0.99227606, 0.27476751, 0.11713139]])

In [88]:
torch.bernoulli(a)

tensor([[0., 0., 1.],
        [0., 1., 0.],
        [1., 1., 0.]])

In [89]:
torch.normal(mean=torch.arange(1., 11.), std=torch.arange(1, 0, -0.1))

tensor([1.44029525, 2.09361571, 1.39344268, 4.79772939, 5.50238010, 6.08322211,
        7.40052536, 7.77761422, 8.85335743, 9.91384263])

In [90]:
rates = torch.rand(4, 4) * 5
torch.poisson(rates)

tensor([[4., 1., 3., 0.],
        [4., 0., 2., 0.],
        [0., 5., 1., 0.],
        [0., 1., 0., 1.]])

In [91]:
torch.randint(3, 10, (2, 2))

tensor([[5, 4],
        [3, 9]])

In [92]:
torch.randn(2, 3)

tensor([[ 1.08835184, -0.04653659,  0.82810138],
        [-0.10339053,  0.52852463,  0.75752584]])

### torch.unbind

In [65]:
torch.unbind(x)

(tensor([ 0.50822594, -0.53035324, -1.06180327]),
 tensor([-0.53724749,  0.00923640,  0.29436072]))

### torch.take  
将`input tensor`视为一维tensor，按照索引取值

In [66]:
src = torch.tensor([[4, 3, 5],[6, 7, 8]])
torch.take(src, torch.tensor([0, 2, 5]))

tensor([4, 5, 8])

In [67]:
x = torch.randn(3, 2)
y = torch.ones(3, 2)
x

tensor([[-3.67208683e-02,  2.12627604e+00],
        [-1.70442794e+00, -7.90883500e-02],
        [-2.75677552e-04, -3.71878714e-01]])

In [68]:
torch.where(x>0, x, y)

tensor([[1.00000000, 2.12627604],
        [1.00000000, 1.00000000],
        [1.00000000, 1.00000000]])

### 打乱顺序

`tensor = tensor[torch.randperm(tensor.size(0))]  # Shuffle the first dimension`  

### 水平翻转

`PyTorch`不支持`tensor[::-1]`这样的负步长操作，水平翻转可以用张量索引实现。

`# Assume tensor has shape N*D*H*W.`
`tensor = tensor[:, :, :, torch.arange(tensor.size(3) - 1, -1, -1).long()]`  

### 复制张量

有三种复制的方式，对应不同的需求。   
|`tensor.clone()`         |    `New/Shared memory      New `        | `Still in computation graph         Yes` |   
|`tensor.detach()`        |    `New/Shared memory      Shared`      | `Still in computation graph          No` |  
|`tensor.detach.clone()()`|    `New/Shared memory      New`         | `Still in computation graph          No` |  

### 拼接张量

注意`torch.cat`和`torch.stac`k的区别在于`torch.cat`沿着给定的维度拼接，而`torch.stack`会新增一维。例如当参数是3个10×5的张量，`torch.cat`的结果是30×5的张量，而`torch.stack`的结果是3×10×5的张量。

In [59]:
tensor1 = torch.rand(10, 5)
tensor2 = torch.rand(10, 5)
tensor3 = torch.rand(10, 5)
tensor_cat = torch.cat([tensor1, tensor2, tensor3], dim=0)

In [60]:
tensor_cat.shape

torch.Size([30, 5])

In [74]:
tensor_cat.size()

torch.Size([30, 5])

In [63]:
tensor1 = torch.rand(10, 5)
tensor2 = torch.rand(10, 5)
tensor3 = torch.rand(10, 5)
tensor_stack = torch.stack([tensor1, tensor2, tensor3], dim=0)

In [64]:
tensor_stack.shape

torch.Size([3, 10, 5])

In [53]:
tensor = torch.rand(3, 2, 2)

In [66]:
tensor

tensor([[[0.0219, 0.6251],
         [0.0798, 0.6325]],

        [[0.9203, 0.3982],
         [0.0589, 0.8266]],

        [[0.8809, 0.1143],
         [0.9980, 0.5072]]])

In [67]:
N = tensor.size(0)
N

3

### Parallelism

In [97]:
torch.get_num_threads()

16

In [98]:
torch.set_num_threads(8)

In [99]:
torch.get_num_threads()

8

### Locally disabling gradient computation  

These context managers are thread local, so they won’t work if you send work to another thread using the threading module

### 判断两个tensor相等

In [73]:
torch.allclose(tensor1, tensor2)  # float tensor
torch.equal(tensor1, tensor2)     # int tensor

False

### 矩阵乘法

`# Matrix multiplication: (m*n) * (n*p) -> (m*p).`  
`result = torch.mm(tensor1, tensor2)`  

`# Batch matrix multiplication: (b*m*n) * (b*n*p) -> (b*m*p).`  
`result = torch.bmm(tensor1, tensor2)`  

`# Element-wise multiplication.`  
`result = tensor1 * tensor2`  
计算两组数据之间的两两欧式距离  

`# X1 is of shape m*d, X2 is of shape n*d.`  
`dist = torch.sqrt(torch.sum((X1[:,None,:] - X2) ** 2, dim=2))`   

## 模型定义  

### 卷积层

最常用的卷积层配置是

`conv = torch.nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=True)`  
`conv = torch.nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0, bias=True)`  

### GAP（Global average pooling）层

`gap = torch.nn.AdaptiveAvgPool2d(output_size=1)`  

### 双线性汇合（bilinear pooling)  
`
X = torch.reshape(N, D, H * W)                        # Assume X has shape N*D*H*W
X = torch.bmm(X, torch.transpose(X, 1, 2)) / (H * W)  # Bilinear pooling
assert X.size() == (N, D, D)
X = torch.reshape(X, (N, D * D))
X = torch.sign(X) * torch.sqrt(torch.abs(X) + 1e-5)   # Signed-sqrt normalization
X = torch.nn.functional.normalize(X)                  # L2 normalization`  

### 多卡同步BN（Batch normalization）

当使用`torch.nn.DataParallel`将代码运行在多张GPU卡上时，PyTorch的BN层默认操作是各卡上数据独立地计算均值和标准差，同步BN使用所有卡上的数据一起计算BN层的均值和标准差，缓解了当批量大小`（batch size）`比较小时对均值和标准差估计不准的情况，是在目标检测等任务中一个有效的提升性能的技巧。

现在PyTorch官方已经支持同步BN操作

`sync_bn = torch.nn.SyncBatchNorm(num_features, eps=1e-05, momentum=0.1, affine=True, 
                                 track_running_stats=True)`  

将已有网络的所有BN层改为同步BN层

In [76]:
def convertBNtoSyncBN(module, process_group=None):
    '''Recursively replace all BN layers to SyncBN layer.

    Args:
        module[torch.nn.Module]. Network
    '''
    if isinstance(module, torch.nn.modules.batchnorm._BatchNorm):
        sync_bn = torch.nn.SyncBatchNorm(module.num_features, module.eps, module.momentum, 
                                         module.affine, module.track_running_stats, process_group)
        sync_bn.running_mean = module.running_mean
        sync_bn.running_var = module.running_var
        if module.affine:
            sync_bn.weight = module.weight.clone().detach()
            sync_bn.bias = module.bias.clone().detach()
        return sync_bn
    else:
        for name, child_module in module.named_children():
            setattr(module, name) = convert_syncbn_model(child_module, process_group=process_group)
        return module

SyntaxError: can't assign to function call (<ipython-input-76-801794ad3fe5>, line 18)

### 计算模型整体参数量


`# torch.numel:返回输入张量中元素的总数   
num_parameters = sum(torch.numel(parameter) for parameter in model.parameters())`  

### 模型权值初始化

注意`model.modules()`和`model.children()`的区别：`model.modules()`会迭代地遍历模型的所有子层，而`model.children()`只会返回模型最外层的子层


`# Common practise for initialization.
for layer in model.modules():
    if isinstance(layer, torch.nn.Conv2d):
        torch.nn.init.kaiming_normal_(layer.weight, mode='fan_out',
                                      nonlinearity='relu')
        if layer.bias is not None:
            torch.nn.init.constant_(layer.bias, val=0.0)
    elif isinstance(layer, torch.nn.BatchNorm2d):
        torch.nn.init.constant_(layer.weight, val=1.0)
        torch.nn.init.constant_(layer.bias, val=0.0)
    elif isinstance(layer, torch.nn.Linear):
        torch.nn.init.xavier_normal_(layer.weight)
        if layer.bias is not None:
            torch.nn.init.constant_(layer.bias, val=0.0)`

`# Initialization with given tensor.
layer.weight = torch.nn.Parameter(tensor)`

## PyTorch其他注意事项

### 模型定义

建议有参数的层和汇合`（pooling）`层使用`torch.nn`模块定义，激活函数直接使用`torch.nn.functional`。`torch.nn`模块和`torch.nn.functional`的区别在于，`torch.nn`模块在计算时底层调用了`torch.nn.functional`，但`torch.nn`模块包括该层参数，还可以应对训练和测试两种网络状态。使用`torch.nn.functional`时要注意网络状态，如
`def forward(self, x):
    ...
    x = torch.nn.functional.dropout(x, p=0.5, training=self.training)`
`model(x)`前用`model.train()`和`model.eval()`切换网络状态。
不需要计算梯度的代码块用`with torch.no_grad()`包含起来。`model.eval()`和`torch.no_grad()`的区别在于，`model.eval()`是将网络切换为测试状态，例如BN和dropout在训练和测试阶段使用不同的计算方法。`torch.no_grad()`是关闭`PyTorch`张量的自动求导机制，以减少存储使用和加速计算，得到的结果无法进行`loss.backward()`。
`torch.nn.CrossEntropyLoss`的输入不需要经过`Softmax`。`torch.nn.CrossEntropyLoss`等价于`torch.nn.functional.log_softmax + torch.nn.NLLLoss`。
`loss.backward()`前用`optimizer.zero_grad()`清除累积梯度。`optimizer.zero_grad()`和`model.zero_grad()`效果一样。