# PyTorch Basics (Week 1)

### Source: Chapter 1, Natural Language Processing with Pytorch. (2019). Delip Rao and Brian McMahan. O’Reilly: source code available on https://github.com/joosthub/PyTorchNLPBook

### PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Meta's (old Facebook's) AI Research lab.

### PyTorch tutorial: refer to https://pytorch.org/tutorials/

In [3]:
import torch
import numpy as np
torch.manual_seed(1234)

<torch._C.Generator at 0x1945bcc48f0>

## Tensors

* Scalar is a single number.
* Vector is an array of numbers.
* Matrix is a 2-D array of numbers.
* Tensors are N-D arrays of numbers.

#### Creating Tensors

You can create tensors by specifying the shape as arguments.  Here is a tensor with 2 rows and 3 columns

In [4]:
x = torch.Tensor(2, 3)
print(x)

tensor([[3.4600e+18, 2.1216e-42, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00]])


In [5]:
def describe(x):
    print("Type: {}".format(x.type()))
    print("Shape/size: {}".format(x.shape))
    print("Values: \n{}\n".format(x))

In [6]:
describe(torch.Tensor(2, 3))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[3.4601e+18, 2.1216e-42, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00]])



In [7]:
describe(torch.randn(2, 3))  # the standard normal distribution (average is 0; standard deviation is 1)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[ 0.0461,  0.4024, -1.0115],
        [ 0.2167, -0.6123,  0.5036]])



It's common in prototyping to create a tensor with random numbers of a specific shape.
在原型设计中，使用特定形状的随机数创建张量是很常见的。

In [8]:
x = torch.rand(2, 3)  # a uniform distribution on the interval [0, 1) ; i.e. 0 <= value < 1
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0.7749, 0.8208, 0.2793],
        [0.6817, 0.2837, 0.6567]])



You can also initialize tensors of ones or zeros.

In [9]:
describe(torch.zeros(2, 3))
describe(torch.ones(2, 3))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0., 0., 0.],
        [0., 0., 0.]])

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1., 1., 1.],
        [1., 1., 1.]])



Tensors can be initialized and then filled in place. 
张量可以初始化，然后就地填充。

Note: operations that end in an underscore (`_`) are in place operations.
注意：以下划线 (`_`) 结尾的运算是就地运算。

In [10]:
x = torch.Tensor(3,4).fill_(5)
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([3, 4])
Values: 
tensor([[5., 5., 5., 5.],
        [5., 5., 5., 5.],
        [5., 5., 5., 5.]])



Tensors can be initialized from a list of lists

In [11]:
x = torch.Tensor([[1, 2,],  
                  [2, 4,]])
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[1., 2.],
        [2., 4.]])



Tensors can be initialized from numpy matrices. NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
张量可以从 numpy 矩阵初始化。NumPy 是 Python 编程语言的一个库，增加了对大型多维数组和矩阵的支持，以及大量可对这些数组进行操作的高级数学函数。

In [12]:
npy = np.random.rand(2, 3)
print(npy.dtype)
describe(torch.from_numpy(npy))

float64
Type: torch.DoubleTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0.5880, 0.2784, 0.2787],
        [0.0970, 0.2383, 0.2931]], dtype=torch.float64)



#### Tensor Types

The FloatTensor has been the default tensor that we have been creating all along
FloatTensor 一直是我们一直在创建的默认张量

In [13]:
torch.arange(6)

tensor([0, 1, 2, 3, 4, 5])

In [14]:
x = torch.arange(6).view(2, 3)
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])



In [15]:
x = torch.FloatTensor([[1, 2, 3],  
                       [4, 5, 6]])
describe(x)

x = x.long()
describe(x)

x = torch.tensor([[1, 2, 3], 
                  [4, 5, 6]], dtype=torch.int64)    # torch.IntTensor when dtype=torch.int32
describe(x)

x = x.float() 
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1., 2., 3.],
        [4., 5., 6.]])

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1, 2, 3],
        [4, 5, 6]])

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1, 2, 3],
        [4, 5, 6]])

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1., 2., 3.],
        [4., 5., 6.]])



In [16]:
x = torch.randn(2, 3)
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[ 1.5385, -0.9757,  1.5769],
        [ 0.3840, -0.6039, -0.5240]])



In [17]:
describe(torch.add(x, x))        # elementwise addition元素加法

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[ 3.0771, -1.9515,  3.1539],
        [ 0.7680, -1.2077, -1.0479]])



In [18]:
describe(x + x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[ 3.0771, -1.9515,  3.1539],
        [ 0.7680, -1.2077, -1.0479]])



In [19]:
describe(x * x)            # elementwise multiplication

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[2.3671, 0.9521, 2.4867],
        [0.1475, 0.3646, 0.2745]])



In [20]:
describe(torch.mul(x, x))   # elementwise multiplication

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[2.3671, 0.9521, 2.4867],
        [0.1475, 0.3646, 0.2745]])



In [21]:
import torch
x = torch.arange(6).view(2, 3)
describe(x)
describe(x[:1, :2])
describe(x[0, 1])

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])

Type: torch.LongTensor
Shape/size: torch.Size([1, 2])
Values: 
tensor([[0, 1]])

Type: torch.LongTensor
Shape/size: torch.Size([])
Values: 
1



## Operations

### Using the tensors to do linear algebra is a foundation of modern Deep Learning practices

Reshaping allows you to move the numbers in a tensor around.  One can be sure that the order is preserved.  In PyTorch, reshaping is called `view`
重塑允许您移动张量中的数字。可以确保顺序得到保留。在 PyTorch 中，重塑称为“视图”

In [22]:
x = torch.arange(0, 20)

print(x)
print(x.view(1, 20))
print(x.view(2, 10))
print(x.view(4, 5))
print(x.view(5, 4))
print(x.view(10, 2))
print(x.view(20, 1))

tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18, 19])
tensor([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
         18, 19]])
tensor([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
tensor([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]])
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19]])
tensor([[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9],
        [10, 11],
        [12, 13],
        [14, 15],
        [16, 17],
        [18, 19]])
tensor([[ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 9],
        [10],
        [11],
        [12],
        [13],
        [14],
        [15],
        [16],
        [17],
        [18],
   

We can use view to add size-1 dimensions, which can be useful for combining with other tensors.  This is called <b>broadcasting</b>. 

In [23]:
x = torch.arange(12).view(3, 4)
y = torch.arange(4).view(1, 4)
z = torch.arange(3).view(3, 1)

print("x:\n", x, "\n")
print("y:\n", y, "\n")
print("z:\n", z, "\n")
print("x + y:\n", x + y)
print("x + z:\n", x + z)

x:
 tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]]) 

y:
 tensor([[0, 1, 2, 3]]) 

z:
 tensor([[0],
        [1],
        [2]]) 

x + y:
 tensor([[ 0,  2,  4,  6],
        [ 4,  6,  8, 10],
        [ 8, 10, 12, 14]])
x + z:
 tensor([[ 0,  1,  2,  3],
        [ 5,  6,  7,  8],
        [10, 11, 12, 13]])


Unsqueeze and squeeze will add and remove 1-dimensions.
除挤压和挤压将添加和删除 1 维。

In [24]:
x = torch.arange(12).view(3, 4)
describe(x)

x = x.unsqueeze(dim=1)  #   +1维
describe(x)

x = x.squeeze()         #   -1维
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([3, 4])
Values: 
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

Type: torch.LongTensor
Shape/size: torch.Size([3, 1, 4])
Values: 
tensor([[[ 0,  1,  2,  3]],

        [[ 4,  5,  6,  7]],

        [[ 8,  9, 10, 11]]])

Type: torch.LongTensor
Shape/size: torch.Size([3, 4])
Values: 
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])



all of the standard mathematics operations apply (such as `add` below)

In [25]:
x = torch.arange(12).reshape(3, 4)
print(x)
print(x.add_(x))  # in place operations.

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
tensor([[ 0,  2,  4,  6],
        [ 8, 10, 12, 14],
        [16, 18, 20, 22]])


There are many operations for which reduce a dimension.  Such as sum:

In [26]:
x = torch.arange(12).reshape(3, 4)
print("x: \n", x)
print("---")
print("Summing across rows (dim=0): \n", x.sum(dim=0))  # or torch.sum(x, dim=0)
print("---")
print("Summing across columns (dim=1): \n", x.sum(dim=1)) # or torch.sum(x, dim=1)

x: 
 tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
---
Summing across rows (dim=0): 
 tensor([12, 15, 18, 21])
---
Summing across columns (dim=1): 
 tensor([ 6, 22, 38])


We can combine tensors by concatenating them.  First, concatenating on the rows

In [27]:
x = torch.arange(6).view(2,3)
describe(x)
describe(torch.cat([x, x], dim=0)) #拼接
describe(torch.cat([x, x], dim=1))
describe(torch.stack([x, x]))      # concatenate on a new 0th dimension to "stack" the tensors 在新的第 0 维上进行连接以“堆叠”张量

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])

Type: torch.LongTensor
Shape/size: torch.Size([4, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5],
        [0, 1, 2],
        [3, 4, 5]])

Type: torch.LongTensor
Shape/size: torch.Size([2, 6])
Values: 
tensor([[0, 1, 2, 0, 1, 2],
        [3, 4, 5, 3, 4, 5]])

Type: torch.LongTensor
Shape/size: torch.Size([2, 2, 3])
Values: 
tensor([[[0, 1, 2],
         [3, 4, 5]],

        [[0, 1, 2],
         [3, 4, 5]]])



#### Linear Algebra Tensor Functions

Transposing allows you to switch the dimensions to be on different axis. So we can make it so all the rows are colums and vice versa. 
转置允许你将维度切换到不同的轴上。因此我们可以让所有行都变成列，反之亦然。

In [28]:
x = torch.arange(0, 12).view(3,4)
print("x: \n", x) 
print("---")
print("x.tranpose(1, 0): \n", x.transpose(1, 0))   # same result with x.transpose(0, 1)

x: 
 tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
---
x.tranpose(1, 0): 
 tensor([[ 0,  4,  8],
        [ 1,  5,  9],
        [ 2,  6, 10],
        [ 3,  7, 11]])


A three dimensional tensor would represent a batch of sequences, where each sequence item has a feature vector.  It is common to switch the batch and sequence dimensions so that we can more easily index the sequence in a sequence model. 
三维张量表示一批序列，其中每个序列项都有一个特征向量。通常会切换批次和序列维度，以便我们能够更轻松地在序列模型中索引序列。

Note: Transpose will only let you swap 2 axes.  Permute (in the next cell) allows for multiple

In [29]:
batch_size = 3
seq_size = 4
feature_size = 5

x = torch.arange(batch_size * seq_size * feature_size).view(batch_size, seq_size, feature_size)

print("x.shape: \n", x.shape)
print("x: \n", x)
print("-----")

print("x.transpose(1, 0).shape: \n", x.transpose(1, 0).shape)
print("x.transpose(1, 0): \n", x.transpose(1, 0))

x.shape: 
 torch.Size([3, 4, 5])
x: 
 tensor([[[ 0,  1,  2,  3,  4],
         [ 5,  6,  7,  8,  9],
         [10, 11, 12, 13, 14],
         [15, 16, 17, 18, 19]],

        [[20, 21, 22, 23, 24],
         [25, 26, 27, 28, 29],
         [30, 31, 32, 33, 34],
         [35, 36, 37, 38, 39]],

        [[40, 41, 42, 43, 44],
         [45, 46, 47, 48, 49],
         [50, 51, 52, 53, 54],
         [55, 56, 57, 58, 59]]])
-----
x.transpose(1, 0).shape: 
 torch.Size([4, 3, 5])
x.transpose(1, 0): 
 tensor([[[ 0,  1,  2,  3,  4],
         [20, 21, 22, 23, 24],
         [40, 41, 42, 43, 44]],

        [[ 5,  6,  7,  8,  9],
         [25, 26, 27, 28, 29],
         [45, 46, 47, 48, 49]],

        [[10, 11, 12, 13, 14],
         [30, 31, 32, 33, 34],
         [50, 51, 52, 53, 54]],

        [[15, 16, 17, 18, 19],
         [35, 36, 37, 38, 39],
         [55, 56, 57, 58, 59]]])


Permute is a more general version of tranpose:
Permute 是 transpose 的更通用版本：

In [30]:
batch_size = 3
seq_size = 4
feature_size = 5

x = torch.arange(batch_size * seq_size * feature_size).view(batch_size, seq_size, feature_size)

print("x.shape: \n", x.shape)
print("x: \n", x)
print("-----")

print("x.permute(1, 0, 2).shape: \n", x.permute(1, 0, 2).shape)
print("x.permute(1, 0, 2): \n", x.permute(1, 0, 2))

x.shape: 
 torch.Size([3, 4, 5])
x: 
 tensor([[[ 0,  1,  2,  3,  4],
         [ 5,  6,  7,  8,  9],
         [10, 11, 12, 13, 14],
         [15, 16, 17, 18, 19]],

        [[20, 21, 22, 23, 24],
         [25, 26, 27, 28, 29],
         [30, 31, 32, 33, 34],
         [35, 36, 37, 38, 39]],

        [[40, 41, 42, 43, 44],
         [45, 46, 47, 48, 49],
         [50, 51, 52, 53, 54],
         [55, 56, 57, 58, 59]]])
-----
x.permute(1, 0, 2).shape: 
 torch.Size([4, 3, 5])
x.permute(1, 0, 2): 
 tensor([[[ 0,  1,  2,  3,  4],
         [20, 21, 22, 23, 24],
         [40, 41, 42, 43, 44]],

        [[ 5,  6,  7,  8,  9],
         [25, 26, 27, 28, 29],
         [45, 46, 47, 48, 49]],

        [[10, 11, 12, 13, 14],
         [30, 31, 32, 33, 34],
         [50, 51, 52, 53, 54]],

        [[15, 16, 17, 18, 19],
         [35, 36, 37, 38, 39],
         [55, 56, 57, 58, 59]]])


Matrix multiplication is `mm`:

In [31]:
x1 = torch.arange(6).view(2, 3).float()
describe(x1)

x2 = torch.ones(3, 2)
x2[:, 1] += 1
describe(x2)

describe(torch.mm(x1, x2))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0., 1., 2.],
        [3., 4., 5.]])

Type: torch.FloatTensor
Shape/size: torch.Size([3, 2])
Values: 
tensor([[1., 2.],
        [1., 2.],
        [1., 2.]])

Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[ 3.,  6.],
        [12., 24.]])



In [32]:
x = torch.arange(0, 12).view(3,4).float()
print(x)

x2 = torch.ones(4, 2)
x2[:, 1] += 1
print(x2)

print(x.mm(x2))

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]])
tensor([[1., 2.],
        [1., 2.],
        [1., 2.],
        [1., 2.]])
tensor([[ 6., 12.],
        [22., 44.],
        [38., 76.]])


See the [PyTorch Math Operations Documentation](https://pytorch.org/docs/stable/torch.html#math-operations) for more!

## Computing Gradients

Support automatic differentiation of arbitrary scalar valued functions.支持任意标量值函数的自动微分。

In [33]:
x = torch.tensor([[2.0, 3.0]], requires_grad=True)
z = 3 * x
print(z)

tensor([[6., 9.]], grad_fn=<MulBackward0>)


In this small snippet, you can see the gradient computations at work.  We create a tensor and multiply it by 3.  Then, we create a scalar output using `sum()`.  A Scalar output is needed as the the loss variable. Then, called backward on the loss means it computes its rate of change with respect to the inputs.  Since the scalar was created with sum, each position in z and x are independent with respect to the loss scalar. 
在这个小代码片段中，你可以看到梯度计算是如何进行的。我们创建一个张量并将其乘以 3。然后，我们使用 `sum()` 创建一个标量输出。标量输出需要作为损失变量。然后，对损失进行反向调用意味着它计算相对于输入的变化率。由于标量是用 sum 创建的，因此 z 和 x 中的每个位置相对于损失标量都是独立的。

The rate of change of x with respect to the output is just the constant 3 that we multiplied x by.
x 相对于输出的变化率只是我们将 x 乘以的常数 3。

In [34]:
x = torch.tensor([[2.0, 3.0]], requires_grad=True)
print("x: \n", x)
print("---")
z = 3 * x
print("z = 3*x: \n", z)
print("---")

loss = z.sum()
print("loss = z.sum(): \n", loss)
print("---")

loss.backward()

print("after loss.backward(), x.grad: \n", x.grad)


x: 
 tensor([[2., 3.]], requires_grad=True)
---
z = 3*x: 
 tensor([[6., 9.]], grad_fn=<MulBackward0>)
---
loss = z.sum(): 
 tensor(15., grad_fn=<SumBackward0>)
---
after loss.backward(), x.grad: 
 tensor([[3., 3.]])


### CUDA Tensors

PyTorch's operations can seamlessly be used on the GPU or on the CPU.  There are a couple basic operations for interacting in this way.

In [41]:
print(torch.cuda.is_available())

True


In [42]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


In [43]:
x = torch.rand(3, 3).to(device)
describe(x)
print(x.device)

Type: torch.cuda.FloatTensor
Shape/size: torch.Size([3, 3])
Values: 
tensor([[0.8839, 0.8083, 0.7528],
        [0.8988, 0.6839, 0.7658],
        [0.9149, 0.3993, 0.1100]], device='cuda:0')

cuda:0


In [47]:
# this will break if x is in "cuda"
y = torch.rand(3, 3)
# x + y

In [44]:
cpu_device = torch.device("cpu")

In [40]:
y = y.to(cpu_device)
x = x.to(cpu_device)
x + y

tensor([[0.7432, 0.7451, 0.7907],
        [0.8631, 1.1666, 0.9664],
        [1.5513, 0.8455, 1.0217]])

In [None]:
if torch.cuda.is_available(): # only if GPU is available
    a = torch.rand(3,3).to(device='cuda:0') #  CUDA Tensor
    print(a)
    
    b = torch.rand(3,3).cuda()
    print(b)

    print(a + b)

    # a = a.cpu() # Error expected
    # print(a + b)
else :
    print('no gpu')

tensor([[0.1230, 0.9638, 0.7695],
        [0.0378, 0.2239, 0.6772],
        [0.5274, 0.6325, 0.0910]], device='cuda:0')
tensor([[0.2323, 0.7269, 0.1187],
        [0.3951, 0.7199, 0.7595],
        [0.5311, 0.6449, 0.7224]], device='cuda:0')
tensor([[0.3552, 1.6906, 0.8883],
        [0.4330, 0.9438, 1.4367],
        [1.0585, 1.2775, 0.8134]], device='cuda:0')


RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

### END