# 数学基础

线性代数/ 微积分/ 自动微分/ 概率

## 线性代数

### 张量

标量, 向量, 矩阵: 略

张量: 更加广泛意义上的向量. 即: 向量是1维张量, 矩阵是2维张量. 那么张量可以是n (n>2)维的

In [2]:
import torch
A = torch.arange(20, dtype=torch.float32).reshape(5, 4)
B = A.clone()
print("A: ", A, end="\n\n")

# 标量与张量的运算
# 张量+标量: 每个元素都加上标量
print("A + 1: ", A + 1, end="\n\n")

# 张量*标量: 每个元素都乘以标量
print("A * 2: ", A * 2, end="\n\n")

A:  tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [12., 13., 14., 15.],
        [16., 17., 18., 19.]])

A + 1:  tensor([[ 1.,  2.,  3.,  4.],
        [ 5.,  6.,  7.,  8.],
        [ 9., 10., 11., 12.],
        [13., 14., 15., 16.],
        [17., 18., 19., 20.]])

A * 2:  tensor([[ 0.,  2.,  4.,  6.],
        [ 8., 10., 12., 14.],
        [16., 18., 20., 22.],
        [24., 26., 28., 30.],
        [32., 34., 36., 38.]])



In [3]:
# 张量与张量的运算
# Hadamard 积: 对应位置的元素相乘
print("A * B: ", A * B, end="\n\n")

# 矩阵乘法
print("A @ B.T: ", A @ B.T, end="\n\n")

A * B:  tensor([[  0.,   1.,   4.,   9.],
        [ 16.,  25.,  36.,  49.],
        [ 64.,  81., 100., 121.],
        [144., 169., 196., 225.],
        [256., 289., 324., 361.]])

A @ B.T:  tensor([[  14.,   38.,   62.,   86.,  110.],
        [  38.,  126.,  214.,  302.,  390.],
        [  62.,  214.,  366.,  518.,  670.],
        [  86.,  302.,  518.,  734.,  950.],
        [ 110.,  390.,  670.,  950., 1230.]])



### 降维与非降维的求和

直接对张量进行求和/ 求均值, 会返回一个标量. 实际上是一种降维的操作

In [4]:
print("Shape of A: ", A.shape, end="\n\n")
print("Sum of A: ", A.sum(), end="\n\n")
print("Mean of A: ", A.mean(), end="\n\n")
print("Mean of A: ", A.sum() / A.numel(), end="\n\n")

Shape of A:  torch.Size([5, 4])

Sum of A:  tensor(190.)

Mean of A:  tensor(9.5000)

Mean of A:  tensor(9.5000)



上述的默认求和/ 求均值是沿着所有轴来降低维度, 从而最终都汇聚到一个标量中

其实还可以指定某个轴进行求和, 也就是指定某个轴去降低维度

In [5]:
print("Shape of A: ", A.shape, end="\n\n")

# 沿第0维求和, shape中的第0维消失
print("Sum by axis 0: ", A.sum(axis=0))
print("Shape of sum by axis 0: ", A.sum(axis=0).shape, end="\n\n")

# 沿着第1维求和, shape中的第1维消失
print("Sum by axis 1: ", A.sum(axis=1))
print("Shape of sum by axis 1: ", A.sum(axis=1).shape, end="\n\n")

# 沿着第0维和第1维求和, shape中的第0维和第1维消失
# 与A.sum()等价
print("Sum by axis 0 and 1: ", A.sum(axis=[0, 1]))

Shape of A:  torch.Size([5, 4])

Sum by axis 0:  tensor([40., 45., 50., 55.])
Shape of sum by axis 0:  torch.Size([4])

Sum by axis 1:  tensor([ 6., 22., 38., 54., 70.])
Shape of sum by axis 1:  torch.Size([5])

Sum by axis 0 and 1:  tensor(190.)


同理, 均值也可以指定某个轴进行求值

In [6]:
print("Mean by axis 0: ", A.mean(axis=0))
print("Mean by axis 0: ", A.sum(axis=0) / A.shape[0])

Mean by axis 0:  tensor([ 8.,  9., 10., 11.])
Mean by axis 0:  tensor([ 8.,  9., 10., 11.])


此外, 还可以在保持轴数不变时进行非降维的求和. 

可以发现只有指定的维度shape变为1, 其他维度shape不变, 总的shape也不变

In [7]:
print("Shape of A: ", A.shape, end="\n\n")

print("Sum by axis 0 and keepdims: \n", A.sum(axis=0, keepdims=True))
print("Shape of sum by axis 0 and keepdims: \n", A.sum(axis=0, keepdims=True).shape, end="\n\n")

Shape of A:  torch.Size([5, 4])

Sum by axis 0 and keepdims: 
 tensor([[40., 45., 50., 55.]])
Shape of sum by axis 0 and keepdims: 
 torch.Size([1, 4])



还可以使用`cumsum()`函数, 保持所有维度的shape不变, 返回一个累加的张量

`A[i] = A[i-1] + A[i-2] + ... + A[0]`

In [8]:
print("cumsum by axis 0: \n", A.cumsum(axis=0))

cumsum by axis 0: 
 tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  6.,  8., 10.],
        [12., 15., 18., 21.],
        [24., 28., 32., 36.],
        [40., 45., 50., 55.]])


非降维求和有一些特定的用处:

In [9]:
print("A: ", A, end="\n\n")
print("Sum by axis 1 and keepdims: \n", A.sum(axis=1, keepdims=True), end="\n\n")

# 利用广播机制来计算每个元素的比例
print("A / A.sum(axis=1, keepdims=True): \n", A / A.sum(axis=1, keepdims=True), end="\n\n")


A:  tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [12., 13., 14., 15.],
        [16., 17., 18., 19.]])

Sum by axis 1 and keepdims: 
 tensor([[ 6.],
        [22.],
        [38.],
        [54.],
        [70.]])

A / A.sum(axis=1, keepdims=True): 
 tensor([[0.0000, 0.1667, 0.3333, 0.5000],
        [0.1818, 0.2273, 0.2727, 0.3182],
        [0.2105, 0.2368, 0.2632, 0.2895],
        [0.2222, 0.2407, 0.2593, 0.2778],
        [0.2286, 0.2429, 0.2571, 0.2714]])



### 张量的积

In [10]:
# 点积: 两个向量 (1维张量) 的积
x = torch.arange(4, dtype=torch.float32)
y = torch.ones(4, dtype=torch.float32)
print("Dot product of x and y: \n", torch.dot(x, y), end="\n\n")

Dot product of x and y: 
 tensor(6.)



In [11]:
# 向量与矩阵的积
print("Shape of A: ", A.shape)
print("Shape of x: ", x.shape, end="\n\n")

print("A @ x: \n", torch.mv(A, x), end="\n\n")
print("A @ x: \n", A @ x, end="\n\n")

Shape of A:  torch.Size([5, 4])
Shape of x:  torch.Size([4])

A @ x: 
 tensor([ 14.,  38.,  62.,  86., 110.])

A @ x: 
 tensor([ 14.,  38.,  62.,  86., 110.])



In [12]:
# 矩阵与矩阵的积
print("A @ B^T: \n", torch.mm(A, B.T), end="\n\n")
print("A @ B^T: \n", A @ B.T, end="\n\n")

A @ B^T: 
 tensor([[  14.,   38.,   62.,   86.,  110.],
        [  38.,  126.,  214.,  302.,  390.],
        [  62.,  214.,  366.,  518.,  670.],
        [  86.,  302.,  518.,  734.,  950.],
        [ 110.,  390.,  670.,  950., 1230.]])

A @ B^T: 
 tensor([[  14.,   38.,   62.,   86.,  110.],
        [  38.,  126.,  214.,  302.,  390.],
        [  62.,  214.,  366.,  518.,  670.],
        [  86.,  302.,  518.,  734.,  950.],
        [ 110.,  390.,  670.,  950., 1230.]])



### 范数

In [13]:
print("x: ", x, end="\n\n")

# L2范数
print("L2 norm of x: \n", torch.norm(x), end="\n\n")

# L1范数
print("L1 norm of x: \n", torch.abs(x).sum())

x:  tensor([0., 1., 2., 3.])

L2 norm of x: 
 tensor(3.7417)

L1 norm of x: 
 tensor(6.)


一般地, $L_p$范数的形式为:
$$||x||_p=(\sum_{i=1}^{n}  |x_i|^p)^{1/p}$$

矩阵$X$的Fronebius范数类似于向量的$L_2$范数: 
$$||X||_F=\sqrt{\sum_{i=1}^{m}\sum_{j=1}^{n}x^2_{ij}}$$

In [15]:
# 矩阵的Frobenius范数
print("A: ", A, end='\n\n')
print("Frobenius norm of A: \n", torch.norm(A))

A:  tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [12., 13., 14., 15.],
        [16., 17., 18., 19.]])

Frobenius norm of A: 
 tensor(49.6991)
