## PyTorch中如何进行向量微分、矩阵微分与计算雅克比行列式

In [1]:
import torch
from torch.autograd.functional import jacobian

### 矩阵微分

In [2]:
def func(x):
    return x.exp().sum(dim=1)

In [3]:
x = torch.randn(2, 3)
x

tensor([[ 0.4457,  0.5671,  1.2739],
        [-0.8329,  1.2512,  0.1236]])

In [4]:
y = func(x)
y

tensor([6.8994, 5.0609])

输出一个 $2 \times 2 \times 3$ 的矩阵

代表着(y的元素个数, x.shape)

res[i][j][k] 就代表着 $y_i$对$x_{jk}$的偏导

In [5]:
jacobian(func, x)

tensor([[[1.5616, 1.7631, 3.5747],
         [0.0000, 0.0000, 0.0000]],

        [[0.0000, 0.0000, 0.0000],
         [0.4348, 3.4946, 1.1315]]])

### 向量微分

In [6]:
a = torch.randn(3)
a

tensor([-0.4791,  1.2708, -0.4565])

In [7]:
def func(x):
    return a + x

In [8]:
x = torch.randn(3)
x

tensor([-1.0204, -0.8899, -0.7270])

In [9]:
func(x)

tensor([-1.4994,  0.3809, -1.1835])

输出一个 $3 \times 3$ 的矩阵

res[i][j] 代表着$y_i$对$x_j$的偏导

In [10]:
jacobian(func, x)

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])

### 基于Pytorch的backward自动求导

In [11]:
x = torch.randn(3, requires_grad=True)
x

tensor([-0.6152,  0.9683,  0.5469], requires_grad=True)

In [12]:
y=func(x)
y

tensor([-1.0943,  2.2391,  0.0903], grad_fn=<AddBackward0>)

In [13]:
y.backward(torch.ones_like(y))

y不是标量 如果直接y.backward()会报错

这里其实是在更上层有个$l$ 我们假设$\frac{\partial l}{\partial y} = 1$

这里传入的ones_like 就是传入$\frac{\partial l}{\partial y}$

In [14]:
x.grad

tensor([1., 1., 1.])

### 通过jacobian矩阵和vector相乘来去验证backward函数

In [15]:
torch.ones_like(y) @ jacobian(func, x)

tensor([1., 1., 1.])

### 矩阵间梯度运算

In [16]:
a = torch.randn(2, 3, requires_grad=True)
b = torch.randn(3, 2, requires_grad=True)
a, b

(tensor([[ 0.2632,  0.2993, -1.4357],
         [ 0.1571, -1.3713, -1.0222]], requires_grad=True),
 tensor([[-0.5882,  0.1172],
         [ 1.6023, -1.0599],
         [ 0.7127, -1.0141]], requires_grad=True))

In [17]:
y = a @ b
y

tensor([[-0.6984,  1.1696],
        [-3.0183,  2.5085]], grad_fn=<MmBackward0>)

我们希望求$\frac{\partial l}{\partial a}$ 并且我们假设$\frac{\partial l}{\partial y} = 1$

In [18]:
y.backward(torch.ones_like(y))

In [19]:
a.grad

tensor([[-0.4709,  0.5425, -0.3014],
        [-0.4709,  0.5425, -0.3014]])

In [20]:
b.grad

tensor([[ 0.4203,  0.4203],
        [-1.0720, -1.0720],
        [-2.4579, -2.4579]])

### 矩阵某一行的梯度

In [21]:
def func(a):
    return a @ b

func(a)

tensor([[-0.6984,  1.1696],
        [-3.0183,  2.5085]], grad_fn=<MmBackward0>)

In [22]:
func(a[0])

tensor([-0.6984,  1.1696], grad_fn=<SqueezeBackward3>)

In [23]:
torch.ones_like(func(a[0])) @ jacobian(func, a[0])

tensor([-0.4709,  0.5425, -0.3014])

In [24]:
a.grad

tensor([[-0.4709,  0.5425, -0.3014],
        [-0.4709,  0.5425, -0.3014]])

矩阵某一列梯度

In [25]:
def func(b):
    return a @ b


In [26]:
torch.ones_like(func(b[:, 0])) @ jacobian(func, b[:, 0])

tensor([ 0.4203, -1.0720, -2.4579])

In [27]:
b.grad

tensor([[ 0.4203,  0.4203],
        [-1.0720, -1.0720],
        [-2.4579, -2.4579]])