Pytorch linear implementation `torch.nn.Linear(n, m)` specifies a $W_{m \times n}$ which transforms a vector of $n$ to $m$:
$$
y = W x + b
$$

In [1]:
import torch
linear = torch.nn.Linear(3, 5)
x = torch.randn(3)
y = linear(x)

In [2]:
print(linear.weight.shape, linear.bias.shape)

torch.Size([5, 3]) torch.Size([5])


In [146]:
assert torch.allclose(y, linear.weight @ x + linear.bias)

Let's verify its Jacobian matrix:

In [147]:
J = torch.func.jacrev(linear)(x)
print(J)

tensor([[ 0.4155, -0.1529,  0.2234],
        [ 0.1951, -0.4253,  0.5428],
        [-0.1048, -0.0363, -0.0245],
        [ 0.0065, -0.0197, -0.5145],
        [ 0.0813, -0.0605,  0.5442]], grad_fn=<ViewBackward0>)


It should be exactly equal to $W$ (see [math here](../math/operators/linear.ipynb)):

In [148]:
print(linear.weight)

Parameter containing:
tensor([[ 0.4155, -0.1529,  0.2234],
        [ 0.1951, -0.4253,  0.5428],
        [-0.1048, -0.0363, -0.0245],
        [ 0.0065, -0.0197, -0.5145],
        [ 0.0813, -0.0605,  0.5442]], requires_grad=True)


We can also verify it using the backprop process:

In [149]:
x.requires_grad = True
y = linear(x)
y.retain_grad() # call after computation graph is constructed!
y.backward(torch.ones(5), retain_graph=True)

In [150]:
assert torch.allclose(x.grad, torch.ones(5) @ J) 