# Tensoralgebra in PyTorch

PyTorch provides the following functions for **algebraic operations**:

```
A.mm(B)                    # matrix multiplication; alias: A.matmul(B)
A.mv(x)                    # matrix vector multiplication
x.dot(y)                   # inner vector product / dot product
x.t()                      # matrix transpose
```

## Vectors

In machine learning, **features** $x$ are typically described as a **column vector** of $n$ feature units:

$$x = \begin{pmatrix}x_1 \\ \vdots \\ x_n \end{pmatrix}$$

Then a single neuron is the **inner product** or **dot product** `torch.dot()` (a.k.a. scalar product in a cartesian coordinate system) with a **column vector** of **weights** $w$

$$w = \begin{pmatrix}w_1 \\ \vdots \\ w_n \end{pmatrix}$$

applied to an **activation function** $f$ which given a **scalar** $y$:

$$y = f(w \cdot x) = f(\sum_{i=0}^n w_i x_i)$$

In `PyTorch`, this is written as

In [1]:
import torch

x = torch.tensor([1, 2, 3])
print(x.numpy().shape)

w = torch.tensor([2, 3, 4])
print(w.numpy().shape)

y = torch.dot(w, x)               # alternatively w.dot(x)
print(y.item())
print(y.numpy().shape)

(3,)
(3,)
20
()


The **dot product** is **commutative**, hence we can also switch the order of the operands:

$$y = f(x \cdot w) = f(\sum_{i=0}^n x_i w_i)$$

In [2]:
y = torch.dot(x, w)               # alternatively x.dot(w)
print(y.item())
print(y.numpy().shape)

20
()


## Matrices

If we describe features and weights with a $n \times 1$ **column matrix**, we have 

$$x = \begin{bmatrix}x_1 \\ \vdots \\ x_n \end{bmatrix} \qquad \text{and} \qquad w = \begin{bmatrix}w_1 \\ \vdots \\ w_n \end{bmatrix}$$

and the dot product can be written as **matrix product** `torch.matmul()` using the **transpose** $w^\top$ if the number of **columns** of the **left operand** is equal to the number of **rows** of the **right operand**:

$$w^\top \cdot x = [w_1, \ldots, w_n] \cdot \begin{bmatrix}x_1 \\ \vdots \\ x_n \end{bmatrix} = \sum_{i=0}^n w_i x_i$$

If we use a **row matrix** $x = [x_1, \ldots, x_n]$ for the features, we must switch the order of the operands to get the same scalar:

$$x \cdot w = [x_1, \ldots, x_n] \cdot \begin{bmatrix}w_1 \\ \vdots \\ w_n \end{bmatrix} = \sum_{i=0}^n x_i w_i$$

In [3]:
torch.manual_seed(1)

w = torch.randn(3, 1)
print(w.t())
print(w.t().numpy().shape)

x = torch.tensor([[1.], [2.], [3.]])
print(x)
print(x.numpy().shape)

y = torch.matmul(w.t(), x)              # alternatively `w.t().mm(x)` or `w.t() @ x`
print(y.item())
print(y.numpy().shape)

tensor([[0.6614, 0.2669, 0.0617]])
(1, 3)
tensor([[1.],
        [2.],
        [3.]])
(3, 1)
1.3802322149276733
(1, 1)


As **matrix multiplication** is **non-communtative** $AB \neq BA$, we must take care of the order.

### Outer Product (German: Dyadisches Produkt)

An outer product is a product of two vectors, or two one-column matrices, which results in a **matrix**:

$$x \otimes y = x \cdot y^\top = {\begin{bmatrix} x_1 \\ \vdots \\ x_m \end{bmatrix}} \cdot [ y_1, \cdots, y_n ] = \begin{bmatrix} x_1 y_1 & \cdots & x_1 y_n \\ \vdots & & \vdots \\ x_m y_1 & \cdots & x_m y_n \end{bmatrix}$$

In [4]:
print(w)
print(w.numpy().shape)

print(x.t())
print(x.t().numpy().shape)

y = torch.matmul(w, x.t())
print(y)
print(y.numpy().shape)

tensor([[0.6614],
        [0.2669],
        [0.0617]])
(3, 1)
tensor([[1., 2., 3.]])
(1, 3)
tensor([[0.6614, 1.3227, 1.9841],
        [0.2669, 0.5338, 0.8008],
        [0.0617, 0.1234, 0.1850]])
(3, 3)


## Multi Layer Networks

When we have a multi layer neural network, the weights are typically a **matrix** $W$ with $n$ **rows** for $n$ **features** and $m$ **columns** for $m$ **hidden nodes**:

$$W = \begin{bmatrix}w_{11} & \cdots & w_{1m} \\ \vdots & & \vdots \\ w_{n1} & \cdots & w_{nm} \end{bmatrix}$$

Then the items of the hidden layer are a **column matrix**

$$h = f(W^\top x) = \begin{bmatrix}h_1 \\ \vdots \\ h_m \end{bmatrix}$$

with

$$W^\top x = \begin{bmatrix}w_{11} & \cdots & w_{n1} \\ \vdots & & \vdots \\ w_{1m} & \cdots & w_{nm} \end{bmatrix} \cdot \begin{bmatrix}x_1 \\ \vdots \\ x_n \end{bmatrix}$$

In [5]:
torch.manual_seed(1)

W = torch.randn(3, 2)
print(W.t())
print(W.t().numpy().shape)

x = torch.tensor([[1.], [2.], [3.]])
print(x)
print(x.numpy().shape)

y = torch.matmul(W.t(), x)
print(y)
print(y.numpy().shape)

tensor([[ 0.6614,  0.0617, -0.4519],
        [ 0.2669,  0.6213, -0.1661]])
(2, 3)
tensor([[1.],
        [2.],
        [3.]])
(3, 1)
tensor([[-0.5710],
        [ 1.0112]])
(2, 1)


As the number of **columns** of $W$ is **not equal** the number of **rows** of $x$, we can't calculate $x \cdot W^\top$:

In [6]:
y = torch.matmul(x, W.t())
print(y)
print(y.numpy().shape)

RuntimeError: size mismatch, m1: [3 x 1], m2: [2 x 3] at ../aten/src/TH/generic/THTensorMath.cpp:41

But as $(A \cdot B)^T = B^T \cdot A^T$ and $(A^\top)^\top = A$, we can caluclate

$$x^\top \cdot W = ((x^\top \cdot W)^\top)^\top = (W^\top \cdot (x^\top)^\top)^\top = (W^\top \cdot x)^\top = h^\top$$

to get the **transpose** of the **hidden column matrix**.

In [7]:
y = torch.matmul(x.t(), W)
print(y)
print(y.numpy().shape)

tensor([[-0.5710,  1.0112]])
(1, 2)


It is common in PyTorch to use a **row matrix** for a feature vector, hence we calculate the **matrix product** `torch.matmul()` as $x \cdot W$ to get a **hidden row matrix**:

$$h = f(x \cdot W) = [h_1, \ldots, h_m]$$


## Feature Batches

In PyTorch it is common to have a **batch** of $k$ **feature vectors**, which are described as a matrix having $k$ rows and $n$ columns. 

Each row represents one feature vector $x^{(i)} = (x_1^{(i)}, \ldots, x_n^{(i)})$ (consider features as stacked within a matrix):

$$X = \begin{bmatrix}x^{(1)} \\ \vdots \\ x^{(k)} \end{bmatrix} = \begin{bmatrix}x_1^{(1)} & \cdots & x_n^{(1)} \\ \vdots & & \vdots \\ x_1^{(k)} & \cdots & x_n^{(k)} \end{bmatrix} = \begin{bmatrix}x_{11} & \cdots & x_{1n} \\ \vdots & & \vdots \\ x_{k1} & \cdots & x_{kn} \end{bmatrix}$$

With a **weights matrix** from a multi layer neural network

$$W = \begin{bmatrix}w_{11} & \cdots & w_{1m} \\ \vdots & & \vdots \\ w_{n1} & \cdots & w_{nm} \end{bmatrix}$$

we then calculate the items of the hidden layer as the $k \times m$ **matrix product** of the $k \times n$ **feature matrix** and the $n \times m$ **weight matrix**:

$$h = f(X \cdot W) = \begin{bmatrix}h_{11} & \cdots & h_{1m} \\ \vdots & & \vdots \\ h_{k1} & \cdots & h_{km} \end{bmatrix} = \begin{bmatrix}h^{(1)} \\ \vdots \\ h^{(k)} \end{bmatrix}$$

with

$$X \cdot W = \begin{bmatrix}x_{11} & \cdots & x_{1n} \\ \vdots & & \vdots \\ x_{k1} & \cdots & x_{kn} \end{bmatrix} \cdot \begin{bmatrix}w_{11} & \cdots & w_{1m} \\ \vdots & & \vdots \\ w_{n1} & \cdots & w_{nm} \end{bmatrix}$$

In [8]:
torch.manual_seed(1)

X = torch.tensor([[1., 2., 3.]])          # 1 batch, feature as a row-matrix
print(X)
print(X.numpy().shape)

W = torch.randn(3, 2)                     # 3 input units, 2 hidden units
print(W)
print(W.numpy().shape)

y = torch.matmul(X, W)
print(y)
print(y.numpy().shape)

tensor([[1., 2., 3.]])
(1, 3)
tensor([[ 0.6614,  0.2669],
        [ 0.0617,  0.6213],
        [-0.4519, -0.1661]])
(3, 2)
tensor([[-0.5710,  1.0112]])
(1, 2)


#### Example of a Neural Network

This is an example neural network with 3 **input units**, 2 **hidden units** and 1 **output unit** using a non-linearity as activation function:

In [10]:
torch.manual_seed(1)

X = torch.randn(1, 3)

W_ih = torch.randn(3, 2)
W_ho = torch.randn(2, 1)

B_h = torch.randn(1, 2)
B_o = torch.randn(1, 1)

h = torch.sigmoid(torch.mm(X, W_ih) + B_h)
y = torch.sigmoid(torch.mm(h, W_ho) + B_o)

y

tensor([[0.1766]])

## Dimensions in PyTorch

### Addressing Dimensions

PyTorch provides functions like `torch.sum()`, `torch.mean()`, `torch.max()` or `torch.nn.Softmax()` for **element wise operations** which allow to define the axis of the operation via an `dim` argument.

An important concept of PyTorch is, that each **new dimension** gets **prepended** and takes the first position at `dim=0`:

In [48]:
a = torch.ones((4,3))
b = torch.ones((4,3))
c = torch.stack((a, b))
c.numpy().shape

(2, 4, 3)

This means that `dim` represents the **index** of an **n-dimensional tensor**.

#### Example for an 2 Dimensional Tensor

In [89]:
x = torch.ones((2, 4))
print(x)
print(x.numpy().shape)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.]])
(2, 4)


In [90]:
x.sum()                           # sum element wise

tensor(8.)

In [91]:
x.sum(dim=0)                      # sum in y-axis (here: dim index 0)

tensor([2., 2., 2., 2.])

In [92]:
x.sum(dim=1)                      # sum in x-axis (here: dim index 1)

tensor([4., 4.])

#### Example for an 3 Dimensional Tensor

In [93]:
x = torch.stack((torch.ones((3,4)), torch.ones((3,4))))
print(x)
print(x.numpy().shape)

tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]])
(2, 3, 4)


In [94]:
x.sum()

tensor(24.)

In [95]:
x.sum(dim=0)                       # sum in  z-axis (here: dim index 0)

tensor([[2., 2., 2., 2.],
        [2., 2., 2., 2.],
        [2., 2., 2., 2.]])

In [96]:
x.sum(dim=1)                       # sum in  y-axis (here: dim index 1)

tensor([[3., 3., 3., 3.],
        [3., 3., 3., 3.]])

In [97]:
x.sum(dim=2)                       # sum in  x-axis (here: dim index 2)

tensor([[4., 4., 4.],
        [4., 4., 4.]])

### Adding Dimensions

In order to add new dimensions to a PyTorch tensor, you can add a new index with `[None]`:

In [106]:
x = torch.ones(2)
x.numpy().shape

(2,)

In [130]:
x[None].numpy().shape              # a trailing `:` can be skipped

(1, 32)

In [132]:
x[:, None].numpy().shape

(32, 1)

In [135]:
x[None, None, :, None].numpy().shape

(1, 1, 32, 1)

The `unsqueeze()` function allows to add a new dimension at a **specific dimension**:

In [115]:
x = torch.ones((2, 3, 4))
x.numpy().shape

(2, 3, 4)

In [119]:
x.unsqueeze(dim=0).numpy().shape

(1, 2, 3, 4)

In [120]:
x.unsqueeze(dim=2).numpy().shape

(2, 3, 1, 4)

The `view()` function allows to reshape tensors and to add new dimensions via the `-1` dimension placeholder:

In [121]:
x = torch.ones(32)
x.numpy().shape

(32,)

In [125]:
x.view(8, 2, 2).numpy().shape

(8, 2, 2)

In [136]:
x.view(-1, 4).numpy().shape              # `-1` can only be used once!

(8, 4)

### Broadcasting

When operating on two arrays, PyTorch compares their shapes element-wise. Two dimensions are compatible when:

1. they are equal, or
2. one of them is 1
3. When either of the dimensions compared is one, the other is used.

In other words, dimensions with size 1 are stretched or copied to match the other.

In [153]:
a = torch.ones((4, 3, 2))
b = torch.ones((   3, 1))
(a + b).shape

torch.Size([4, 3, 2])

In [155]:
a = torch.ones((4, 3, 2))
b = torch.ones((   3, 2))
(a + b).shape

torch.Size([4, 3, 2])

In [156]:
a = torch.ones((4, 3, 2))
b = torch.ones((   3, 3))
(a + b).shape

RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 2

In [157]:
a = torch.ones((1, 3))
b = torch.ones((3, 1))
(a + b).shape

torch.Size([3, 3])