# Tensor Operations

## Tensor Transposition

- The transpose of a scalar is itself, e.g.: $x^T = x$
- The transpose of a vector converts column to row, and vice versa
- Scalar and vector transposition are special cases of **matrix transposition**:
    - Flip of axes over the *main diagonal*: $(\boldsymbol{X}^T)_{i,j} = \boldsymbol{X}_{j,i}$

   $$\begin{bmatrix} x_{1,1} & x_{1,2} \\ x_{2,1} & x_{2,2} \\ x_{3,1} & x_{3,2} \end{bmatrix}^T = \begin{bmatrix} x_{1,1} & x_{2,1} & x_{3,1} \\ x_{1,2} & x_{2,2} & x_{3,2} \end{bmatrix}$$

### Tensor Transposition with Python

In [46]:
import numpy as np
import torch
import tensorflow as tf

In [47]:
X = np.array([[25, 2], [5, 26], [3, 7]])
X_tf = tf.Variable([[25, 2], [5, 26], [3, 7]])
X_pt = torch.tensor([[25, 2], [5, 26], [3, 7]])

##### Transposition with NumPy

In [48]:
X.T

array([[25,  5,  3],
       [ 2, 26,  7]])

##### Transposition with TensorFlow

In [49]:
tf.transpose(X_tf)

<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[25,  5,  3],
       [ 2, 26,  7]])>

### Transposition with PyTorch

In [50]:
X_pt.T

tensor([[25,  5,  3],
        [ 2, 26,  7]])

## Basic Tensor Arithmetic

Adding or multiplying with scalar applies the operation to each element of the tensor and the tensor shape is retained:

In [51]:
X * 2

array([[50,  4],
       [10, 52],
       [ 6, 14]])

In [52]:
X + 2

array([[27,  4],
       [ 7, 28],
       [ 5,  9]])

In [53]:
X*2+2

array([[52,  6],
       [12, 54],
       [ 8, 16]])

With TensorFlow and PyTorch, there's the risk of operator overloading: operator overloading is when the same operator (e.g. `*`, `+`) has different meanings depending on the context. For example, in TensorFlow, `*` is element-wise multiplication, while in PyTorch, it can be matrix multiplication. To avoid confusion, it's better to use the explicit functions provided by the libraries:

In [54]:
torch.add(torch.mul(X_pt, 2), 2)

tensor([[52,  6],
        [12, 54],
        [ 8, 16]])

In [55]:
tf.add(tf.multiply(X_tf, 2), 2)

<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[52,  6],
       [12, 54],
       [ 8, 16]])>

#### Hadamard Product (or Element-wise Product)

If two tensors have the same size, operations are often applied element-wise. This is called the **Hadamard product**. The Hadamard product of two tensors $\boldsymbol{A}$ and $\boldsymbol{X}$ is denoted as $\boldsymbol{A} \odot \boldsymbol{X}$.

In [56]:
X

array([[25,  2],
       [ 5, 26],
       [ 3,  7]])

In [57]:
A = X + 2
A

array([[27,  4],
       [ 7, 28],
       [ 5,  9]])

In [58]:
A + X

array([[52,  6],
       [12, 54],
       [ 8, 16]])

In [59]:
# the Hadamard product in NumPy
A * X

array([[675,   8],
       [ 35, 728],
       [ 15,  63]])

In [60]:
A_pt = X_pt + 2

In [61]:
A_pt + X_pt

tensor([[52,  6],
        [12, 54],
        [ 8, 16]])

In [62]:
# the Hadamard product in PyTorch
A_pt * X_pt

tensor([[675,   8],
        [ 35, 728],
        [ 15,  63]])

In [63]:
A_tf = X_tf + 2

In [64]:
A_tf + X_tf

<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[52,  6],
       [12, 54],
       [ 8, 16]])>

In [65]:
# the Hadamard product in TensorFlow
A_tf * X_tf

<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[675,   8],
       [ 35, 728],
       [ 15,  63]])>

### Tensor Reduction

Calculating the sum across all elements of a tensor is a common operation. For example:

- For vector $\boldsymbol{x}$ of length $n$, we calculate $\sum_{i=1}^{n} x_i$
- For matrix $\boldsymbol{X}$ with $m$ by $n$ dimensions, we calculate $\sum_{i=1}^{m} \sum_{j=1}^{n} X_{i,j}$


In [66]:
X

array([[25,  2],
       [ 5, 26],
       [ 3,  7]])

In [67]:
X.sum()

68

In [68]:
torch.sum(X_pt)

tensor(68)

In [69]:
tf.reduce_sum(X_tf)

<tf.Tensor: shape=(), dtype=int32, numpy=68>

It can also be done along a specific axis.

In [70]:
X.sum(0)

array([33, 35])

In [71]:
X.sum(1)

array([27, 31, 10])

In [72]:
torch.sum(X_pt, 0)

tensor([33, 35])

In [73]:
tf.reduce_sum(X_tf, 1)

<tf.Tensor: shape=(3,), dtype=int32, numpy=array([27, 31, 10])>

Other operations that can be applied with reduction along all or a selection of axses include:

- maximum
- minimum
- mean
- product

## The Dot Product

If we have two vectors $\boldsymbol{x}$ and $\boldsymbol{y}$ of the same length $n$, we can calculate the dot product between them.

The dot product is annotated in different ways:

- $\boldsymbol{x} \cdot \boldsymbol{y}$
- $\boldsymbol{x}^T \boldsymbol{y}$
- $\langle \boldsymbol{x} , \boldsymbol{y}\rangle$

Regardless of the notation, the dot product is calculated as follows: first, the products are calculated in an element-wise fashion, and then the results are summed reductively across the products to a scalar value/

$$\boldsymbol{x} \cdot \boldsymbol{y} = \sum_{i=1}^{s} x_i y_i$$

In [74]:
x = np.array([25, 2, 5])
x_pt = torch.tensor([25, 2, 5])
x_tf = tf.Variable([25, 2, 5])
y = np.array([0, 1, 2])
y_pt = torch.tensor([0, 1, 2])
y_tf = tf.Variable([0, 1, 2])

In [75]:
np.dot(x, y)

12

In [76]:
np.dot(x_pt, y_pt)

12

In [77]:
torch.dot(x_pt, y_pt)

tensor(12)

In [78]:
tf.reduce_sum(tf.multiply(x_tf, y_tf))

<tf.Tensor: shape=(), dtype=int32, numpy=12>

In [79]:
A = np.array([[1,2,3],[4,5,6]])
B = np.array([[7,8],[9,10],[11,12]])
np.dot(A, B)

array([[ 58,  64],
       [139, 154]])

## Exercises

1. What is $\boldsymbol{Y}^T$ for the following matrix:

$$\boldsymbol{Y} = \begin{bmatrix} 42 & 4 & 7 & 99 \\ -99 & -3 & 17 & 22 \end{bmatrix}$$

In [80]:
y = np.array([[42, 4, 7, 99],[-99, -3, 17, 22]])
y

array([[ 42,   4,   7,  99],
       [-99,  -3,  17,  22]])

In [81]:
y.T

array([[ 42, -99],
       [  4,  -3],
       [  7,  17],
       [ 99,  22]])

2. What is the Hadamard product of the following matrices:

$$\begin{bmatrix} 25 & 10 \\ -2 & 1 \end{bmatrix} \odot \begin{bmatrix} -1 & 7 \\ 10 & 8 \end{bmatrix}$$

In [82]:
A = np.array([[25,10],[-2,1]])
B = np.array([[-1,7],[10,8]])
A * B

array([[-25,  70],
       [-20,   8]])

3. What is the dot product of the following vectors:

$$\boldsymbol{w} = [-1 \quad 2 \quad -2]$$
$$\boldsymbol{z} = [5 \quad 10 \quad 0]$$

In [83]:
w = np.array([-1, 2, -2])
z = np.array([5, 10, 0])
np.dot(w, z)

15

## Tensor Manipulation

### Tensor Aggregation

Finding the maximum, minimum, mean, and sum of a tensor is a common operation. It is often used to summarize the data in a tensor, and it can be done along all or a selection of axes.

It's called aggregation because it combines the values in a tensor into a single value or a smaller tensor. For example, if we have a tensor with 100 elements, we can aggregate it to a single value by calculating the sum or mean of all the elements.

In [99]:
X = torch.arange(0, 100, 10)
X

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [100]:
torch.max(X), torch.min(X), torch.mean(X.type(torch.float32)), torch.sum(X)

(tensor(90), tensor(0), tensor(45.), tensor(450))

The *positional* maximum and of a tensor is helpful in case we want to find the position of a specific value, but not the value itself.

In [101]:
# Create a tensor
tensor = torch.arange(10, 100, 10)
print(f"Tensor: {tensor}")

# Returns index of max and min values
print(f"Index where max value occurs: {tensor.argmax()}")
print(f"Index where min value occurs: {tensor.argmin()}")

Tensor: tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])
Index where max value occurs: 8
Index where min value occurs: 0


### Reshaping

Reshaping changes the shape of a tensor:

In [102]:
S = torch.arange(1., 8.)
S, S.shape

(tensor([1., 2., 3., 4., 5., 6., 7.]), torch.Size([7]))

Let's add a new dimension to the tensor:

In [93]:
R = S.reshape(1, 7)
R, R.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

### View

Returns a view of the original tensor with a different shape, but shares the same data as the original tensor.

In [89]:
V = S.view(1, 7)
V, V.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

Changing the view, though, also changes the original tensor's shape:

In [90]:
V[:, 0] = 5
print(f"Original tensor: {S}")
print(f"View of the tensor: {V}")

Original tensor: tensor([5., 2., 3., 4., 5., 6., 7.])
View of the tensor: tensor([[5., 2., 3., 4., 5., 6., 7.]])


### Stacking

With stacking, we can combine multiple tensors into a single tensor.

In [91]:
T = torch.stack([S, S, S, S], dim=0)
T, T.shape

(tensor([[5., 2., 3., 4., 5., 6., 7.],
         [5., 2., 3., 4., 5., 6., 7.],
         [5., 2., 3., 4., 5., 6., 7.],
         [5., 2., 3., 4., 5., 6., 7.]]),
 torch.Size([4, 7]))

### Squeezing and Unsqueezing

Squeezing removes dimensions of size 1 from a tensor.

In [95]:
Q = R.squeeze()
print(f"Original tensor: {R}")
print(f"Squeezed tensor: {Q}")
print(f"Shape of original tensor: {R.shape}")
print(f"Shape of squeezed tensor: {Q.shape}")

Original tensor: tensor([[1., 2., 3., 4., 5., 6., 7.]])
Squeezed tensor: tensor([1., 2., 3., 4., 5., 6., 7.])
Shape of original tensor: torch.Size([1, 7])
Shape of squeezed tensor: torch.Size([7])


The `unsqueeze()` method adds a dimension of size 1 to a tensor.

In [96]:
U = Q.unsqueeze(0)
print(f"Original tensor: {Q}")
print(f"Unsqueezed tensor: {U}")
print(f"Shape of original tensor: {Q.shape}")
print(f"Shape of unsqueezed tensor: {U.shape}")

Original tensor: tensor([1., 2., 3., 4., 5., 6., 7.])
Unsqueezed tensor: tensor([[1., 2., 3., 4., 5., 6., 7.]])
Shape of original tensor: torch.Size([7])
Shape of unsqueezed tensor: torch.Size([1, 7])


### Permutation

Using the `permute()` method, we can change the order of the dimensions of a tensor.

This creates a view of the original tensor with a different shape.

In [98]:
X = torch.rand(size=(224, 224, 3))
P = X.permute(2, 0, 1)

print(f"Original tensor shape: {X.shape}")
print(f"Permuted tensor shape: {P.shape}")

Original tensor shape: torch.Size([224, 224, 3])
Permuted tensor shape: torch.Size([3, 224, 224])
