# 2.3 Linear Algebra


In [1]:
import torch


## Scalars

We represent scalar as tensor of dimension=1


In [2]:
x = torch.tensor(3.0)
y = torch.tensor(2.0)

x + y, x * y, x / y, x ** y, x < y, x > y


(tensor(5.),
 tensor(6.),
 tensor(1.5000),
 tensor(9.),
 tensor(False),
 tensor(True))

## Vectors

Vector is fixed-length array of scalars.
We represent vectors as 1-st order tensors

In [3]:
x = torch.arange(3)
x, len(x), x[0], x[2], x[:], x[-1]


(tensor([0, 1, 2]), 3, tensor(0), tensor(2), tensor([0, 1, 2]), tensor(2))

To indicate that a vector contains $n$ elements, we write $\mathbf{x} \in \mathbb{R}^{n}$

To get length of the vector we use `len(x)` (above) or `shape` attribute


In [4]:
s = x.shape
s


torch.Size([3])

In [5]:
t1 = torch.empty(2, 2, 2)
t2 = torch.empty(2, 3, 2)

s1 = t1.shape
s2 = t2.shape

s1 + s2, s1 < s2, s1 > s2, s1 == s2, s1 is s2

(torch.Size([2, 2, 2, 2, 3, 2]), True, False, False, False)

In [6]:
s1 = torch.Size([1, 2, 3, 1])



## 2.3.3 Matrices

Matrix is 2nd order tensor. Denoted by bold capital letters, e.g. $\mathbf{X}, \mathbf{Y} etc$

The expression $\mathbf{A} \in \mathbb{R}^{n \times m}$ indicates that a matrix $\mathbf{A}$ contains $m \times n$ real-valued scalars, arranged as rows and columns.

Example of matrix:
$$
\mathbf{A} = \[
\begin{bmatrix}
a_{11} & a_{12} & ... & a_{13} \\
a_{21} & a_{22} & ... & a_{23} \\
\end{bmatrix}
\]
$$

In [7]:
A = torch.arange(6).reshape(3, 2)
A


tensor([[0, 1],
        [2, 3],
        [4, 5]])

Transposing matrix
$\mathbf{B} = \mathbf{A^{T}}$ , where $\mathbf{b}_{ij} = \mathbf{a}_{ji}$


In [8]:
A.T


tensor([[0, 2, 4],
        [1, 3, 5]])

Simmetric matrices: $\mathbf{A}=\mathbf{A^{T}}$


In [9]:
A = torch.tensor([[1, 2, 3], [2, 0, 4], [3, 4, 5]])
A

tensor([[1, 2, 3],
        [2, 0, 4],
        [3, 4, 5]])

In [10]:
A == A.T

tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])

2.3.4 Tensors

Tensor is $n^{th}$ order array.
We denote general tensors by capital letters with a special font face (e.g. $\mathcal{X}, \mathcal{Y}, \mathcal{Z}$), and and their indexing mechanism (e.g.$[\mathcal{X}]_ijk$)



In [11]:
torch.arange(24).reshape(2, 3, 4)

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])

## 2.3.5 Basic properties of Tensor arithmetics

In [12]:
A = torch.arange(6, dtype=torch.float32).reshape(2, 3)
B = A.clone()
A, A + B


(tensor([[0., 1., 2.],
         [3., 4., 5.]]),
 tensor([[ 0.,  2.,  4.],
         [ 6.,  8., 10.]]))

In [13]:
A * B


tensor([[ 0.,  1.,  4.],
        [ 9., 16., 25.]])

In [14]:
a = 2
X = torch.arange(24).reshape(2, 3, 4)
a+X, (a*X).shape


(tensor([[[ 2,  3,  4,  5],
          [ 6,  7,  8,  9],
          [10, 11, 12, 13]],
 
         [[14, 15, 16, 17],
          [18, 19, 20, 21],
          [22, 23, 24, 25]]]),
 torch.Size([2, 3, 4]))

# 2.3.6 Reduction

Sum of vector elements $\sum_{i=1}^n x_{i}$
Sum of matrix elements $\sum_{i=1}^m\sum_{j=1}^n a_{ij}$


In [15]:
x = torch.arange(3, dtype=torch.float32)
x, x.sum()

(tensor([0., 1., 2.]), tensor(3.))

In [16]:
A, A.shape, A.sum(), A.sum().shape

(tensor([[0., 1., 2.],
         [3., 4., 5.]]),
 torch.Size([2, 3]),
 tensor(15.),
 torch.Size([]))

In [17]:
A.sum(axis=0), A.sum(axis=0).shape

(tensor([3., 5., 7.]), torch.Size([3]))

In [18]:
A.sum(axis=1), A.sum(axis=1).shape

(tensor([ 3., 12.]), torch.Size([2]))

In [19]:
A.sum(axis=[0, 1]) == A.sum() # Same as A.sum()

tensor(True)

In [20]:
A.mean(), A.sum() / A.numel()

(tensor(2.5000), tensor(2.5000))

In [21]:
A.mean(axis=0), A.sum(axis=0) / A.shape[0]


(tensor([1.5000, 2.5000, 3.5000]), tensor([1.5000, 2.5000, 3.5000]))

# 2.3.7 Non-Reduction Sum




In [22]:
A


tensor([[0., 1., 2.],
        [3., 4., 5.]])

In [23]:
A.sum(axis=1)

tensor([ 3., 12.])

In [24]:
A.sum(axis=1, keepdims=True)


tensor([[ 3.],
        [12.]])

In [25]:
"""
Why keepdim is useful?
Because it preserve original tensor structure, and hence we can use it for operations in next steps, e.g. divide original matrix on result
"""
A / A.sum(axis=1, keepdims=True)


tensor([[0.0000, 0.3333, 0.6667],
        [0.2500, 0.3333, 0.4167]])

In [26]:
"""
Cumulative sum
"""
A.cumsum(axis=0), A.cumsum(axis=1)


(tensor([[0., 1., 2.],
         [3., 5., 7.]]),
 tensor([[ 0.,  1.,  3.],
         [ 3.,  7., 12.]]))

## 2.3.8 Dot products

Given two vectors $\mathbf{x}, \mathbf{y} \in \mathbb{R}^d$, their *dot product* $\mathbf{x}^\top \mathbf{y}$ (or $\langle \mathbf{x}, \mathbf{y} \rangle$) is a sum over the products of the elements at the same position:
$\mathbf{x}^\top \mathbf{y} = \sum_{i=1}^{d} x_i y_i$.


In [27]:
"""
Example
"""
x = torch.tensor([1,2,3])
y = torch.tensor([4,5,6])

"""
Result will be: 1*4 + 2*5 + 3*6 = 32
"""
torch.dot(x,y)



tensor(32)

In [28]:
"""
More examples with numpy
"""
import numpy as np
np.dot([1,2,3],[1,2,3]), \
    np.dot([3,2,1],[1,2,7])


(14, 14)

In [29]:
"""
And how to do this hardway, without using built-in dot() function
"""
# Step 1 - pairwise multiplication
# step 2 - summation
s1 = x * y
s1, s1.sum()


(tensor([ 4, 10, 18]), tensor(32))

## 2.3.9 Matrix vector products

Now that we know how to calculate dot products, we can begin to
understand the *product* between an $m \times n$ matrix
$\mathbf{A}$ and an $n$-dimensional vector
$\mathbf{x}$. To start off, we visualize our matrix in terms of its row vectors

$$
\mathbf{A}= \[
\begin{bmatrix}
    \mathbf{a}^\top_{1} \\
    \mathbf{a}^\top_{2} \\
    \vdots \\
    \mathbf{a}^\top_m \\
\end{bmatrix}
\]
$$

where each $\mathbf{a}^\top_{i} \in \mathbb{R}^n$ is a row vector representing the $i^\mathrm{th}$ row of the matrix $\mathbf{A}$.

The matrix-vector product $\mathbf{A}\mathbf{x}$ is simply a column vector of length $m$, whose $i^\mathrm{th}$ element is the dot product $\mathbf{a}^\top_i \mathbf{x}$:

$$
\mathbf{A}\mathbf{x}= \[
\begin{bmatrix}
    \mathbf{a}^\top_{1} \\
    \mathbf{a}^\top_{2} \\
    \vdots \\
    \mathbf{a}^\top_m \\
\end{bmatrix}
\]
\cdot \mathbf{x} = \[
\begin{bmatrix}
    \mathbf{a}^\top_{1} \mathbf{x}  \\
    \mathbf{a}^\top_{2} \mathbf{x} \\
    \vdots \\
    \mathbf{a}^\top_{m} \mathbf{x}\\
\end{bmatrix}
\].
$$

We can think of multiplication with a matrix $\mathbf{A}\in \mathbb{R}^{m \times n}$ as a transformation that
projects vectors from $\mathbb{R}^{n}$ to $\mathbb{R}^{m}$. These transformations are remarkably useful. For example, we can
represent rotations as multiplications by certain square matrices. Matrix-vector products also describe the key calculation involved in
computing the outputs of each layer in a neural network given the outputs from the previous layer.


In [30]:
x = x.to(torch.float32)
A, x, A.shape, x.shape


(tensor([[0., 1., 2.],
         [3., 4., 5.]]),
 tensor([1., 2., 3.]),
 torch.Size([2, 3]),
 torch.Size([3]))

In [31]:
"""
Options to expression for dot operation
"""
torch.mv(A, x), A@x


(tensor([ 8., 26.]), tensor([ 8., 26.]))

## 2.3.10 Matrix-Matrix multiplication

Assume we have 2 matrices: $\mathbf{A} \in \mathbb{R}^{m \times k}$ and $\mathbf{B} \in \mathbb{R}^{k \times n}$:

$$
\mathbf{A} = \[
\begin{bmatrix}
    \mathbf{a}_{11} \mathbf{a}_{12} \hdots  \mathbf{a}_{1k} \\
    \mathbf{a}_{21} \mathbf{a}_{22} \hdots  \mathbf{a}_{2k} \\
    \vdots \\
    \mathbf{a}_{m1} \mathbf{a}_{m2} \hdots  \mathbf{a}_{mk} \\
\end{bmatrix}
\]
, \mathbf{B} = \[
\begin{bmatrix}
    \mathbf{b}_{11} \mathbf{b}_{12} \hdots  \mathbf{b}_{1n} \\
    \mathbf{b}_{21} \mathbf{b}_{22} \hdots  \mathbf{b}_{2n} \\
    \vdots \\
    \mathbf{b}_{k1} \mathbf{b}_{k2} \hdots  \mathbf{b}_{kn} \\
\end{bmatrix}
\].
$$


Let $\mathbf{a_i^\top} \in \mathbb{R^k}$ denote row vector representing the $i^{th}$ row of the matric $\mathbf{A}$ and let $\mathbf{b_j} \in \mathbb{R^k}$ denote the column vector from the $j^{th}$ column of the matrix $\mathbf{B}$

$$
\mathbf{A} = \[
\begin{bmatrix}
    \mathbf{a_1^\top} \\
    \mathbf{a_2^\top} \\
    \vdots \\
    \mathbf{a_m^\top}
\end{bmatrix}
\]
,
\mathbf{B} = \[
\begin{bmatrix}
    \mathbf{b_1^\top} \mathbf{b_2^\top} \hdots \mathbf{b_m^\top}
\end{bmatrix}
\].
$$


To form the matrix product $\mathbf{C} \in \mathbb{R^{m \times n}}$, we compute each element $c_{ij}$ as the dot product between the $i^{th}$ row of $\mathbf{A}$ and the $j^{th}$ column of $\mathbf{B}$, i.e. $\mathbf{a_i^\top}\mathbf{b_j}$:

$$
\mathbf{C} = \mathbf{AB} = \[
\begin{bmatrix}
    \mathbf{a_1^\top} \\
    \mathbf{a_2^\top} \\
    \vdots \\
    \mathbf{a_m^\top}
\end{bmatrix}
\]
\[
\begin{bmatrix}
    \mathbf{b}_{1} & \mathbf{b}_{2} & \cdots & \mathbf{b}_{n} \\
\end{bmatrix}
\] =
\[
\begin{bmatrix}
    \mathbf{a}^\top_{1} \mathbf{b}_1 & \mathbf{a}^\top_{1}\mathbf{b}_2& \cdots & \mathbf{a}^\top_{1} \mathbf{b}_n \\
    \mathbf{a}^\top_{2}\mathbf{b}_1 & \mathbf{a}^\top_{2} \mathbf{b}_2 & \cdots & \mathbf{a}^\top_{2} \mathbf{b}_n \\
    \vdots & \vdots & \ddots &\vdots \\
   \mathbf{a}^\top_{m} \mathbf{b}_1 & \mathbf{a}^\top_{m}\mathbf{b}_2& \cdots& \mathbf{a}^\top_{m} \mathbf{b}_n
\end{bmatrix}
\]

$$


In [32]:
B = torch.ones(3, 4)
A, B


(tensor([[0., 1., 2.],
         [3., 4., 5.]]),
 tensor([[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]))

In [33]:
torch.mm(A, B), A@B


(tensor([[ 3.,  3.,  3.,  3.],
         [12., 12., 12., 12.]]),
 tensor([[ 3.,  3.,  3.,  3.],
         [12., 12., 12., 12.]]))

## 2.3.11 Norms
http://d2l.ai/chapter_preliminaries/linear-algebra.html#norms

- A norm is a function $\| \cdot \|$
- $\ell_2$ norm measures the (Euclidean) length of a vector.


$\ell_2 = \|\mathbf{x}\|_2 = \sqrt{\sum_{i=1}^n x_i^2} $



In [34]:
"""
Calculate L2 norm: sqrt(3^2+4^2)=sqrt(9+16)=5
"""
u = torch.tensor([3.0, -4.0])
torch.norm(u)


tensor(5.)

Manhattan distance. Simple. Less sensitive to outliers.

$\ell_1 = \|\mathbf{x}\|_1 = \sum_{i=1}^n \left|x_i \right|$


In [35]:
"""
Manhattan distance

L1 = 3+4 = 7
"""
u, torch.abs(u).sum()


(tensor([ 3., -4.]), tensor(7.))

In [36]:
"""
Frobenius norm for matices
"""
M = torch.ones((4,9))
M, f"Torch: {torch.norm(M)}", f"Numpy: {np.linalg.norm(M)}",


(tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1.]]),
 'Torch: 6.0',
 'Numpy: 6.0')

In [37]:
t = torch.tensor([2,3,4])
len(t)

3

In [38]:
"""
For a tensor X of arbitrary shape, does len(X) always correspond to the length of a certain axis of X? What is that axis?
"""
len(torch.ones(2,3,4)), len(torch.ones(1)), len(torch.ones(4,4,4)),len(torch.ones(4,2,3)),

"""
It always corresponds to the length oz first (i.e. zero) axis. For matrix it will be number of rows
"""



'\nIt always corresponds to the length oz first (i.e. zero) axis. For matrix it will be number of rows\n'

In [39]:
"""
Run A / A.sum(axis=1) and see what happens. Can you analyze the reason?
"""
A, A.sum(), A.sum(axis=1)


(tensor([[0., 1., 2.],
         [3., 4., 5.]]),
 tensor(15.),
 tensor([ 3., 12.]))