# 2.3 Linear Algebra

In [161]:
import torch

## 2.3.1 Scalars
The expression $ x \in \R $ is a formal way to say that $x$ is a real-valued scalar.

$x,y \in \{0, 1\}$ indicates that $x$ and $y$ are variables that can only take values $0$ or $1$.

Scalars are implemented as tensors that contain only one element.

In [162]:
x = torch.tensor(3.0)
y = torch.tensor(2.0)

x + y, x * y, x / y, x**y

(tensor(5.), tensor(6.), tensor(1.5000), tensor(9.))

## 2.3.2 Vectors
Vectors are implemented as 1st-order tensors. In general, such tensors can have arbitrary lengths. 

In [163]:
x = torch.arange(3)
x, len(x), x.shape

(tensor([0, 1, 2]), 3, torch.Size([3]))

$x_2$ denotes the second element of $\vec{x}$. It is a scalar.

To indicate that a vector contains $n$ elements, we write $x \in \R^n$. Formally, we call $n$ the dimensionality of the vector.

We use **order** to refer to the number of axes, and **dimensionality** exclusively to refer to the number of components.

## 2.3.3 Matrices
Just as scalars are 0th-order tensors and vectors are 1st-order tensors, matrices are 2nd-order tensors.

The expression $\vec{A} \in \R^{m \times n}$ indicates that matrix $\vec{A}$ contains $m \times n$ real-valued scalars.

In [164]:
A = torch.arange(6).reshape(3, 2)
A

tensor([[0, 1],
        [2, 3],
        [4, 5]])

To flip the axes of a matrix, we transpose it.

$$ B = A^T $$

If a square matrix is equal to its transpose, we call it **symmetric**.

In [165]:
A.T

tensor([[0, 2, 4],
        [1, 3, 5]])

## 2.3.4 Tensors
Tensors are nth-order arrays.

In [166]:
torch.arange(24).reshape(2, 3, 4)

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])

## 2.3.5 Tensor Arithmetic

In [167]:
A = torch.arange(6, dtype=torch.float32).reshape(2, 3)
B = A.clone()  # Assign a copy of A to B by allocating new memory
A, A + B

(tensor([[0., 1., 2.],
         [3., 4., 5.]]),
 tensor([[ 0.,  2.,  4.],
         [ 6.,  8., 10.]]))

In [168]:
A * B

tensor([[ 0.,  1.,  4.],
        [ 9., 16., 25.]])

In [169]:
a = 2
X = torch.arange(24).reshape(2, 3, 4)
a + X, (a * X).shape

(tensor([[[ 2,  3,  4,  5],
          [ 6,  7,  8,  9],
          [10, 11, 12, 13]],
 
         [[14, 15, 16, 17],
          [18, 19, 20, 21],
          [22, 23, 24, 25]]]),
 torch.Size([2, 3, 4]))

## 2.3.6 Reduction
By default, invoking the sum function reduces a tensor along all of its axes, eventually producing a scalar. 

In [170]:
x = torch.arange(3, dtype=torch.float32)
x, x.sum()

(tensor([0., 1., 2.]), tensor(3.))

In [171]:
A.shape, A.sum()

(torch.Size([2, 3]), tensor(15.))

To sum over all elements along the rows (axis 0), we specify axis=0 in sum. Since the input matrix reduces along axis 0 to generate the output vector, this axis is missing from the shape of the output.

In [172]:
A.shape, A.sum(axis=0).shape

(torch.Size([2, 3]), torch.Size([3]))

In [173]:
A.shape, A.sum(axis=1).shape

(torch.Size([2, 3]), torch.Size([2]))

In [174]:
A.sum(axis=[0, 1]) == A.sum()  # Same as A.sum()

tensor(True)

In [175]:
A.mean() == A.sum() / A.numel()

tensor(True)

## 2.3.7 Non-Reduction Sum

In [176]:
sum_A = A.sum(axis=1, keepdims=True)
sum_A, sum_A.shape

(tensor([[ 3.],
         [12.]]),
 torch.Size([2, 1]))

In [177]:
A / sum_A

tensor([[0.0000, 0.3333, 0.6667],
        [0.2500, 0.3333, 0.4167]])

## 2.3.8 Dot Products
Given two vectors $\vec{x}$ and $\vec{y}$, their dot product $\vec{x}^T\vec{y}$ -- also known as the inner product -- is a sum over the products of the elements at the same position.

In [178]:
y = torch.ones(3, dtype = torch.float32)
print(x)
print(y)
print(torch.dot(x, y))

tensor([0., 1., 2.])
tensor([1., 1., 1.])
tensor(3.)


In [179]:
# Equivalently:
torch.sum(x * y)

tensor(3.)

## 2.3.9 Matrix-Vector Products
We can think of multiplication with a matrix $A \in \R^{m \times n}$ as a transformation that projects vectors from $\R^n$ to $\R^m$.

Use cases:
- Represent rotations as multiplications by certain square matrices.
- Describes key calculation involved in computing outputs of each layer in a neural network.

In [180]:
print(A.shape)
print(x.shape)
print(torch.mv(A, x))
print(A@x)

torch.Size([2, 3])
torch.Size([3])
tensor([ 5., 14.])
tensor([ 5., 14.])


## 2.3.10 Matrix Multiplication

We can think of matrix-matrix multiplication as performing $m$ matrix-vector products, or $m \times n$ dot products, and stitching the results together to form an $n \times m$ matrix.

Here, **A** is a $2 \times 3$ matrix and **B** is a $3 \times 4$ matrix.

After multiplication, we obtain a $2 \times 4$ matrix.

In [181]:
print(A, "\n")

B = torch.ones(3, 4)
print(B, "\n")

print(torch.mm(A, B), "\n")
print(A@B)

tensor([[0., 1., 2.],
        [3., 4., 5.]]) 

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]) 

tensor([[ 3.,  3.,  3.,  3.],
        [12., 12., 12., 12.]]) 

tensor([[ 3.,  3.,  3.,  3.],
        [12., 12., 12., 12.]])


## 2.3.11 Norms
The norm of a vector tells us how big it is. For instance, the $l_2$ norm measures the Euclidean length of a vector.

A norm is a function that maps a vector to a scalar, and satisfies three properties:

1. Given any vector $\vec{x}$, if we scale the vector by a scalar $\vec{a} \in R$, its norm scales accordingly.

2. For any vectors $\vec{x}$ and $\vec{y}$, norms satisfy the triangle inequality:

$$ || \textbf{x} + \textbf{y} || \leq || \textbf{x} || + || \textbf{y} ||$$

3. The norm of a vector is nonnegative and only vanishes if the vector is zero.

$$ || \textbf{x} || > 0 \textrm{ for all } \textbf{x} \neq 0.$$

The $l_2$ norm is the square root of the sum of squares of a vector's elements.

In [182]:
u = torch.tensor([3.0, -4.0])
torch.norm(u)

tensor(5.)

The $l_1$ norm is called the Manhattan distance. It sums the absolute values of a vector's elements.

Compared to the $l_2$ norm, it is less sensitive to outliers.

In [183]:
torch.abs(u).sum()

tensor(7.)

## 2.3.12 Discussion
- Elementwise products are called Hadamard products. 
- By contrast, dot products, matrix–vector products, and matrix–matrix products are not.
- Compared to Hadamard products, matrix–matrix products take considerably longer to compute (cubic rather than quadratic time).
- Norms capture various notions of the magnitude of a vector (or matrix), and are commonly applied to the difference of two vectors to measure their distance apart.