# Linear Algebra
Below, we introduce the basic mathematical objects, arithmetic,
and operations in linear algebra

## Scalars
A scalar is represented by a tensor with just one element

In [1]:
import torch

x = torch.tensor(3.0)
y = torch.tensor(2.0)

x + y, x * y, x / y, x**y

(tensor(5.), tensor(6.), tensor(1.5000), tensor(9.))

## Vectors

[**You can think of a vector as simply a list of scalar values.**]

In [2]:
x = torch.arange(4)
x

tensor([0, 1, 2, 3])

We can refer to any element of a vector by using a subscript.
In code,
we (**access any element by indexing into the tensor.**)


In [3]:
x[3]

tensor(3)

### Length, Dimensionality, and Shape

In [4]:
#Length of a tensor
len(x)

4

The shape is a tuple that lists the length (dimensionality)
along each axis of the tensor.

In [5]:
x.shape

torch.Size([4])

## Matrices

Just as vectors generalize scalars from order zero to order one,
matrices generalize vectors from order one to order two.
Matrices, which we will typically denote with bold-faced, capital letters
(e.g., $\mathbf{X}$, $\mathbf{Y}$, and $\mathbf{Z}$),
are represented in code as tensors with two axes.

In [6]:
A = torch.arange(20).reshape(5, 4)
A

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19]])

In [7]:
#Matrix transpose
A.T

tensor([[ 0,  4,  8, 12, 16],
        [ 1,  5,  9, 13, 17],
        [ 2,  6, 10, 14, 18],
        [ 3,  7, 11, 15, 19]])

As a special type of the square matrix,
[**a *symmetric matrix* $\mathbf{A}$ is equal to its transpose:
$\mathbf{A} = \mathbf{A}^\top$.**]
Here we define a symmetric matrix `B`.


In [8]:
B = torch.tensor([[1, 2, 3], [2, 0, 4], [3, 4, 5]])
B

tensor([[1, 2, 3],
        [2, 0, 4],
        [3, 4, 5]])

Now we compare `B` with its transpose.


In [9]:
B == B.T

tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])

## Tensors

Just as vectors generalize scalars, and matrices generalize vectors, we can build data structures with even more axes.
[**Tensors**]
(**give us a generic way of describing $n$-dimensional arrays with an arbitrary number of axes.**)

In [10]:
X = torch.arange(24).reshape(2, 3, 4)
X

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])

## Basic Properties of Tensor Arithmetic

In [11]:
A = torch.arange(20, dtype=torch.float32).reshape(5, 4)
B = A.clone()  # Assign a copy of `A` to `B` by allocating new memory
A, A + B

(tensor([[ 0.,  1.,  2.,  3.],
         [ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.],
         [12., 13., 14., 15.],
         [16., 17., 18., 19.]]),
 tensor([[ 0.,  2.,  4.,  6.],
         [ 8., 10., 12., 14.],
         [16., 18., 20., 22.],
         [24., 26., 28., 30.],
         [32., 34., 36., 38.]]))

In [12]:
#Elementwise multiplication
A * B

tensor([[  0.,   1.,   4.,   9.],
        [ 16.,  25.,  36.,  49.],
        [ 64.,  81., 100., 121.],
        [144., 169., 196., 225.],
        [256., 289., 324., 361.]])

[**Multiplying or adding a tensor by a scalar**] also does not change the shape of the tensor,
where each element of the operand tensor will be added or multiplied by the scalar.


In [13]:
a = 2
X = torch.arange(24).reshape(2, 3, 4)
a + X, (a * X)

(tensor([[[ 2,  3,  4,  5],
          [ 6,  7,  8,  9],
          [10, 11, 12, 13]],
 
         [[14, 15, 16, 17],
          [18, 19, 20, 21],
          [22, 23, 24, 25]]]),
 tensor([[[ 0,  2,  4,  6],
          [ 8, 10, 12, 14],
          [16, 18, 20, 22]],
 
         [[24, 26, 28, 30],
          [32, 34, 36, 38],
          [40, 42, 44, 46]]]))

## Reduction

One useful operation that we can perform with arbitrary tensors
is to
calculate [**the sum of their elements.**]

In [14]:
x = torch.arange(4, dtype=torch.float32)
x, x.sum()

(tensor([0., 1., 2., 3.]), tensor(6.))

In [15]:
A.shape, A.sum()

(torch.Size([5, 4]), tensor(190.))

By default, invoking the function for calculating the sum
*reduces* a tensor along all its axes to a scalar.
We can also [**specify the axes along which the tensor is reduced via summation.**]
Take matrices as an example.
To reduce the row dimension (axis 0) by summing up elements of all the rows,
we specify `axis=0` when invoking the function.
Since the input matrix reduces along axis 0 to generate the output vector,
the dimension of axis 0 of the input is lost in the output shape.


In [16]:
#Specify the axes along which the tensor is reduced via summation
A_sum_axis0 = A.sum(axis=0)
A_sum_axis0, A_sum_axis0.shape

(tensor([40., 45., 50., 55.]), torch.Size([4]))

Specifying
`axis=1` will reduce the column dimension (axis 1) by summing up elements of all the columns.
Thus, the dimension of axis 1 of the input is lost in the output shape.


In [17]:
A_sum_axis1 = A.sum(axis=1)
A_sum_axis1, A_sum_axis1.shape

(tensor([ 6., 22., 38., 54., 70.]), torch.Size([5]))

Reducing a matrix along both rows and columns via summation
is equivalent to summing up all the elements of the matrix.


In [18]:
A.sum(axis=[0, 1])  # Same as `A.sum()`

tensor(190.)

In [19]:
A.mean(), A.sum() / A.numel()

(tensor(9.5000), tensor(9.5000))

Likewise, the function for calculating the mean can also reduce a tensor along the specified axes.


In [20]:
A.mean(axis=0), A.sum(axis=0) / A.shape[0]

(tensor([ 8.,  9., 10., 11.]), tensor([ 8.,  9., 10., 11.]))

### Non-Reduction Sum
:label:`subseq_lin-alg-non-reduction`

However,
sometimes it can be useful to [**keep the number of axes unchanged**]
when invoking the function for calculating the sum or mean.


In [21]:
sum_A = A.sum(axis=1, keepdims=True)
sum_A

tensor([[ 6.],
        [22.],
        [38.],
        [54.],
        [70.]])

For instance,
since `sum_A` still keeps its two axes after summing each row, we can (**divide `A` by `sum_A` with broadcasting.**)


In [22]:
A / sum_A

tensor([[0.0000, 0.1667, 0.3333, 0.5000],
        [0.1818, 0.2273, 0.2727, 0.3182],
        [0.2105, 0.2368, 0.2632, 0.2895],
        [0.2222, 0.2407, 0.2593, 0.2778],
        [0.2286, 0.2429, 0.2571, 0.2714]])

If we want to calculate [**the cumulative sum of elements of `A` along some axis**], say `axis=0` (row by row),
we can call the `cumsum` function. This function will not reduce the input tensor along any axis.


In [23]:
A.cumsum(axis=0)

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  6.,  8., 10.],
        [12., 15., 18., 21.],
        [24., 28., 32., 36.],
        [40., 45., 50., 55.]])

## Dot Products

In [24]:
y = torch.ones(4, dtype = torch.float32)
x, y, torch.dot(x, y)

(tensor([0., 1., 2., 3.]), tensor([1., 1., 1., 1.]), tensor(6.))

Note that
(**we can express the dot product of two vectors equivalently by performing an elementwise multiplication and then a sum:**)


In [25]:
torch.sum(x * y)

tensor(6.)

## Matrix-Vector Products

Expressing matrix-vector products in code with tensors, we use
the `mv` function. When we call `torch.mv(A, x)` with a matrix
`A` and a vector `x`, the matrix-vector product is performed.
Note that the column dimension of `A` (its length along axis 1)
must be the same as the dimension of `x` (its length).


In [26]:
A.shape, x.shape, torch.mv(A, x)

(torch.Size([5, 4]), torch.Size([4]), tensor([ 14.,  38.,  62.,  86., 110.]))

## Matrix-Matrix Multiplication

In [27]:
B = torch.ones(4, 3)
torch.mm(A, B)

tensor([[ 6.,  6.,  6.],
        [22., 22., 22.],
        [38., 38., 38.],
        [54., 54., 54.],
        [70., 70., 70.]])

## Norms

[**The $L_2$ *norm* of $\mathbf{x}$ is the square root of the sum of the squares of the vector elements:**]

**$$\|\mathbf{x}\|_2 = \sqrt{\sum_{i=1}^n x_i^2},$$**


where the subscript $2$ is often omitted in $L_2$ norms, i.e., $\|\mathbf{x}\|$ is equivalent to $\|\mathbf{x}\|_2$. In code,
we can calculate the $L_2$ norm of a vector as follows.


In [28]:
u = torch.tensor([3.0, -4.0])
torch.norm(u)

tensor(5.)

You will also frequently encounter [**the $L_1$ *norm***],
which is expressed as the sum of the absolute values of the vector elements:

**$$\|\mathbf{x}\|_1 = \sum_{i=1}^n \left|x_i \right|.$$**

In [29]:
torch.abs(u).sum()

tensor(7.)

In [30]:
torch.norm(torch.ones((4, 9)))

tensor(6.)

## Summary

* Scalars, vectors, matrices, and tensors are basic mathematical objects in linear algebra.
* Vectors generalize scalars, and matrices generalize vectors.
* Scalars, vectors, matrices, and tensors have zero, one, two, and an arbitrary number of axes, respectively.
* A tensor can be reduced along the specified axes by `sum` and `mean`.
* Elementwise multiplication of two matrices is called their Hadamard product. It is different from matrix multiplication.
* In deep learning, we often work with norms such as the $L_1$ norm, the $L_2$ norm, and the Frobenius norm.
* We can perform a variety of operations over scalars, vectors, matrices, and tensors.

## Exercises

1. Prove that the transpose of a matrix $\mathbf{A}$'s transpose is $\mathbf{A}$: $(\mathbf{A}^\top)^\top = \mathbf{A}$.
1. Given two matrices $\mathbf{A}$ and $\mathbf{B}$, show that the sum of transposes is equal to the transpose of a sum: $\mathbf{A}^\top + \mathbf{B}^\top = (\mathbf{A} + \mathbf{B})^\top$.
1. Given any square matrix $\mathbf{A}$, is $\mathbf{A} + \mathbf{A}^\top$ always symmetric? Why?
1. We defined the tensor `X` of shape (2, 3, 4) in this section. What is the output of `len(X)`?
1. For a tensor `X` of arbitrary shape, does `len(X)` always correspond to the length of a certain axis of `X`? What is that axis?
1. Run `A / A.sum(axis=1)` and see what happens. Can you analyze the reason?
1. When traveling between two points in Manhattan, what is the distance that you need to cover in terms of the coordinates, i.e., in terms of avenues and streets? Can you travel diagonally?
1. Consider a tensor with shape (2, 3, 4). What are the shapes of the summation outputs along axis 0, 1, and 2?
1. Feed a tensor with 3 or more axes to the `linalg.norm` function and observe its output. What does this function compute for tensors of arbitrary shape?


[Discussions](https://discuss.d2l.ai/t/31)
