reference sheet : < The Matrix Cookbook (Petersen and
Pedersen 2006 >

## 2.1Scalars, Vectors, Matrices and Tensors

- Scalars: A Scalars is just a single number, we usually use lower-case variable names to represent scalars.
- Vectors: A vector is an array of numbers. Typically we give vectors lower case names written in bold typeface.
    - $\boldsymbol{x}_{-S}$ is the vector containing all elements of $\boldsymbol{x}$ except for $x_1$
- Matrices: A matrix is a 2-D array of numbers. e.g.  $\boldsymbol{A}$
- Tensors: array with more than two axes.

In the context of deep learning, we also use less conventional notation.
We allow the addition of matrix and a vector, yielding another matrix: 
$\boldsymbol{C}=\boldsymbol{A}+\boldsymbol{b}$, the vector $\boldsymbol{b}$ is added to each row of the matrix.

This implicit copying of $\boldsymbol{b}$ to many locations is called broadcasting.

## 2.2 Multiplying Matrices and Vectors

Multiplication of two matrices
$$C_{i, j}=\sum_{k} A_{i, k} B_{k, j}$$

Element-wise product
![Element-wise product](./figs/element-wise.jpg)

## 2.3 Identity and Inverse Matrices

 We denote the identity matrix that preserves n-dimensioonal vectors as $\boldsymbol{I}_{n}$, and 
 $$
\forall \boldsymbol{x} \in \mathbb{R}^{n}, \boldsymbol{I}_{n} \boldsymbol{x}=\boldsymbol{x}
$$


The matrix inverse of $\boldsymbol{A}$ is denoted as $\boldsymbol{A}^{-1}$, and it is defined as the matrix such that 

$$
\boldsymbol{A}^{-1} \boldsymbol{A}=\boldsymbol{I}_{n}
$$

## 2.4 线性相关和生成子空间
$$
\begin{array}{c}
{A x=b} \\ 
{A^{-1} A x=A^{-1} b} \\ 
{I_{n} x=A^{-1} b} \\ 
{x=A^{-1} b}
\end{array}
$$
如果逆矩阵$\boldsymbol{A}^{-1}$存在, 那么对于每一个向量 $b$ 恰好存在一个解。

$$
\left[\begin{array}{ccc}{a_{11}} & {a_{12}} & {a_{13}} \\ {a_{21}} & {a_{22}} & {a_{23}} \\ {a_{31}} & {a_{32}} & {a_{33}}\end{array}\right]\left[\begin{array}{c}{x_{1}} \\ {x_{2}} \\ {x_{3}}\end{array}\right]=\left[\begin{array}{c}{b_{1}} \\ {b_{2}} \\ {b_{3}}\end{array}\right] \Leftrightarrow\left\{\begin{array}{l}{a_{11} x_{1}+a_{12} x_{2}+a_{13} x_{3}=b_{1}} \\ {a_{21} x_{1}+a_{22} x_{2}+a_{23} x_{3}=b_{2}} \\ {a_{31} x_{1}+a_{32} x_{2}+a_{33} x_{3}=b_{3}}\end{array}\right.
$$

确定$Ax=b$是否有解相当于确定向量$b$是否在$A$列向量的生成子空间中。

要想使矩阵可逆，意味着该矩阵必须是一个方阵，并且所有列向量都是线性无关的。

一个列向量相关的方阵称为奇异的(singular).

从上边的公式可以看出，b可以看作是A列向量的线性组合。

## 2.5 Norms
The $L^p$ norm is given by 
$$
\|\boldsymbol{x}\|_{p}=\left(\sum_{i}\left|x_{i}\right|^{p}\right)^{\frac{1}{p}}
$$
for $p \in \mathbb{R}, p \geq 1$

note $x$ is a vector.

On an intuitive level, the norm of a vector $x$ measures the distance from the origin to the point $x$.

The $L^2$ norm, with $p=2$, is known as the Euclidean norm. it is often denoted simply as $\|\boldsymbol{x}\|$ with the subscript 2 omitted.

## 2.6 特殊类型的矩阵和向量

- 对角矩阵（diagonal matrix）只在主对角线上含有非零元素，其他位置都是零。
- 单位矩阵，对角元素全都是1.
- 对称矩阵是转置和自己相等的矩阵:
$$
A=A^{\top}
$$

## 2.7 特征分解

特征分解（eigendecomposition）是使用最广的矩阵分解之一，即我们将矩阵分
解成一组特征向量和特征值。

方阵A的特征向量是指与$A$相乘后相当于对该向量进行缩放的非零向量
$$\boldsymbol{A} \boldsymbol{v}=\lambda \boldsymbol{v}$$
这个标量$\lambda$被称为这个特征向量对应的特征值。

$$
\begin{aligned} A \overrightarrow{\mathbf{v}} &=\lambda \overrightarrow{\mathbf{v}} \\ A \overrightarrow{\mathbf{v}}-\lambda I \overrightarrow{\mathbf{v}} &=0 \\(A-\lambda I) \overrightarrow{\mathbf{v}} &=0 \\ \operatorname{det}(A-\lambda I) &=0 \end{aligned}
$$

关于特征值, 特征向量可以参考:
- <https://www.youtube.com/watch?v=8UX82qVJzYI>

特征分解可以可以记作
$$
\boldsymbol{A}=\boldsymbol{V} \operatorname{diag}(\boldsymbol{\lambda}) \boldsymbol{V}^{-1}
$$

## 2.8 奇异值分解

奇异值分解（singular value decomposition, SVD），将矩阵分解为奇异向量和奇异值。
 
每个实数矩阵都有一个奇异值分解, 但不一定都有特征分解。例如, 非方阵的矩阵没有特征分解, 这时只能使用奇异值分解。

$$
\boldsymbol{A}=\boldsymbol{U} \boldsymbol{D} \boldsymbol{V}^{\top}
$$

## PCA