# Linear Algebra 

# Resources: 

1. Ian Goodfellow presented a amazing summary of linear algebra for machine learning in a chapter of his book. [pdf](http://www.deeplearningbook.org/contents/linear_algebra.html)

2. Elementary Linear Algebra_Ron_Larson (https://bonniekhanhtran.files.wordpress.com/2016/05/math-g235.pdf)

3. MIT Algebra Spring 2015, https://www.youtube.com/playlist?list=PLE7DDD91010BC51F8

# Matrix Transpose 

<img src="./images/linear_fig1.JPG" width="500">

# Matrix Inverse 

**Matrix $A$ is invertible (nonsingular) if A is square matrix and consists of linearly independent collumns. (A matrix is called singular if it consists of linearly dependent collumns).**

 **Proof 1:** We can think that when $A (mxn)$ is invertible then the linear equation $Ax =b$ does have only solution $x =A^{-1}b$. To achieve this, we can think of the span of column vectors of A in collumn space. Since $Ax = a_{:,1}x_1 + a_{:,2}x_2 + a_{:,n} x_n $, where $a_{:,i}$ is $i^{th}$ column of matrix $A$, $x_i$ is the $i^{th}$ element of vector $x$, $Ax$ is the linear combination of all collums of $A$ and $x$ is coefficents. There is two requirements:
 
  a) Columns of A must span the space $\mathbb{R}^{m}$ meaning that every points $b \in \mathbb{R}^{m}$, can be described as a linear combination with some coefficients $x$ of columns of $A$. Thus, all collums of $A$ must be linearly independent and the number of columns $n$ of $A$ must be equal or more than number of rows $m$ ( $n \geq m$ ).
  
  b) Since there is only one solution $x$, $A$ must be square matrix ($m = n$), else more than one solution exists because there exists more than one set of linearly independent in $A$. This could be also be explain that if A is not square matrix $AB \neq BA$, this contradicts the definition of the inverse of matrix$.
  
 **How to find inverse of a matrix?** 
  <img src="./images/linear_inverse_1.JPG" width="500">

  
 **Theorems**
 1. If A and B are invertible, then $(AB)^{-1} = B^{-1}A^{-1}$
 1. $(cA)^{-1} = 1/cA^{-1}$
 1. $(A^T)^{-1} = (A^{-1})^T$
 1. $(A^k)^{-1} = (A^{-1})^k$

# MIT Algebra Spring 2015
<h2> Lecture 1: The Geometry of Linear Equations. </h2>

When solving linear equation $Ax = b$, we can think of $AX$, where is a matrix $m \times n$, and $X$is a vector $n x 1$ as a row picture or collumn picture. 

For example, Ax = $\begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix}$ $\begin{bmatrix} x \\ y \end{bmatrix}$ =  $\begin{bmatrix} 0 \\ 3 \end{bmatrix}$

In **row picture** $ X = \begin{bmatrix} x \\ y \end{bmatrix}$ is the intersection (if exists) of two lines $2x -y = 0$ and $-x + 2y =3$ in $ \mathbb{R}^{2}$.

In **collum picture** $b$ is  a linear combination of columns of A. We can write:
$ Ax = x\begin{bmatrix} 2 \\ -1 \end{bmatrix} + y \begin{bmatrix} -1 \\ 2 \end{bmatrix} =\begin{bmatrix} 0 \\ 3 \end{bmatrix}$, 
where each column of A if a vector in $\mathbb{R}^{2}$. Based on this intuition, if columns of A are linearly independent, then all combinations of them covers entire $ \mathbb{R}^{2}$ space. It means that there always exists a soluton $X$ so that the linear combination $AX =b$

<h2> Lecture 2: Elimination of Matrices. </h2>

Gauss or Gauss-Jordan algorithm can be used to solve $AX =b $ using eliminations. By applying three followings elementary operations, one can transform any matrix to row-echelon form (Gauss) and reduced row-echelon form (Gauss-Jordan):
- Interchange two rows 
- Multiply a row by a nonzero constant
- Add a multiple of a row to another row

Remember, these operations must be done for both sides of an equation.
  <img src="./images/linear_row_echelon.JPG" width="500">
 
For example:

$A = \begin{bmatrix} 1 & 2 & 1 \\ 3 & 8 & 1 \\ 0 & 4 & 1  \end{bmatrix} \rightarrow \begin{bmatrix} 1 & 2 & 1 \\ 0 & 2 & -2\\ 0 & 4 & 1   \end{bmatrix} \rightarrow \begin{bmatrix} 1 & 2 & 1 \\ 0 & 2 & -2\\ 0 & 0 & 5  \end{bmatrix}$ (Row-echelon form).
Now, using back-subtitution, we can solve for $X$

Interestingly, all above elemetary row operations can be done using matrix multiplication between A and elemetary matrices. Simmilar as the way we think $AX$ is a linear combination of columns, we can think $XA$ is the linear combination of rows, where X is a row vector $1xm$, and the results of $XA$ is another row vector. For example: 

$AX = \begin{bmatrix} 1 & 2 & 0 \end{bmatrix}\begin{bmatrix} 1 & 1 & 2 \\ -1 & 0 & 1 \\ 3 & 1 & 0 \end{bmatrix} = 1* \begin{bmatrix} 1 & 1 & 2 \end{bmatrix} + 2*\begin{bmatrix} -1 & 0 & 1 \end{bmatrix} + 0*\begin{bmatrix}  3 & 1 & 0 \end{bmatrix}  = \begin{bmatrix} -1 & 1 & 4 \end{bmatrix} $

By using the above idea, to convert $A = \begin{bmatrix} 1 & 2 & 1 \\ 3 & 8 & 1 \\ 0 & 4 & 1  \end{bmatrix} \rightarrow \begin{bmatrix} 1 & 2 & 1 \\ 0 & 2 & -2\\ 0 & 4 & 1   \end{bmatrix} $, we basically keep row 1 and row 3 while subtract row 2 from 3 times of row 1. Thus, the elementary matrix $E_{21}$ to do this is: 
$ E_{21} = \begin{bmatrix} 1 & 0 & 0 \\ -3 & 1 & 0 \\ 0 & 0 & 1  \end{bmatrix}$.

Simmilary to transform $\begin{bmatrix} 1 & 2 & 1 \\ 0 & 2 & -2\\ 0 & 4 & 1   \end{bmatrix} \rightarrow \begin{bmatrix} 1 & 2 & 1 \\ 0 & 2 & -2\\ 0 & 0 & 5  \end{bmatrix}$,
the elementary matrix $E_{32} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -2 & 1  \end{bmatrix}$

Above elementary matrices can be combined into a unique one: $ E = E_{32}E_{21} $

In some other cases, sometimes we want to switch two rows. Let's say we want to switch row 1 and row 2 of A. The elementary matrix (permutation matrix) to achieve this is: 
$P_{32} = \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1  \end{bmatrix}$

<h2> Lecture 3: Inverse of Matrices </h2>

To find the inverse of matrix A, we can convert the joint matrix $[A, I]$ into $[I, R]$ using Gauss-Jordan method, then  $R =A^{-1}$. This is because when we transform $A$ into indentity matrix $I$, we use elementary matrix $E$: $EA = I$, thus E is $A^{-1}$. Additionally, we also multitply $E$ with $I$ to get $R=EI=E$, thus $R$ is exactly $E$ matrix.

 There are  cases that we can not find the invese A. In other words, A is not invertible:
 - A is not square matrix 
 - columns of A is not independent.
 
It is because for both cases $EA \neq I$, thus $E$ is not $A^{-1}$

<h2> Lecture 4: Factorization into A = LU </h2> 
We know that by multiply elemenatary matrix with A, we have $E_{32}E_{21}A = U$ 
U is echelon form. $A = E_{21}^{-1}E_{32}^{-1}U$
Assume there's no row exchanges when peforming elemination, the matrix $L = E_{21}^{-1}E_{32}^{-1}$ is lower triangluar matrix.$ THis is because all elemetary matrices not for row exchanges are lower triangluar matrix, thus their inveserse and their multiplcations also have lower triangular form. 

For example, $ A = \begin{bmatrix} 2 & 1\\ 8 & 7 \end{bmatrix} = \begin{bmatrix} 1 & 0\\ 4 & 1 \end{bmatrix} \begin{bmatrix} 2& 1\\ 0 & 3 \end{bmatrix} $

Costs of elimination is $ 1/3n^3 $

**Permutation Matrices**: used to perform row exchanges. In $3x3$ matrix, there's 6 matrices. $4x4$ : 12 matrices. 
An interesting thing about permutation matrix is $P^T = P^{-1}$

To deal with row changes in reality, matlab perform $PA =LU$ given A is invertible. it is not clear explained how P is calculated, but matlab picks pivots with big values for numerical accuracies.

<h2> Lecture 5: Vector Spaces: </h2> 

**Vector space**: consists of all vectors and their linear combinations. For example, $R^2$ consists of all vectors with 2 dimensional values.\\ 
**Subspace**: subspaces of $R^2$