In [1]:
import numpy as np # more basic functionality
import scipy # advanced functionality, built on numpy
import matplotlib as mpl
import matplotlib.pyplot as plt

# Summary

- **Eigenvalues and Eigenvectors**
    - Eigenvectors are the vectors that does not change its orientation when multiplied by the transition matrix, but it just scales by a factor of corresponding eigenvalues.
- **Diagonalization & Eigendecomposition**
    - A few applications of eigenvalues and eigenvectors that are very useful when handing the data in a matrix form because you could decompose them into matrices that are easy to manipulate.
- **Underlying assumption behind the diagonalization and eigendecomposition**
    - Make sure that the matrix you are trying to decompose is a square matrix and has linearly independent eigenvectors (different eigenvalues).

## Eigenvalue Problem and the Characteristic Polynomial

A non-zero vector $v$ of dim. $\mathbb{R}^n$ is an **eigenvector of square matrix $A : A\in \mathbb{R}^{n\times n}$** 
- if it satisfies a linear equation of the form: $Av = \lambda v$; for some scalar $\lambda$ <mark>which we are solving for</mark>
    - This is called the eigenvalue equation/problem
    - **Geometric intuition**: The eigenvector(s) of $A$ are the vector(s) ($v$) which $A$ **only elongates/shrinks**, and never takes off it's span(s). 
        - The amount of this elongation/shrink is $\lambda$, a scalar value
- Rearranging the eigenvalue problem:

$$ \begin{align*}
Av&=\lambda v\\ 
Av&=(\lambda I) v = \begin{bsmallmatrix} 
                    \lambda & 0 & \dots & 0 \\ 
                    0 & \lambda & \dots & 0 \\ 
                    \vdots & \vdots & \ddots & \vdots \\ 
                    0 & 0 & \dots & \lambda \\ 
                    \end{bsmallmatrix} \begin{bsmallmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bsmallmatrix}\\
(A-\lambda I) v &= 0\\
\end{align*} $$

$\text{Since }v \ne 0\text{, solve for:}$

$$ \begin{align*}
p(\lambda) = \text{det}(A-\lambda I)&=0\\
\end{align*} $$



- <mark>The **only way for $(A-\lambda I) v = 0$ to be possible** (given non-zero $v$) is for $\text{det}(A-\lambda I)=0$</mark>
    - i.e. The matrix $(A-\lambda I$) represents a *linear transform. of the vector space which "reduces" its dimensionality* (at least 1 dim is lost)
    - A matrix **cannot squish non-zero vectors into the $\vec{0}$ vector, except when their determinant is 0**
- By computing the determinant, we get the eigenvalues $\lambda_1, \lambda_2(, ..., \lambda_n)$ (1 for each dimension of the square matrix).
    - Computing $\text{det}(A-\lambda I)$ requires solving a **characteristic polynomial** whose roots are the $\lambda$(s)
 
#### Why the $\text{det}(A-\lambda I) = 0$ observation matters
- The observation that $\text{det}(A-\lambda I)\equiv 0$ is only useful because solving it yields the eigenvalues $\lambda$s.
    - Those helps us solve for the eigenvectors $v$s (i.e. those vectors that this diagonally altered matrix $(A-\lambda I)$ "shrinks" to 0)

In [2]:
# A = np.random.randn(4,4)
# A = np.array([[3,1],
#              [0,2]])

import scipy
A = np.array([[4,2,2],
             [2,4,2],
             [2,2,4]])
# A = np.diag((1,2,3))
detA = np.linalg.det(A)

print('A:\n', A)
print('determinant(A):', detA)

eig_vals,eig_vecs = np.linalg.eig(A)
# eig_vals,eig_vecs = np.linalg.eigh(A)
# eig_vals,eig_vecs = scipy.linalg.eig(A)

print('\nEigenvalues - shape:', eig_vals.shape, 'values:', np.round(eig_vals,0))
print('\nEigenvectors - shape:', eig_vecs.shape, 'values:\n', np.round(eig_vecs.T,2)) # not sure why this is so different to Wolfram and others 

# Link 1: https://www.wolframalpha.com/input?i=eigenvectors+of+%7B%7B4%2C2%2C2%7D%2C%7B2%2C4%2C2%7D%2C%7B2%2C2%2C4%7D%7D
# Link 2: https://matrixcalc.org/vectors.html#eigenvectors%28%7B%7B4,2,2%7D,%7B2,4,2%7D,%7B2,2,4%7D%7D%29

# But let's test it against the eigenvalue problem: Av = λv (
print('Check against the Eigenvalue Problem Av = λv')
print('\nAv:\n', A@eig_vecs)
print('\nλv:\n', eig_vals*eig_vecs)

A:
 [[4 2 2]
 [2 4 2]
 [2 2 4]]
determinant(A): 32.0

Eigenvalues - shape: (3,) values: [2. 8. 2.]

Eigenvectors - shape: (3, 3) values:
 [[-0.82  0.41  0.41]
 [ 0.58  0.58  0.58]
 [ 0.51 -0.81  0.3 ]]
Check against the Eigenvalue Problem Av = λv

Av:
 [[-1.63299316  4.61880215  1.01339709]
 [ 0.81649658  4.61880215 -1.61564839]
 [ 0.81649658  4.61880215  0.6022513 ]]

λv:
 [[-1.63299316  4.61880215  1.01339709]
 [ 0.81649658  4.61880215 -1.61564839]
 [ 0.81649658  4.61880215  0.6022513 ]]


## Eigenbasis, Diagonalisation, and Eigen-decomposition

## Eigenbasis
- If our <mark>**basis vectors** ($\hat{i} = \begin{bsmallmatrix} 1 \\ 0 \end{bsmallmatrix}\\, \hat{j} = \begin{bsmallmatrix} 0 \\ 1 \end{bsmallmatrix}\\, \dots$) are themselves **eigenvectors**</mark>. It is called an eigenbasis then if we inspect $A$, the transformation matrix, it will have the form known as a <mark>**Diagonal Matrix**</mark>:

$$A_{\text{diag}} = \begin{bsmallmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bsmallmatrix}$$

- Its form is <mark>(diagonal) because recall $A$ **only scales** (stretch/shrink) its eigenvectors</mark>, which in this case are the basis vectors
    - It is very easy to compute large powers of diagonal matrices (they simply scale vectors by the eigenvalues)

## <mark>Diagonalisation</mark>: Using the Eigenbasis to Diagonalise any non-diagonal $A$
1. Find the eigenvectors of $A$
2. Make a **change of basis** matrix $S$ whose columns are the eigenvectors of $A$. We'll use this to switch the coordinate system of $A$
3. Diagonalise $A$ to get $\Lambda$ by doing this:
$$\Lambda = S^{-1}AS$$
4. The new matrix $\Lambda$ is guaranteed to be diagonal, with its **eigenvalues on the main diagonal**

#### Derivation: Why is diagonalisation $\Lambda = S^{-1}AS$ possible in the first place?

Show that $AS=S\Lambda$
- Suppose we have $m$ linearly independent eigenvectors of $A$;
    - then $S$ is a matrix, where each column is an eigenvector of $A$, $v_{1\dots m}$
    - $AS = A \begin{bsmallmatrix} v_1 & v_2 & \dots & v_m \end{bsmallmatrix} = \begin{bsmallmatrix} A v_1 & A v_2 & \dots & A v_m \end{bsmallmatrix} = \begin{bsmallmatrix} \lambda_1 v_1 & \lambda_2 v_2 & \dots & \lambda_m v_m \end{bsmallmatrix} $ (final step because recall $Av=\lambda $)
- And so $\begin{bsmallmatrix} \lambda_1 v_1 & \lambda_2 v_2 & \dots & \lambda_m v_m \end{bsmallmatrix}$ = $\begin{bsmallmatrix} \ v_1 & \ v_2 & \dots & \ v_m \end{bsmallmatrix} \begin{bsmallmatrix} 
\ \lambda_1 & 0 & \dots & 0 \\
\ 0 & \lambda_2 & \dots & 0 \\
\ \vdots & \vdots & \ddots & \vdots \\
\ 0 & 0 & \dots & \lambda_m \\
\end{bsmallmatrix} $ which is $S\Lambda$

#### Assumptions:

The matrix $A$ you are trying to decompose must:
- be a square matrix
- have linearly independent eigenvectors (different eigenvalues, 1 for each row of the matrix)

In [3]:
print('Recall A:\n',A)
print('\nAnd as calculated earlier for A:')
print('\nEigenvalues:', np.round(eig_vals,0))
print('\nEigenvectors:\n', np.round(eig_vecs.T,2))
print('\n------\n')

# Define a change of basis matrix S, and then diagonalise A using Lambda = S^(-1) A S
S = eig_vecs 
Lambda = np.linalg.inv(S)@A@S

print('Lambda = S^-1 @ A @ S:\n', np.round(Lambda,2))

Recall A:
 [[4 2 2]
 [2 4 2]
 [2 2 4]]

And as calculated earlier for A:

Eigenvalues: [2. 8. 2.]

Eigenvectors:
 [[-0.82  0.41  0.41]
 [ 0.58  0.58  0.58]
 [ 0.51 -0.81  0.3 ]]

------

Lambda = S^-1 @ A @ S:
 [[ 2.  0. -0.]
 [ 0.  8. -0.]
 [-0. -0.  2.]]


## Now that we know $AS=S\Lambda$, we can do:

### Diagonalisation:

- <mark>Takes $A$ matrix to produce $\Lambda$, a diagonal matrix with **eigenvalues on the main diagonal**</mark>
- Multiply both sides by $S^{-1}$ **from the left**. 

$$
\begin{align*}
S^{-1}\times AS &= S^{-1}\times S\Lambda \\
S^{-1}AS &= S^{-1}S\Lambda \\
S^{-1}AS &= \Lambda \\ \\
\Lambda &= S^{-1}AS
\end{align*}
$$

### Eigendecomposition

- After diagonalising $A$ to get $\Lambda$, <mark>we can use eigendecomposition to do quick matrix multiplications of A</mark>
- Multiply both sides by $S^{-1}$ **from the right**

$$
\begin{align*}
AS \times S^{-1} &= S\Lambda \times S^{-1} \\
ASS^{-1} &= S\Lambda S^{-1} \\
A &= S\Lambda S^{-1}
\end{align*}
$$

$\text{Now note, if we raise $A$ to arbitrary powers}$

$$
\begin{align*}
A^2 &= (S\Lambda S^{-1})(S\Lambda S^{-1}) \\
&= S\Lambda (S^{-1}S)\Lambda S^{-1} = S\Lambda^2 S^{-1}\\\\
\text{In general } A^k &= S\Lambda^k S^{-1}
\end{align*}
$$

In [4]:
# Perform eigendecomposition to recover A from its eigenvectors and eigenvalues
A_eig_decomp = S@Lambda@np.linalg.inv(S)
print('A_eigendecomposed = S @ Lambda @ S^-1:\n', A_eig_decomp)

A_eigendecomposed = S @ Lambda @ S^-1:
 [[4. 2. 2.]
 [2. 4. 2.]
 [2. 2. 4.]]


# Questions

- When exactly do we use decompositions?
- What is the intuition behind an eigenvalue and eigenvector?
- Some interesting edge cases (what are the eigenvalues and eigenvectors for):
    - A rotation-only matrix like $\begin{bsmallmatrix} 0 & -1 \\ 1 & 0 \end{bsmallmatrix}$ has only imaginary $\lambda=i$. No eigenvectors, as each vector is rotated
    - A shear matrix like $\begin{bsmallmatrix} 1 & 1 \\ 0 & 1 \end{bsmallmatrix}$: only 1 eigenvalue ($\lambda = 1$), so only 1 eigenvector (all vectors on the x-axis are eigenvectors)
    - A scaling-only matrix like $\begin{bsmallmatrix} 2 & 0 \\ 0 & 2 \end{bsmallmatrix}$: only 1 eigenvalue ($\lambda = 2$), but **all** vectors in the plane are eigenvectors, and scaled by 2x