# Lecture 3: Eigenvalues and eigenvectors

## Recap of the previous lecture
- Vector norms
- Matrix norms
- Scalar product
- Unitary matrices
- LU decomposition

## Today 

- Eigenvalues and eigenvectors 
- Similar matrices
- Eigendecomposition
- Matrix Powers

## Eigenvalues: The Key Idea

If we can find a solution $x \ne 0$ to

$$
Ax = \lambda x
$$

then, for this vector, the matrix $A$ **acts like a scalar**.  $x$ is called an **eigenvector** of $A$, and $\lambda$ is called an **eigenvalue**.

In fact, for an $m \times m$ matrix $A$, we typically find $m$ linearly independendent eigenvectors $x_1,x_2,\ldots,x_m$ and $m$ corresponding eigenvalues $\lambda_1, \lambda_2, \ldots, \lambda_m$.   Such a matrix is called **diagonalizable**.  Most matrices are diagonalizable; we will deal with the rare "defective" (non-diagonalizable) case later.

Given such a **basis of eigenvectors**, the key idea for using them is:

1. Take any vector $x$ and expand it in this basis: $x = c_1 x_1 + \cdots c_m x_n$, or $x = Xc$ or $c = X^{-1}x$ where $X$ is the matrix whose *columns are the eigenvectors*.

2. For each eigenvector $x_k$, the matrix $A$ acts like a scalar $\lambda_k$.  Multiplication or division corresponds to multiplying/dividing $x_k$ by $\lambda_k$.  **Solve your problem for each eigenvector by treating A as the scalar λ**.

## The characteristic polynomial

To *find* the eigenvalues, one approach is to realize that $Ax = \lambda x$ means:

$$
(A - \lambda I) x = 0 \, ,
$$

therefore matrix $A - \lambda I$ has **non-trivial kernel** and should be **singular**. 

That means, that the **determinant**  

$$ p(\lambda) = \det(A - \lambda I) = 0. $$

where $p(\lambda)$ is the **characteristic polynomial of A: a polynomial of degree m** if $A$ is $m \times m$.  

The **roots of this polynomial are the eigenvalues λ**.

A polynomial of degree $m$ has at most $m$ roots (possibly complex), and typically has $m$ distinct roots.  **This is why most matrices have $m$ distinct eigenvalues/eigenvectors**, and are therefore **diagonalizable**.

## Recall the determinant
The determinant of a square matrix $A$ is defined as 

$$\det A = \sum_{\sigma \in S_n} \mathrm{sgn}({\sigma})\prod^n_{i=1} a_{i, \sigma_i},$$

where 
- $S_n$ is the set of all **permutations** of the numbers $1, \ldots, n$
- $\mathrm{sgn}$ is the **signature** of the permutation ( $(-1)^p$, where $p$ is the number of transpositions to be made, the sign is negative iff there are an odd number of misordered pairs of indices).

<img src="determinant.png" style="width: 700px;">  

## Properties of determinant:

- $\det(AB) = \det(A) \det(B)$
- $\det A = \det \Lambda = \lambda_1 \lambda_2 \cdots \lambda_m$.  That is, the **determinant is the product of the eigenvalues**.

## Similar matrices

B is **similar** to A if and only if $B = S^{-1} A S$ for some invertible matrix S.

It also follows that $A = SBS^{-1} = (S^{-1})^{-1} B (S^{-1})$, i.e. if B is similar to A using S, then A is similar to B using $S^{-1}$.

Characteristic polynomial:

$$
\det(A - \lambda I) = \det(SBS^{-1} - \lambda I) = \det \left[ S (B - \lambda I) S^{-1}   \right]
= \det(S) \det(B - \lambda I) \det(S^{-1}) = \det(B - \lambda I)
$$

i.e. **similar matrices have the same characteristic polynomial**.

$\det A = \det B$, i.e. **similar matrices have the same determinant**.

## Trace = Sum of λ’s
The trace is defined as the **sum of the diagonal elements** of any matrix.  By plugging in the definition of matrix multiplication, one can quickly show that the trace has a crucial property:

$$
\operatorname{trace} (AB) = \operatorname{trace}(BA)
$$


It follows that **similar matrices have the same trace**, since if $A=SBS^{-1}$ then 

$$
\operatorname{trace} (A) = \operatorname{trace}(SBS^{-1}) = \operatorname{trace}(S^{-1}SB) = \operatorname{trace}(B)
$$

In particular, since A and Λ are similar, this means that the **trace of a matrix is the sum of the eigenvalues**!

## Eigenvalue example

For example, consider the matrix

$$
A = \begin{pmatrix} 1 & 1 \\ -2 & 4 \end{pmatrix}
$$

The characteristic polynomial is

$$
\det(A - \lambda I) = \det \begin{pmatrix} 1 - \lambda & 1 \\ -2 & 4 - \lambda \end{pmatrix} = (1 - \lambda)(4 - \lambda) - (1)(-2) = \lambda^2 - 5\lambda + 6 = (\lambda - 2) (\lambda - 3)
$$

where we have used high-school algebra to factor the polynomial.   Hence its roots are $\lambda = \{2, 3\}$, as computed above.

## Eigenvectors

Once we have the eigenvalues, finding the eigenvectors is (in principle) easy: **the eigenvectors are just a basis for the nullspace** of

$$
A - \lambda I
$$

when $\lambda$ is an eigenvalue.

For example, with the matrix above, let's take the eigenvalue $\lambda_1 = 2$:

$$
A - 2I = \begin{pmatrix} -1 & 1 \\ -2 & 2 \end{pmatrix}
$$

We could go through Gaussian elimination to find the nullspace, but we can see by inspection that the second column is minus the first, hence $x_1 = (1, 1)$ is a basis for the nullspace:

$$
(A - 2I) x_1 = \begin{pmatrix} -1 & 1 \\ -2 & 2 \end{pmatrix} \begin{pmatrix} 1 \\ 1 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \end{pmatrix}
$$

or

$$
A x_1 = 2 x_1
$$

as desired.  $x_1 = (1, 1)$ is an eigenvector!

For the other eigenvalue, $\lambda = 3$, we get:

$$
A - 3I = \begin{pmatrix} -2 & 1 \\ -2 & 1 \end{pmatrix}
$$

from which it is obvious that a basis for the nullspace is $x_2 = (1, 2)$. 

## Complex eigenvalues

If we change the matrix to:
$$
\begin{pmatrix} 1 & 3 \\ -2 & 4 \end{pmatrix}
$$
we get a characteristic polynomial:
$$
\det \begin{pmatrix} 1 - \lambda & 3 \\ -2 & 4 - \lambda \end{pmatrix} = (1 - \lambda)(4 - \lambda) - (3)(-2) = \lambda^2 - 5\lambda + 10
$$
whose roots, from the quadratic formula, are:
$$
\lambda = \frac{5 \pm \sqrt{5^2 - 40}}{2} = \frac{5 \pm \sqrt{-15}}{2}
$$
which are complex!

**Eigenvalues may be complex numbers, even for real matrices**. 

For real matrices, they are the [roots of a real polynomial](https://en.wikipedia.org/wiki/Complex_conjugate_root_theorem) and hence come in [complex conjugate pairs](https://en.wikipedia.org/wiki/Complex_conjugate).

## Computing eigenvalues = polynomial roots = hard

* Everyone learns the [quadratic formula](https://en.wikipedia.org/wiki/Quadratic_formula) to find roots of a quadratic (degree-2) polynomial.

* There is a (horrible) [cubic formula](https://en.wikipedia.org/wiki/Cubic_function) to find the roots of any cubic (degree-3) polynomial.

* There is a (terrifying) [quartic formula](https://en.wikipedia.org/wiki/Quartic_function) to find the roots of any quartic (degree-4) polynomial.

* There is **no formula** (in terms of a *finite number* of ±,×,÷,ⁿ√) for the roots of an **arbitrary quintic** polynomial or **any degree ≥ 5**.  This is the [Abel–Ruffini theorem](https://en.wikipedia.org/wiki/Abel%E2%80%93Ruffini_theorem), proved in the 19th century.

In practice, computing eigenvalues by hand, especially by this method, is even more pointless than doing Gaussian elimination by hand, for reasons explained below, so I will **focus more on the properties of eigenvalues and how to use them than how to compute them.**  The computer will give us their values.

## Transpose: Same eigenvalues!

One of the properties of determinant is that $\det A^T = \det A$.   It follows that
$$\det(A-\lambda I) = \det\left[ (A -\lambda I)^T \right] = \det (A^T - \lambda I)$$
and therefore $A$ and $A^T$ have the **same eigenvalues!**  (They have the **same characteristic polynomial**.)

However, $A$ and $A^T$ in general have **different eigenvectors**. 

$N(A - \lambda I) \ne N(A^T - \lambda I)$ in general. 

## Eigendecomposition
If matrix $A$ of size $n\times n$ has $n$ eigenvectors $s_i$, $i=1,\dots,n$:

$$ As_i = \lambda_i s_i, $$

then this can be written as

$$ A S = S \Lambda, \quad\text{where}\quad S=(s_1,\dots,s_n), \quad \Lambda = \text{diag}(\lambda_1, \dots, \lambda_n), $$

or equivalently

$$ A = S\Lambda S^{-1}. $$

- This is called **eigendecomposition** of a matrix. Matrices that can be represented by their eigendecomposition are called **diagonalizable**.

## Diagonalization and Matrix Powers

We've already seen that if $Ax = \lambda x$ then $A^n x = \lambda^n x$ (for both positive and negative $n$): **if x is an eigenvector of A, then it is also an eigenvector of Aⁿ with the eigenvalue raised to the n-th power**.

There is another cute way to see this for diagonalizable matrices.  If $A = X\Lambda X^{-1}$, then for $n \ge 0$

$$
A^n = \underbrace{AAA\cdots A}_{n\mbox{ times}} 
    = \underbrace{X\Lambda X^{-1}X\Lambda X^{-1}X\Lambda X^{-1}\cdots X\Lambda X^{-1}}_{n\mbox{ times}} = X\Lambda^n X^{-1}
$$
because all of the $X$ terms *cancel* except the first and last ones.  $\Lambda^n$ is just the diagonal matrix with $\lambda_1^n, \lambda_2^n, \ldots$ on the diagonal.

So, since we have the diagonalization of $A^n$, we immediately see that its eigenvectors are $X$ (same as for $A$) and its eigenvalues are $\lambda^n$.

Since $A^{-1}x = \lambda^{-1} x$ for an eigenvector $x$, it immediately follows that $A^{-1} = X \Lambda^{-1} X^{-1}$ where $\Lambda^{-1}$ is the diagonal matrix with $\lambda_k^{-1}$ on the diagonal.  Similarly for $A^{-n} = X \Lambda^{-n} X^{-1}$.

## Matrix square roots

One really cool thing that diagonalization allows us to do easily is to compute $A^n$ for **non-integer powers n**.   For example, we can now see how to find the [square root of a matrix](https://en.wikipedia.org/wiki/Square_root_of_a_matrix), at least for diagonalizable matrices.

If $A$ is a square matrix, its square root $\sqrt{A} = A^{1/2}$ is just a matrix so that $A^{1/2} A^{1/2} = A$.  But how would we find such a matrix?

As usual, for eigenvalues it is easy: if $Ax=\lambda x$, then we obviously want $A^{1/2} x = \lambda^{1/2} x$.  If $A$ is diagonalizable and we do this for *every* eigenvalue/vector, we get the diagonalization:

$$
\sqrt{A} = A^{1/2} = X \underbrace{\begin{pmatrix} \sqrt{\lambda_1} & & & \\ & \sqrt{\lambda_2} & & \\ & & \ddots & \\ & & & \sqrt{\lambda_m} \end{pmatrix}}_\sqrt{\Lambda} X^{-1}
$$
(Obviously, this may give a complex matrix if any eigenvalues are negative or complex.)

Does this have the property that $A^{1/2} A^{1/2} = A$? Yes!  $X \sqrt{\Lambda} X^{-1} X \sqrt{\Lambda} X^{-1} = X \sqrt{\Lambda} \sqrt{\Lambda} X^{-1} = A$, since obviously $\sqrt{\Lambda} \sqrt{\Lambda} = \Lambda$ from the definition above.

Let's try it:

## Next lecture
- Derivatives
- Partial derivatives
- Free maxima and minima 

## Questions?