# Lecture 2a: Eigendecomposition of Data and Systems
In this lecture, we will discuss the eigendecomposition of a square matrix and how it can be used to understand data and systems in unsupervised machine learning. There are several key ideas in this lecture:

* __Eigendecomposition__ allows us to decompose a matrix into its constituent parts, the [eigenvectors and eigenvalues](https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors). These values can help us understand the structure of the data or system represented by the matrix. We'll look at two approaches to estimate the [eigenvalues and eigenvectors](https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors) of a matrix.
* __Power iteration method__ estimates the _largest_ eigenvalue/eigenvector pair. Given a _diagonalizable_ matrix $\mathbf{A}$ the power iteration algorithm will produce a number $\lambda$, which is the greatest (in absolute value) eigenvalue of $\mathbf{A}$ and a nonzero vector $\mathbf{v}$ which is a corresponding eigenvector of $\lambda$ such that $\mathbf{A}\mathbf{v} = \lambda\cdot\mathbf{v}$.
* __QR factorization__ is another approach to compute the eigendecomposition of the matrix $\mathbf{A}$. However, unlike power iteration, this approach will give all eigenvalues and eigenvectors of the matrix $\mathbf{A}$. The QR factorization algorithm relies on the [QR decomposition](https://en.wikipedia.org/wiki/QR_decomposition), which itself relies on the [Gram-Schmidt algorithm](https://en.wikipedia.org/wiki/Gramâ€“Schmidt_process).

Lecture notes can be found: [here!](https://github.com/varnerlab/CHEME-5820-Lectures-Spring-2025/blob/main/lectures/week-2/L2a/docs/Notes.pdf)

## Eigendecomposition
Suppose we have a real square matrix $\mathbf{A}\in\mathbb{R}^{m\times{m}}$ which could be a measurement dataset, e.g., the columns of $\mathbf{A}$ represent feature 
vectors $\mathbf{x}_{1},\dots,\mathbf{x}_{m}$ or an incidence array in a graph, etc. Eigenvalue-eigenvector problems involve finding a set of scalar values $\left\{\lambda_{1},\dots,\lambda_{m}\right\}$ called 
[eigenvalues](https://mathworld.wolfram.com/Eigenvalue.html) and a set of linearly independent vectors 
$\left\{\mathbf{v}_{1},\dots,\mathbf{v}_{m}\right\}$ called [eigenvectors](https://mathworld.wolfram.com/Eigenvector.html) such that:
$$
\begin{equation}
\mathbf{A}\cdot\mathbf{v}_{j} = \lambda_{j}\cdot\mathbf{v}_{j}\qquad{j=1,2,\dots,m}
\end{equation}
$$
where $\mathbf{v}\in\mathbb{R}^{m}$ and $\lambda\in\mathbb{R}$. So, why is this interesting?
* Eigenvectors represent fundamental directions of the matrix $\mathbf{A}$. For the linear transformation defined by a matrix $\mathbf{A}$, the eigenvectors are the only vectors that do not change direction during the transformation.
* Eigenvalues are scale factors for their eigenvector. An eigenvalue is a scalar that indicates how much a corresponding eigenvector is stretched or compressed during a linear transformation represented by the matrix $\mathbf{A}$.

Another interpretation we'll explore later is that eigenvectors represent the most critical directions in the data or system, and the eigenvalues represent the importance of these directions.

## Power iteration
The [power iteration method](https://en.wikipedia.org/wiki/Power_iteration) is an iterative algorithm to compute the largest eigenvalue and its corresponding eigenvector of a square (real) matrix; we'll consider only real-valued matrices here, but this approach can be used for matrices with complex entries. 

__Eigenvector__: Suppose we have a real-valued square _diagonalizable_ matrix $\mathbf{A}\in\mathbb{R}^{m\times{m}}$ whose eigenvalues have the property $|\lambda_{1}|\geq|\lambda_{2}|\dots\geq|\lambda_{m}|$. Then, the eigenvector $\mathbf{v}_{1}$ which corresponds to the largest eigenvalue $|\lambda_{1}|$ can be (iteratively) estimated as:
$$
\mathbf{v}_{1}^{(k+1)} = \frac{\mathbf{A}\mathbf{v}_{1}^{(k)}}{\lVert \mathbf{A}\mathbf{v}_{1}^{(k)} \rVert}\quad{k=0,1,2\dots}
$$

where $\lVert \star \rVert$ denotes [some vector norm](https://mathworld.wolfram.com/VectorNorm.html), typically, we'll use the [L2 (Euclidean) norm](https://mathworld.wolfram.com/L2-Norm.html). The [power iteration method](https://en.wikipedia.org/wiki/Power_iteration) will converge for $k\rightarrow\infty$ when a few properties are true, namely, $|\lambda_{1}|/|\lambda_{2}| < 1$, and we pick an appropriate initial guess for $\mathbf{v}_{1}$.

__Algorithm__
* __Initialization__. We begin (iteration $k=0$) with an initial guess of the eigenvector $\mathbf{v}_{1}^{(0)}$, which can be randomly chosen or chosen to be an approximation of the dominant eigenvector. 
* __Update__: Next, we repeatedly multiply this vector by the matrix $\mathbf{A}$ and normalize the result. This iterative approach capitalizes on the property that the dominant eigenvalue will exert the most influence on the vector $\mathbf{v}$ over successive iterations, allowing it to converge towards the eigenvector associated with the largest eigenvalue.
* __Stopping__: We stop the iteration procedure after a fixed number of iterations is reached or when the difference between successive iterations is _small_ in some sense, i.e., $\lVert \mathbf{v}_{1}^{(k)} - \mathbf{v}_{1}^{(k-1)} \rVert\leq\epsilon$. In practice, we'll use both stopping criteria.

However, while simple and efficient, especially for large sparse matrices, the [power iteration method](https://en.wikipedia.org/wiki/Power_iteration) may exhibit slow convergence, mainly when the largest eigenvalue is close in magnitude to other eigenvalues.

Additional references:
* https://www.cs.cornell.edu/~bindel/class/cs6210-f16/lec/2016-10-17.pdf
* https://blogs.sas.com/content/iml/2012/05/09/the-power-method.html

## QR factorization and Gram-Schmidt
Fill me in