# Lecture 2a: Eigendecomposition of Data and Systems
Last week, we looked at [K-means clustering](https://en.wikipedia.org/wiki/K-means_clustering) to group data based on the (distance) similarity of the features. Thus, we imposed an ordering on the data with the presumption that similar things will be close together. Today, we take a different perspective: let's let the data tell us which should be grouped and what combinations of features are most (or least) important.

In this lecture, we will discuss the eigendecomposition of a square matrix and how it can be used to understand data and systems in unsupervised machine learning. There are several key ideas in this lecture:

* __Eigendecomposition__ allows us to decompose a matrix into its constituent parts, the [eigenvectors and eigenvalues](https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors). These values help us understand the structure of the data or system represented by the matrix. We'll look at two approaches to estimate the [eigenvalues and eigenvectors](https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors) of a matrix:
    * __Power iteration method__ estimates the _largest_ eigenvalue/eigenvector pair. Given a _diagonalizable_ matrix $\mathbf{A}$ the power iteration algorithm will produce a number $\lambda$, which is the greatest (in absolute value) eigenvalue of $\mathbf{A}$ and a nonzero vector $\mathbf{v}$ which is a corresponding eigenvector of $\lambda$ such that $\mathbf{A}\mathbf{v} = \lambda\cdot\mathbf{v}$.
    * __QR iteration__ is another approach to compute the eigendecomposition of the matrix $\mathbf{A}$. However, unlike power iteration, this approach will give all eigenvalues and eigenvectors of the matrix $\mathbf{A}$. The QR factorization algorithm relies on the [QR decomposition](https://en.wikipedia.org/wiki/QR_decomposition), which itself relies on the [Gram-Schmidt algorithm](https://en.wikipedia.org/wiki/Gram–Schmidt_process).
* __Buy versus build__: While we will explore these two approaches to compute the eigendecomposition of a matrix in the lecture and associated lab, most, if not all, computing platforms already have built-in functionality to do this computation. For example, [Julia has the `eigen(...)` function exported by the LinearAlgebra.jl package](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.eigen). Most of the time, there is no need to recreate the wheel (and your implementation will likely be worse in space and time complexity). So always buy when given the chance if the buy option is good.

Lecture notes can be found: [here!](https://github.com/varnerlab/CHEME-5820-Lectures-Spring-2025/blob/main/lectures/week-2/L2a/docs/Notes.pdf)

## Setup, Data and Prerequisites
We set up the computational environment by including the `Include.jl` file, loading any needed resources, such as sample datasets, and setting up any required constants. The `Include.jl` file loads external packages, various functions that we will use in the exercise, and custom types to model the components of our problem.

In [3]:
include("Include.jl");

### Data
We'll use samples from a coagulation dataset taken before and during pregnancy. The original data [was generated in the lab of Prof. Bravo at the University of Vermont Medical Center](https://www.linkedin.com/in/maria-cristina-bravo-2489351/). We then developed a probabilistic model of the data and generated various synthetic measurement sets using this model, which we'll look at today. 

Let's load a `visit 1` (pre-pregnancy, default) dataset from disk using [the `MySyntheticDataSet()` function](src/Files.jl)

In [5]:
dataset = MySyntheticDataset() |> d-> d["ensemble"]; # this is loading visit 1 data (baseline, non-pregnant) by default

The keys of the dataset dictionary are the `actual` patient indexes. These keys point to `synthetic` patient measurement vectors constructed by building a model of the original data distribution. To explore this data, specify an original patient index (one of the keys of the original dictionary) in the `original_patient_index::Int` variable

In [7]:
original_patient_index = 7; # i ∈ {keys}

Next, we'll build a data matrix with the `synthetic` measurement vectors for the specified original patient index. We'll store this in the `D::Array{<:Number, 1}` matrix. We'll have `100` synthetic patients, each with `32` clinical samples. The record at index `0` is the original patient record.

We'll [z-score center each feature of the dataset](https://en.wikipedia.org/wiki/Feature_scaling) when the `zscore_convert_data_flag::Bool` flag is set to a value of `true`.
* __Scaling__: In [z-score feature scaling](https://en.wikipedia.org/wiki/Feature_scaling), we subtract off the mean of each feature and then divide by the standard deviation, i.e., $x^{\prime} = (x - \mu)/\sigma$ where $x$ is the unscaled data, and $x^{\prime}$ is the scaled data. Under this scaling regime, $x^{\prime}\leq{0}$ will be values that are less than or equal to the mean value $\mu$, while $x^{\prime}>0$ indicate values that are greater than the mean. The range of data is measured in quanta of the standard deviation $\sigma$.

In [9]:
zscore_convert_data_flag = true; # if this is true, we z-score center the data

In [10]:
D = let

    M = dataset[original_patient_index];
    number_of_rows = length(M); # number of synthetic patients
    number_of_cols = length(M[1]) - 1; # number of measurements (features), first col is the visit number
    D = Array{Float64,2}(undef, number_of_rows, number_of_cols);

    for i ∈ 0:(number_of_rows - 1)
        for j ∈ 1:(number_of_cols)
            D[i+1,j] = M[i][j+1];
        end
    end

    D̂ = copy(D); # z-scale this data? 
    if (zscore_convert_data_flag == true) 
        for j ∈ 1:number_of_cols
            sample_vector = D[:,j]; 
            μ = mean(sample_vector);
            σ = std(sample_vector);

            for i ∈ 1:number_of_rows
                D̂[i,j] = (sample_vector[i] - μ)/σ;
            end
        end
    end
    
    D̂ # return (original -or- z-score scaled data)
end;

In [11]:
D # what is in this dataset?

101×32 Matrix{Float64}:
 -0.235898    -0.400982    0.491598   …  -0.945111   -0.240582   -1.28826
  2.64205     -0.112588    0.604192       1.06829     0.292611   -0.31851
 -0.426973    -0.125557    0.727522      -2.36602     0.937498   -0.430226
 -0.00917291   1.07028    -0.274728       0.748622   -0.04306     0.765123
  2.39505      1.179       1.16139        0.546061   -0.49863     0.337057
  0.455682    -0.612293   -0.323544   …  -0.499281   -1.41524     1.24904
  0.00108153   0.0340816   1.52975       -1.00115    -0.0296104   2.00794
 -0.1397      -0.0683655   0.71844        0.124909    1.5044     -0.218094
  0.0412372    1.65581    -0.657545       1.89657    -0.158904   -0.61306
 -0.659116    -1.29706     1.5989        -0.667092    0.840188   -0.839817
 -1.20167     -0.991119    0.111004   …   0.938794    0.0608561   0.205705
  1.51375      2.73005     1.82544       -1.10368    -0.555378    1.44378
 -0.386925    -0.0698584  -0.31693       -1.3328     -0.491284    2.64475
  ⋮     

### Covariance matrix
Next, compute the covariance matrix $\mathbf{\Sigma}$, a square symmetric matrix whose eigenvalues have some magical properties. The covariance matrix is a square matrix that summarizes the variance and covariance of the features in the dataset.
Suppose we have a dataset $\mathcal{D} = \left\{\mathbf{x}_{1},\mathbf{x}_{2},\dots,\mathbf{x}_{n}\right\}$ where each $\mathbf{x}_{i}\in\mathbb{R}^{m}$ is a feature vector.
The covariance of feature vectors $i$ and $j$, denoted as $\text{cov}\left(\mathbf{x}_{i},\mathbf{x}_{j}\right)$, is a real-valued symmetric matrix $\mathbf{\Sigma}\in\mathbb{R}^{n\times{n}}$ with elements: 
$$
\begin{equation}
    \Sigma_{ij} = \text{cov}\left(\mathbf{x}_{i},\mathbf{x}_{j}\right) = \sigma_{i}\,\sigma_{j}\,\rho_{ij}\qquad\text{for}\quad{i,j \in \mathcal{D}}
\end{equation}
$$
where $\sigma_{i}$ denote the standard deviation of the feature vector $\mathbf{x}_{i}$, $\sigma_{j}$ denote the standard deviation of the 
feature vector $\mathbf{x}_{j}$, and $\rho_{ij}$ denotes the correlation between features $i$ and $j$ in the dataset $\mathcal{D}$. The correlation is given by:
$$
\begin{equation}
\rho_{ij} = \frac{\mathbb{E}(\mathbf{x}_{i}-\mu_{i})\cdot\mathbb{E}(\mathbf{x}_{j} - \mu_{j})}{\sigma_{i}\sigma_{j}}\qquad\text{for}\quad{i,j \in \mathcal{D}}
\end{equation}
$$
where $\mathbb{E}(\cdot)$ denotes the expected value, and $\mu_{i}$ denotes the mean of the feature vector $\mathbf{x}_{i}$.
The diagonal elements of the covariance matrix $\Sigma_{ii}\in\mathbf{\Sigma}$ are the variances of features $i$,
while the off-diagonal elements $\Sigma_{ij}\in\mathbf{\Sigma}$ for $i\neq{j}$ measure the relationship between features 
$i$ and $j$ in the dataset $\mathcal{D}$.

In [13]:
Σ = cov(D) # features x features, while transpose of D gives examples x examples

32×32 Matrix{Float64}:
  1.0          0.0339898   -0.0664774  …  -0.139084    -0.0650261
  0.0339898    1.0         -0.032252      -0.00390982  -0.0337692
 -0.0664774   -0.032252     1.0            0.190691     0.417705
  0.116338     0.00172211   0.100233      -0.101903     0.0452993
  0.0442201   -0.0573173    0.105923       0.0361718    0.00331409
  0.00591742   0.0315375   -0.0846749  …   0.01858      0.0799587
 -0.0947654   -0.0673657    0.242433       0.2639       0.293764
 -0.0882258    0.0791754    0.278398       0.0899397    0.208094
  0.0688532   -0.0185064    0.319922       0.135729     0.191669
 -0.0411815   -0.148643     0.0139708     -0.0950178   -0.0852398
 -0.287338     0.144238     0.223241   …   0.110889     0.00641493
 -0.105303     0.105286     0.123968       0.110409    -0.19473
 -0.124966     0.20102     -0.0424669     -0.234802     0.0928804
  ⋮                                    ⋱   ⋮           
 -0.0674445   -0.0900764    0.0574698  …   0.466954     0.00454851


Finally, we set some constants that we'll use throughout the lecture. Please look at the comment beside the constant value for its meaning, permissible values, units, etc.

In [15]:
number_of_examples = size(D,1); # number of synthetic patients (this includes the original record)
number_of_features = size(D,2); # number of features (measurements)
maxiter = 1000; # maximum number of iterations
ϵ = 1e-10; # stopping criteria

## Eigendecomposition
Suppose we have a real square matrix $\mathbf{A}\in\mathbb{R}^{m\times{m}}$ which could be a measurement dataset, e.g., the columns of $\mathbf{A}$ represent feature 
vectors $\mathbf{x}_{1},\dots,\mathbf{x}_{m}$ or an adjacency array in a graph with $m$ nodes, etc. Eigenvalue-eigenvector problems involve finding a set of scalar values $\left\{\lambda_{1},\dots,\lambda_{m}\right\}$ called 
[eigenvalues](https://mathworld.wolfram.com/Eigenvalue.html) and a set of linearly independent vectors 
$\left\{\mathbf{v}_{1},\dots,\mathbf{v}_{m}\right\}$ called [eigenvectors](https://mathworld.wolfram.com/Eigenvector.html) such that:
$$
\begin{equation}
\mathbf{A}\cdot\mathbf{v}_{j} = \lambda_{j}\cdot\mathbf{v}_{j}\qquad{j=1,2,\dots,m}
\end{equation}
$$
where $\mathbf{v}\in\mathbb{C}^{m}$ and $\lambda\in\mathbb{C}$. We can put the eigenvalues and eigenvectors together in matrix-vector form, which gives us an interesting matrix decomposition:
$$
\mathbf{A} = \mathbf{V}\cdot\text{diag}(\lambda)\cdot\mathbf{V}^{-1}
$$
where $\mathbf{V}$ denotes the matrix of eigenvectors, where the eigenvectors form the columns of the matrix $\mathbf{V}$, $\text{diag}(\lambda)$ denotes a diagonal matrix with the eigenvalues along the main diagonal, and $\mathbf{V}^{-1}$ denotes the inverse of the eigenvalue matrix.

__Symmetric real matricies__

The eigendecomposition of a symmetric real matrix $\mathbf{A}\in\mathbb{R}^{m\times{m}}$ has some special properties. 
First, all the eigenvalues $\left\{\lambda_{1},\lambda_{2},\dots,\lambda_{m}\right\}$ of the matrix $\mathbf{A}$ are real-valued.
Next, the eigenvectors $\left\{\mathbf{v}_{1},\mathbf{v}_{2},\dots,\mathbf{v}_{m}\right\}$ of the matrix $\mathbf{A}$ are orthogonal, i.e., $\left<\mathbf{v}_{i},\mathbf{v}_{j}\right> = 0$ for $i\neq{j}$. Finally, the (normalized) eigenvectors $\mathbf{v}_{j}/\lVert|{\mathbf{v}_{j}}\rVert|$ of a symmetric real-valued matrix 
form an orthonormal basis for the space spanned by the matrix $\mathbf{A}$ such that:
$$
\begin{equation}
\left<\hat{\mathbf{v}}_{i},\hat{\mathbf{v}}_{j}\right> = \delta_{ij}\qquad\text{for}\quad{i,j\in\mathbf{A}}
\end{equation}
$$
where $\delta_{ij}$ is the [Kronecker delta function](https://en.wikipedia.org/wiki/Kronecker_delta). 

__So, why is this interesting__?
* Eigenvectors represent fundamental directions of the matrix $\mathbf{A}$. For the linear transformation defined by a matrix $\mathbf{A}$, eigenvectors are the only vectors that do not change direction during the transformation. If we think about the matrix $\mathbf{A}$ as a machine, we put the eigenvector $\mathbf{v}_{\star}$ into the machine, and we get back the same eigenvector $\mathbf{v}_{\star}$ multiplied by a scalar, the eigenvalue $\lambda_{\star}$.
* Eigenvalues are scale factors for their eigenvector. An eigenvalue is a scalar that indicates how much a corresponding eigenvector is stretched or compressed during a linear transformation represented by the matrix $\mathbf{A}$.
* We can use the eigendecomposition to diagonalize the matrix $\mathbf{A}$, i.e., transform the matrix into a diagonal form where the eigenvalues lie along the main diagonal. To see this, solve the eigendecomposition for the $\text{diag}(\lambda) = \mathbf{V}^{-1}\cdot\mathbf{A}\cdot\mathbf{V}$. We can also use the eigenvalues to classify a matrix $\mathbf{A}$ as positive (semi)definite or negative (semi)definite (which will be handy later).
* If the matrix $\mathbf{A}$ is symmetric, and all entries are positive, then all the eigenvalues will be real-valued, and the eigenvectors will be orthogonal (handy properties, as we shall soon see).

Finally, another interpretation we'll also explore later is that eigenvectors represent the most critical directions in the data or system, and eigenvalues (or functions of them) represent their importance. Hmm, interesting. But how can we calculate them (given the buy versus build caveat above)?

However, before we do anything, let's build the square matrix $\mathbf{A}$ that we want to decompose:

In [17]:
A = Σ; # let's use the covariance for our example matrix

### Method 1: Power iteration
The [power iteration method](https://en.wikipedia.org/wiki/Power_iteration) is an iterative algorithm to compute the largest eigenvalue and its corresponding eigenvector of a square (real) matrix; we'll consider only real-valued matrices here, but this approach can also be used for matrices with complex entries. 

* Perhaps the most famous application of [power iteration](https://en.wikipedia.org/wiki/Power_iteration) is the [Google PageRank algorithm](https://epubs.siam.org/doi/10.1137/050623280). Google's PageRank algorithm, which uses power iteration, utilizes the dominant eigenvalue and its corresponding eigenvector of a link connection matrix to assess the importance of web pages within a connection network.

How does the [power iteration method](https://en.wikipedia.org/wiki/Power_iteration) work?

__Phase 1: Eigenvector__: Suppose we have a real-valued square _diagonalizable_ matrix $\mathbf{A}\in\mathbb{R}^{m\times{m}}$ whose eigenvalues have the property $|\lambda_{1}|\geq|\lambda_{2}|\dots\geq|\lambda_{m}|$. Then, the eigenvector $\mathbf{v}_{1}\in\mathbb{C}^{m}$ which corresponds to the largest eigenvalue $\lambda_{1}\in\mathbb{C}$ can be (iteratively) estimated as:
$$
\mathbf{v}_{1}^{(k+1)} = \frac{\mathbf{A}\mathbf{v}_{1}^{(k)}}{\Vert \mathbf{A}\mathbf{v}_{1}^{(k)} \Vert}\quad{k=0,1,2\dots}
$$

where $\lVert \star \rVert$ denotes the [L2 (Euclidean) vector norm](https://mathworld.wolfram.com/L2-Norm.html). The [power iteration method](https://en.wikipedia.org/wiki/Power_iteration) converges to a value for the eigenvector as $k\rightarrow\infty$ when a few properties are true, namely, $|\lambda_{1}|/|\lambda_{2}| < 1$ (which is unknown beforehand), and we pick an appropriate initial guess for $\mathbf{v}_{1}$ (in our case, a random vector will work).

__Phase 2: Eigenvalue__: Once we have an estimate for the eigenvector $\hat{\mathbf{v}}_{1}$, we can estimate the corresponding eigenvalue $\hat{\lambda}_{1}$ using [the Rayleigh quotient](https://en.wikipedia.org/wiki/Rayleigh_quotient). This argument proceeds from the definition of the eigenvalues and eigenvectors. We know, from the definition of eigenvalue-eigenvector pairs, that:
$$
\mathbf{A}\hat{\mathbf{v}}_{1} - \hat{\lambda}_{1}\hat{\mathbf{v}}_{1}\simeq{0}
$$
where we use the $\simeq$ symbol because we don't have the true eigenvector $\mathbf{v}_{1}$, only an estimate of it. To solve this expression for the (estimated) eigenvalue $\hat{\lambda}_{1}$, we multiply through by the transpose of the eigenvector and solve for the eigenvalue:
$$
\hat{\lambda}_{1} \simeq \frac{\hat{\mathbf{v}}_{1}^{T}\mathbf{A}\hat{\mathbf{v}}_{1}}{\hat{\mathbf{v}}_{1}^{T}\hat{\mathbf{v}}_{1}} = \frac{\left<\mathbf{A}\hat{\mathbf{v}}_{1},\hat{\mathbf{v}}_{1}\right>}{\left<\hat{\mathbf{v}}_{1},\hat{\mathbf{v}}_{1}\right>}
$$
where $\left<\star,\star\right>$ denotes [an inner product](https://mathworld.wolfram.com/InnerProduct.html). 

__Algorithm__
* __Initialization__. We begin (iteration $k=0$) with an initial (random) guess of the eigenvector $\mathbf{v}_{1}^{(0)}$, the maximum number of iterations we are willing to take `maxiter,` and a tolerance parameter $\epsilon>0$.  
* __Update__: Next, we repeatedly multiply the $\mathbf{v}^{\star}_{1}$ vector by the matrix $\mathbf{A}$ and normalize the result by $\Vert\mathbf{A}\mathbf{v}^{\star}_{1}\Vert$. This iterative approach capitalizes on the property that the dominant eigenvalue will exert the most influence on the vector $\mathbf{v}$ over successive iterations, allowing it to converge towards the eigenvector associated with the largest eigenvalue.
* __Stopping__: We stop the iteration procedure after `maxiter` number of iterations is reached or when the difference between successive iterations is _small_ in some sense, i.e., $\lVert \mathbf{v}_{1}^{(k)} - \mathbf{v}_{1}^{(k-1)} \rVert\leq\epsilon$ where $\lVert\star\rVert$ is [some vector norm](https://mathworld.wolfram.com/VectorNorm.html). In practice, we'll use both stopping criteria and the L2 norm to guard against an infinite loop (our iteration will be implemented using [a `while` loop](https://docs.julialang.org/en/v1/base/base/#while)).

While simple to implement, the [power iteration method](https://en.wikipedia.org/wiki/Power_iteration) may exhibit slow convergence, mainly when the largest eigenvalue is close in magnitude to other eigenvalues, i.e., $|\lambda_{1}|/|\lambda_{2}| \sim 1$.
Check out a [power iteration pseudo-code in the course notes](https://github.com/varnerlab/CHEME-5820-Lectures-Spring-2025/blob/main/lectures/week-2/L2a/docs/Notes.pdf)

__Additional references__:
* [Prof. David Bindel: Cornell CS 6210: Matrix Computations (2016), Lecture on Power Iteration](https://www.cs.cornell.edu/~bindel/class/cs6210-f16/lec/2016-10-17.pdf)
* [Prof. Tom Trogdon: UCI MATH 105A: Numerical Analysis (2016), Lecture 22: The Power Method](https://faculty.washington.edu/trogdon/105A/html/Lecture22.html)
* [Prof. Thomas Strohmer: UCD MATH 108: Mathematical Algorithms for Artificial Intelligence and Big Data (2017), Lecture on PageRank](https://math.ucdavis.edu/%7Estrohmer/courses/180BigData/180lecture_pagerank.pdf)

We've implemented the [`poweriteration(...)` method](https://en.wikipedia.org/wiki/Power_iteration) in the [`poweriteration(...)` function](src/Compute.jl). 
* The [`poweriteration(...)` function](src/Compute.jl) takes the square matrix $\mathbf{A}$, an initial guess for the eigenvector $\mathbf{v}^{(0)}_{1}$ and (optional) keyword parameters controlling the stopping criteria as arguments. The function returns a tuple holding the estimated eigenvector $\hat{\mathbf{v}}_{1}$ and eigenvalue $\hat{\lambda}_{1}$.

In [20]:
(v̂,λ̂) = let

    n = size(A,1); # how many rows (cols) do we have? (square)
    vₒ = randn(n); # initial random guess

    # call the poweriteration function
    (v, λ) = poweriteration(A, vₒ, maxiter = maxiter, ϵ = ϵ);

    # return -
    (v,λ)
end;

Converged in 140 iterations


To test our power iteration implementation, let's compare the values of the largest eigenvalue, eigenvector pair $(\hat{\lambda}_{1}, \hat{\mathbf{v}}_{1})$ that we just estimated with those computed [using the `eigen(...)` function exported by the LinearAlgebra.jl package](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.eigen). The built-in [`eigen(...)` function](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.eigen) takes the matrix $\mathbf{A}$ (and some additional optional arguments) and returns [an `Eigen` factorization object](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.Eigen) holding the eigenvalues and eigenvectors.
* __Check__: We use the [`@assert` macro](https://docs.julialang.org/en/v1/base/base/#Base.@assert) in combination with the shortcut version of [the `isapprox(...)` function](https://docs.julialang.org/en/v1/base/math/#Base.isapprox) to compare our result to the built in function. If the argument to [the `@assert` macro](https://docs.julialang.org/en/v1/base/base/#Base.@assert) evaluates to `false`, an [`AssertionError` instance](https://docs.julialang.org/en/v1/base/base/#Core.AssertionError) is thrown.

In [22]:
let
    F = eigen(A); # compute the eigendecomposition

    # get
    λ = maximum(F.values); # get the max eigenvalue (sorted, this should be the last element)
    v = F.vectors[:,end]; # eigenvectors are sorted - the last column is v₁

    # tests
    @assert (λ ≈ λ̂) && (abs.(v) ≈ abs.(v̂))  # do the eigenvalues and eigenvectors match?
end

__What's in the largest eigenvector?__ Let's use [the `softmax(...)` function](src/Compute.jl) to transform the largest eigenvector in a probability vector (sums to one, all entries are non-negative). The [softmax](https://en.wikipedia.org/wiki/Softmax_function) for some vector $\mathbf{z}$ as
$$
\begin{equation}
\sigma(\mathbf{z})_{i} = \frac{e^{z_{i}}}{\sum_{j=1}^{m}e^{z_{j}}}\quad{i=1,2,\dots,m}
\end{equation}
$$
where $\sigma(\mathbf{z})_{i}$ is the ith components of the transformed eigenvector. We apply [the `argmax(...)` function](https://docs.julialang.org/en/v1/base/collections/#Base.argmax) to the transformed vector to get the largest component:

In [75]:
v̂

32-element Vector{Float64}:
  0.07244043702439566
  0.024313379862462497
 -0.33915475720642513
 -0.07095805537107926
 -0.05760404590763579
 -0.06829650382078492
 -0.269920129788821
 -0.21149739260192324
 -0.2059913324200421
 -0.0005691914700015318
 -0.08309605338158103
 -0.020388792251569917
  0.01749459509085242
  ⋮
 -0.11517536358053056
  0.06248316389430274
 -0.04080162661096731
  0.06097107265249597
 -0.15342243824016039
  0.002993380886373368
 -0.40544641737927034
 -0.1461596826862385
 -0.2214973483414665
  0.04217526063872276
 -0.16664372448110407
 -0.40480804476517007

### Method 2: QR Iteration
[QR iteration](https://en.wikipedia.org/wiki/QR_algorithm) is a fundamental technique in numerical linear algebra, primarily used for computing the eigenvalues and eigenvectors of matrices. The algorithm leverages the concept of [QR decomposition](https://en.wikipedia.org/wiki/QR_decomposition), which expresses a (rectangular) matrix $\mathbf{A}\in\mathbb{R}^{n\times{m}}$ as a product of an orthogonal matrix $\mathbf{Q}\in\mathbb{R}^{n\times{n}}$ and an upper triangular matrix $\mathbf{R}\in\mathbb{R}^{n\times{m}}$:
$$
\mathbf{A} = \mathbf{Q}\mathbf{R}
$$
where $\mathbf{Q}^{\top}\mathbf{Q} = \mathbf{I}$. The core of the QR iteration algorithm involves iteratively decomposing a given matrix $\mathbf{A}$ into its $\mathbf{Q}$ and $\mathbf{R}$ factors and then reformulating the matrix for subsequent iterations. Under certain conditions, [QR iteration](https://en.wikipedia.org/wiki/QR_algorithm) will converge to a triangular matrix with the eigenvalues of the original matrix $\mathbf{A}$ listed on the diagonal.

__Algorithm__
* __Initialization__. We begin by specifying an initial matrix $\mathbf{A}_{1} = \mathbf{A}$, the maximum number of iterations `maxiter` that we are willing to do, and a tolerance parameter $\epsilon$.
* __Update__. For iteration $k = 1,2,\dots$, compute the [QR decomposition](https://en.wikipedia.org/wiki/QR_decomposition) of $\mathbf{A}_{k} = \mathbf{Q}_{k}\mathbf{R}_{k}$. We then form a new matrix $\mathbf{A}_{k+1} = \mathbf{R}_{k}\mathbf{Q}_{k}$, which can be re-written as $\mathbf{A}_{k+1} = \mathbf{Q}^{T}_{k}\mathbf{A}_{k}\mathbf{Q}_{k}$.
* __Stopping__. We stop the iteration procedure after `maxiter` iterations is reached or when the difference between successive iterations is _small_ in some sense, i.e., $\lVert \mathbf{A}_{k+1} - \mathbf{A}_{k} \rVert_{1}\leq\epsilon$ where $\lVert\star\rVert_{1}$ denotes the [p = 1 matrix norm](https://en.wikipedia.org/wiki/Matrix_norm), or perhaps $\lVert \lambda_{k+1} - \mathbf{\lambda}_{k} \rVert_{2}\leq\epsilon$ where $\lVert\star\rVert_{2}$ is [the L2-vector norm](https://en.wikipedia.org/wiki/Norm_(mathematics)#Euclidean_norm), i.e., the eigenvalues don't change between iterations, and $\epsilon$ is a tolerance parameter.

Once we have converged to matrix $\mathbf{A}_{\star}$, we get the eigenvalue from the diagonal of $\mathbf{A}_{\star}$. To compute the eigenvectors, we solve the homogenous system of linear algebraic equations:
$$
\left(\mathbf{A} - \lambda_{\star}\mathbf{I} \right)\cdot\mathbf{v}_{\star} = \mathbf{0}
$$

Before we implement [QR iteration](https://en.wikipedia.org/wiki/QR_algorithm), let's look at how to compute the $\mathbf{Q}$ and $\mathbf{R}$ matrices.

### Classical and Modified Gram-Schmidt
The [QR decomposition](https://en.wikipedia.org/wiki/QR_decomposition) can be computed using a variety of approaches, including a handy technique called [the Gram–Schmidt algorithm](https://en.wikipedia.org/wiki/Gram%E2%80%93Schmidt_process). In principle, Gram-Schmidt orthogonalization generates a set of mutually orthogonal vectors $\mathbf{q}_{1},\mathbf{q}_{2},\dots, \mathbf{q}_{n}$ starting from a set of linearly independent vectors $\mathbf{x}_{1},\mathbf{x}_{2},\dots,\mathbf{x}_{n}$ 
by subtracting the projection of each vector onto the previous vectors, i.e.,
$$
\begin{equation}
\mathbf{q}_{k}=\mathbf{x}_{k}-\sum_{i=1}^{k-1}c_{k,i}\cdot\mathbf{q}_{i},
\qquad{k=1,\dots,n}
\end{equation}
$$
where the coefficients $c_{k,1},c_{k,2},\dots,c_{k,k-1}$ are chosen to make the vectors $\mathbf{q}_{1},\mathbf{q}_{2},\dots,\mathbf{q}_{k}$ orthogonal.
The $c_{\star}$ coefficients represent the component of the vector $\mathbf{x}_{k}$ that lies in the direction of the vectors $\mathbf{q}_{1},\mathbf{q}_{2},\dots,\mathbf{q}_{k-1}$. See [the course notes for details about computing $c_{\star}$](https://github.com/varnerlab/CHEME-5820-Lectures-Spring-2025/blob/main/lectures/week-2/L2a/docs/Notes.pdf).

Classical Gram-Schmidt can sometimes produce _almost_ orthogonal vectors because of roundoff error, which led to the Modified Gram-Schmidt algorithm. Check out the [Classical and Modified Gram-Schmidt pseudo-code in the course notes](https://github.com/varnerlab/CHEME-5820-Lectures-Spring-2025/blob/main/lectures/week-2/L2a/docs/Notes.pdf).

__Additional references__:
* [Prof. Tom Trogdon: UCI MATH 105A: Numerical Analysis (2016), Lecture 21: Orthogonal Matricies](https://faculty.washington.edu/trogdon/105A/html/Lecture21.html)
* [Prof. Tom Trogdon: UCI MATH 105A: Numerical Analysis (2016), Lecture 23: The modified Gram-Schmidt procedure](https://faculty.washington.edu/trogdon/105A/html/Lecture23.html)

In [27]:
(T, QCGS, QMGS) = let
    T = randn(10,3); # notice this is not squre
    QCGS = orthogonalize(T, ClassicalGramSchmidtAlgorithm())
    QMGS = orthogonalize(T, ClassicalGramSchmidtAlgorithm())
    T,QCGS,QMGS # return
end;

Once we have $\mathbf{Q}$, we can compute the $\mathbf{R}$ factor as:
$$
\mathbf{R} = \mathbf{Q}^{\top}\mathbf{A}
$$
which takes advantage of the properties of orthogonal matrices, namely, $\mathbf{Q}^{\top}\mathbf{Q} = \mathbf{I}$.

In [29]:
R = transpose(QCGS)*T;

Are the columns of $\mathbf{Q}$ orthonomal?

In [31]:
dot(QCGS[:,2],QCGS[:,3]) # inner product

2.6052839771361824e-17

In [32]:
transpose(QCGS)*QCGS

3×3 Matrix{Float64}:
  1.0          -1.94209e-17  -4.61402e-17
 -1.94209e-17   1.0           2.60528e-17
 -4.61402e-17   2.60528e-17   1.0

#### Compute eigenvalues and eigenvectors using QR iteration
We can use [QR decomposition](https://en.wikipedia.org/wiki/QR_decomposition) to compute the eigenvectors and eigenvalues of a square matrix $\mathbf{A}$. We've implemented our QR own iteration algorithm in [the `myqriteration(...)` method](src/Eigendecomposition.jl). This code uses [the `qr(...)` function exported by LinearAlgebra.jl](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.qr) to compute the [QR decomposition](https://en.wikipedia.org/wiki/QR_decomposition) (instead of [our `orthogonalize(...)` function](src/Compute.jl) discussed above).
* The [`myqriteration(...)` method](src/Eigendecomposition.jl) takes the matrix $\mathbf{A}$ that we want to decompose, along with the optional `maxiter` and $\epsilon$ tolerance parameters. This method returns the sorted eigenvalues and their corresponding eigenvalues organized in [a `Tuple`](https://docs.julialang.org/en/v1/base/base/#Core.Tuple). We'll process this data and return $\text{diag}(\lambda)$ and $\mathbf{V}$, i.e., the diagonal matrix of eigenvalues and the matrix of eigenvectors.

In [34]:
myeiegnresult = myqriteration(A, maxiter = maxiter, ϵ = ϵ);

Converged in 727 iterations


In [35]:
(Λ̂,V̂) = let

    # initialize -
    (n,m) = size(A); # what is the dimension of A?
    Λ = Matrix{Float64}(1.0*I, n, n); # builds the I matrix, we'll update with λ -
    V = Array{Float64,2}(undef, n,n); # builds an empty V matrix

    # call our qr-iteration method
    myeiegnresult = myqriteration(A, maxiter = maxiter, ϵ = ϵ);

    # package the eigenvalues into Λ -
    for i ∈ 1:n
        Λ[i,i] = myeiegnresult[1][i];
    end

    # package the eigenvectors into the V-matrix
    for i ∈ 1:n
        v = myeiegnresult[2][i]; # this gets the ith eigenvector
        for j ∈ 1:n
            V[j,i] = v[j];
        end
    end

    Λ,V
end;

Converged in 727 iterations


##### Check: Eigenvalues
Let's check if the eigenvalues computed by the [`eigen(...)` function](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.eigen) are the same as the ones we just calculated [by our `myqriteration(...)` implementation](src/Compute.jl). First, let's use the [built-in eigen function](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.eigen) and see what we get. The [eigen function](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.eigen) takes a square array `A` as an argument and returns the eigendecomposition.

In [37]:
(Λ,V) = let

    # initialize -
    (n,m) = size(A); # what is the dimension of A?
    Λ = Matrix{Float64}(1.0*I, n, n); # builds the I matrix, we'll update with λ -
    
    # Decompose using the built-in function
    F = eigen(A);   # eigenvalues and vectors in F of type Eigen
    λ = F.values;   # vector of eigenvalues
    V = F.vectors;  # n x n matrix of eigenvectors, each col is an eigenvector

    # package the eigenvalues into Λ -
    for i ∈ 1:n
        Λ[i,i] = λ[i];
    end

    Λ,V
end;

How far apart are the eigenvalues estimated using the builtin function versus our qr-iteration implementation?

In [39]:
norm(diag(Λ) - diag(Λ̂)) # is this is small (?), if so, we are good to go

2.7292091353942218e-9

##### Check: Eigenvectors
Let's do the same thing with the eigenvectors. How similar are our eigenvectors computed using [the `myqriteration(...)` function](src/Compute.jl) to those calculated using [the `eigen(...)` method](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.eigen)?

In [41]:
i = 2; # which eigenvector do I want to check?
norm(abs.(V[:,i]) - abs.(V̂[:,i])) # my |v| are ok, but why |*|?

1.5831938896116381e-13

### Check: Are the magical properties of $\mathbf{\Sigma}$ true?
The covariance matrix $\mathbf{\Sigma}$ is a real-valued, symmetric matrix. So, it's eigenvalues and eigenvectors should have two important properties:
* __Property 1__: All the eigenvalues $\left\{\lambda_{1},\lambda_{2},\dots,\lambda_{m}\right\}$ of the matrix $\mathbf{\Sigma}$ are real-valued.
* __Property 2__: The eigenvectors $\left\{\mathbf{v}_{1},\mathbf{v}_{2},\dots,\mathbf{v}_{m}\right\}$ of the matrix $\mathbf{A}$ are orthogonal, i.e., $\left<\mathbf{v}_{i},\mathbf{v}_{j}\right> = 0$ for $i\neq{j}$. Further, the (normalized) eigenvectors $\hat{\mathbf{v}}_{j} = \mathbf{v}_{j}/\lVert|\mathbf{v}_{j}\rVert|$ of a symmetric real-valued matrix are orthonormal $\left<\hat{\mathbf{v}}_{i},\hat{\mathbf{v}}_{j}\right> = \delta_{ij}\,\text{for}\,{i,j\in\mathbf{A}}$ where $\delta_{ij}$ is the [Kronecker delta function](https://en.wikipedia.org/wiki/Kronecker_delta).

In [43]:
F = eigen(A);   # eigenvalues and vectors in F of type Eigen

In [44]:
λ = F.values   # vector of eigenvalues

32-element Vector{Float64}:
 0.0014527780159717676
 0.004227900511241689
 0.022419588285230645
 0.03620596262883423
 0.045861218559380246
 0.08684049302927382
 0.10088717873785562
 0.13745562578456494
 0.17118390178109663
 0.2297646801919446
 0.24578677552002437
 0.3234784147815236
 0.3757628716693392
 ⋮
 1.1300050964721537
 1.1501059356725132
 1.2837146382018603
 1.4743233692512792
 1.5406386656596625
 1.6648200258004073
 1.8021540370981772
 2.0149169413795214
 2.4345687250043135
 3.1049549269027206
 3.5626012400351406
 4.166259130277582

In [45]:
V = F.vectors  # n x n matrix of eigenvectors, each col is an eigenvector

32×32 Matrix{Float64}:
 -0.00716029    0.0183125    0.0202499   …  -0.0123746    0.0724404
  0.00850918    0.00981436   0.0193196       0.0734779    0.0243134
  0.047095      0.081259    -0.157562       -0.106824    -0.339155
  0.00270045    0.047222    -0.0307358       0.0891756   -0.0709581
  0.0260907    -0.0621795    0.0569926      -0.00593653  -0.057604
  0.0294774    -0.0273706   -0.00410907  …   0.0364807   -0.0682965
  0.0807985     0.0174296    0.0135959      -0.164639    -0.26992
 -0.0501546     0.0278806   -0.0197476      -0.066491    -0.211497
 -0.0215418    -0.0474811    0.00980129      0.0152977   -0.205991
 -0.00448647    0.0458928   -0.0402995       0.0543541   -0.000569192
  0.000898109   0.0050567   -0.0204656   …  -0.11858     -0.0830961
  0.000546541   0.105316    -0.0785019      -0.0300036   -0.0203888
 -0.0617869    -0.0532542    0.0785565       0.0980275    0.0174946
  ⋮                                      ⋱   ⋮           
  0.0101275     0.0194467    0.00340866

In [46]:
dot(V[:,1], V[:,32])

1.1102230246251565e-16

In [47]:
transpose(V)*V

32×32 Matrix{Float64}:
  1.0          -1.8731e-16    1.84748e-16  …  -2.2619e-16    1.76712e-16
 -1.8731e-16    1.0           4.37424e-17     -1.21459e-16  -7.83334e-17
  1.84748e-16   4.37424e-17   1.0              1.17752e-16   4.20765e-17
 -6.48828e-17   1.34039e-18  -8.93977e-17      5.69244e-17  -2.20794e-17
  1.31551e-16   9.17413e-17   2.69345e-16     -1.41507e-17  -1.24097e-18
 -9.78677e-17  -3.5094e-17    4.87207e-16  …   2.62904e-17   5.16934e-17
  7.32971e-17   6.05868e-17   1.68057e-16      1.07265e-17   1.52493e-17
 -3.96194e-17  -1.2621e-18    1.10441e-16     -1.00785e-17  -1.028e-16
  1.30805e-16   8.12315e-17   2.30345e-17     -5.79707e-18   5.74693e-18
  5.20457e-18   7.90135e-17  -1.41447e-16      3.30371e-17  -1.16385e-17
  1.57284e-18  -7.04451e-17   1.31732e-16  …   4.7487e-17   -5.18278e-18
 -9.43813e-17   1.03723e-17  -7.46978e-17      7.45315e-17  -4.30277e-18
 -9.01354e-17  -1.90522e-17   2.84421e-16      1.28934e-17   3.62452e-17
  ⋮                           

# Today?
What did we do today? Give me three things.