# Lecture 2a: Eigendecomposition of Data and Systems
In this lecture, we will discuss the eigendecomposition of a square matrix and how it can be used to understand data and systems in unsupervised machine learning. There are several key ideas in this lecture:

* __Eigendecomposition__ allows us to decompose a matrix into its constituent parts, the [eigenvectors and eigenvalues](https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors). These values can help us understand the structure of the data or system represented by the matrix. We'll look at two approaches to estimate the [eigenvalues and eigenvectors](https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors) of a matrix.
* __Power iteration method__ estimates the _largest_ eigenvalue/eigenvector pair. Given a _diagonalizable_ matrix $\mathbf{A}$ the power iteration algorithm will produce a number $\lambda$, which is the greatest (in absolute value) eigenvalue of $\mathbf{A}$ and a nonzero vector $\mathbf{v}$ which is a corresponding eigenvector of $\lambda$ such that $\mathbf{A}\mathbf{v} = \lambda\cdot\mathbf{v}$.
* __QR factorization__ is another approach to compute the eigendecomposition of the matrix $\mathbf{A}$. However, unlike power iteration, this approach will give all eigenvalues and eigenvectors of the matrix $\mathbf{A}$. The QR factorization algorithm relies on the [QR decomposition](https://en.wikipedia.org/wiki/QR_decomposition), which itself relies on the [Gram-Schmidt algorithm](https://en.wikipedia.org/wiki/Gram–Schmidt_process).

Lecture notes can be found: [here!](https://github.com/varnerlab/CHEME-5820-Lectures-Spring-2025/blob/main/lectures/week-2/L2a/docs/Notes.pdf)

## Setup and Prerequisites
We set up the computational environment by including the `Include.jl` file, loading any needed resources, such as sample datasets, and setting up any required constants. The `Include.jl` file loads external packages, various functions that we will use in the exercise, and custom types to model the components of our problem.

In [3]:
include("Include.jl");

We'll use the coagulation dataset. Let's load this data from disk using [the `MySyntheticDataSet()` function](src/Files.jl)

In [5]:
dataset = MySyntheticDataset() |> d-> d["ensemble"]; 

The keys of the dataset dictionary are the `actual` patient indexes. These keys point to `synthetic` patient measurement vectors constructed by building a model of the original data distribution. To explore this data, specify an original patient index (one of the keys of the original dictionary) in the `original_patient_index::Int` variable:

In [7]:
original_patient_index = 7; # i ∈ {keys}

Next, we'll build a data matrix with the `synthetic` measurement vectors for the specified original patient index. We'll store this in the `D::Array{<:Number, 1}` matrix. This data will be [z-score transformed](https://en.wikipedia.org/wiki/Standard_score), i.e., we center the data and normalize it by the standard deviation. Thus, all features will be on the same scale.

In [9]:
M = dataset[original_patient_index][0]

33-element Vector{Float64}:
    1.0
    0.30836988
    0.95415
 1379.48378288
   19.75514189
   12.804792939999999
    0.8867103181999999
  115.60658202
  156.28490704
   15.33133564
   11.96380865
 3781.5834824
   92.22819643799998
    ⋮
    1.737781717
    1.631652416
 1576.229293
  227.991075025
   36.51941662875
    4.875
   15.5
   41.5
   30.43333333
   99.0
   54.25833333
 1545.5

In [10]:
D = let

    M = dataset[original_patient_index];
    number_of_rows = length(M); # number of synthetic patients
    number_of_cols = length(M[1]) - 1; # number of measurements (features), first col is the visit number
    D = Array{Float64,2}(undef, number_of_rows, number_of_cols);

    for i ∈ 0:(number_of_rows - 1)
        for j ∈ 1:(number_of_cols)
            D[i+1,j] = M[i][j+1];
        end
    end

    D̂ = copy(D);
    # for j ∈ 1:number_of_cols
    #     sample_vector = D[:,j]; 
    #     μ = mean(sample_vector);
    #     σ = std(sample_vector);

    #     for i ∈ 1:number_of_rows
    #         D̂[i,j] = (sample_vector[i] - μ)/σ;
    #     end
    # end
    
    D̂
end;

In [11]:
D

101×32 Matrix{Float64}:
 0.30837    0.95415   1379.48  …  30.4333   99.0     54.2583  1545.5
 1.05454    1.21144   1390.44     40.3337   99.6431  62.8345  1738.98
 0.258829   1.19987   1402.45     50.2747   98.5461  73.2072  1716.69
 0.367153   2.26673   1304.9      38.0187   99.541   57.4354  1955.18
 0.990502   2.36371   1444.68     28.3245   99.4763  50.1078  1869.77
 0.487677   0.765631  1300.14  …  27.3373   99.1424  35.3646  2051.73
 0.369812   1.34229   1480.53     44.1981   98.9821  57.6517  2203.14
 0.333311   1.25089   1401.56     53.1082   99.3418  82.3255  1759.01
 0.380223   2.78909   1267.64     32.6413   99.9077  55.5721  1680.21
 0.198641   0.154719  1487.26     46.1733   99.0888  71.642   1634.97
 0.0579719  0.427665  1342.44  …  40.5358   99.6018  59.1068  1843.57
 0.762005   3.74746   1509.31     27.9641   98.9494  49.195   2090.58
 0.269213   1.24956   1300.79     32.8467   98.8762  50.2259  2330.19
 ⋮                             ⋱                      ⋮       
 0.1

Finally, we set some constants that we'll use throughout the lecture. See the comment beside the constant value for its meaning, permissible values, units, etc.

In [13]:
number_of_examples = size(D,1); # number of synthetic patients
number_of_features = size(D,2); # number of features (measurements)
maxiter = 25000; # maximum number of iterations
ϵ = 1e-8; # stopping criteria

## Eigendecomposition
Suppose we have a real square matrix $\mathbf{A}\in\mathbb{R}^{m\times{m}}$ which could be a measurement dataset, e.g., the columns of $\mathbf{A}$ represent feature 
vectors $\mathbf{x}_{1},\dots,\mathbf{x}_{m}$ or an incidence array in a graph, etc. Eigenvalue-eigenvector problems involve finding a set of scalar values $\left\{\lambda_{1},\dots,\lambda_{m}\right\}$ called 
[eigenvalues](https://mathworld.wolfram.com/Eigenvalue.html) and a set of linearly independent vectors 
$\left\{\mathbf{v}_{1},\dots,\mathbf{v}_{m}\right\}$ called [eigenvectors](https://mathworld.wolfram.com/Eigenvector.html) such that:
$$
\begin{equation}
\mathbf{A}\cdot\mathbf{v}_{j} = \lambda_{j}\cdot\mathbf{v}_{j}\qquad{j=1,2,\dots,m}
\end{equation}
$$
where $\mathbf{v}\in\mathbb{R}^{m}$ and $\lambda\in\mathbb{R}$. So, why is this interesting?
* Eigenvectors represent fundamental directions of the matrix $\mathbf{A}$. For the linear transformation defined by a matrix $\mathbf{A}$, the eigenvectors are the only vectors that do not change direction during the transformation.
* Eigenvalues are scale factors for their eigenvector. An eigenvalue is a scalar that indicates how much a corresponding eigenvector is stretched or compressed during a linear transformation represented by the matrix $\mathbf{A}$.

Another interpretation we'll explore later is that eigenvectors represent the most critical directions in the data or system, and the eigenvalues represent the importance of these directions.

## Method 1: Power iteration
The [power iteration method](https://en.wikipedia.org/wiki/Power_iteration) is an iterative algorithm to compute the largest eigenvalue and its corresponding eigenvector of a square (real) matrix; we'll consider only real-valued matrices here, but this approach can be used for matrices with complex entries. 

__Eigenvector__: Suppose we have a real-valued square _diagonalizable_ matrix $\mathbf{A}\in\mathbb{R}^{m\times{m}}$ whose eigenvalues have the property $|\lambda_{1}|\geq|\lambda_{2}|\dots\geq|\lambda_{m}|$. Then, the eigenvector $\mathbf{v}_{1}$ which corresponds to the largest eigenvalue $\lambda_{1}$ can be (iteratively) estimated as:
$$
\mathbf{v}_{1}^{(k+1)} = \frac{\mathbf{A}\mathbf{v}_{1}^{(k)}}{\Vert \mathbf{A}\mathbf{v}_{1}^{(k)} \Vert}\quad{k=0,1,2\dots}
$$

where $\lVert \star \rVert$ denotes [some vector norm](https://mathworld.wolfram.com/VectorNorm.html), typically, the [L2 (Euclidean) norm](https://mathworld.wolfram.com/L2-Norm.html). The [power iteration method](https://en.wikipedia.org/wiki/Power_iteration) will converge to a value for the eigenvector as $k\rightarrow\infty$ when a few properties are true, namely, $|\lambda_{1}|/|\lambda_{2}| < 1$, and we pick an appropriate initial guess for $\mathbf{v}_{1}$.

__Eigenvalue__: Once we have an estimate for the eigenvector $\hat{\mathbf{v}}_{1}$, we can compute an estimate of the corresponding eigenvalue $\hat{\lambda}_{1}$ using [the Rayleigh quotient](https://en.wikipedia.org/wiki/Rayleigh_quotient). We know, from the definition of eigenvalue-eigenvector pairs, that:
$$
\mathbf{A}\hat{\mathbf{v}}_{1} - \hat{\lambda}_{1}\hat{\mathbf{v}}_{1}\simeq{0}
$$
To solve this for the eigenvalue $\hat{\lambda}_{1}$, we can multiply through by the transpose of the eigenvector and solve for the eigenvalue:
$$
\hat{\lambda}_{1} \simeq \frac{\hat{\mathbf{v}}_{1}^{T}\mathbf{A}\hat{\mathbf{v}}_{1}}{\hat{\mathbf{v}}_{1}^{T}\hat{\mathbf{v}}_{1}}
$$
Note that we have used the $\simeq$ symbol as this expression will give the true eigenvalue only when we have the true eigenvector. In our case, we have an approximation of the eigenvector, which could be a good or poor approximation depending on how many iterations we take.

__Algorithm__
* __Initialization__. We begin (iteration $k=0$) with an initial (random) guess of the eigenvector $\mathbf{v}_{1}^{(0)}$, the maximum number of iterations we are willing to take `maxiter,` and a tolerance parameter $\epsilon>0$.  
* __Update__: Next, we repeatedly multiply the $\mathbf{v}^{\star}_{1}$ vector by the matrix $\mathbf{A}$ and normalize the result by $\Vert\mathbf{A}\mathbf{v}^{\star}_{1}\Vert$. This iterative approach capitalizes on the property that the dominant eigenvalue will exert the most influence on the vector $\mathbf{v}$ over successive iterations, allowing it to converge towards the eigenvector associated with the largest eigenvalue.
* __Stopping__: We stop the iteration procedure after `maxiter` number of iterations is reached or when the difference between successive iterations is _small_ in some sense, i.e., $\lVert \mathbf{v}_{1}^{(k)} - \mathbf{v}_{1}^{(k-1)} \rVert\leq\epsilon$. In practice, we'll use both stopping criteria to guard against an infinite loop.

While simple and efficient, especially for large sparse matrices, the [power iteration method](https://en.wikipedia.org/wiki/Power_iteration) may exhibit slow convergence, mainly when the largest eigenvalue is close in magnitude to other eigenvalues.

Additional references:
* https://www.cs.cornell.edu/~bindel/class/cs6210-f16/lec/2016-10-17.pdf
* https://blogs.sas.com/content/iml/2012/05/09/the-power-method.html

In [33]:
(v,λ) = let

    A = transpose(D)*D; # build a square matrix from the data
    n = size(A,1); # how many rows (cols) do we have?
    vₒ = randn(n); # initial random guess

    # call the poweriteration function
    (v,λ) = poweriteration(A, vₒ, maxiter = maxiter, ϵ = ϵ);

    # return -
    (v,λ)
end;

## Method 2: QR factorization and Gram-Schmidt
Fill me in

In [31]:
A = transpose(D)*D; # build a square matrix from the data
eigen(A)

Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
32-element Vector{Float64}:
      0.7965972275689526
      0.8701429641944097
      2.039551836582301
      2.7904185818636726
      4.190406985710426
     10.500815662637573
     13.066580433098135
     45.2197229642944
     56.38952446979662
     73.71270386779292
    108.95065243994259
    196.56312777062917
    218.30174625892894
      ⋮
   8012.657170514377
  10887.201683745288
  17516.934946637335
  31896.468087663383
  51257.711158272316
  70895.1030640178
 515727.08641797875
      2.0513055222368066e6
      3.0972800898292297e6
      4.241929776681231e6
      1.4358778322967645e7
      8.158508857877813e9
vectors:
32×32 Matrix{Float64}:
 -0.00178058    0.0385867     0.098397     …   0.000158009  -4.03042e-5
  0.0447398    -0.0196024    -0.00142773       0.000314894  -0.000145721
  0.000297513  -0.000988625  -0.00161957       0.177651     -0.148088
  0.0390617    -0.00581845   -0.0149327        0.00344472   -0.002