# Example: Computing the Eigendecomposition using QR decomposition of a Covariance Matrix
In this example, we will compute the eigendecomposition of a covariance matrix using the QR algorithm, which relies on the Gram-Schmidt process for orthogonalization. The covariance matrix will be computed from the daily log growth rate of stock prices.

> __Learning Objectives__
> 
> By the end of this example, you should be able to:
> 
> * __Compute empirical covariance matrices:__ Calculate covariance matrices from financial return data by centering observations and applying the covariance formula.
> * __Apply QR iteration for eigendecomposition:__ Use the QR algorithm to compute eigenvalues and eigenvectors of a covariance matrix and verify results against built-in methods.
> * __Interpret eigendecomposition results:__ Analyze eigenvalues and eigenvectors as market and sector factors to understand sources of variance in financial returns.

Let's get started!
___

## Setup, Data, and Prerequisites
First, we set up the computational environment by including the `Include.jl` file and loading any needed resources.

> The [`include(...)` command](https://docs.julialang.org/en/v1/base/base/#include) evaluates the contents of the input source file, `Include.jl`, in the notebook's global scope. The `Include.jl` file sets paths, loads required external packages, etc. For additional information on functions and types used in this material, see the [Julia programming language documentation](https://docs.julialang.org/en/v1/). 

Let's set up our code environment:

In [1]:
include(joinpath(@__DIR__, "Include.jl")); # include the Include.jl file

In addition to standard Julia libraries, we'll also use [the `VLDataScienceMachineLearningPackage.jl` package](https://github.com/varnerlab/VLDataScienceMachineLearningPackage.jl). Check out [the documentation](https://varnerlab.github.io/VLDataScienceMachineLearningPackage.jl/dev/) for more information on the functions, types, and data used in this material.

### Data
We gathered a daily open-high-low-close dataset for each firm in the S&P 500 from `01-03-2014` until `12-31-2024`, along with data for a few exchange-traded funds and volatility products during that time. 

Let's load the `original_dataset::DataFrame` by calling [the `MyTrainingMarketDataSet()` function](https://varnerlab.github.io/VLDataScienceMachineLearningPackage.jl/dev/data/#VLDataScienceMachineLearningPackage.MyTrainingMarketDataSet) and remove firms that do not have the maximum number of trading days. The cleaned dataset $\mathcal{D}$ will be stored in the `dataset` variable.

In [2]:
original_dataset = MyTrainingMarketDataSet() |> x-> x["dataset"];

Not all tickers in our dataset have the maximum number of trading days for various reasons, e.g., acquisition or de-listing events. Let's collect only those tickers with the maximum number of trading days.

First, let's compute the number of records for a firm that we know has a maximum value, e.g., `AAPL`, and save that value in the `maximum_number_trading_days::Int64` variable:

In [3]:
maximum_number_trading_days = original_dataset["AAPL"] |> nrow;

Now, let's iterate through our data and collect only tickers with `maximum_number_trading_days` records. Save that data in the `dataset::Dict{String,DataFrame}` variable:

In [4]:
dataset = let

    dataset = Dict{String,DataFrame}();
    for (ticker,data) ∈ original_dataset
        if (nrow(data) == maximum_number_trading_days)
            dataset[ticker] = data;
        end
    end
    dataset
end;

Finally, let's get a list of the firms in our cleaned dataset (and sort them alphabetically). We store the sorted firm ticker symbols in the `list_of_tickers::Array{String,1}` variable.

In [5]:
list_of_tickers = keys(dataset) |> collect |> sort # list of firm "ticker" symbols in alphabetical order

424-element Vector{String}:
 "A"
 "AAL"
 "AAP"
 "AAPL"
 "ABBV"
 "ABT"
 "ACN"
 "ADBE"
 "ADI"
 "ADM"
 ⋮
 "WYNN"
 "XEL"
 "XOM"
 "XRAY"
 "XYL"
 "YUM"
 "ZBRA"
 "ZION"
 "ZTS"

Finally, let's set up a ticker map that holds the index of each ticker value. We'll save this in the `tickerindexmap::Dict{String,Int}` dictionary:

In [6]:
tickerindexmap = let

    # initialize -
    tickerindexmap = Dict{String,Int}();
    for i ∈ eachindex(list_of_tickers)
        tickerindexmap[list_of_tickers[i]] = i;
    end

    tickerindexmap;
end

Dict{String, Int64} with 424 entries:
  "EMR"  => 132
  "CTAS" => 101
  "HSIC" => 187
  "KIM"  => 217
  "PLD"  => 310
  "IEX"  => 194
  "BAC"  => 48
  "CBOE" => 69
  "EXR"  => 144
  "NCLH" => 271
  "CVS"  => 103
  "DRI"  => 119
  "DTE"  => 120
  "ZION" => 423
  "AVY"  => 43
  "EW"   => 140
  "EA"   => 124
  "NWSA" => 289
  "CAG"  => 65
  ⋮      => ⋮

__Load the tickers file:__ The tickers file (contained in the `data/` folder) contains a list of all S&P 500 firms as of `12-31-2024`, along with a few exchange-traded funds and volatility products, along with the full name of each firm and the business sector for each firm. We'll save data for each ticker in the `tickerinfo::Dict{String, NamedTuple}` dictionary (where the keys are the ticker symbols, and the values are named tuples containing the full name `name` and business sector `sector` of each firm):

In [22]:
tickerinfo = let

    # initialize -
    tickerinfo = Dict{String, NamedTuple}();
    
    # load the ticker data file -
    path_to_ticker_data_file = joinpath(_PATH_TO_DATA, "Ticker-data.csv");
    df = CSV.read(path_to_ticker_data_file, DataFrame; stringtype=String);
    number_of_firms = nrow(df);

    for i ∈ 1:number_of_firms
        ticker = df[i, :Symbol];
        name = df[i, :Name] |> String;
        sector = df[i, :Sector] |> String;
        tickerinfo[ticker] = (name=name, sector=sector);
    end

    tickerinfo;
end

Dict{String, NamedTuple} with 515 entries:
  "NI"   => (name = "NiSource", sector = "Utilities")
  "EMR"  => (name = "Emerson Electric Company", sector = "Industrials")
  "CTAS" => (name = "Cintas Corporation", sector = "Industrials")
  "HSIC" => (name = "Henry Schein", sector = "Health Care")
  "KIM"  => (name = "Kimco Realty", sector = "Real Estate")
  "PLD"  => (name = "Prologis", sector = "Real Estate")
  "IEX"  => (name = "IDEX Corporation", sector = "Industrials")
  "TPR"  => (name = "Tapestry", sector = "Consumer Discretionary")
  "BAC"  => (name = "Bank of America", sector = "Financials")
  "CBOE" => (name = "Cboe Global Markets", sector = "Financials")
  "EXR"  => (name = "Extra Space Storage", sector = "Real Estate")
  "NCLH" => (name = "Norwegian Cruise Line Holdings", sector = "Consumer Discre…
  "CVS"  => (name = "CVS Health", sector = "Health Care")
  "DRI"  => (name = "Darden Restaurants", sector = "Consumer Discretionary")
  "DTE"  => (name = "DTE Energy", sector = "Uti

### Compute the return matrix
Next, let's compute the return array which contains, for each day and each firm in our dataset, the value of the growth rate between time $j$ and $j-1$. 

>  __Continuously Compounded Growth Rate (CCGR)__
>
> Let's assume a model of the share price of firm $i$ is governed by an expression of the form:
>$$
\begin{align*}
S^{(i)}_{j} &= S^{(i)}_{j-1}\;\exp\left(\underbrace{g^{(i)}_{j,j-1}\Delta{t}_{j}}_{\text{return}}\right)
\end{align*}
$$
> where $S^{(i)}_{j-1}$ denotes the share price of firm $i$ at time index $j-1$, $S^{(i)}_{j}$ denotes the share price of firm $i$ at time index $j$, and $\Delta{t}_{j} = t_{j} - t_{j-1}$ denotes the length of a time step (units: years) between time index $j-1$ and $j$. The value we are going to estimate is the return, which is the growth rate $g^{(i)}_{j,j-1}$ (units: inverse years) for each firm $i$ multiplied by the time step in the dataset. 

We've implemented [the `log_growth_matrix(...)` function](https://varnerlab.github.io/VLDataScienceMachineLearningPackage.jl/dev/data/#VLDataScienceMachineLearningPackage.log_growth_matrix) which takes the cleaned dataset and a list of ticker symbols, and returns the growth rate array. Each row of the growth rate array is a time step, while each column corresponds to a firm from the `list_of_tickers::Array{String,1}` array. We can get the return by passing in a time-step value of `1`.

We save the growth rate array in the `X::Array{Float64,2}` variable:

In [7]:
X = let

    # initialize -
    r̄ = 0.0; # assume the risk-free rate is 0

    # compute the growth matrix -
    growth_rate_array = log_growth_matrix(dataset, list_of_tickers, Δt = 1.0, 
        risk_free_rate = r̄); # other optional parameters are at their defaults

    growth_rate_array; # return
end;

___

## Task 1: Compute the Empirical Covariance Matrix
In this task, let's compute the empirical covariance matrix $\hat{\mathbf{\Sigma}}$ for our dataset $\mathcal{D}$ using code that we write ourselves (we'll never do this in practice, but it's a good exercise). The empirical covariance matrix is given by:
$$
\hat{\mathbf{\Sigma}} = \frac{1}{n-1}\tilde{\mathbf{X}}^{\top}\tilde{\mathbf{X}}
$$
where $\tilde{\mathbf{X}}$ is the centered data matrix:
$$
\tilde{\mathbf{X}} = \mathbf{X} - \mathbf{1}\mathbf{m}^{\top}
$$
where $\mathbf{1} \in \mathbb{R}^{n}$ is a vector of ones, and $\mathbf{1}\mathbf{m}^{\top}$ creates an $n \times m$ matrix where each row is identical and contains the mean returns for each firm.

> __Outer product:__ The $\mathbf{1}\mathbf{m}^{\top}$ is an example of an outer product. The [outer product](https://en.wikipedia.org/wiki/Outer_product) of two vectors $\mathbf{a} \in \mathbb{R}^{n}$ and $\mathbf{b} \in \mathbb{R}^{m}$ is the $n \times m$ matrix $\mathbf{a}\mathbf{b}^{\top}$. Each element of the outer product is computed as $(\mathbf{a}\mathbf{b}^{\top})_{ij} = a_i b_j$. 

Let's start by constructing the data matrix $\mathbf{X} \in\mathbb{R}^{n \times m}$ where each row $k$ contains the __returns__ for all $m$ firms at time period $k$. To compute the returns, we [use the `log_growth_matrix(...)` function from the `VLDataScienceMachineLearningPackage.jl` package](https://varnerlab.github.io/VLDataScienceMachineLearningPackage.jl/dev/data/#VLDataScienceMachineLearningPackage.log_growth_matrix) and multiply by the time step $\Delta{t}$. 

First, let's compute the mean returns for each firm and store them in the `m::Array{Float64,1}` variable:

In [8]:
m = mean(X, dims=1) |> vec # mean returns for each firm

424-element Vector{Float64}:
  0.00043152690397797527
 -0.0001469541078208755
 -0.00031710102271930243
  0.0009238140812477853
  0.00044105742821765466
  0.00038868965269282677
  0.0005289761752874106
  0.0007284547060446172
  0.0005265024709424277
  5.618535648601576e-5
  ⋮
 -0.00029807586991059844
  0.00032461074815436536
  2.6916182694995763e-5
 -0.0003392763144159303
  0.00043857557703725285
  0.0002064503070884351
  0.0007130714183989561
  0.00021701413350653834
  0.0005858730432623267

Now, let's form the centered data matrix $\tilde{\mathbf{X}}$ by subtracting the mean returns from each row of the data matrix $\mathbf{X}$. We store the centered data in the `X_centered::Array{Float64,2}` variable:

In [9]:
r, c = size(X)
ones_vector = ones(r)
⊗(ones_vector, m) # outer product of ones_vector and m

2766×424 Matrix{Float64}:
 0.000431527  -0.000146954  -0.000317101  …  0.000217014  0.000585873
 0.000431527  -0.000146954  -0.000317101     0.000217014  0.000585873
 0.000431527  -0.000146954  -0.000317101     0.000217014  0.000585873
 0.000431527  -0.000146954  -0.000317101     0.000217014  0.000585873
 0.000431527  -0.000146954  -0.000317101     0.000217014  0.000585873
 0.000431527  -0.000146954  -0.000317101  …  0.000217014  0.000585873
 0.000431527  -0.000146954  -0.000317101     0.000217014  0.000585873
 0.000431527  -0.000146954  -0.000317101     0.000217014  0.000585873
 0.000431527  -0.000146954  -0.000317101     0.000217014  0.000585873
 0.000431527  -0.000146954  -0.000317101     0.000217014  0.000585873
 ⋮                                        ⋱               
 0.000431527  -0.000146954  -0.000317101     0.000217014  0.000585873
 0.000431527  -0.000146954  -0.000317101     0.000217014  0.000585873
 0.000431527  -0.000146954  -0.000317101     0.000217014  0.000585873
 0.00

The outer product creates a matrix where each row contains the mean returns, allowing us to subtract the mean from each data point in the matrix.

In [10]:
X_centered = let 
    r, c = size(X)
    ones_vector = ones(r)
    X̃ = X .- ⊗(ones_vector, m);
end

2766×424 Matrix{Float64}:
 -0.00391389    0.0250717    -0.0110757    …   0.000758758  -0.00457502
  0.0107441     0.00439892    0.00584244      -0.00340269    0.00332868
  0.0127155     0.00354218    0.000338403      0.00450918   -0.0108297
  0.00213365    0.0686387     0.00703199       0.0123199    -0.0020471
  0.00677517    0.0103835     0.0134887       -0.00882298    0.0168867
  0.00200431   -0.0155826    -0.00282885   …  -0.00779546   -0.0129519
  0.0109205    -0.00177269    0.0195462       -0.00726801   -0.00490968
  0.00769032    0.0041688     0.00788889       0.0174411    -0.00113277
  0.00477837    0.00679031    0.000742733     -0.00869703    0.00511985
  0.00441036    0.0244706     0.00401781       0.011203     -0.00628531
  ⋮                                        ⋱                
 -0.0177684     0.0154026    -0.0091056       -0.0366419    -0.0162462
 -0.0103991    -0.0102059    -0.0398453       -0.0282911    -0.02892
  0.00835239    0.0166178     0.0291932        0.0147388 

Finally, let's compute the empirical covariance matrix $\hat{\mathbf{\Sigma}}$ and store it in the `Σ̂::Array{Float64,2}` variable:

In [11]:
Σ̂ = let 

    # initialize -
    T = 252; # number of trading days in a year
    (r,c) = size(X_centered)
    Σ = (1/(r-1)) * (X_centered' * X_centered)
    Σ*T; # return the annualized empirical covariance matrix
end

424×424 Matrix{Float64}:
 0.0535771   0.032397    0.0212982  …  0.0355662   0.0280938   0.0266666
 0.032397    0.207093    0.0431437     0.051429    0.0689041   0.0280481
 0.0212982   0.0431437   0.117391      0.0308539   0.037867    0.019717
 0.0229702   0.0312627   0.0163174     0.032633    0.0201827   0.022609
 0.0179206   0.0176175   0.0158407     0.0157326   0.0168026   0.0185472
 0.0244078   0.0195811   0.0145372  …  0.023159    0.0167416   0.0221966
 0.025643    0.0334402   0.0218931     0.0313218   0.0280694   0.023824
 0.0294242   0.02952     0.0185233     0.0369478   0.0198333   0.0262802
 0.0297618   0.0458278   0.0231343     0.0433011   0.0335821   0.0233085
 0.0169025   0.0325409   0.0206063     0.0219846   0.0327002   0.0131991
 ⋮                                  ⋱                          
 0.0339199   0.0921199   0.035255   …  0.0503196   0.0581754   0.0308858
 0.00888225  0.00814595  0.0118987     0.00616705  0.00592554  0.0112662
 0.0161836   0.0376862   0.0190661    

__Check__: Let's check our covariance matrix against [the `cov(...)` function from the Julia standard library](https://docs.julialang.org/en/v1/stdlib/Statistics/#Statistics.cov). Compute the covariance matrix using the built-in function and comapre it to your result:

> __Test__ We'll compare the two covariance matrices by computing the Frobenius norm of their difference. The Frobenius norm of a matrix $\mathbf{A} \in \mathbb{R}^{n \times m}$ is defined as:
> $$
\|\mathbf{A}\|_{F} = \sqrt{\sum_{i=1}^{n}\sum_{j=1}^{m} |a_{ij}|^{2}}
> $$
> where $a_{ij}$ is the element in the $i^{th}$ row and $j^{th}$ column of matrix $\mathbf{A}$. If the Frobenius norm of the difference between the two covariance matrices is very small (close to zero), it indicates that they are nearly identical, confirming the correctness of our implementation. We'll use the [`@assert` macro](https://docs.julialang.org/en/v1/base/base/#Base.@assert) to enforce this check.

So what do we see?


In [None]:
let

    # initialize -
    ϵ = 1e-8; # tolerance for the Frobenius norm comparison
    T = 252; # number of trading days in a year
    Σ_builtin = cov(X)*T; # annualized empirical covariance matrix using built
    Δ = Σ̂ - Σ_builtin;
    frobenius_norm = norm(Δ); # Frobenius norm (default for matrices)
    test = frobenius_norm < ϵ

    # if test fails, throw an error -
    @assert test "Covariance matrices do not match within tolerance!"
end

Ok! So if we get here without an error, our covariance matrix implementation is correct!

___

## Task 2: Compute the Eigendecomposition using the QR Algorithm
In this task, we will compute the eigendecomposition of the empirical covariance matrix $\hat{\mathbf{\Sigma}}$ using our implementation of the QR algorithm in [the `qriteration(...)` function in the `Compute.jl` file](../src/Compute.jl).

> __QR Iteration Convergence:__ The QR algorithm iteratively decomposes the matrix into orthogonal $\mathbf{Q}$ and upper-triangular $\mathbf{R}$ factors, then reverses the product to form $\mathbf{A}_{k+1} = \mathbf{R}_{k}\mathbf{Q}_{k}$. This process gradually transforms the matrix toward diagonal form, with diagonal elements converging to eigenvalues while the accumulated product of $\mathbf{Q}$ matrices yields the corresponding eigenvectors.

We'll save the eigenvalues in the `λ̂::Array{Float64,1}` variable and the eigenvectors in the `V̂::Array{Float64,2}` variables. The eigenvalues and eigenvectors are sorted in descending order based on the eigenvalue magnitude.

In [None]:
(λ̂,V̂) = let

    # initialize -
    maxiter = 1000; # max number of iterations
    tolerance = 1e-9; # tolerance for convergence

    # call our QR iteration function -
    result = qriteration(Σ̂; maxiter = maxiter, tolerance = tolerance);

    λ = result[1]; # eigenvalues
    tmpdict = result[2]; # eigenvectors
    number_of_rows = length(λ);
    V = zeros(number_of_rows, number_of_rows);
    for i ∈ 1:number_of_rows
        V[:,i] = tmpdict[i];
    end

    # sort the eigenpairs by eigenvalue magnitude -
    p = sortperm(λ, rev=true); # indices that would sort λ in descending order
    λ = λ[p]
    V = V[:,p]

    (λ,V); # return
end

([11.438058870655444, 1.595819459020517, 1.1983172323103428, 0.9913463704158854, 0.8821003299986302, 0.7218934238657446, 0.5110538829534128, 0.4433158835074257, 0.42410094575995183, 0.40109541051295494  …  0.003141200093345577, 0.0030032234323237325, 0.0029450466978901834, 0.0029043419685604965, 0.0024374860556287833, 0.0023239405869121253, 0.0011683475444739903, 0.0008197993962321032, 0.0008004082795630774, 0.00020091649390589135], [-0.042413850294204546 0.05785052459201625 … -0.003749383941944555 0.002294131234357588; -0.08444908198854581 -0.05786956337900864 … 0.000697604059543025 -0.0009776709554531712; … ; -0.06871769022820234 -0.07600622038332525 … 0.001978653491762944 0.00809290103175616; -0.037098632606339624 0.06134369996581131 … -0.006274549177533917 0.0019203225901283023])

Before we think about what the eigenvalues and eigenvectors mean, let's verify that our implementation is correct by checking the values against the built-in Julia [`eigen(...)` function](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.eigen). 

> __Check__: We perform two verification tests. First, we compare the eigenvalues by computing the maximum absolute difference between our computed eigenvalues and those from the built-in function, ensuring the difference is less than a tolerance of $10^{-4}$. Second, we compare the eigenvectors by normalizing each pair, accounting for sign ambiguity (eigenvectors are defined up to a sign), and verifying the maximum component-wise difference is less than a tolerance of $10^{-3}$.

So what do we see?

In [14]:
let

    # compute the eigendecomposition using the built-in function -
    F = eigen(Σ̂);
    λ = F.values; # grab the eigenvalues
    V = F.vectors; # grab the eigenvectors

    # sort the eigenpairs by eigenvalue magnitude -
    p = sortperm(λ, rev=true); # indices that would sort λ in descending order
    λ = λ[p]
    V = V[:,p]

    # Test 1: let's compare the eigenvalues -
    ϵ = 1e-4; # tolerance for comparison
    maximum_eigenvalue_delta = maximum(abs.(λ̂ - λ))
    @assert maximum_eigenvalue_delta < ϵ "Eigenvalues do not match within tolerance!"

    # Test 2: let's compare the eigenvectors (up to a sign) -
    ΔV = similar(V̂)
    for i ∈ 1:length(λ̂)
        v1 = V̂[:,i] / norm(V̂[:,i])
        v2 = V[:,i] / norm(V[:,i])
        if dot(v1, v2) < 0
            v2 = -v2
        end
        ΔV[:,i] = v1 - v2
    end
    ϵv = 1e-3; # tolerance for eigenvector comparison
    maximum_eigenvector_delta = maximum(abs.(ΔV))
    @assert maximum_eigenvector_delta < ϵv "Eigenvectors do not match within tolerance!"
end

Ok, so if we get here without an error, our QR algorithm implementation produced correct eigenvalues and eigenvectors!

___

## Task 3: What do the Eigenvalues and Eigenvectors Mean?
In this task, we'll interpret the eigenvalues and eigenvectors we computed from the empirical covariance matrix $\hat{\mathbf{\Sigma}}$. Eigendecomposition is a type of matrix factorization of the form:
$$
\begin{align*}
\hat{\mathbf{\Sigma}} &= \mathbf{V}\mathbf{\Lambda}\mathbf{V}^{\top}
\end{align*}
$$
where $\mathbf{V} \in \mathbb{R}^{m \times m}$ is a matrix whose columns are the orthonormal eigenvectors of $\hat{\mathbf{\Sigma}}$, and $\mathbf{\Lambda} \in \mathbb{R}^{m \times m}$ is a diagonal matrix whose diagonal elements are the eigenvalues of $\hat{\mathbf{\Sigma}}$. One way to interpret the eigenvalues and eigenvectors is through the lens of market factor models. 

> __Market Factor:__ The eigenvector $\mathbf{v}_1$ corresponding to the largest eigenvalue $\lambda_{1}$ can be interpreted as the market factor, while the other eigenvectors $\mathbf{v}_2, \mathbf{v}_3, \ldots, \mathbf{v}_m$ correspond to sector or idiosyncratic factors (where we assume the eigenvalues are sorted in descending order).

Let's start by verifying that the eigenvectors we computed are orthonormal. We can do this by checking that the matrix product of the transpose of the eigenvector matrix $\mathbf{V}^{\top}$ and the eigenvector matrix $\mathbf{V}$ yields the identity matrix $\mathbf{I}$.

> __Check:__ In the check below, we compute the product $\mathbf{V}^{\top}\mathbf{V}$ and verify that it is close to the identity matrix within a specified tolerance. In particular, we compute the maximum absolute difference between the elements of the computed product and the identity matrix, and assert that this difference is less than a small tolerance value (e.g., $1 \times 10^{-6}$).

So what do we see?

In [15]:
let
    I_test = transpose(V̂)*V̂ |> Matrix; # compute the identity test matrix
    I_true = Matrix(I, size(I_test)...); # true identity matrix
    ϵ = 1e-6; # tolerance for comparison
    ΔI = I_test - I_true;
    maximum_identity_delta = maximum(abs.(ΔI))
    @assert maximum_identity_delta < ϵ "Eigenvectors are not orthonormal within tolerance!" 
end

If we get here without an error, the eigenvectors are orthonormal! Now, let's examine the largest eigenvector $\mathbf{v}_1$ and interpret it as the market factor. In particular, let's compute the normalized absolute values of the components of $\mathbf{v}_1$ to understand the relative influence of each firm on the market factor.

> __Normalized Influence:__ We normalize by dividing each $|v_{1,i}|$ by the sum of all absolute components, creating a percentage-like measure of relative contribution. This shows which firms have the strongest loadings on this principal direction of variance.

In [None]:
let
    
    # initialize -
    number_of_firms = length(list_of_tickers);
    number_of_firms_to_display = 20; # number of firms to display
    df = DataFrame(); # initialize the DataFrame to hold the results

    # get the largest eigenvector, and scale it appropriately
    v₁ = V̂[:,1]; # get the largest eigenvector (associated with the largest eigenvalue)
    σ = sum(abs.(v₁)) |> T -> (1/T)*abs.(v₁)
    sortperm_indices = sortperm(σ, rev=true); # indices that would sort σ in descending order
    
    
    for i ∈ 1:number_of_firms_to_display
        firm_index = sortperm_indices[i];
        ticker = list_of_tickers[firm_index];
        influence = σ[firm_index];
        
        # get data from the tickerinfo dictionary
        name = tickerinfo[ticker].name
        sector = tickerinfo[ticker].sector
        
        push!(df, (
            Ticker = ticker, rank = i, Name = name, Sector = sector
        ))
    end

    # make a table display of the results -
    pretty_table(
        df;
        backend = :text,
        fit_table_in_display_horizontally = false,
        fit_table_in_display_vertically = false,
        table_format = TextTableFormat(borders = text_table_borders__compact)
    );

end

 -------- ------- ------- -------------------------------- ------------------------
 [1m Ticker [0m [1m  mode [0m [1m  rank [0m [1m                           Name [0m [1m                 Sector [0m
 [90m String [0m [90m Int64 [0m [90m Int64 [0m [90m                         String [0m [90m                 String [0m
 -------- ------- ------- -------------------------------- ------------------------
    NCLH       1       1   Norwegian Cruise Line Holdings   Consumer Discretionary
     CCL       1       2             Carnival Corporation   Consumer Discretionary
     RCL       1       3            Royal Caribbean Group   Consumer Discretionary
     LNC       1       4                 Lincoln National               Financials
    PENN       1       5             Penn National Gaming   Consumer Discretionary
     APA       1       6                  APA Corporation                   Energy
     MGM       1       7        MGM Resorts International   Consumer Discretionar

__So how should we interpret these values?__ The top eigenvector is tied to the direction in return space with the largest variance. So the names with the largest coefficients $|v_{1,i}|$ are the ones that, in our sample, have the strongest exposure to that highest-variance mode. Ok, but why these particular names?

> __Why our specific tickers?__ 
> 
> The list of tickers with the largest influence share two properties over the last 10 years, especially including 2020: they have very high volatility (big day-to-day moves), and strong common macro sensitivity. For example, the $\beta$ (a measure of sensitivity to market moves) for many of these names such as `NCLH` or `CCL` is above $\beta > 2$ (meaning they tend to move twice as much as the market, either up or down, on average). This list of tickers also seems to be clustered around the travel and leisure sectors, which were heavily impacted during the COVID-19 pandemic.

Ok, if this is true, why do we have some low $\beta < 1$ assets on this list, e.g., `HAL` or `FANG`?
> __Be careful!__ Just because a stock has a high loading on the first eigenvector does not necessarily mean it has a high $\beta$ coefficient. The first eigenvector captures the direction of maximum variance in the data, which may not align perfectly with market movements of the SP500, e.g., the `SPY` ETF. Some stocks may have high volatility due to other factors, such as sector-specific news or events, which can lead to a high loading on the first eigenvector even if their $\beta$ is low.

What about other eigenvectors? Can we interpret them as sector factors?

In [None]:
let

    # initialize -
    index_to_look_at = 2; # index of the eigenvector to look at (sorted by eigenvalue magnitude 1 = largest)
    number_of_firms = length(list_of_tickers);
    number_of_firms_to_display = 10; # number of firms to display
    df = DataFrame(); # initialize the DataFrame to hold the results

    # get the largest eigenvector, and scale it appropriately
    v₁ = V̂[:,index_to_look_at]; # get the largest eigenvector (associated with the largest eigenvalue)
    σ = sum(abs.(v₁)) |> T -> (1/T)*abs.(v₁)
    sortperm_indices = sortperm(σ, rev=true); # indices that would sort σ in descending order
    
    
    for i ∈ 1:number_of_firms_to_display
        firm_index = sortperm_indices[i];
        ticker = list_of_tickers[firm_index];
        influence = σ[firm_index];
        
        # get data from the tickerinfo dictionary
        name = tickerinfo[ticker].name
        sector = tickerinfo[ticker].sector
        
        push!(df, (
            Ticker = ticker, rank = i, Name = name, Sector = sector
        ))
    end

    # make a table display of the results -
    pretty_table(
        df;
        backend = :text,
        fit_table_in_display_horizontally = false,
        fit_table_in_display_vertically = false,
        table_format = TextTableFormat(borders = text_table_borders__compact)
    );

end

 -------- ------- ------- ---------------------- ------------------------
 [1m Ticker [0m [1m  mode [0m [1m  rank [0m [1m                 Name [0m [1m                 Sector [0m
 [90m String [0m [90m Int64 [0m [90m Int64 [0m [90m               String [0m [90m                 String [0m
 -------- ------- ------- ---------------------- ------------------------
     APA       2       1        APA Corporation                   Energy
    FANG       2       2     Diamondback Energy                   Energy
     DVN       2       3           Devon Energy                   Energy
     OXY       2       4   Occidental Petroleum                   Energy
     HAL       2       5            Halliburton                   Energy
     EOG       2       6          EOG Resources                   Energy
     HES       2       7       Hess Corporation                   Energy
     SLB       2       8           Schlumberger                   Energy
     COP       2       9         Co

__Wow! Yes, we can!__ When we examine subsequent eigenvectors, we see that they often correspond to specific sectors in the market. For example, the second eigenvector shown above tends to have high loadings for firms in particular sectors such as energy or financials. This suggests that these eigenvectors capture sector-specific factors that influence stock returns, beyond the broad market factor captured by the first eigenvector.

> __Exploring Other Modes:__ You can change the `index_to_look_at` variable to explore different eigenvectors (e.g., 2, 3, 4, etc.). Each successive eigenvector captures orthogonal directions of variance, often revealing distinct sector or style factors in the market.

There is so much more we could explore here, but we'll stop for now. Great work!

___

## Summary
The QR iteration algorithm computes the eigendecomposition of a covariance matrix, revealing market and sector factors that explain variance in financial returns.

> __Key Takeaways__
> 
> * **Empirical covariance captures return relationships:** The covariance matrix computed from centered return data quantifies the statistical relationships between asset returns across the market.
> * **QR iteration converges to eigenvalues:** The QR algorithm iteratively transforms the covariance matrix into a form where diagonal elements converge to eigenvalues, with corresponding eigenvectors capturing principal directions of variance.
> * **Eigenvectors represent market factors:** The largest eigenvector identifies the primary market factor with the highest variance exposure, while subsequent eigenvectors often correspond to sector-specific factors.


Eigendecomposition provides a mathematical framework for understanding the structure of financial return data and identifying common sources of risk.

___