# Example: Learning Linear Regression Parameters using Singular Value Decomposition (SVD)
This example will familiarize students with using [Singular Value Decomposition]() to estimate the parameters in ordinary least squares and regularized least squares calculations. In particular, we'll look at estimating the parameters in single-index models for equity returns.

### Single index model of Sharpe
A single index model describes the return of a firm’s stock in terms of a firm-specific return and the overall market return. One of the simplest (yet still widely used) single index models was developed by [Sharpe (1963)](https://en.wikipedia.org/wiki/Single-index_model#:~:text=The%20single%2Dindex%20model%20(SIM,used%20in%20the%20finance%20industry.)). Let $R_{i}(t)\equiv\left(r_{i}\left(t\right) - r_{f}\right)$ 
and $R_{m}(t)\equiv\left(r_{m}\left(t\right)-r_{f}\right)$ denote the firm-specific and market **excess returns**, where $r_{f}$ denotes the risk-free rate of return.
Further, let $\epsilon_{i}\left(t\right)$ denote stationary normally distributed random noise
with mean zero and standard deviation $\sigma_{i}$. Then, the single index model of Sharpe is given by:

$$
\begin{equation*}
R_{i}\left(t\right) = \alpha_{i}+\beta_{i}\cdot{R}_{m}\left(t\right)+\epsilon_{i}
\left(t\right)\qquad{t=1,2,\dots,T}
\end{equation*}
$$

where $\alpha_{i}$ and $\beta_{i}$ are (unknown) model parameters: 
* $\alpha_{i}$ describes the firm-specific return not explained by the market; thus, $\alpha_{i}$ is the idiosyncratic return of firm $i$.
* $\beta_{i}$ has two interpretations. First, it measures the relationship between the excess return of firm $i$ and the excess return of the market. 
A large $\beta_{i}$ suggests that the market returns (or losses) are amplified for firm $i$, while a small $\beta_{i}$ suggests that the market returns (or losses) are damped for firm $i$. 
Second, it represents the relative risk of investing in a firm $i$ relative to the overall market.

## Setup
This example requires several external libraries and a function to compute the outer product. Let's download and install these packages and call our `Include.jl` file.

In [1]:
include("Include.jl");

[32m[1m    Updating[22m[39m git-repo `https://github.com/varnerlab/VLDecisionsPackage.jl.git`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-4800-5800-Examples-AY-2024/week-10/L10c/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-4800-5800-Examples-AY-2024/week-10/L10c/Manifest.toml`
[32m[1m  Activating[22m[39m project at `~/Desktop/julia_work/CHEME-4800-5800-Examples-AY-2024/week-10/L10c`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-4800-5800-Examples-AY-2024/week-10/L10c/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-4800-5800-Examples-AY-2024/week-10/L10c/Manifest.toml`
[32m[1m    Updating[22m[39m registry at `~/.julia/registries/General.toml`
[32m[1m    Updating[22m[39m git-repo `https://github.com/varnerlab/VLDecisionsPackage.jl.git`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-4800-5800-Examples-AY-2024/w

## Prerequisites 
We gathered a daily open-high-low-close `dataset` for each firm in the [S&P500](https://en.wikipedia.org/wiki/S%26P_500) from `01-03-2018` until `03-14-2024`, along with data for a few exchange-traded funds and volatility products during that time. 
* We load the `orignal_dataset` by calling the `MyPortfolioDataSet()` function and then remove firms that do not have the maximum number of trading days, i.e., they are missing data. We save the cleaned data in the `dataset` variable.
* We'll also grab the list of firms in the `dataset` and sort them alphabetically. We'll store the sorted list of firms in the `my_list_of_tickers` dataset.
* Finally, we'll compute the excess return matrix `R` and the excess return for the market index, in this case treated as `SPY`.

In [2]:
dataset = Dict{String,DataFrame}();
original_dataset = MyPortfolioDataSet() |> x->x["dataset"];
maximum_number_trading_days = original_dataset["AAPL"] |> nrow;
for (ticker,data) ∈ original_dataset
    if (nrow(data) == maximum_number_trading_days)
        dataset[ticker] = data;
    end
end
my_list_of_tickers = keys(dataset) |> collect |> x->sort(x);

The single index model uses a market index, e.g., the [S&P500](https://en.wikipedia.org/wiki/S%26P_500), as the base from which we compute the return for firm $i$. Let's get the index in our dataset that corresponds to [SPDR SPY](https://www.ssga.com/us/en/intermediary/etfs/capabilities/spdr-core-equity-etfs/spy-sp-500), an ETF that tracks the [S&P500](https://en.wikipedia.org/wiki/S%26P_500). 
* We'll use the [findfirst function](https://docs.julialang.org/en/v1/base/strings/#Base.findfirst-Tuple{AbstractString,%20AbstractString}) on the `my_list_of_tickers` list to find the index that corresponds to `SPY`, we'll use this later. We'll store this value in the `idx_spy` variable.

In [3]:
idx_spy = findfirst(x->x=="SPY", my_list_of_tickers); 

Finally, we must calculate the `excess return` from the price information stored in the `dataset.` We'll do this for every firm in the `dataset` using the `μ(...)` function. We'll store the excess return data in the `R` matrix, and the excess return for `SPY` in the `Rₘ` variable:

In [4]:
risk_free_rate = 0.05;
R = μ(dataset, my_list_of_tickers, risk_free_rate = risk_free_rate) |> x-> transpose(x) |> Matrix;
Rₘ = R[idx_spy, :]; # this is the growth rate of the market, which we apoproximate by the SPY ETF

In [5]:
Rₘ

1557-element Vector{Float64}:
  1.5050377454909538
  1.095468088086578
  0.7757592626051124
  0.7390204094703445
 -0.659217123297486
  1.2784386070568685
  1.8262837004768115
  0.5011457033442471
  0.5948732710078719
  0.44603924309987253
  0.4564092862912787
  1.7522322189385078
  1.0553224381234976
  ⋮
 -0.4719770170266065
 -0.14738780347442887
  0.8981279176165984
  1.5807628608286195
  0.6777283645585958
 -2.4011983851922336
  0.9188625992892551
  2.087665996949812
 -0.16219197151137427
 -1.7062801647597288
  2.118848347155344
  0.5559387917487099

## Compute $\theta$ using pseudo-inverse and SVD approaches

#### Pseudo inverse approach
Regularized least squares estimates of the unknown parameters $\mathbf{\beta}$ \textit{minimize} the sum of squared errors between the model calculated and observed 
outputs plus a penalty term:
\begin{equation*}
\hat{\mathbf{\theta}}_{\lambda} = \arg\min_{\mathbf{\theta}} ||~\mathbf{y} - \mathbf{X}\cdot\mathbf{\theta}~||^{2}_{2} + \lambda\cdot||~\mathbf{\theta}~||^{2}_{2}
\end{equation*}
where $||\star||^{2}_{2}$ is the square of the p = 2 vector norm, $\lambda\geq{0}$ denotes the regularization parameter, and $\hat{\mathbf{\theta}}$ denotes the estimated parameter vector. 
The $\hat{\mathbf{\theta}}$ that minimizes the $||\star||^{2}_{2}$ loss plus penalty for data matrix $\mathbf{X}$ is given by:
\begin{equation*}
\hat{\mathbf{\theta}}_{\lambda} = \left(\mathbf{X}^{T}\mathbf{X}+\lambda\cdot\mathbf{I}\right)^{-1}\mathbf{X}^{T}\mathbf{y} - \left(\mathbf{X}^{T}\mathbf{X}+\lambda\cdot\mathbf{I}\right)^{-1}\mathbf{X}^{T}\mathbf{\epsilon}
\end{equation*}
The matrix $\mathbf{X}^{T}\mathbf{X}+\lambda\cdot\mathbf{I}$ is called the _regularized normal matrix_, while $\mathbf{X}^{T}\mathbf{y}$ is called the _moment vector_.
The matrix $\left(\mathbf{X}^{T}\mathbf{X}+\lambda\cdot\mathbf{I}\right)^{-1}$ must exist for the solution $\hat{\mathbf{\theta}}_{\lambda}$ to exist.

#### SVD approach
Let the [singular value decomposition (SVD)](https://en.wikipedia.org/wiki/Singular_value_decomposition) of the $n\times{p}$ data matrix $\mathbf{X}$ be given by:
\begin{equation}
\mathbf{X} = \mathbf{U}\cdot\mathbf{\Sigma}\cdot\mathbf{V}^{T}
\end{equation}
where $\mathbf{U}$ is an orthogonal matrix, $\mathbf{\Sigma}$ is a diagonal singular value matrix,
and $\mathbf{V}$ is an orthogonal matrix. Then, the regularized least-squares estimate of the unknown parameter vector $\mathbf{\theta}$ is given by:
\begin{equation}
\hat{\mathbf{\theta}}_{\lambda} = \Bigl[\left(\mathbf{\Sigma}^{T}\mathbf{\Sigma}+\lambda\mathbf{I}\right)\mathbf{V}^{T}\Bigr]^{-1}\cdot\mathbf{\Sigma^{T}}\cdot\mathbf{U}^{T}\cdot\mathbf{y}
\end{equation}

In [6]:
λ = 0.0;
ticker_to_explore = "IBM";
idx_of_ticker = findfirst(x->x==ticker_to_explore, my_list_of_tickers);

In [7]:
θ̂_pinv = θ(R,idx_of_ticker, Rₘ, λ = λ, method = MyMatrixAlgebraLearningMethod())

2-element Vector{Float64}:
 -0.06439967698961394
  0.8938876296406099

In [8]:
θ̂_svd = θ(R,idx_of_ticker, Rₘ, λ = λ, method = MySVDLearingMethod())

2-element Vector{Float64}:
 -0.06439967698961394
  0.89388762964061

#### Check: Do the two approaches give the same answer?
Let's use the [@assert macro](https://docs.julialang.org/en/v1/base/base/#Base.@assert) to check if the parameter vectors `θ̂_pinv` and `θ̂_svd` are `close`. If this test fails, [an AssertionError](https://docs.julialang.org/en/v1/base/base/#Core.AssertionError) is thrown.

In [9]:
@assert θ̂_pinv ≈ θ̂_svd

## A deeper look at the SVD approach
One of the exciting things about using the [singular value decomposition](https://en.wikipedia.org/wiki/Singular_value_decomposition) to solve the linear regression problem for the unknown parameter vector $\hat{\theta}_{\lambda}$ is the summation relationship:
$$
\begin{equation*}
\hat{\theta}_{\lambda} = \sum_{i=1}^{r}\frac{\sigma_{i}(\mathbf{u}_{i}^{T}\cdot\mathbf{y})}{\sigma_{i}^{2}+\lambda}\cdot\mathbf{v}_{i}
\end{equation*}
$$
where $r$ denotes the rank of the data matrix $\mathbf{X}$, $\mathbf{u}_{i}$ and $\mathbf{v}_{i}$ are the $i$-th columns of $\mathbf{U}$ and $\mathbf{V}$, respectively, $\sigma_{i}$ is the $i$-th singular value, and $\lambda$ is the regularization parameter.
This tells us what each mode of the [singular value decomposition](https://en.wikipedia.org/wiki/Singular_value_decomposition) contributes to the unknown parameter vector $\hat{\theta}_{\lambda}$, which is __super cool__!!
* Let's reconstruct the unknown parameters by summing the `r` modes of the data matrix $\mathbf{X}$. First, let's set up the data matrix $\mathbf{X}$, and compute the [singular value decomposition](https://en.wikipedia.org/wiki/Singular_value_decomposition) using the [svd function](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.svd) exported by the [Linear Algebra package](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#man-linalg) in Julia.

In [10]:
Y = R[idx_of_ticker, :];
max_length = length(Y);
X = [ones(max_length) Rₘ];
IM = diagm(ones(2)); # we have two parameters -

Compute the [singular value decomposition](https://en.wikipedia.org/wiki/Singular_value_decomposition) using the [svd function](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.svd) exported by the [Linear Algebra package](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#man-linalg) in Julia. By default, the Julia [svd function](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.svd) returns the singular values as a vector, so let's use the [diagm function](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.diagm) to compute a diagonal matrix from the singular value vector.

In [11]:
(U,d,V) = svd(X);
Σ = diagm(d) # the Julia SVD returns only the diagonal vector, using diagm

2×2 Matrix{Float64}:
 98.5355   0.0
  0.0     39.4475

Next, let's compute the [rank](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.rank) of the data matrix $\mathbf{X}\in\mathbb{R}^{n\times{p}}$. We know that the rank $r\leq\min\left(n,p\right)$. But what is our intuition about rank?
* __Alternative view__: The [rank](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.rank) is also the number of non-zero singular values; thus, we can think of the [rank](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.rank) as the number of unique pieces of information that are contained in the data matrix $\mathbf{X}$

In [12]:
r = rank(X) # the number of non-zero modes in the svd decomposition!

2

Now, we can compute the contribution to the unknown parameter vector $\theta$ that comes from each mode of the [singular value decomposition](https://en.wikipedia.org/wiki/Singular_value_decomposition). The most important mode corresponds to the $\beta$ parameter (risk), followed by the $\alpha$, which is a measure of the firm-specific excess return:

In [13]:
θ̂ = zeros(2); # we have two parameters -
for i ∈ 1:r
    

    σᵢ = Σ[i,i];
    uᵢ = U[:,i];
    vᵢ = V[:,i];

    # compute the coefficient -
    cᵢ = (σᵢ*dot(transpose(uᵢ),Y))/(σᵢ^2 + λ)

    # compute the parameter update -
    θ̂ += cᵢ*vᵢ

    @show (i, θ̂)
end

(i, θ̂) = (1, [0.009373526798496713, 0.8931133549095052])
(i, θ̂) = (2, [-0.06439967698961394, 0.8938876296406101])
