# L7b: Estimating Single Index Models (SIMs) from Historical Data
In this lab, students will estimate single index models from historical data. Single index models are widely used in finance to model the returns of assets based on their relationship with a market index.

> __Learning Objectives:__
>
> By the end of this example, you will be able to:
> * **Transform historical price data into growth rate matrices** - We will compute continuously compounded growth rates from daily S&P500 stock prices spanning 2014-2024 and verify full column rank (424 independent firms) in the resulting matrix, gaining hands-on experience structuring financial time series for econometric analysis.
>
> * **Estimate single index model parameters and quantify uncertainty** - We will apply the normal equations approach to estimate alpha and beta parameters using ordinary least squares, then compute residual variance, standard errors, and 95% confidence intervals to assess the statistical significance of our parameter estimates.
>
> * **Apply SVD-based estimation for numerical robustness** - We will decompose the design matrix using Singular Value Decomposition and reconstruct parameters via the pseudoinverse formula $\hat{\mathbf{\theta}} = \sum_{i=1}^{r}(\mathbf{u}_i^{\top}\mathbf{y}/\sigma_i)\mathbf{v}_i$, learning why SVD provides superior numerical stability compared to direct matrix inversion when dealing with potential multicollinearity.

Let's go!
___

## Setup, Data, and Prerequisites
First, we set up the computational environment by including the `Include.jl` file and loading any needed resources.

> __Include:__ The [`include(...)` command](https://docs.julialang.org/en/v1/base/base/#include) evaluates the contents of the input source file, `Include.jl`, in the notebook's global scope. The `Include.jl` file sets paths, loads required external packages, etc. For additional information on functions and types used in this material, see the [Julia programming language documentation](https://docs.julialang.org/en/v1/). 

Let's set up our code environment:

In [3]:
include(joinpath(@__DIR__, "Include.jl")); # include the Include.jl file

In addition to standard Julia libraries, we'll also use [the `VLDataScienceMachineLearningPackage.jl` package](https://github.com/varnerlab/VLDataScienceMachineLearningPackage.jl). Check out [the documentation](https://varnerlab.github.io/VLDataScienceMachineLearningPackage.jl/dev/) for more information on the functions, types, and data used in this material.

### Data
We gathered a daily open-high-low-close dataset for each firm in the [S&P500](https://en.wikipedia.org/wiki/S%26P_500) from `01-03-2014` until `12-31-2024`, along with data for a few exchange-traded funds and volatility products during that time. 

Let's load the `original_dataset::DataFrame` by calling [the `MyTrainingMarketDataSet()` function](https://varnerlab.github.io/VLDataScienceMachineLearningPackage.jl/dev/data/#VLDataScienceMachineLearningPackage.MyTrainingMarketDataSet) and remove firms that do not have the maximum number of trading days. The cleaned dataset $\mathcal{D}$ will be stored in the `dataset` variable.

In [6]:
original_dataset = MyTrainingMarketDataSet() |> x-> x["dataset"];

Not all tickers in our dataset have the maximum number of trading days for various reasons, e.g., acquisition or de-listing events. Let's collect only those tickers with the maximum number of trading days.

First, let's compute the number of records for a firm that we know has a maximum value, e.g., `AAPL`, and save that value in the `maximum_number_trading_days::Int64` variable:

In [8]:
maximum_number_trading_days = original_dataset["AAPL"] |> nrow

2767

Now, let's iterate through our data and collect only tickers with `maximum_number_trading_days` records. Save that data in the `dataset::Dict{String,DataFrame}` variable:

In [10]:
dataset = Dict{String,DataFrame}();
for (ticker,data) ∈ original_dataset
    if (nrow(data) == maximum_number_trading_days)
        dataset[ticker] = data;
    end
end
dataset;

How many firms do we have with the full number of trading days? Let's use [the `length(...)` method](https://docs.julialang.org/en/v1/base/collections/#Base.length) - notice this works for dictionaries, in addition to arrays, sets, and other collections.

In [12]:
length(dataset) # tells us how many keys are in the dictionary (how many firms in our dataset?)

424

Finally, let's get a list of the firms in our cleaned dataset (and sort them alphabetically). We store the sorted firm ticker symbols in the `list_of_tickers::Array{String,1}` variable.

In [14]:
list_of_tickers = keys(dataset) |> collect |> sort # list of firm "ticker" symbols in alphabetical order

424-element Vector{String}:
 "A"
 "AAL"
 "AAP"
 "AAPL"
 "ABBV"
 "ABT"
 "ACN"
 "ADBE"
 "ADI"
 "ADM"
 "ADP"
 "ADSK"
 "AEE"
 ⋮
 "WST"
 "WU"
 "WY"
 "WYNN"
 "XEL"
 "XOM"
 "XRAY"
 "XYL"
 "YUM"
 "ZBRA"
 "ZION"
 "ZTS"

### Compute the growth rate matrix
Next, let's compute the growth rate array which contains, for each day and each firm in our dataset, the value of the growth rate between time $j$ and $j-1$. 

>  __Continuously Compounded Growth Rate (CCGR)__
>
> Let's assume a model of the share price of firm $i$ is governed by an expression of the form:
>$$
\begin{align*}
S^{(i)}_{j} &= S^{(i)}_{j-1}\;\exp\left(g^{(i)}_{j,j-1}\Delta{t}_{j}\right)
\end{align*}
$$
> where $S^{(i)}_{j-1}$ denotes the share price of firm $i$ at time index $j-1$, $S^{(i)}_{j}$ denotes the share price of firm $i$ at time index $j$, and $\Delta{t}_{j} = t_{j} - t_{j-1}$ denotes the length of a time step (units: years) between time index $j-1$ and $j$. The value we are going to estimate is the growth rate $g^{(i)}_{j,j-1}$ (units: inverse years) for each firm $i$, and each time step in the dataset.

We've implemented [the `log_growth_matrix(...)` function](https://varnerlab.github.io/VLDataScienceMachineLearningPackage.jl/dev/data/#VLDataScienceMachineLearningPackage.log_growth_matrix) which takes the cleaned dataset and a list of ticker symbols, and returns the growth rate array. Each row of the growth rate array is a time step, while each column corresponds to a firm from the `list_of_tickers::Array{String,1}` array.

In [16]:
growth_rate_array = let

    # initialize -
    τ = (1/252); # time-step one-day in units of years (trading year is 252 days)
    r̄ = 0.0; # assume the risk-free rate is 0

    # compute the growth matrix -
    growth_rate_array = log_growth_matrix(dataset, list_of_tickers, Δt = τ, 
        risk_free_rate = r̄); # other optional parameters are at their defaults

    growth_rate_array; # return
end

2766×424 Matrix{Float64}:
 -0.877554    6.28105    -2.87097     …  -0.755391   0.245894  -1.00527
  2.81626     1.07149     1.39239         2.13832   -0.80279    0.986468
  3.31305     0.855597    0.00536803      0.109877   1.191     -2.58144
  0.646425   17.2599      1.69215         0.274716   3.1593    -0.368228
  1.81609     2.57961     3.31924         0.621677  -2.1687     4.40309
  0.61383    -3.96384    -0.79278     …  -0.862739  -1.90977   -3.11624
  2.86071    -0.483751    4.84573         1.7657    -1.77685   -1.0896
  2.04671     1.0135      1.90809         1.67597    4.44984   -0.137819
  1.31289     1.67413     0.107259       -1.50708   -2.13696    1.43784
  1.22016     6.12957     0.932578       -1.53202    2.87784   -1.43626
 -0.437668    4.87009     1.00774     …  -0.321261   9.50827   -3.00873
  1.36281     3.61317    -2.34776         0.710613   4.52223    0.340531
 -4.73904     1.38585    -3.01624        -2.15245   -6.64907    1.40612
  ⋮                                

> **Growth Rate Matrix Structure**
>
> The `growth_rate_array` is a matrix $\mathbf{G} \in \mathbb{R}^{m \times n}$ where each **row** represents a trading day (time step) in our dataset, each **column** represents a firm from the S&P500, and each **element** $G_{i,j}$ contains the continuously compounded growth rate for firm $j$ on day $i$.

The matrix has 424 firms (columns) and there are $T-1$ = 2,766 trading days (rows), capturing the daily growth rate dynamics of the S&P500 components from 2014 to 2024. Is there redundancy in the data?

Let's check the rank of the growth rate array using [the `rank(...)` function](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.rank):

In [18]:
rank(growth_rate_array) # tells us the rank of the growth rate array

424

The growth rate matrix has **full column rank** (rank = 424), which means that all 424 firms contribute independent information to the dataset. 

> __Why is this significant?__ This tells us that no firm's growth pattern can be perfectly predicted from the others. Each stock brings unique behavior to the market, even though there may be strong correlations between them.

Now, let's build a single index model for a single firm, to see how this works.
___

## Task 1: Build a single index model for a test firm
In this task, let's build a single index model for a test firm, so that we can see how to set up this calculation and test that it works. 

> __Single Index Model (SIM)__
> 
> Single index models are factor models that consider only the return (growth) of the market factor. These models were originally developed by Sharpe, 1963: [Sharpe, William F. (1963). "A Simplified Model for Portfolio Analysis". Management Science, 9(2): 277-293. doi:10.1287/mnsc.9.2.277.](https://pubsonline.informs.org/doi/abs/10.1287/mnsc.9.2.277)
>
> Suppose the growth of firm $i$ at time $t$ is denoted by $g^{(t)}_{i}$. Then, the single index model of the return (growth rate) is given by:
> $$
g^{(t)}_{i} = \alpha_{i} + \beta_{i}\;g^{(t)}_{M} + \epsilon^{(t)}_{i}
$$
> where $\alpha_{i}$ is the _idiosyncratic (firm-specific) growth_, $\beta_{i}$ is the component of the growth rate of firm $i$ explained by the market (it is also a measure of risk), $g^{(t)}_{M}$ is the growth rate of the market index at time $t$, and $\epsilon^{(t)}_{i}$ denotes an error model associated with firm $i$ (describes growth rate not captured by the firm or market factors). 

Let's start by picking the ticker symbol for our test firm; we'll save this in the `my_test_ticker::String` variable:

In [21]:
my_test_ticker = "PG"; # ticker symbol for our test firm

Next, we need to pull out the growth (return) of the market portfolio from the `growth_rate_array::Array{Float64,2}` matrix.


* We'll use the [SPDR S&P 500 ETF Trust (SPY)](https://www.ssga.com/us/en/individual/etfs/funds/spdr-sp-500-etf-trust-spy) as our market index. The SPY is an exchange-traded fund (ETF) that tracks the performance of the S&P 500 index, which is a market-capitalization-weighted index of 500 of the largest publicly traded companies in the U.S.


To do this, look up the index for our market portfolio surrogate `SPY`, then store the growth rate (column from the growth rate array) in the `Rₘ::Array{Float64,1}` variable:

In [23]:
Rₘ = findfirst(x->x=="SPY", list_of_tickers) |> i -> growth_rate_array[:,i];

Then, we need to formulate the data matrix $\hat{\mathbf{X}}$ and the response vector $\mathbf{y}$ for our test firm. The data matrix $\hat{\mathbf{X}} \in \mathbb{R}^{(T-1) \times 2}$ will have two columns: a column of ones (to account for the intercept term $\alpha$) and a column containing the market growth $R_{m}(t)$ values. The response vector $\mathbf{y} \in \mathbb{R}^{(T-1)}$ will contain the growth values for our test firm.

In [70]:
X̂,y = let

    # get the growth values for our test firm -
    Rᵢ = findfirst(x-> x== my_test_ticker, list_of_tickers) |> j-> growth_rate_array[:, j];

    # TODO: Build the design matrix X̂ and response vector y
    # X̂ should have shape (T-1, 2): first column of ones, second column of market returns Rₘ
    # y should contain the firm's returns Rᵢ
    max_length = length(Rᵢ);
    y = Rᵢ;
    X̂ = [ones(max_length) Rₘ]; # <-- fill me in: [ones(...) Rₘ]

    
    X̂,y # return
end;

In [72]:
X̂

2766×2 Matrix{Float64}:
 1.0  -0.58454
 1.0   0.882626
 1.0   0.205992
 1.0   0.0318663
 1.0   0.300334
 1.0  -1.33599
 1.0   0.364513
 1.0   2.04306
 1.0  -0.28005
 1.0  -0.393881
 1.0  -0.0336959
 1.0   0.441388
 1.0  -1.95715
 ⋮    
 1.0  -0.500752
 1.0   0.945151
 1.0  -0.971834
 1.0  -3.77619
 1.0  -3.04467
 1.0   0.276633
 1.0   1.65103
 1.0   2.73181
 1.0   0.7875
 1.0  -2.71887
 1.0  -2.4625
 1.0  -0.920341

Now, we can estimate the single index model parameters $\theta=(\alpha, \beta)$ using ordinary least squares (OLS). The OLS estimator has the closed form solution (no regularization $\delta=0$):
$$
\begin{align*}
\hat{\mathbf{\theta}} &= \left(\hat{\mathbf{X}}^{\top}\hat{\mathbf{X}}\right)^{-1}\hat{\mathbf{X}}^{\top}\mathbf{y}
\end{align*}
$$
if $\hat{\mathbf{X}}^{\top}\hat{\mathbf{X}}$ is invertible (has full column rank).  In our case, $\hat{\mathbf{X}}$ has two columns, so we need to check that $\hat{\mathbf{X}}^{\top}\hat{\mathbf{X}}$ has rank 2.


In [74]:
@assert rank(transpose(X̂) * X̂) == 2 # check that X'X is invertible

If we get here without an error, then we know that $\hat{\mathbf{X}}^{\top}\hat{\mathbf{X}}$ is invertible, and we can compute the single index model parameters for our test firm. Let's save these values in the $\hat{\mathbf{\theta}}$ variable:

In [78]:
# TODO: Implement the normal equations solution for OLS
# Hint: θ̂ = (X̂ᵀX̂)⁻¹X̂ᵀy
# Use: transpose(), inv(), and matrix multiplication
θ̂ = inv(transpose(X̂)*X̂)*transpose(X̂)*y; # <-- fill me in

In [80]:
θ̂

2-element Vector{Float64}:
 0.014676312973311125
 0.49051042376920384

## Task 2: Uncertainty quantification of the single index model parameters
In this task, we'll compute the uncertainty in our single index model parameters $\hat{\mathbf{\theta}}$ for our test firm. To do this, we need to compute the standard errors of the parameter estimates $\mathrm{SE}(\hat{\mathbf{\theta}})$, which requires us to estimate the variance of the error terms $\hat{\sigma}^{2}$. 

Let's start there.

> __Theory:__ Since the __true__ variance $\sigma^2$ is unknown, we can estimate the population variance $\hat{\sigma}^2$ from the residuals $\mathbf{r} = \mathbf{y} - \hat{\mathbf{X}}\hat{\mathbf{\theta}}$ as:
> $$
\begin{align*}
\hat{\sigma}^{2} &= \left(\frac{1}{\Delta{t}}\right)\frac{\lVert~\mathbf{r}~\rVert^{2}_{2}}{n-p} = \left(\frac{1}{\Delta{t}}\right)\frac{1}{n-p}\sum_{i=1}^{n}r_i^2
\end{align*}
$$
> where $n$ is the number of observations, $p$ is the number of parameters, $\lVert\star\rVert_{2}^{2}$ denotes the $\ell_2$ norm squared, and $r_i = y_i - \hat{\mathbf{x}}_i^{\top}\hat{\mathbf{\theta}}$ is the $i$-th residual, i.e., the difference between the observed and predicted value for observation $i$.

We implement this computation in the code block below and save the result in the `training_variance::Float64` variable:

In [82]:
training_variance = let

    # initialize -
    p = length(θ̂); # number of parameters
    n = size(X̂,1); # number of training observations
    Δt  = (1/252); # time-step one-day in units of years (trading year is 252 days)

    # compute the residual vector -
    r = y .- X̂*θ̂; # residual vector
    
    # compute the variance using the formula from the theory box above
    my_variance = (1/Δt)*(1/(n-p))*norm(r)^2

    # let's compute the variance of the residuals (Julia) for comparison
    built_in_variance = (1/Δt)*var(r, corrected=true); # variance - Julia
    @show built_in_variance, my_variance; # show

    my_variance; # return
end;

(built_in_variance, my_variance) = (1238.9715663741292, 1239.4198194733954)


Next, let's compute the standard error. The standard error of the parameter estimates $\hat{\mathbf{\theta}}$ quantifies the uncertainty in the estimated parameters due to the variability in the data. Let's compute the standard error for each parameter $\hat{\theta}_j$ that we estimated in Task 1.

We save the standard errors in the `SE::Vector{Float64}` variable, where element $j$ corresponds to the standard error of a parameter estimate $\text{SE}(\hat{\theta}_j)$.

In [84]:
SE = let
    
    # initialize -
    p = length(θ̂); # number of parameters
    n = size(X̂,1); # number of training samples
    Δt  = (1/252); # time-step one-day in units of years (trading year is 252 days)

    # compute the standard error for each parameter
    SE = sqrt.(diag(inv(transpose(X̂)*X̂))*training_variance*Δt);

    SE; # return
end

2-element Vector{Float64}:
 0.0422195602563985
 0.01966250116372708

Now that we have the standard error for each of the model parameters, we can compute the uncertainty in the parameter estimates $\hat{\mathbf{\theta}}$. Let's compute confidence intervals for each parameter estimate.
> __Confidence Intervals:__ A $(1-\alpha) \times 100\%$ confidence interval for each parameter $\hat{\theta}_j$ is given by:
> $$
\begin{align*}
\hat{\theta}_j \pm t_{1-\alpha/2,\nu}\; \text{SE}(\hat{\theta}_j)
\end{align*}
$$
> where $t_{1-\alpha/2,\nu}$ is the $(1-\alpha/2)$-quantile of a Student $t$ distribution with $\nu$ degrees of freedom. For a 95% confidence interval, $\alpha = 0.05$ and $t_{1-\alpha/2,\nu} \approx 1.96$. For a 99.9% confidence interval, $\alpha = 0.001$ and $t_{1-\alpha/2,\nu} \approx 3.291$. The standard error $\text{SE}(\hat{\theta}_j)$ (computed above) quantifies the uncertainty in the parameter estimate $\hat{\theta}_j$. It is given by:
>$$
\begin{align*}
\text{SE}(\hat{\theta}_{j}) &= \hat{\sigma}\; \sqrt{\bigl[(\hat{\mathbf X}^\top\hat{\mathbf X})^{-1}\bigr]_{jj}}
\end{align*}
$$

Want a little more detail on confidence intervals? See this [advanced topic note](CHEME-5800-L7b-Advanced-CI-Derivation-Fall-2025.ipynb).

Let's build a table that shows the parameter ranges for a 95.0% confidence interval using [the `PrettyTables.jl` package](https://github.com/ronisbr/PrettyTables.jl). (You can adjust this to show another confidence interval if you like).

In [87]:
let

    # initialize -
    t = 1.96; # for a 95% confidence interval
    df = DataFrame(); # hold the data (rows) for the table

    # build features of the table -
    feature_labels = Array{String,1}();
    push!(feature_labels, "α");
    push!(feature_labels, "β");

    for i ∈ eachindex(θ̂)

        center = θ̂[i];
        lower_bound = θ̂[i] - t*SE[i];
        upper_bound = θ̂[i] + t*SE[i];

        row_df = (
            i = i,
            feature = feature_labels[i],
            p = round(center, digits=4),
            l = round(lower_bound, digits=4),
            u = round(upper_bound, digits=4),
            cz = (lower_bound <= 0.0 <= upper_bound ? "yes" : "no")
        ) # data for the row

        push!(df, row_df) # add the row to the dataframe
    end

    # show the table -
    pretty_table(df, backend = :text,
        table_format = TextTableFormat(borders = text_table_borders__simple)) # new table API. Hmmm
end

 [1m     i [0m [1m feature [0m [1m       p [0m [1m       l [0m [1m       u [0m [1m     cz [0m
 [90m Int64 [0m [90m  String [0m [90m Float64 [0m [90m Float64 [0m [90m Float64 [0m [90m String [0m
      1         α    0.0147   -0.0681    0.0974      yes
      2         β    0.4905     0.452     0.529       no


___

## Task 3: SVD-Based Parameter Estimation

In the previous tasks, we used the normal equations approach: $\hat{\mathbf{\theta}} = (\mathbf{X}^{\top}\mathbf{X})^{-1}\mathbf{X}^{\top}\mathbf{y}$. However, this approach can be numerically unstable when the data matrix $\hat{\mathbf{X}}$ has a high condition number (multicollinearity). 

Let's use **Singular Value Decomposition (SVD)** to estimate the parameters more robustly.

> **Theory:** If $\hat{\mathbf{X}} = \mathbf{U}\mathbf{S}\mathbf{V}^{\top}$ is the SVD decomposition of $\hat{\mathbf{X}}$, then the least squares solution becomes:
> $$\hat{\mathbf{\theta}} = \mathbf{V}\mathbf{S}^{+}\mathbf{U}^{\top}\mathbf{y}$$
> where $\mathbf{S}^{+}$ is the Moore-Penrose pseudoinverse of the diagonal matrix $\mathbf{S}$. This approach is more numerically stable and handles rank-deficient matrices gracefully.
> 
> For practical computation, this can be written in index notation as:
> $$
\boxed{
\begin{equation*}
\hat{\mathbf{\theta}} = \sum_{i=1}^{r_{\hat{X}}}\left(\frac{\mathbf{u}_{i}^{\top}\mathbf{y}}{\sigma_{i}}\right)\mathbf{v}_{i}\quad\blacksquare
\end{equation*}}
$$
> where $r_{\hat{X}} = \min(n,p)$ is the rank of the data matrix $\hat{\mathbf{X}}$, $\mathbf{u}_{i}$ and $\mathbf{v}_{i}$ are the $i$-th columns of $\mathbf{U}$ and $\mathbf{V}$, respectively, and $\sigma_{i}$ is the $i$-th singular value (with $\sigma_i > 0$).

Ok, so let's play around with this. First, let's compute the SVD of our data matrix $\hat{\mathbf{X}}$ from Task 1:


In [89]:
U,S,V = svd(X̂);

In [91]:
S

2-element Vector{Float64}:
 112.96598769316324
  52.51079224392125

The SVD decomposition gives us three matrices: $\mathbf{U}$ (left singular vectors), $\mathbf{S}$ (singular values), and $\mathbf{V}$ (right singular vectors). Now, let's compute the individual parameter modes that make up the final estimate. Each mode corresponds to one singular value and captures how that singular component contributes to the overall parameter vector.

We'll store these in the `parameter_modes::Dict{Int, Array{Float64,1}}` dictionary, where each key is the singular value index and each value is the corresponding parameter contribution $(\mathbf{u}_i^{\top}\mathbf{y}/\sigma_i)\mathbf{v}_i$:

In [93]:
parameter_modes = let
    
    # initialize -
    r = rank(X̂); # rank of the design matrix
    parameter_modes = Dict{Int, Array{Float64,1}}(); # create an empty dictionary
    
    # TODO: Loop through each singular value and compute the parameter contribution
    # Hint: For each i, compute (uᵢᵀy/σᵢ) * vᵢ and store in parameter_modes[i]
    for i ∈ 1:r
        u = U[:,i]; # i-th left singular vector
        v = V[:,i]; # i-th right singular vector
        σ = S[i]; # i-th singular value
        parameter_modes[i] = ((transpose(u)*y)/σ)*v; # <-- fill me in: ((transpose(u)*y)/σ)*v
    end

    parameter_modes; # return
end

Dict{Int64, Vector{Float64}} with 2 entries:
  2 => [0.000274155, -8.04949e-6]
  1 => [0.0144022, 0.490518]

Now, we can reconstruct the parameter estimates using the SVD components. Let's implement the SVD-based parameter estimation and compare it to our previous OLS estimates.

In [101]:
θ̂_svd = let

    # TODO: Reconstruct the full parameter estimate by summing all modes
    # Hint: Start with a zero vector, then add each parameter_modes[i]
    # You can use r = rank(X̂) or r = 2 (since we have 2 parameters)
    r = rank(X̂); # or simply use r = 2
    p = zero(θ̂); # initialize parameter estimate as zero vector

    r = 1
    # TODO: Sum up all the parameter modes
    for i ∈ 1:r
        p += parameter_modes[i] # <-- fill me in
    end
    
    p; # return
end

2-element Vector{Float64}:
 0.014402158050404176
 0.4905184732571335

___

Let's compare our SVD-based estimate `θ̂_svd` with the normal equations estimate `θ̂` to verify they match:

In [103]:
@show θ̂; # Normal equations estimate
@show θ̂_svd; # SVD-based estimate
@show θ̂ ≈ θ̂_svd; # Should be true (approximately equal)

θ̂ = [0.014676312973311125, 0.49051042376920384]
θ̂_svd = [0.014402158050404176, 0.4905184732571335]
θ̂ ≈ θ̂_svd = false


## Summary
In this lab, we estimated single index models for S&P500 firms using both the normal equations approach and SVD-based methods, comparing parameter estimates and quantifying uncertainty through confidence intervals.

> __Key Takeaways:__
> 
> * **Full column rank confirms independent firm behavior**: Our growth rate matrix having rank 424 tells us that no firm's returns can be perfectly predicted from the others, justifying individual parameter estimation for each stock despite potential correlations.
>
> * **Confidence intervals reveal statistical significance**: By computing standard errors and confidence intervals, we can determine whether our alpha estimates reflect genuine firm-specific returns or merely capture noise—critical for distinguishing meaningful abnormal returns from statistical artifacts.
>
> * **SVD provides robust alternatives to direct inversion**: While mathematically equivalent for well-conditioned problems, the SVD approach offers superior numerical stability through the pseudoinverse, making it the preferred method when dealing with near-singular matrices or multicollinearity common in financial data.

These dual-method parameter estimation techniques form the foundation for empirical asset pricing and portfolio construction in modern quantitative finance.
___

## Disclaimer and Risks

__This content is offered solely for training and informational purposes__. No offer or solicitation to buy or sell securities or derivative products or any investment or trading advice or strategy is made, given, or endorsed by the teaching team.

__Trading involves risk__. Carefully review your financial situation before investing in securities, futures contracts, options, or commodity interests. Past performance, whether actual or indicated by historical tests of strategies, is no guarantee of future performance or success. Trading is generally inappropriate for someone with limited resources, investment or trading experience, or a low-risk tolerance. Only risk capital that is not required for living expenses.

__You are fully responsible for any investment or trading decisions you make__. Such decisions should be based solely on evaluating your financial circumstances, investment or trading objectives, risk tolerance, and liquidity needs.