# Fundamental Factor Models

Fundamental factor models start with asset returns $r_t$ and factor loadings $B_t$, and estimates factor returns $f_t$ and idio returns $\epsilon_t$

Process:
- Data ingestion: correctness, outliers, consistency across vendors, missing data
- Estimation universe selection
- Winsorization: identify outliers and winsorize
- Loading generation: generate $B_t$
- Cross-sectional regression: estimate $f_t$ and $\epsilon_t$
- Time-series estimation:
  - factor cov matrix
  - idio cov matrix
  - risk-adjusted performance of factor returns

## Cross-sectional regression

Starting with a single period model:

$$ r_t = Bf_t + \epsilon_t $$

Where $r_t$ and $f_t$ are column vectors. To estimate $f_t$, we solve the following optimization problem:

$$ \min ||r_t - Bf_t||^2 $$
$$ \text{s.t. } f \in R^{m}$$

However, since the idio returns $\epsilon_t$ usually aren't homoskedastic (same variance), we need to make them so:

$$ \Omega_{\epsilon}^{-1/2}r_t = \Omega_{\epsilon}^{-1/2}Bf_t + \Omega_{\epsilon}^{-1/2}\epsilon_t $$
$$ var(\Omega_{\epsilon}^{-1/2}\epsilon_t) = 
    \Omega_{\epsilon}^{-1/2}\Omega_{\epsilon}\Omega_{\epsilon}^{-1/2} = I_m
$$

And the optimization problem becomes

$$ \min ||\Omega_{\epsilon}^{-1/2}(r_t - Bf_t)||^2 $$
$$ \text{s.t. } f \in R^{m}$$

The solution is the normal equation: 

$$ \hat{f}_t = (B^T\Omega_{\epsilon}^{-1}B)^{-1}B^T\Omega_{\epsilon}^{-1}r_t $$

For multi-period case where the factor returns $F \in R^{mxT}$, the method stays the same as each single-period factor returns can be solved independently and combined to get the factor returns matrix $F$

### Idio covariance matrix

We don't know idio cov matrix before solving for factor returns, and we need idio cov matrix to solve for factor returns. It's a chicken and egg problem. We can borrow from Expectation-Maximization procedure, 
- we start with an identity idio cov matrix
- solve for factor returns and get a new idio returns and cov matrix
- solve for factor returns again using the new idio returns, get new factor returns and update the idio cov matrix.

If we have $T$ time stamps, we can use the first half of the samples to estimate idio cov matrix, then use and update it in a walk-forward manner
- Use the idio cov matrix from time $t-1$ to estimate factor returns and idio returns at time $t$
- Update idio cov matrix with the new idio returns from time $t$ 

