# Chapter 7 - Statistial Factor Models

## 7.1 Basics

### 7.1.1 Best Low-Rank Approximation and PCA

A factor model is $R = BF$, to decompose the returns $R$ into low-ranked factors $F$ via factor loadings $B$, we run SVD, $R=USV^T$. Sort the columns of $U$ and rows of $V$ based on the diagonal matrix $S$ in descending order, we can select the first $m$ components where $m<n$, such that

$$R \simeq U_mS_mV_m^T $$

Assign

$$B = U_m$$
$$F = S_mV_m^T$$

To formulate PCA as a eigenvalue decomposition (EVD) problem, we have

$$ \max w^T\hat{\Sigma}w$$
$$\text{s.t. } ||w|| \leq 1$$

Solve it with Langrangian

\begin{align}
L & = -w^T\hat{\Sigma}w + \lambda (w^Tw - 1) \\
dL & = -2\hat{\Sigma}w + 2\lambda w = 0 \\
    \hat{\Sigma}w &= \lambda w \\
    \lambda &= w^T\hat{\Sigma}w
\end{align}

Since the goal is to maximize $w^T\hat{\Sigma}w$ and $\lambda = w^T\hat{\Sigma}w$, and solution is the eignvector associated with the highest eignvalue. 

To formulate PCA as a SVD problem, we have 
\begin{align}
R &= USV^T \\
\hat{\Sigma} &= \frac{1}{T} RR^T \\
            &= \frac{1}{T} (USV^T)(VSU^T) \\
            &= \frac{1}{T} US^2U^T \\
w^T\hat{\Sigma}w &= \frac{1}{T} w^TUS^2U^Tw \\
\end{align}

Set $U^Tw = v$, we have
\begin{align}
\text{max } & v^TS^2v \\
\text{s.t. } & v^Tv \leq 1
\end{align}

Since $S^2$ is a diagonal matrix sorted in descending order, the optimal solution is $v = (1,0,\ldots,0)^T$, e.g. we want to select the first (largest) diagonal value.

If the diagonal values (eigen values) in $S^2$ aren't unique, the solution $v$ isn't unique either, say we have $\lambda_1 = \lambda_2$:

\begin{align}
\lambda_1 &= v_1^TS^2v_1 \\
\lambda_1 &= v_1^2S_1^2 \\
\lambda_2 &= v_2^TS^2v_2 \\
\lambda_2 &= v_2^2S_2^2
\end{align}

Then there is $\lambda_3 = \lambda_1 + \lambda_2$ with eignvector $v_3 = (v_1, v_2, \ldots,0)^T$ that

\begin{align}
\lambda_3 &= v_1^2S_1^2 + v_2^2S_2^2 = v_3^TS^2v_3
\end{align}


Finally, to solve for $m < n$ components in one go, we have

\begin{align}
\text{max } & trace(W^T\hat{\Sigma}W) \\
\text{s.t. } & W^TW = I_m \\
W & \in R^{n\times m}
\end{align}

Same as solving eignvectors one by one with each subsequent problem having an additional contraint that the new eigenvector being orthogonal to the previous ones.

### 7.1.2 MLE and PCA

The idea is to start with a rotated factor model so that the factor covariance matrix is an identity matrix, which can be achieved following section 4.4.1 (identify factor covariance matrix). Assume constant and diagonal idio covariance matrix. 

$$ f \sim N(0,I_m) $$
$$ \epsilon \sim N(0, \sigma^2 I_n) $$
$$ \Sigma_r = BB^T + \sigma^2 I_n $$

MLE on the factor covariance matrix becomes:

$$ \max -\log|\hat{\Sigma}_r| - \langle \hat{\Sigma}_r^{-1}, \Sigma_r \rangle $$
$$ \hat{\Sigma}_r = \hat{B}\hat{B}^T + \hat{\sigma}^2I_n$$

Where $\Sigma_r$ is the empirical covariance matrix. The solution is:

\begin{align}
    \hat{B} & = U_m(S_m - \hat{\sigma}I_m)^{1/2} \\
    \hat{\sigma}^2 & = \bar{\lambda}
\end{align}

Where $\bar{\lambda}$ is the average of the last n-m eigenvalues. We can apply a transformation to make it more intuitive:

\begin{align}
    \hat{B} & = U_m \\
    \Sigma_f & = S_m - \bar{\lambda}I_m \\
    \Sigma_{\epsilon} & = \bar{\lambda}I_n 
\end{align}

The transformation provides the insight: we shrink factor variance by the "unselected" factors and categorize those as idio returns/variance. 

Experiments show that:
- shrinkage mentioned above help counters the upward bias of sample factor variance estimate.
- however, the shrinkage tends to overcorrete and introduce downward bias. Therefore, the MLE approach is generally biased.


### 7.1.3 Regressions via SVD

Once we know the factor loadings $U_m$ and asset returns $R$, we can get the factor returns via regression and vice versa.

## 7.2 Beyond Basics

### Spiked Covariance Model

For the empirical covariance matrix:
$$\tilde{\Omega}_r = T^{-1}\sum^T_{t=1}r_tr_t^T$$

The matrix is spiked if there is an $m$ that $0<m<n$ and a positive constant $C$ such that as $T \to \infty$, the eigenvalues:

$$\lambda_i = \lim_{T\to\infty}\lambda_{T,i}\begin{cases}
= 1 \text{    for all i > m} \\
\geq Cn \text{    for all i } \leq m
\end{cases}$$

A spiked matrix has a subset of eigenvalues that scale with the size of the matrix, while the rest of the eignvalues is constant. In the context of factor model, as we add more assets into the universe, the variance between factor and idio returns become larger and larger, to a point that the idio variances are negligible thus "diversified" away.

To see this, assume we have done the rotation and rescaling that both factor and idio covariance matrices are identiy matrix, and the covariance matrix is now: $BB^T+I_n$. Since the eignvalues of $BB^T$ and $B^TB$ are the same, we can write $B^TB = \sum b_i^Tb_i$ where $b_i$ is ith row of $B$. We can then rewrite the $B^TB = n(1/n\sum b_i^Tb_i) = nE(b_i^Tb)$. Let $\mu_i$ be the eigvalues of the expectation, the eigenvalues of $B^TB$ are then $n\mu_i$. The overall covariance matrix is now $n\mu_i + 1$. As n increases the factor eignvalues increase while the idio eigenvalues stay constant.

To illustrate and borrow points from section 9.3, the ith FMP is $w_i = B(B^TB)^{-1}e_i$. Plug into the covariance matrix. Factor covariance matrix:

$$w_i^T(BB^T)w_i = e_i^{T}(B^TB)^{-1}B^T(BB^T)B(B^TB)^{-1}e_i=1$$

Idio covariance matrix:

\begin{align}
w_i^Tw_i &= e_i^T(B^TB)^{-1}B^TB(B^TB)^{-1}e_i \\ 
        &= e_i^T(B^TB)^{-1}e_i \\
        &= e_i^TVS^{-2}V^Te_i \\
        &\leq \lambda_m^{-1} e_i^TVV^Te_i \\
        &= \lambda_m^{-1} ||V^Te_i||^2 \\
        &\leq \lambda_m^{-1} ||V^T||^2||e_i||^2 \\
        &= \lambda_m^{-1} ||V^T||^2 \\
        &= \lambda_m^{-1} \\
        &\leq 1/Cn
\end{align}
Where $\lambda_m$ is the mth largest eignvalue of $BB^T$ and $B^TB$