David G. Luenberger - Investment Science
# **Chapter 9 Data and Statistics**
Cosma Rohilla Shalizi - Advanced Data Analysis from an Elementary Point of View
# **Chapter 15 Principal Components Analysis**

hse07088@snu.ac.kr
***

## **Ch9 Data and Statistics**

### **Basic Estimation Method : To Use Historical Data** 

**Period-Length Effects**
- The relationship between p length period and a year return and std
$$\bar{r}_p=p\bar{r}_y$$
$$\bar{\sigma}_p=\sqrt p\bar{\sigma}_y$$ 
- the ratio of standard deviation to expected rate of return increases dramatically as the period length is reduced

**Estimation of $r$**
- Estimator: Sample Mean $$\hat{\bar{r}}=\frac{1}{n}\sum_{i=1}^n r_i$$
    - unbiased estimator $$E(\hat{\bar{r}})=\hat{E}\left(\frac{1}{n}\sum_{i=1}^n r_i\right)=\bar{r}$$
    - std can be calculated as follows: $$\sigma_{\hat{\bar{r}}}^2=E\left[(\hat{\bar{r}}-\bar{r})^2\right]=E\left[\frac{1}{n}\sum_{i=1}^n(r_i-\bar r)\right]^2=\frac{1}{n}\sigma^2$$ $$\sigma_{\hat{\bar{r}}}=\frac{\sigma}{\sqrt n}$$
- **Mean Blur**
    - typically, too many samples are required to obtain standard deviation at an appropriate level compared to mean
    - however, the historical data may not be available
    - mean values are not likely to be constant over that length of time
- basically *impossible* to measure $\bar{r}$ to within workable accuracy using historical data
- the problem cannot be improved much by changing the period length

**Estimation of $\sigma$**
- Estimator: Sample Variance $$s^2 = \frac{1}{n-1}\sum_{i=1}^n(r_i-\hat{\bar{r}})^2$$
    - unbiased estimator
    - std can be calculated as follows: $$var(s^2) = \frac{2\sigma^4}{n-1}$$ $$stdev(s^2) = \frac{\sqrt 2\sigma^2}{\sqrt{n-1}}$$
- relative error in the estimate of $\sigma^2$ is not too extreme if n is reasonably large

**a Blur**
- blur phenomenon applies to the parameters of a factor model, but mainly to the determination of a
- similarily, $\alpha$ of SML cannot be reliably estimated. On the other hand, the relative error in estimating $\beta$ is somewhat better

### **The Effect of Estimation Errors**
negative impact of estimation errors on portfolio quality is significantly *larger* for expectedreturn errors than for variance and covariance errors

**number of ways that portfolios are negatively impacted by estimation errors:**
- Condition number
    - The higher the condition number of a matrix, the closer it is to being singular, and the more sensitive the result becomes to its input
    - In the case of portfolio design it is the covariance matrix $V = [\sigma_j]$ $$\kappa(V) = \lVert V \rVert \lVert V^{-1} \rVert$$
- Leverage 
- Large weights

**Effect to the Efficient frontier**
-"Theory"
    - optimization problem $$\begin{align}\max_{\mathbf{w}}\quad &\mathbf{w}^T\mathbf{u}\\ \text{subject to}\quad &\mathbf{w}^T\mathbf{V}\mathbf{w} \leq \sigma^2 \\ &\mathbf{w}^T\mathbf{1}=1\end{align} $$
- "Think"
    - assume the estimation process produces an unbiased estimate of $\mathbf{u}$ of the form $\mathbf{u}+\mathbf{e}$
    - optimization problem $$\begin{align}\max_{\mathbf{w}}\quad &\mathbf{w}^T(\mathbf{u}+\mathbf{e})\\ \text{subject to}\quad &\mathbf{w}^T\mathbf{V}\mathbf{w} \leq \sigma^2 \\ &\mathbf{w}^T\mathbf{1}=1\end{align} $$
    - expected performance $$E\left[\max_{\mathbf{w}}\mathbf{w}^T(\mathbf{u}+\mathbf{e})\right]$$
    - On average, the performance of "Think" is greater than or equal to that of "Theory" $$E\left[\max_{\mathbf{w}}\mathbf{w}^T(\mathbf{u}+\mathbf{e})\right] \geq \max_{\mathbf{w}}E\left[\mathbf{w}^T(\mathbf{u}+\mathbf{e})\right]=\max_{\mathbf{w}}\mathbf{w}^T\mathbf{u}$$
- "Actual"
    - actual expected return $\mathbf{w}^T\mathbf{u}$ using $\mathbf{w}_{\mathbf{u}+\mathbf{e}}$ maximizing $\mathbf{w}^T(\mathbf{u}+\mathbf{e})$
    - since weights chosen at the "think" stage are not optimal for "theory," the "actual" value is always worse than the one that would be obtained in “theory”

![image.png](attachment:image.png)

**Effect to the Maximum Tangent**
- basic theoratical optimizing problem $$\begin{align} \text{maximize}_{w} \quad &\frac{\mathbf{w}^T(\mathbf{u}+\mathbf{e})}{\sqrt{\mathbf{w}^TV\mathbf{w}}} \\ \text{subject to}\quad &\mathbf{w}^T=1 \end{align}$$
- Average Think > Theory > Actual

**Compounding Effect**
- estimation error is not reduced by time diversification, especially if the estimates are determined from historical studies
    - errors are not independent from period to period
    - they are often identical or close to identical, since they are formed from fixed histories
- estimation error's d of N periods is $N\sigma$ and therefore, the ratio of sigma to expected value remains $\sigma/\bar{r}$, not decreasing with time

### **Conservative Approaches**
- requiring portfolio weights to be nonnegative : laverage elmination
- produce portfolios with relatively few nonzero weights (concentrating on only a few securities)
    - the resulting portfolio is easier to manage
    - tends to *reduce the condition number* of the covariance matrix and thus further *reduce error sensitivity*
    - no single asset should have a significant weight
- explicitly to discourage large weights by the incorporation of a penalty term added to the portfolio objective function
    - modify the optimizing problem as follows: $$\begin{align}\text{minimize}_\mathbf{w} \quad &\mathbf{w}^T\mathbf{V}\mathbf{w}+c\mathbf{w}^T\mathbf{P}\mathbf{w}\\\text{subject to} \quad &\mathbf{w}^T\bar{\mathbf{r}}=\bar{r}\\&\mathbf{w}^T\mathbf{1}=1 \end{align}$$ where $\mathbf{P}$ is positive-definite matrix and $c>0$
- set an upper bound on the weights

**Better Estimates**
Although the estimate based on the average of historical returns is *the best possible unbiased estimate* in the sense of *mean-square error*, there are other estimation methods that are not unbiased but have superior performance in a quadratic sense
- estimators $\mathbf{u}^0$: superior to $\hat{\mathbf{r}}$, which is the sample mean, in the sense of lower expected value of the loss function $(—\mathbf{u}^0)^T\mathbf{V}^{-1}(\mathbf{u} — \mathbf{u}^0)$
- **shrinkage estimators**: use a weighted combination of two estimators $\hat{\mathbf{u}}$, $\mathbf{u}^0$
    - **James—Stein Shrinkage estimator** ($\mathbf{u}^0 = u_0\mathbf{1}$) $$\mathbf{u}_{JS}=(1-w)\hat{\mathbf{u}}+wu_0\mathbf{1}$$ $$w=\min \left[1, \frac{n-2}{N(\hat{\mathbf{u}}-u_0\mathbf{1})^T\mathbf{V}^{-1}(\hat{\mathbf{u}}-u_0\mathbf{1})}\right]$$ Note that $0 < w < 1$, and in practice, shrinkage estimators applied to stock data tend to produce estimates of mean returns about 2-3% lower than the standard estimate

### **Tilting Away From Equilibrium**
we might include subjective information about the expected return, or information based on a careful analysis of the firm can be systematically *combined* with the estimates derived from historical data to develop superior estimates
    
e.g.) informations from detailed fundamental analyses of the firm, including an analysis of its future projects, its management, its financial condition, its competition, and the projected market for its products or services

- In each case we also assign a *variance* to the estimate
- As additional information is added, the solution will tilt away from that initial solution
- The degree of departure, or tilt, will depend on the nature of the adjoined equations and the degree of confidence we have in them, as expressed by the variances and covariances of the error terms

## **Ch 9 Principal Components Analysis**

to go forward in multivariate data analysis, we need to somehow lift the curse of dimensionality (dimensionality reduction)

**Principal components analysis (PCA)**

Takes high-dimensional data, and uses the dependencies between the variables to represent it in a more tractable, lower-dimensional form, without losing too much information

Assume $p$-dimensional vectors, and want to summarize them by projecting down into a $q(\ll p)$-dimensional subspace

*Principal Components*
- orthonormal $q$ vectors which span the sub-space
    - $\mathbf{w}^T\mathbf{w}=\mathbf{I}$
- Throughout, assume that the data have been "centered", so that every variable has mean 0
- k th component is the *variance-maximizing* direction *orthogonal* to the previous k-1 components
    - The variance is $$\begin{align} \hat{\mathbb{V}}[\mathbf{w}\cdot \mathbf{x}_i] &= \frac{1}{n}\sum_i (\mathbf{x}_i\cdot\mathbf{w})^2 -(\bar{\mathbf{x}}\cdot\mathbf{w})^2\\ &= \frac{1}{n}(\mathbf{x}\mathbf{w})^T(\mathbf{x}\mathbf{w}) -(\bar{\mathbf{x}}^T\mathbf{w})^T(\bar{\mathbf{x}}^T\mathbf{w})\\ &=\mathbf{w}^T(\frac{1}{n}\mathbf{x}^T\mathbf{x})\mathbf{w}-\mathbf{w}^T(\bar{\mathbf{x}}\bar{\mathbf{x}}^T)\mathbf{w} \\ &=\mathbf{w}^T\mathbf{V}\mathbf{w}\end{align}$$ though it is obvious that $\bar{\mathbf{x}}=\mathbf{0}$ since the data is centered
    - minimizing problem $$\begin{align} \min_{\mathbf{w}} \quad &\mathbf{w}^T\mathbf{V}\mathbf{w}\\ \text{subject to}\quad &\mathbf{w}^T\mathbf{w}=1 \end{align}$$
    - Solving using the laglangian multiplier $\lambda$, $$\mathcal{L}(\mathbf{w}, \lambda)\equiv \mathbf{w}^T\mathbf{V}\mathbf{w}-\lambda(\mathbf{w}^\mathbf{w}-1)$$ $$\begin{align}\frac{\delta\mathcal{L}}{\delta \lambda} &= \mathbf{w}^T\mathbf{w}-1\\ \frac{\delta\mathcal{L}}{\delta \mathbf{w}}&=2\mathbf{V}\mathbf{w}-2\lambda\mathbf{w}\end{align}$$ $$\begin{align}\mathbf{w}^T\mathbf{w}&=1\\\mathbf{V}\mathbf{w}&=\lambda\mathbf{w}\end{align}$$ Thus, the desired vector $\mathbf{w}$ is an eigenvector of the covariance matrix $\mathbf{V}$, and the maximizing vector will be the one associated with the largest eigenvalue $\lambda$
    - Note that, because $\mathbf{V}$ is a covariance matrix, it is a *non-negative-deffnite matrix*, eigenvalues of $\mathbf{V}$ must all be  $\geq0$

projection residuals
pick the top $q$ components, we can deffine a projection operator $\mathbf{P}_q$
- The images of the data are then $\mathbf{x}\mathbf{P}_q$
- The projection residuals are $\mathbf{x}(1-\mathbf{P}_q)$
- $R^2$ of the projection $$R^2 = \frac{\Sigma_{i=1}^q\lambda_i}{\Sigma_{i=1}^p\lambda_i}$$