# Chapter 9 - Portfolio Management: the Basics

## Why MVO
- Interpretability
- Data availability: multi-period optimization requires too much data
- Computational tractability: easy to solve
- Short-term investment horizon

The goal is to maximize utility function $V$, $\max E\left[V(W_0 + w^Tr)\right]$. Utility function should be:
- increasing (more wealth is better)
- concave (investor gets more risk aversion when having already cumulated sizeable wealth)

\begin{align}
E[V(W_0 + w^Tr)] &\simeq E[V(W_0) + V'(W_0)w^Tr + V''(W_0)(w^Tr)^2 / 2] \\
&= V(W_0) + V'(W_0)w^TE[r] + V''(W_0)E[(w^Tr)^2] / 2 \\
&= V(W_0) + V'(W_0)w^TE[r] + V''(W_0)(Var(w^Tr) + E^2[w^Tr]) / 2 \\
&= V(W_0) + V'(W_0)w^T\mu + V''(W_0)(w^T\Omega_r w + (w^T\mu)^2) / 2 \\
&= V(W_0) + V'(W_0)w^T\mu + \frac{V''(W_0)}{2} w^T\Omega_r w \\
\frac{E[V(W_0 + w^Tr)] - V(W_0)}{V'(W_0)} &\simeq w^T\mu - \frac{\rho}{2}w^T\Omega_r w \\
\rho &= - \frac{V''(W_0)}{V'(W_0)}
\end{align}

For $V(x) = -\exp(-ax)$
\begin{align}
V'(x) &= a\exp(-ax) \\
V''(x) &= -a^2\exp(-ax) \\
\rho &= -\frac{V''(x)}{V'(x)} = a
\end{align}
The objective function is $w^T\mu - a/2w^T\Omega_r w$. This is the usual form of the MVO where $a$ controls the risk aversion.

For $V(x) = \log(x)$
\begin{align}
V'(x) &= 1/x \\
V''(x) &= -1/x^2 \\
\rho &= -\frac{V''(x)}{V'(x)} = 1/x
\end{align}
The objective function is $w^T\mu - 1/(2W_0)w^T\Omega_r w$. This is associated to the Kelly Criterion, you seek more risk as you become wealthier.



## MVO

If we are to maximize the sharpe ratio $\max_w \frac{\alpha^T w}{\sqrt{w^T\Omega_r w}}$, the optimization is indefinite, because there are infinite $w$ with the same SR, e.g. $SR(w) = SR(\kappa w)$ for all $\kappa > 0$. So we need to upper-bound the volatility

\begin{align}
\text{max } & \alpha^T w \\
\text{s.t. } & \sqrt{w^T\Omega_r w} \leq \sigma
\end{align}

Solve this with Langrangian multiplier:
\begin{align}
L &= \alpha^T w - \lambda\left(w^T\Omega_r w - \sigma^2\right) \\
\nabla L &= \alpha - 2\lambda\Omega_r w = 0 \\
w &= \frac{1}{2\lambda} \Omega_r^{-1} \alpha \\
\end{align}

Since the upper bound is always binding, we have:
\begin{align}
\lambda\left(w^T\Omega_r w - \sigma^2\right) &= 0 \\
\lambda &\geq 0 \\
w^T\Omega_r w - \sigma^2 &= 0 \\
\frac{1}{2\lambda} \alpha^T \Omega_r^{-1} \Omega_r \frac{1}{2\lambda} \Omega_r^{-1} \alpha - \sigma^2 &= 0 \\
\frac{1}{4\lambda^2} \alpha^T \Omega_r^{-1} \alpha &= \sigma^2 \\
\lambda &= \frac{\sqrt{\alpha^T \Omega_r^{-1} \alpha}}{2\sigma}
\end{align}

Put them together:
\begin{align}
w &= \frac{\sigma}{\sqrt{\alpha^T \Omega_r^{-1} \alpha}} \Omega_r^{-1} \alpha
\end{align}

Looking at it the other way, we know that $w \propto \Omega_r^{-1}\alpha$. Starting with $w = \Omega_r^{-1}\alpha$, then $\sigma_p^2 = \alpha^T\Omega_r^{-1}\Omega_r\Omega_r^{-1}\alpha = \alpha^T\Omega_r^{-1}\alpha$. Scaling the weights so $\sigma_p = \sigma$, we need $w = \frac{\sigma}{\sqrt{\alpha^T\Omega_r^{-1}\alpha}} \Omega_r^{-1} \alpha$.

An insight is that if the optimal weight is independent of the scale of the expected returns as long as the relative values are correct. Try plugging in $\kappa \alpha$ into the optimal weight formula, the constant $\kappa$ cancels out.

To transform this into a more intuitive formulation, let $V$ be a diagonal matrix with diagonal values being each asset's volatility, and let $C$ be the correlation matrix.

\begin{align}
w  &= \frac{1}{2\lambda} \Omega_r^{-1} \alpha \\
   &= \frac{1}{2\lambda} (VCV)^{-1} \alpha \\
Vw &= \frac{1}{2\lambda} VV^{-1}C^{-1}V^{-1} \alpha \\
   &= \frac{1}{2\lambda} C^{-1}V^{-1} \alpha \\
   &= \frac{1}{2\lambda} C^{-1}s \\
v  &= \frac{1}{2\lambda} C^{-1}s \\
\alpha_p &= w^T\alpha = w^TVV^{-1}\alpha = v^Ts = \frac{1}{2\lambda} s^T C^{-1} s \\
\sigma_p^2 &= w^T\Omega_r w = (w^TVV^{-1})(VCV)(V^{-1}Vw) = v^TCv \\
         &= \frac{1}{4\lambda^2} s^T C^{-1}s \\
SR_p &= \frac{\alpha_p}{\sigma_p} = \frac{s^T C^{-1} s}{\sqrt{s^T C^{-1} s}} = \sqrt{s^T C^{-1} s}
\end{align}

So the volatility weighting is proportional the each asset's sharpe ratio and inverse of correlation matrix. 

In addition to the first formulation, we can directly add the volatility constraint as a penalty term:

\begin{align}
\text{max } & \alpha^T w - \lambda w^T\Omega_r w \\
\text{s.t. } & w \in R^n
\end{align}

The optimal weight is $1/(2\lambda)\Omega_r^{-1}\alpha$ and this is equivelant to the first formulation if the $\lambda$ is set as the same as the langrangian multiplier:
$$\lambda = \frac{\sqrt{\alpha^T\Omega_r^{-1}\alpha}}{2\sigma}$$
The lower the volatility constraint, the higher the $\lambda$ and the higher the penalty term.

The third formulation is to swap the constraint and objective function in the first formulation, i.e. minimize variance while keeping expected return above a threshold $\mu$. The optimal weight in this case is:

$$w = \frac{\mu}{\alpha^T\Omega_r^{-1}\alpha}\Omega_r^{-1}\alpha$$

We can actually get this solution without formally solving the optimization problem. Again, the optimal weight is proportional to $\Omega_r^{-1}\alpha$. Since the constraint is always binding, we know that the portfolio's expected return must be $\mu$. Therefore, we just need to scale $\Omega_r^{-1}\alpha$ so that its expected return is $\mu$

To further interpret the volatility allocation version of MVO:
- if assets are uncorrelated, $C=I$, then the optimal sharpe ratio is $||s||$ (squared sharpe!)
- if assets are correlated
  - the vol allocation is proportional to the deviation of asset sharpe to the average sharpe, i.e. overweight if sharpe is above average, vice versa
  - the optimal sharpe increases as
    - the number of assets increases
    - asset sharpe dispersion increases (interesting)
  - the sharpe is upper-bounded by $s/\sqrt{\rho}$ if all assets have the same sharpe

To further interpret the center piece $\Omega_r^{-1}\alpha$, we use that:
$$w_i \propto [\Omega_r^{-1}]_{i,i}\left(\alpha_i - \sum_{j\neq i} \rho_{i,j} [\Omega_r^{-1}]_{j,j} \alpha_j\right)$$
$\rho_{i,j}$ is the correlation of asset $i$ and asset $j$ after removing their collinearity with the other assets:
- Regress the returns of asset $i$ and asset $j$ on the returns of the other asset.
- $\rho_{i, j}$ is the correlation between the residuals from the two regressions above.
So the interpretation of the above equation are:
- diagonal terms of a precision matrix are always positive
- if two assets share overlapping information after removing the correlations with the other assets, then the allocation to asset $i$ should be reduced.

## Trading in Factor Space

### FMP

The goal is simply find weights such that the portfolio trades a single factor. We need to minimize the tracking variance between that factor return and portfolio return:
\begin{align}
r^Tw &= b^Tf + w^T\epsilon \\
(r^tw - f)^2 &= (b^Tf + w^T\epsilon - f_i)^2 \\
         &= ((b_i-1)f_i + \sum_{i\neq j}b_jf_j + w^T\epsilon)^2
\end{align}

The variance is minimized when $b_i = 1$ and other $b_j = 0$, so the formulation is
\begin{align}
\text{min } & w^T\Omega_\epsilon w \\
\text{s.t. } & B^Tw = e_i
\end{align}

To solve this
\begin{align}
L &= w^T\Omega_\epsilon w - \lambda(e_i^TB^Tw - 1) \\
\nabla L &= 2\Omega_\epsilon w - \lambda Be_i = 0 \\
w &= \frac{\lambda}{2}\Omega_\epsilon^{-1}Be_i \\
B^Tw &= e_i \\
\frac{\lambda}{2}B^T\Omega_\epsilon^{-1}Be_i &= e_i \\
\lambda &= 2(B^T\Omega_\epsilon^{-1}B)^{-1} \\
w &= \Omega_\epsilon^{-1}B(B^T\Omega_\epsilon^{-1}B)^{-1}e_i \\
\end{align}

Now we have $m$ portfolios at each timestamp, we can calculate the expected factor returns:
1. generate the FMP weights
2. compute portfolio returns using the FMP weights
3. calculate the average returns as the expected factor returns
4. add modification (penalties) to the expected factor returns
5. optimize in factor space (swap asset weights with factor weights, asset returns with factor returns)

