# Homework 5

## FINM 36700 - 2024

### UChicago Financial Mathematics

* Mark Hendricks
* hendricks@uchicago.edu

***

In [24]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

# Section 1: Harvard Case

*This section will not be graded, but it will be discussed in class.*

**Smart Beta Exchange-Traded-Funds and Factor Investing**.

* The case is a good introduction to important pricing factors.
* It also gives useful introduction and context to ETFs, passive vs active investing, and so-called “smart beta” funds.

1. Describe how each of the factors (other than MKT) is measured.1That is, each factor is a portfolio of stocks–which stocks are included in the factor portfolio?

2. Is the factor portfolio...
* long-only
* long-short
* value-weighted
* equally-weighted

4. What steps are taken in the factor construction to try to reduce the correlation between the factors?
5. What is the point of figures 1-6?
6. How is a “smart beta” ETF different from a traditional ETF?
7. Is it possible for all investors to have exposure to the “value” factor?
8. How does factor investing differ from traditional diversification?


If you need more info in how these factor portfolios are created, see Ken French’s website, and the follow- details: 

https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/f-f_5_factors_2x3.html

https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/det_mom_factor.html

***

# 2. The Factors

Use the data found in `factor_pricing_data.xlsx`.

* FACTORS: Monthly excess return data for the overall equity market, $\tilde{r}^{\text{MKT}}$.
* The column header to the market factor is `MKT` rather than `MKT-RF`, but it is indeed already in excess return form.
* The sheet also contains data on five additional factors.
* All factor data is already provided as excess returns

In [4]:
factors = pd.read_excel("factor_pricing_data.xlsx", sheet_name=1, index_col=0)
factors

Unnamed: 0_level_0,MKT,SMB,HML,RMW,CMA,UMD
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1980-01-31,0.0551,0.0183,0.0175,-0.0170,0.0164,0.0755
1980-02-29,-0.0122,-0.0157,0.0061,0.0004,0.0268,0.0788
1980-03-31,-0.1290,-0.0693,-0.0101,0.0146,-0.0119,-0.0955
1980-04-30,0.0397,0.0105,0.0106,-0.0210,0.0029,-0.0043
1980-05-31,0.0526,0.0211,0.0038,0.0034,-0.0031,-0.0112
...,...,...,...,...,...,...
2024-04-30,-0.0467,-0.0256,-0.0052,0.0148,-0.0030,-0.0042
2024-05-31,0.0434,0.0076,-0.0166,0.0298,-0.0307,-0.0002
2024-06-30,0.0277,-0.0437,-0.0331,0.0051,-0.0178,0.0090
2024-07-31,0.0124,0.0828,0.0573,0.0022,0.0043,-0.0242


1. Analyze the factors, similar to how you analyzed the three Fama-French factors in Homework 4.
You now have three additional factors, so let’s compare there univariate statistics. • mean
• volatility
• Sharpe

2. Based on the factor statistics above, answer the following.
(a) Does each factor have a positive risk premium (positive expected excess return)? (b) How have the factors performed since the time of the case, (2015-present)?

3. Report the correlation matrix across the six factors.
* Does the construction method succeed in keeping correlations small?
* Fama and French say that HML is somewhat redundant in their 5-factor model. Does this seem to be the case?

4. Report the tangency weights for a portfolio of these 6 factors.
* Which factors seem most important? And Least?
* Are the factors with low mean returns still useful?
* Re-do the tangency portfolio, but this time only include MKT, SMB, HML, and UMD. Which factors get high/low tangency weights now?

What do you conclude about the importance or unimportance of these styles?

### 1)

In [5]:
def stats_mean_vol_sharpe(data,portfolio = None,portfolio_name = 'Portfolio',annualize = 12):
    
    if portfolio is None:
        returns = data
    else:
        returns = data @ portfolio
    
    output = returns.agg(['mean','std'])
    output.loc['sharpe'] = output.loc['mean'] / output.loc['std']
    
    output.loc['mean'] *= annualize
    output.loc['std'] *= np.sqrt(annualize)
    output.loc['sharpe'] *= np.sqrt(annualize)
    
    if portfolio is None:
        pass
    else:
        output.columns = [portfolio_name]
    
    return output

In [6]:
stats_mean_vol_sharpe(factors)

Unnamed: 0,MKT,SMB,HML,RMW,CMA,UMD
mean,0.086277,0.008319,0.025809,0.047096,0.029537,0.062709
std,0.156904,0.101873,0.109999,0.083213,0.073084,0.154564
sharpe,0.549872,0.081665,0.234629,0.565962,0.404148,0.405714


### 2)

All factors have a positive risk premium if counting from the first available data. 

In [10]:
stats_mean_vol_sharpe(factors['2015':])

Unnamed: 0,MKT,SMB,HML,RMW,CMA,UMD
mean,0.116586,-0.0195,-0.017855,0.050886,-0.00841,0.021083
std,0.160285,0.104524,0.132709,0.073338,0.083989,0.140812
sharpe,0.727369,-0.186559,-0.134544,0.693862,-0.100137,0.149723


However, if we consider the performance since 2015, we see that SMB, HML and CMA have negative risk premiums

### 3)

In [11]:
factors.corr()

Unnamed: 0,MKT,SMB,HML,RMW,CMA,UMD
MKT,1.0,0.227756,-0.204356,-0.246768,-0.357823,-0.175585
SMB,0.227756,1.0,-0.029072,-0.414055,-0.049575,-0.055304
HML,-0.204356,-0.029072,1.0,0.219651,0.67845,-0.216986
RMW,-0.246768,-0.414055,0.219651,1.0,0.127209,0.079525
CMA,-0.357823,-0.049575,0.67845,0.127209,1.0,0.008398
UMD,-0.175585,-0.055304,-0.216986,0.079525,0.008398,1.0


Some correlations certainly aren't small. For example CMA-HML is really high at 0.67, and we can also note RMW-SMB and CMA-MKT. Due to the HML-CMA correlation being so high, a point could be made that it is redundant in the model given the other factors.

### 4)

In [15]:
Sigma = factors.cov() * 12
mu = factors.mean() * 12

w_tan = np.linalg.solve(Sigma, mu)
w_tan /= w_tan.sum()

In [16]:
pd.Series(data=w_tan, index=factors.columns)

MKT    0.209760
SMB    0.077337
HML   -0.042142
RMW    0.313263
CMA    0.338982
UMD    0.102798
dtype: float64

The most important factors are RMW and CMA, despite CMA having negative risk premium in recent years and RMW having much lower risk premium than MKT for example.

HML and SMB seem to be the least useful given their weights.

In [18]:
factors_small = factors[['MKT',"SMB", "HML","UMD"]]
Sigma_s = factors_small.cov() * 12
mu_s = factors_small.mean() * 12

w_tan_s = np.linalg.solve(Sigma_s, mu_s)
w_tan_s /= w_tan_s.sum()

pd.Series(data=w_tan_s, index=factors_small.columns)

MKT    0.365529
SMB   -0.032422
HML    0.356199
UMD    0.310694
dtype: float64

This time HML seems to have gained a lot of importance, which corroborates with the previous hypothesis that it is less useful when CMA is given. 

SMB is still unimportant given its weight.

***

# 3. Testing Modern LPMs

Consider the following factor models:
* CAPM: MKT
* Fama-French 3F: MKT, SMB, HML
* Fama-French 5F: MKT, SMB, HML, RMW, CMA
* AQR: MKT, HML, RMW, UMD

We are not saying this is “the” AQR model, but it is a good illustration of their most publicized factors: value, momentum, and more recently, profitability.

For instance, for the AQR model is...

$$
\mathbb{E}[\tilde{r}^i] = \beta^{i, \text{MKT}}\mathbb{E}[\tilde{f}_t^{\text{MKT}}] +
 \beta^{i,\text{HML}} \mathbb{E}[\tilde{f}_t^{\text{HML}}] +
  \beta^{i, \text{RMW}} \mathbb{E}[\tilde{f}_t^{\text{RMW}}] +
   \beta^{i, \text{UMD}} \mathbb{E}[\tilde{f}_t^{\text{UMD}}]
$$

We will test these models with the time-series regressions. Namely, for each asset i, estimate the following regression to test the AQR model:

$$
\tilde{r}_t^i = \alpha^i + \beta^{i, \text{MKT}}\tilde{f}_t^{\text{MKT}} +
 \beta^{i, \text{HML}}\tilde{f}_t^{\text{HML}} +
  \beta^{i, \text{RMW}}\tilde{f}_t^{\text{RMW}} +
   \beta^{i, \text{UMD}}\tilde{f}_t^{\text{UMD}}
   + \epsilon_t
$$

Data
* PORTFOLIOS: Monthly excess return data on 49 equity portfolios sorted by their industry. Denote these as $\tilde{r}^i$ , for $n = 1, . . . , 49.$

* You do NOT need the risk-free rate data. It is provided only for completeness. The other two tabs are already in terms of excess returns.

In [28]:
ports = pd.read_excel("factor_pricing_data.xlsx", sheet_name=2, index_col = 0)
factors4 = pd.read_excel("factor_pricing_data.xlsx", sheet_name=1, index_col = 0)[['MKT','HML','RMW','UMD']]

1. Test the AQR 4-Factor Model using the time-series test. (We are not doing the cross-sectional regression tests.)
* For each regression, report the estimated α and r-squared.
* Calculate the mean-absolute-error of the estimated alphas.
* If the pricing model worked, should these alpha estimates be large or small? Why?
* Based on your MAE stat, does this seem to support the pricing model or not?

2. Test the CAPM, FF 3-Factor Model and the the FF 5-Factor Model.
   * Report the MAE statistic for each of these models and compare it with the AQR Model MAE.
   * Which model fits best?
   
3. Does any particular factor seem especially important or unimportant for pricing? Do you think Fama and French should use the Momentum Factor?

4. This does not matter for pricing, but report the average (across $n$ estimations) of the time-series regression r-squared statistics.
   * Do this for each of the three models you tested.
   * Do these models lead to high time-series r-squared stats? That is, would these factors be good in a Linear Factor Decomposition of the assets?

5. We tested three models using the time-series tests (focusing on the time-series alphas.) Re-test these models, but this time use the cross-sectional test.
* Report the time-series premia of the factors (just their sample averages,) and compare to the cross-sectionally estimated premia of the factors. Do they differ substantially?4
* Report the MAE of the cross-sectional regression residuals for each of the four models. How do they compare to the MAE of the time-series alphas?

***

### 1)

In [58]:
def reg_output(x, factors):
    a = sm.OLS(x, sm.add_constant(factors)).fit()
    return pd.Series({'R2': a.rsquared, 'alpha': a.params['const']})

reg_output4 = lambda x : reg_output(x, factors4)

In [60]:
regs_out = ports.apply(reg_output4)
regs_out

Unnamed: 0,Agric,Food,Soda,Beer,Smoke,Toys,Fun,Books,Hshld,Clths,...,Boxes,Trans,Whlsl,Rtail,Meals,Banks,Insur,RlEst,Fin,Other
R2,0.339175,0.464221,0.308354,0.420762,0.273489,0.50834,0.616501,0.686774,0.55841,0.618299,...,0.582882,0.709527,0.753533,0.686135,0.636929,0.774982,0.675543,0.606331,0.814851,0.594589
alpha,0.000643,0.000579,0.001436,0.001422,0.002942,-0.003111,0.002542,-0.002452,-0.000674,-0.001513,...,0.000103,-0.001652,-0.001213,0.001665,-6.7e-05,-0.001814,-0.000622,-0.004791,0.001548,-0.003522


In [56]:
regs_out.loc['alpha'].abs().mean()

0.001916051581645664

If the pricing model was correct, these alphas should be close to zero since alpha is exactly the intercept, that is, the values that are not explained by the other factors.

The value of alpha seems small enough, although statistical testing could be done.

### 2)

In [68]:
factors1 = pd.read_excel("factor_pricing_data.xlsx", sheet_name=1, index_col = 0)[['MKT']]
factors3 = pd.read_excel("factor_pricing_data.xlsx", sheet_name=1, index_col = 0)[['MKT','HML','SMB']]
factors5 = pd.read_excel("factor_pricing_data.xlsx", sheet_name=1, index_col = 0)[['MKT','HML','SMB', 'RMW', 'CMA']]

reg_output1 = lambda x : reg_output(x, factors1)
reg_output3 = lambda x : reg_output(x, factors3)
reg_output5 = lambda x : reg_output(x, factors5)

In [69]:
regs_out1 = ports.apply(reg_output1)
regs_out1

Unnamed: 0,Agric,Food,Soda,Beer,Smoke,Toys,Fun,Books,Hshld,Clths,...,Boxes,Trans,Whlsl,Rtail,Meals,Banks,Insur,RlEst,Fin,Other
R2,0.330534,0.366464,0.25374,0.333662,0.189975,0.495417,0.599294,0.652329,0.493945,0.558941,...,0.554475,0.667362,0.741252,0.662701,0.581694,0.610876,0.572667,0.527597,0.772817,0.589199
alpha,0.001695,0.003818,0.003923,0.004953,0.00677,-0.00314,0.000574,-0.001206,0.001765,4.4e-05,...,0.000977,-1.6e-05,7.2e-05,0.002532,0.00211,7e-06,0.001946,-0.004235,0.000385,-0.002754


In [117]:
regs_out1.loc['alpha'].abs().mean()

0.0017001307693664562

In [63]:
regs_out5 = ports.apply(reg_output5)
regs_out5

Unnamed: 0,Agric,Food,Soda,Beer,Smoke,Toys,Fun,Books,Hshld,Clths,...,Boxes,Trans,Whlsl,Rtail,Meals,Banks,Insur,RlEst,Fin,Other
R2,0.363375,0.490799,0.313214,0.442389,0.305338,0.549694,0.613717,0.702782,0.589291,0.628634,...,0.578576,0.721812,0.800149,0.68808,0.646969,0.782535,0.6806,0.695625,0.823622,0.595786
alpha,2.8e-05,-0.00022,-0.000153,0.000846,0.000916,-0.0057,0.001388,-0.003698,-0.001803,-0.003385,...,-0.000551,-0.002648,-0.002279,0.001223,-0.001342,-0.001284,-0.000215,-0.007038,0.002369,-0.004058


In [108]:
regs_out5.loc['alpha'].abs().mean()

0.0026139465380550357

In [66]:
regs_out3 = ports.apply(reg_output3)
regs_out3

Unnamed: 0,Agric,Food,Soda,Beer,Smoke,Toys,Fun,Books,Hshld,Clths,...,Boxes,Trans,Whlsl,Rtail,Meals,Banks,Insur,RlEst,Fin,Other
R2,0.357393,0.416709,0.280459,0.36236,0.236211,0.528722,0.609808,0.688783,0.513703,0.574434,...,0.56771,0.694622,0.773986,0.664633,0.591929,0.7667,0.674567,0.677585,0.796552,0.592239
alpha,0.001344,0.00298,0.002899,0.004569,0.005512,-0.003176,0.000528,-0.002111,0.001494,-0.000513,...,0.000305,-0.000839,-0.00021,0.002774,0.001583,-0.002513,0.000252,-0.005825,-0.000493,-0.003095


In [67]:
regs_out3.loc['alpha'].abs().mean()

0.001998650354322511

From this criterion, CAPM fits best.

### 3)

Since AQR4 performed substatially better than the 5 factor model, and also better than the FF 3 factor model, it could be argued that momentum is an important factor.

CMA seems to not be very important, as its addition increased the MAE of the alphas. Meanwhile, MKT seems to be by far the most important factor, since the CAPM alone was able to find alphas that give very small MAE.

### 4)

In [74]:
pd.Series(
data=[
regs_out1.loc['R2'].mean(),
regs_out3.loc['R2'].mean(),
regs_out5.loc['R2'].mean()],
index= ['CAPM', 'FF3', 'FF5'])

CAPM    0.526107
FF3     0.571484
FF5     0.595951
dtype: float64

The $R^2$ values are not very high, making them unsuitable for LFD and hedging

In [81]:
def reg_betas(x, factors):
    a = sm.OLS(x, sm.add_constant(factors)).fit()
    return a.params

In [109]:
reg_betas1 = lambda x : reg_betas(x, factors1)
reg_betas3 = lambda x : reg_betas(x, factors3)
reg_betas4 = lambda x : reg_betas(x, factors4)
reg_betas5 = lambda x : reg_betas(x, factors5)

In [90]:
capm_betas = ports.apply(reg_betas1).loc['MKT']
sm.OLS(ports.mean() * 12, capm_betas).fit().params

MKT    0.086096
dtype: float64

In [94]:
ff3_betas = ports.apply(reg_betas3).loc[['MKT','HML','SMB']]
sm.OLS(ports.mean() * 12, ff3_betas.T).fit().params

MKT    0.102214
HML   -0.015204
SMB   -0.064485
dtype: float64

In [111]:
ff4_betas = ports.apply(reg_betas4).loc[['MKT','HML', 'RMW','UMD']]
sm.OLS(ports.mean() * 12, ff4_betas.T).fit().params

MKT    0.089125
HML   -0.038711
RMW    0.043748
UMD    0.060747
dtype: float64

In [98]:
ff5_betas = ports.apply(reg_betas5).loc[['MKT','HML','SMB', 'RMW','CMA']]
sm.OLS(ports.mean() * 12, ff5_betas.T).fit().params

MKT    0.096486
HML   -0.030994
SMB   -0.056991
RMW    0.033605
CMA   -0.011559
dtype: float64

In [100]:
factors.mean() * 12

MKT    0.086277
SMB    0.008319
HML    0.025809
RMW    0.047096
CMA    0.029537
UMD    0.062709
dtype: float64

MKT has consistent values, and so is the case for RMW. SMB, HML and CMA however are not cosistent from the sample mean and cross-sectional risk-premia.

In [113]:
sm.OLS(ports.mean() * 12, sm.add_constant(capm_betas)).fit().params['const']

0.08538430702935564

In [114]:
sm.OLS(ports.mean() * 12, sm.add_constant(ff3_betas.T)).fit().params['const']

0.06224499665984369

In [115]:
sm.OLS(ports.mean() * 12, sm.add_constant(ff4_betas.T)).fit().params['const']

0.06939177575580863

In [116]:
sm.OLS(ports.mean() * 12, sm.add_constant(ff5_betas.T)).fit().params['const']

0.05312189072347109

The alphas resulting from the cross-sectional regression are much higher