# Money Management Fama-Macbeth Regression Model

This notebook I discuss the famous Fama-Macbeth Regression Model. This analysis assesses the parameters of capital asset pricing model (CAPM), FF-3, and FF-5, to determine the underlying risk premia of the regression parameters. Source is [here](https://github.com/PacktPublishing/Hands-On-Machine-Learning-for-Algorithmic-Trading/blob/master/Chapter07/02_fama_macbeth.ipynb).

This notebook I discuss the following:
- **Environment Initiation**
- **Two-Step Procedure**
- **Fama-Macbeth Cross-Sectional Regression**

## Environment Initiation

Let me import all the required modules.

In [None]:
from pprint import pprint
from pandas_datareader.famafrench import get_available_datasets
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.api import OLS, add_constant
from pathlib import Path
import warnings
from linearmodels.asset_pricing import TradedFactorModel, LinearFactorModel, LinearFactorModelGMM

# due to https://stackoverflow.com/questions/50394873/import-pandas-datareader-gives-importerror-cannot-import-name-is-list-like
# may become obsolete when fixed
pd.core.common.is_list_like = pd.api.types.is_list_like
import pandas_datareader.data as web

We use Fama-French factor model. There was a FF 3-Factor model and a FF 5-Factor model. We introduce the notations below. Each of the concepts is a portfolio, e.g. **PF**. 

| Concept  | Explanation   |
| ------   | ------        |
| Size (SMB)          | Nine small stock PF minus mine large stock PF |
| Value (HML)         | Two value PF minus two growth (with low BE/ME value) PF |
| Profitability (RMW) | Two robust OP PF minus two weak OP PF |
| Investment (CMA)    | Two conservative investment portfolios minus two aggressive investment portfolios |
| Market              | Value-weight return of all firms incorporated in and listed on major US exchanges with good data minus the one-month Treasury bill rate |

In [26]:
ff_factor = 'F-F_Research_Data_5_Factors_2x3'
ff_factor_data = web.DataReader(ff_factor, 'famafrench', start='2010', end='2017-12')[0]
ff_factor_data.info()

<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 96 entries, 2010-01 to 2017-12
Freq: M
Data columns (total 6 columns):
Mkt-RF    96 non-null float64
SMB       96 non-null float64
HML       96 non-null float64
RMW       96 non-null float64
CMA       96 non-null float64
RF        96 non-null float64
dtypes: float64(6)
memory usage: 5.2 KB


In [27]:
ff_factor_data.describe()

Unnamed: 0,Mkt-RF,SMB,HML,RMW,CMA,RF
count,96.0,96.0,96.0,96.0,96.0,96.0
mean,1.158437,0.053854,-0.064896,0.1475,0.049271,0.012604
std,3.579997,2.296482,2.199398,1.550507,1.408406,0.022583
min,-7.89,-4.55,-4.5,-3.99,-3.33,0.0
25%,-0.9175,-1.6,-1.5125,-1.025,-0.9625,0.0
50%,1.235,0.18,-0.295,0.155,-0.015,0.0
75%,3.19,1.505,1.14,1.145,0.92,0.01
max,11.35,6.81,8.29,3.48,3.7,0.09


Fama and French made numerous portfolios that we can illustrate the estimation of the factors exposure. Here there are 17 industry portfolios using monthly data. Let us get these portfolios.

In [28]:
ff_portfolio = '17_Industry_Portfolios'
ff_portfolio_data = web.DataReader(ff_portfolio, 'famafrench', start='2010', end='2017-12')[0]
ff_portfolio_data = ff_portfolio_data.sub(ff_factor_data.RF, axis=0)
ff_factor_data = ff_factor_data.drop('RF', axis=1)
ff_portfolio_data.info()

<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 96 entries, 2010-01 to 2017-12
Freq: M
Data columns (total 17 columns):
Food     96 non-null float64
Mines    96 non-null float64
Oil      96 non-null float64
Clths    96 non-null float64
Durbl    96 non-null float64
Chems    96 non-null float64
Cnsum    96 non-null float64
Cnstr    96 non-null float64
Steel    96 non-null float64
FabPr    96 non-null float64
Machn    96 non-null float64
Cars     96 non-null float64
Trans    96 non-null float64
Utils    96 non-null float64
Rtail    96 non-null float64
Finan    96 non-null float64
Other    96 non-null float64
dtypes: float64(17)
memory usage: 13.5 KB


In [29]:
ff_portfolio_data.describe()

Unnamed: 0,Food,Mines,Oil,Clths,Durbl,Chems,Cnsum,Cnstr,Steel,FabPr,Machn,Cars,Trans,Utils,Rtail,Finan,Other
count,96.0,96.0,96.0,96.0,96.0,96.0,96.0,96.0,96.0,96.0,96.0,96.0,96.0,96.0,96.0,96.0,96.0
mean,1.045625,0.203229,0.547917,1.396979,1.154896,1.303438,1.13625,1.73125,0.555625,1.351042,1.227604,1.278854,1.465521,0.89125,1.234375,1.243646,1.281979
std,2.795857,7.902683,5.577552,5.025167,5.137095,5.594722,3.174283,5.246562,7.389824,4.694688,4.811242,5.718887,4.151203,3.237306,3.508655,4.80835,3.710443
min,-5.17,-24.38,-12.01,-10.0,-13.21,-17.39,-7.3,-13.96,-20.49,-11.96,-9.08,-11.65,-8.56,-6.99,-9.18,-11.02,-7.92
25%,-0.785,-5.8325,-3.1675,-1.865,-2.0175,-1.445,-0.92,-2.4625,-4.41,-1.4475,-2.0475,-1.245,-0.88,-0.745,-0.9625,-1.4475,-1.0675
50%,0.93,-0.415,1.04,1.16,1.205,1.435,1.47,2.19,0.66,1.485,1.545,0.645,1.505,1.215,0.88,1.94,1.575
75%,3.1875,5.7075,3.915,3.8575,4.315,4.4425,3.3175,5.39,4.22,3.875,4.6575,4.8025,4.2275,2.965,3.355,4.0525,3.5175
max,6.67,21.92,16.3,17.2,16.58,18.37,8.29,15.55,21.35,17.66,14.65,20.86,13.16,7.9,12.36,13.43,10.79


## Two-Step Procedure

Fama-Macbeth regression has a two-step procedure. 

To address the inference problem caused by the correlation of the residuals, Fama and MacBeth proposed a two-step methodology for a cross-sectional regression of returns on factors. The two-stage Fama—Macbeth regression is designed to estimate the premium rewarded for the exposure to a particular risk factor by the market. The two stages consist of:

- **First stage**: N time-series regression, one for each asset or portfolio, of its excess returns on the factors to estimate the factor loadings.

- **Second stage**: T cross-sectional regression, one for each time period, to estimate the risk premium.

In [30]:
betas = []
for industry in ff_portfolio_data:
    step1 = OLS(endog=ff_portfolio_data[industry],
                exog=sm.add_constant(ff_factor_data)).fit()
    betas.append(step1.params.drop('const'))

betas = pd.DataFrame(betas,
                         columns=ff_factor_data.columns,
                         index=ff_portfolio_data.columns)

  return ptp(axis=axis, out=out, **kwargs)


In [31]:
betas.info()

<class 'pandas.core.frame.DataFrame'>
Index: 17 entries, Food  to Other
Data columns (total 5 columns):
Mkt-RF    17 non-null float64
SMB       17 non-null float64
HML       17 non-null float64
RMW       17 non-null float64
CMA       17 non-null float64
dtypes: float64(5)
memory usage: 1.4+ KB


In [32]:
lambdas = []
for period in ff_portfolio_data.index:
    step2 = OLS(endog=ff_portfolio_data.loc[period, betas.index],
                exog=betas).fit()
    lambdas.append(step2.params)

lambdas = pd.DataFrame(lambdas,
                       index=ff_portfolio_data.index,
                       columns=betas.columns.tolist())

In [33]:
lambdas.info()

<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 96 entries, 2010-01 to 2017-12
Freq: M
Data columns (total 5 columns):
Mkt-RF    96 non-null float64
SMB       96 non-null float64
HML       96 non-null float64
RMW       96 non-null float64
CMA       96 non-null float64
dtypes: float64(5)
memory usage: 10.2 KB


In [34]:
lambdas.mean()

Mkt-RF    1.181043
SMB       0.112553
HML      -1.234931
RMW      -0.341728
CMA      -0.627899
dtype: float64

In [35]:
t = lambdas.mean().div(lambdas.std())
t

Mkt-RF    0.328238
SMB       0.029175
HML      -0.285483
RMW      -0.113031
CMA      -0.181547
dtype: float64

## Fama-Macbeth Model

Let us build Fama-Macbeth Model.

In [36]:
mod = LinearFactorModel(portfolios=ff_portfolio_data, factors=ff_factor_data)
res = mod.fit()
print(res)

                      LinearFactorModel Estimation Summary                      
No. Test Portfolios:                 17   R-squared:                      0.6905
No. Factors:                          5   J-statistic:                    17.359
No. Observations:                    96   P-value                         0.1366
Date:                  Wed, Dec 25 2019   Distribution:                 chi2(12)
Time:                          11:34:09                                         
Cov. Estimator:                  robust                                         
                                                                                
                            Risk Premia Estimates                             
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Mkt-RF         1.1810     0.4105     2.8772     0.0040      0.3765      1.9856
SMB            0.1126     0.8869    