# 6 Fama-French 3-factor model
파마-프렌치 3-요인 모델의 시장기대수익률:
$$
    R_E = R_f + \beta_{mkt} (r_m - r_f) + \beta_{SMB} E_{SMB} + \beta_{HML} E_{HML}
$$

where

\begin{align*}
    SMB &= \frac{1}{3}(\text{Small Value} + \text{Small Neutral} + \text{Small Growth}) - 
          \frac{1}{3}(\text{Big Value} + \text{Big Neutral} + \text{Big Growth}) \\
    HML &= \frac{1}{2}(\text{Small Value} + \text{Big Value}) - 
           \frac{1}{2}(\text{Small Growth} + \text{Big Growth})
\end{align*}

(`SMB` = "Small minus Big", `HML` = "High minus Low")

그리고 여기서 

Big/Small의 기준은: Market Cap.(시가총액) 상위 50%/하위 50% 로 구분하며

Value/Neutral/Growth의 기준은: BE/ME=장부가/시장가(=`1/PBR`) 비율의 상위 30%/중위 40%/하위 30% 로 구분한다.

간단히 말해서, CAPM + Size risk + Value risk를 반영한 모델이 F-F 3-factor model인데 size와 value에 대한 risk premium이 가중되기때문에 일반적으로 기대수익률이 CAPM대비 상대적으로 높다.

In [1]:
import pandas_datareader.data as web
import pandas_datareader.famafrench as ff

datasets = ff.get_available_datasets()

print('No. of datasets:{0}'.format(len(datasets)))

No. of datasets:297


In [2]:
datasets

['F-F_Research_Data_Factors',
 'F-F_Research_Data_Factors_weekly',
 'F-F_Research_Data_Factors_daily',
 'F-F_Research_Data_5_Factors_2x3',
 'F-F_Research_Data_5_Factors_2x3_daily',
 'Portfolios_Formed_on_ME',
 'Portfolios_Formed_on_ME_Wout_Div',
 'Portfolios_Formed_on_ME_Daily',
 'Portfolios_Formed_on_BE-ME',
 'Portfolios_Formed_on_BE-ME_Wout_Div',
 'Portfolios_Formed_on_BE-ME_Daily',
 'Portfolios_Formed_on_OP',
 'Portfolios_Formed_on_OP_Wout_Div',
 'Portfolios_Formed_on_OP_Daily',
 'Portfolios_Formed_on_INV',
 'Portfolios_Formed_on_INV_Wout_Div',
 'Portfolios_Formed_on_INV_Daily',
 '6_Portfolios_2x3',
 '6_Portfolios_2x3_Wout_Div',
 '6_Portfolios_2x3_weekly',
 '6_Portfolios_2x3_daily',
 '25_Portfolios_5x5',
 '25_Portfolios_5x5_Wout_Div',
 '25_Portfolios_5x5_Daily',
 '100_Portfolios_10x10',
 '100_Portfolios_10x10_Wout_Div',
 '100_Portfolios_10x10_Daily',
 '6_Portfolios_ME_OP_2x3',
 '6_Portfolios_ME_OP_2x3_Wout_Div',
 '6_Portfolios_ME_OP_2x3_daily',
 '25_Portfolios_ME_OP_5x5',
 '25_Portf

In [3]:
# datasets중 이름에 '10'과 'Industry'가 있는 dataset을 찾고
df_10_insustry = [dataset for dataset in datasets if '10' in dataset and 'Industry' in dataset]
print(df_10_insustry)

['10_Industry_Portfolios', '10_Industry_Portfolios_Wout_Div', '10_Industry_Portfolios_daily']


In [4]:
# 두 셀 아래 코드를 보면 df_10_insustry[0]: Average Value Weighted Returns -- Monthly (30 rows x 10 cols)임을 알 수 있다.
ds_industry = web.DataReader(df_10_insustry[0], 'famafrench', start='2017-06-23', end='2019-11-01')
print(type(ds_industry))

  ds_industry = web.DataReader(df_10_insustry[0], 'famafrench', start='2017-06-23', end='2019-11-01')
  ds_industry = web.DataReader(df_10_insustry[0], 'famafrench', start='2017-06-23', end='2019-11-01')
  ds_industry = web.DataReader(df_10_insustry[0], 'famafrench', start='2017-06-23', end='2019-11-01')
  ds_industry = web.DataReader(df_10_insustry[0], 'famafrench', start='2017-06-23', end='2019-11-01')
  ds_industry = web.DataReader(df_10_insustry[0], 'famafrench', start='2017-06-23', end='2019-11-01')
  ds_industry = web.DataReader(df_10_insustry[0], 'famafrench', start='2017-06-23', end='2019-11-01')
  ds_industry = web.DataReader(df_10_insustry[0], 'famafrench', start='2017-06-23', end='2019-11-01')
  ds_industry = web.DataReader(df_10_insustry[0], 'famafrench', start='2017-06-23', end='2019-11-01')


<class 'dict'>


In [5]:
ds_industry.keys()

dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 'DESCR'])

In [6]:
print(ds_industry['DESCR'])

10 Industry Portfolios
----------------------

This file was created by CMPT_IND_RETS using the 202304 CRSP database. It contains value- and equal-weighted returns for 10 industry portfolios. The portfolios are constructed at the end of June. The annual returns are from January to December. Missing data are indicated by -99.99 or -999. Copyright 2023 Kenneth R. French

  0 : Average Value Weighted Returns -- Monthly (30 rows x 10 cols)
  1 : Average Equal Weighted Returns -- Monthly (30 rows x 10 cols)
  2 : Average Value Weighted Returns -- Annual (3 rows x 10 cols)
  3 : Average Equal Weighted Returns -- Annual (3 rows x 10 cols)
  4 : Number of Firms in Portfolios (30 rows x 10 cols)
  5 : Average Firm Size (30 rows x 10 cols)
  6 : Sum of BE / Sum of ME (3 rows x 10 cols)
  7 : Value-Weighted Average of BE/ME (3 rows x 10 cols)


In [7]:
ds_industry[0].head()

Unnamed: 0_level_0,NoDur,Durbl,Manuf,Enrgy,HiTec,Telcm,Shops,Hlth,Utils,Other
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2017-06,-1.03,3.51,1.28,-0.08,-2.1,-2.22,-1.9,5.54,-1.89,4.22
2017-07,-0.13,-1.17,2.18,2.07,3.74,5.27,0.12,0.7,2.98,1.49
2017-08,-1.73,-0.1,0.27,-5.09,3.08,-2.64,-1.67,2.58,2.2,-0.38
2017-09,-0.33,5.28,4.17,10.95,0.65,-1.67,2.43,2.05,-1.97,4.36
2017-10,0.09,1.25,2.8,0.49,6.88,-5.68,2.72,-2.27,3.07,2.08


In [8]:
df_5_factor = [dataset for dataset in datasets if '5' in dataset and 'Factor' in dataset]
df_5_factor

['F-F_Research_Data_5_Factors_2x3',
 'F-F_Research_Data_5_Factors_2x3_daily',
 'Developed_5_Factors',
 'Developed_5_Factors_Daily',
 'Developed_ex_US_5_Factors',
 'Developed_ex_US_5_Factors_Daily',
 'Europe_5_Factors',
 'Europe_5_Factors_Daily',
 'Japan_5_Factors',
 'Japan_5_Factors_Daily',
 'Asia_Pacific_ex_Japan_5_Factors',
 'Asia_Pacific_ex_Japan_5_Factors_Daily',
 'North_America_5_Factors',
 'North_America_5_Factors_Daily',
 'Emerging_5_Factors']

In [9]:
ds_factors = web.DataReader(df_5_factor[0], 'famafrench', start='2016-06-23', end='2019-11-01')
print('\nKEYS\n{0}'.format(ds_factors.keys()))


KEYS
dict_keys([0, 1, 'DESCR'])


  ds_factors = web.DataReader(df_5_factor[0], 'famafrench', start='2016-06-23', end='2019-11-01')
  ds_factors = web.DataReader(df_5_factor[0], 'famafrench', start='2016-06-23', end='2019-11-01')


In [10]:
print('DATASET DESCRIPTION \n {0}'.format(ds_factors['DESCR']))

DATASET DESCRIPTION 
 F-F Research Data 5 Factors 2x3
-------------------------------

This file was created by CMPT_ME_BEME_OP_INV_RETS using the 202304 CRSP database. The 1-month TBill return is from Ibbotson and Associates Inc.

  0 : (42 rows x 6 cols)
  1 : Annual Factors: January-December (4 rows x 6 cols)


In [11]:
ds_factors[0].head()

Unnamed: 0_level_0,Mkt-RF,SMB,HML,RMW,CMA,RF
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2016-06,-0.05,0.44,-1.48,1.39,1.91,0.02
2016-07,3.95,2.49,-1.27,1.25,-1.19,0.02
2016-08,0.49,1.7,3.13,-1.88,-0.34,0.02
2016-09,0.25,1.86,-1.23,-2.23,0.03,0.02
2016-10,-2.02,-4.04,4.12,0.96,0.28,0.02


#### 6.5.2 펀드 수익률과 요인 데이터 회귀분석
월간 3-factor(Mkt-RF, SMB, HML)을 가져오고 펀드 수익률을 계산한 후 이를 회귀분석한다:

In [12]:
import pandas_datareader.data as web
import pandas_datareader.famafrench as ff
import pandas as pd

datasets = ff.get_available_datasets()

df_3_factor = datasets[0]
df_3_factor

'F-F_Research_Data_Factors'

In [13]:
ds_factors = web.DataReader(df_3_factor, 'famafrench', start='1980-02-01', end='2019-06-30')
print(ds_factors)

{0:          Mkt-RF   SMB   HML    RF
Date                             
1980-02   -1.22 -1.85  0.61  0.89
1980-03  -12.90 -6.64 -1.01  1.21
1980-04    3.97  1.05  1.06  1.26
1980-05    5.26  2.13  0.38  0.81
1980-06    3.06  1.66 -0.76  0.61
...         ...   ...   ...   ...
2019-02    3.40  2.05 -2.67  0.18
2019-03    1.10 -3.03 -4.10  0.19
2019-04    3.97 -1.74  2.14  0.21
2019-05   -6.94 -1.31 -2.35  0.21
2019-06    6.93  0.27 -0.72  0.18

[473 rows x 4 columns], 1:       Mkt-RF    SMB    HML     RF
Date                             
1980   22.13   5.66 -24.61  11.24
1981  -18.13   7.11  25.04  14.71
1982   10.66   8.68  13.29  10.54
1983   13.75  14.00  20.52   8.80
1984   -6.05  -8.22  19.13   9.85
1985   24.91   0.55   1.29   7.72
1986   10.12  -9.55   9.34   6.16
1987   -3.87 -10.95  -1.70   5.47
1988   11.55   5.78  14.99   6.35
1989   20.49 -12.86  -4.03   8.37
1990  -13.95 -13.99 -10.03   7.81
1991   29.18  16.08 -14.72   5.60
1992    6.23   7.74  24.49   3.51
1993    8.21   6

  ds_factors = web.DataReader(df_3_factor, 'famafrench', start='1980-02-01', end='2019-06-30')
  ds_factors = web.DataReader(df_3_factor, 'famafrench', start='1980-02-01', end='2019-06-30')


In [14]:
ds_factors[0].index

PeriodIndex(['1980-02', '1980-03', '1980-04', '1980-05', '1980-06', '1980-07',
             '1980-08', '1980-09', '1980-10', '1980-11',
             ...
             '2018-09', '2018-10', '2018-11', '2018-12', '2019-01', '2019-02',
             '2019-03', '2019-04', '2019-05', '2019-06'],
            dtype='period[M]', name='Date', length=473)

In [15]:
# 나중에 구할 펀드 수익률과 합치기 위해 인덱스 종류를 변경 "string format time"
ds_factors[0].index = ds_factors[0].index.strftime('%Y-%m')
ds_factors[0].index

Index(['1980-02', '1980-03', '1980-04', '1980-05', '1980-06', '1980-07',
       '1980-08', '1980-09', '1980-10', '1980-11',
       ...
       '2018-09', '2018-10', '2018-11', '2018-12', '2019-01', '2019-02',
       '2019-03', '2019-04', '2019-05', '2019-06'],
      dtype='object', name='Date', length=473)

In [16]:
ds_3_factors = ds_factors[0]
ds_3_factors

Unnamed: 0_level_0,Mkt-RF,SMB,HML,RF
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1980-02,-1.22,-1.85,0.61,0.89
1980-03,-12.90,-6.64,-1.01,1.21
1980-04,3.97,1.05,1.06,1.26
1980-05,5.26,2.13,0.38,0.81
1980-06,3.06,1.66,-0.76,0.61
...,...,...,...,...
2019-02,3.40,2.05,-2.67,0.18
2019-03,1.10,-3.03,-4.10,0.19
2019-04,3.97,-1.74,2.14,0.21
2019-05,-6.94,-1.31,-2.35,0.21


티커 `FCNTX` 인 펀드를 F-F 3-model로 분석하고 회귀 분석한다:

In [17]:
import yfinance as yf

# Fidelity Contrafund Fund
ticker = "FCNTX"
pxclose = yf.download(ticker, start='1980-01-01', end='2019-06-30', interval='1m')['Adj Close']  # '1m': monthlt data

pxclose.head()

[*********************100%***********************]  1 of 1 completed


Datetime
1980-01-02 09:30:00-05:00    0.120388
1980-01-03 09:30:00-05:00    0.118800
1980-01-04 09:30:00-05:00    0.120600
1980-01-07 09:30:00-05:00    0.120177
1980-01-08 09:30:00-05:00    0.123565
Name: Adj Close, dtype: float64

In [18]:
pxclose.index = pxclose.index.strftime('%Y-%m')
pxclose.head()

Datetime
1980-01    0.120388
1980-01    0.118800
1980-01    0.120600
1980-01    0.120177
1980-01    0.123565
Name: Adj Close, dtype: float64

In [19]:
# 수익률
ret_data = pxclose.pct_change()[1:]
ret_data.head()

Datetime
1980-01   -0.013193
1980-01    0.015152
1980-01   -0.003512
1980-01    0.028194
1980-01    0.000000
Name: Adj Close, dtype: float64

In [20]:
ret_data = pd.DataFrame(ret_data)
ret_data.head()

Unnamed: 0_level_0,Adj Close
Datetime,Unnamed: 1_level_1
1980-01,-0.013193
1980-01,0.015152
1980-01,-0.003512
1980-01,0.028194
1980-01,0.0


In [21]:
ret_data.columns = ['portfolio']
ret_data.head()

Unnamed: 0_level_0,portfolio
Datetime,Unnamed: 1_level_1
1980-01,-0.013193
1980-01,0.015152
1980-01,-0.003512
1980-01,0.028194
1980-01,0.0


In [22]:
# ret_data & ds_3_factors를 합치는데 인덱스 기준으로 합치면 left_index와 right_index를 True로,
# 서로 일치하는 데이터만 합치려면 how='inner'로 설정한다
regress_data = ret_data.merge(ds_3_factors, how='inner', left_index=True, right_index=True)
regress_data.head()

Unnamed: 0,portfolio,Mkt-RF,SMB,HML,RF
1980-02,0.0,-1.22,-1.85,0.61,0.89
1980-02,-0.002665,-1.22,-1.85,0.61,0.89
1980-02,0.0,-1.22,-1.85,0.61,0.89
1980-02,0.010686,-1.22,-1.85,0.61,0.89
1980-02,-0.002644,-1.22,-1.85,0.61,0.89


In [23]:
regress_data.rename(columns={'Mkt-RF': "mkt_excess"}, inplace=True)
regress_data.head()

Unnamed: 0,portfolio,mkt_excess,SMB,HML,RF
1980-02,0.0,-1.22,-1.85,0.61,0.89
1980-02,-0.002665,-1.22,-1.85,0.61,0.89
1980-02,0.0,-1.22,-1.85,0.61,0.89
1980-02,0.010686,-1.22,-1.85,0.61,0.89
1980-02,-0.002644,-1.22,-1.85,0.61,0.89


In [24]:
# 초과수익률을 'port_excess' col.에 저장
regress_data['port_excess'] = regress_data['portfolio'] - regress_data['RF']
regress_data.head()

Unnamed: 0,portfolio,mkt_excess,SMB,HML,RF,port_excess
1980-02,0.0,-1.22,-1.85,0.61,0.89,-0.89
1980-02,-0.002665,-1.22,-1.85,0.61,0.89,-0.892665
1980-02,0.0,-1.22,-1.85,0.61,0.89,-0.89
1980-02,0.010686,-1.22,-1.85,0.61,0.89,-0.879314
1980-02,-0.002644,-1.22,-1.85,0.61,0.89,-0.892644


In [27]:
import statsmodels.api as smf

model = smf.formula.ols(formula="port_excess ~ mkt_excess + SMB + HML", data=regress_data).fit()

print(model.params)

Intercept    -0.348937
mkt_excess    0.003356
SMB           0.003714
HML          -0.006339
dtype: float64


In [26]:
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:            port_excess   R-squared:                       0.012
Model:                            OLS   Adj. R-squared:                  0.012
Method:                 Least Squares   F-statistic:                     39.66
Date:                Sun, 18 Jun 2023   Prob (F-statistic):           1.82e-25
Time:                        20:10:45   Log-Likelihood:                -1849.9
No. Observations:                9937   AIC:                             3708.
Df Residuals:                    9933   BIC:                             3737.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.3489      0.003   -117.071      0.0

회귀 분석 결과 해석해보기-
우선 결과로 
$$
    \text{port excess} = 0.0034 * \text{mkt excess} + 0.0037 * \text{SMB} -0.0063 * \text{HML} - 0.3489
$$
의 선형회귀식이 도출된다.

F-F 3-factor model이
$$
    R_E = R_f + \beta_{mkt} (r_m - r_f) + \beta_{SMB} E_{SMB} + \beta_{HML} E_{HML}
$$
인데,

다시 말해, OLS를 이용한 linear regression의 결과로
$$
    \beta_{mkt}=0.0034, \quad \beta_{SMB}=0.0037, \quad \beta_{HML}=-0.0063
$$
임을 알 수 있다.

하지만 이 식에서는 $R_f=-0.3489$ 라는 뜻이므로 F-F 3-factor model로 이를 완벽히 설명하지는 못하는 것으로 보인다. 어쩌면 선형회귀의 문제일 수도 있지만...

3-model 대신 `F-F 5-factor model`을 이용하면 결과가 어떻게 될까?