# MFIN7034 Problem Set 1 â€“ Factor Model
In this problem set, you will run regressions to understand how factor models work in (cross-sectional and time-series) asset return analysis.

Data sources:
1. https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html#Research
2. https://global-q.org/factors.html
3. https://finance.wharton.upenn.edu/~stambaug/

To match the datasets for stock returns, use monthly versions with time range from 2000-01 to 2022-12 for all factor datasets. \
Submission: Proper visualization and clear interpretations & discussions, such as explaining why the coefficient of a factor change over time, will also be graded.

## Understand dataset and preprocessing

In [1]:
import pandas as pd
import os

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all' # multiple output per jupyter notebook code block

In [14]:
# loading data

# manual modification: 
# 1. delete 'Annual Factors' at the end of 'F-F_Research_Data_5_Factors_2x3.csv'
# 2. remove header comment '%' in liq_data_1962_2023.txt
return_df = pd.read_csv('data/monthly_stock_returns.csv')
ff5_df = pd.read_csv('data/F-F_Research_Data_5_Factors_2x3.csv', comment='%')
q5_df = pd.read_csv('data/q5_factors_monthly_2023.csv')

# special handling
liq_df = pd.read_csv(
    'data/liq_data_1962_2023.txt',
    sep='\t',            # Split on one or more tabs
    comment='%',         # Skip lines starting with '%'
    header=0,            # Use the first non-comment line as header
    skipinitialspace=True,  # Ignore spaces after tabs
    na_values=-99,       # Treat -99 as missing values
    usecols=[0, 1, 2, 3] # explicityly select 4 columns
)

In [7]:
return_df.head()
ff5_df.head()
q5_df.head()
liq_df.head()

Unnamed: 0,PERMNO,YYYYMM,MthPrc,MthRet
0,10324,200001,52.0,0.155556
1,10324,200002,57.4375,0.104567
2,10324,200003,50.125,-0.127312
3,10324,200004,48.8125,-0.026185
4,10324,200005,56.8125,0.163892


Unnamed: 0.1,Unnamed: 0,Mkt-RF,SMB,HML,RMW,CMA,RF
0,196307,-0.39,-0.41,-0.97,0.68,-1.18,0.27
1,196308,5.07,-0.8,1.8,0.36,-0.35,0.25
2,196309,-1.57,-0.52,0.13,-0.71,0.29,0.27
3,196310,2.53,-1.39,-0.1,2.8,-2.01,0.29
4,196311,-0.85,-0.88,1.75,-0.51,2.24,0.27


Unnamed: 0,year,month,R_F,R_MKT,R_ME,R_IA,R_ROE,R_EG
0,1967,1,0.3927,8.1852,6.8122,-2.9263,1.8813,-2.5511
1,1967,2,0.3743,0.7557,1.6235,-0.2915,3.5399,2.1792
2,1967,3,0.3693,4.0169,1.9836,-1.6772,1.8417,-1.1192
3,1967,4,0.3344,3.8786,-0.67,-2.8972,1.0253,-1.6371
4,1967,5,0.3126,-4.2807,2.7366,2.1864,0.6038,0.1191


Unnamed: 0,Month,Agg Liq.,Innov Liq (eq8),Traded Liq (LIQ_V)
0,196208,-0.017537,0.00426,
1,196209,-0.004075,0.011757,
2,196210,-0.104212,-0.074128,
3,196211,-0.019742,0.028572,
4,196212,-0.005089,0.013037,


Organise timestamp column, same format for all dataframes (pandas datetime object), monthly from 2000-01 to 2022-12

In [15]:
return_df['Month'] = pd.to_datetime(return_df['YYYYMM'], format='%Y%m')
return_df.drop(columns=['YYYYMM'], inplace=True)

In [16]:
ff5_df['Month'] = pd.to_datetime(ff5_df.iloc[:, 0], format='%Y%m')
ff5_df.drop(ff5_df.columns[0], axis=1, inplace=True)

In [29]:
q5_df['Month'] = pd.to_datetime(
    q5_df['year'].astype(str) + '-' + q5_df['month'].astype(str).str.zfill(2), # zfill(2): '1' -> '01', padding
    format="%Y-%m"
)
q5_df.drop(columns = ['year', 'month'], inplace=True)

In [32]:
liq_df['Month'] = pd.to_datetime(liq_df['Month'], format='%Y%m')

In [34]:
return_df.head()
ff5_df.head()
q5_df.head()
liq_df.head()

Unnamed: 0,PERMNO,MthPrc,MthRet,Month
0,10324,52.0,0.155556,2000-01-01
1,10324,57.4375,0.104567,2000-02-01
2,10324,50.125,-0.127312,2000-03-01
3,10324,48.8125,-0.026185,2000-04-01
4,10324,56.8125,0.163892,2000-05-01


Unnamed: 0,Mkt-RF,SMB,HML,RMW,CMA,RF,Month
0,-0.39,-0.41,-0.97,0.68,-1.18,0.27,1963-07-01
1,5.07,-0.8,1.8,0.36,-0.35,0.25,1963-08-01
2,-1.57,-0.52,0.13,-0.71,0.29,0.27,1963-09-01
3,2.53,-1.39,-0.1,2.8,-2.01,0.29,1963-10-01
4,-0.85,-0.88,1.75,-0.51,2.24,0.27,1963-11-01


Unnamed: 0,R_F,R_MKT,R_ME,R_IA,R_ROE,R_EG,Month
0,0.3927,8.1852,6.8122,-2.9263,1.8813,-2.5511,1967-01-01
1,0.3743,0.7557,1.6235,-0.2915,3.5399,2.1792,1967-02-01
2,0.3693,4.0169,1.9836,-1.6772,1.8417,-1.1192,1967-03-01
3,0.3344,3.8786,-0.67,-2.8972,1.0253,-1.6371,1967-04-01
4,0.3126,-4.2807,2.7366,2.1864,0.6038,0.1191,1967-05-01


Unnamed: 0,Month,Agg Liq.,Innov Liq (eq8),Traded Liq (LIQ_V)
0,1962-08-01,-0.017537,0.00426,
1,1962-09-01,-0.004075,0.011757,
2,1962-10-01,-0.104212,-0.074128,
3,1962-11-01,-0.019742,0.028572,
4,1962-12-01,-0.005089,0.013037,


## Task 1: Factor Regression (Naive)

## Task 2: Fama-MacBeth Regression

## Task 3: LASSO Regression