Quantitative Finance Project
By: Varun Gopal, Tyler Dixon, and Abhinav Kakumanu

Objective of this project is to highlight the methods learned in class and apply them with market data.

Our project revolves around examining economic data and building a predictive model that can help guide returns.

In [73]:
# Importing basic files

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import datetime

import pandas_datareader as pdr
import pandas_datareader.famafrench

In [3]:
# Get all the datasets 
pandas_datareader.famafrench.get_available_datasets()

['F-F_Research_Data_Factors',
 'F-F_Research_Data_Factors_weekly',
 'F-F_Research_Data_Factors_daily',
 'F-F_Research_Data_5_Factors_2x3',
 'F-F_Research_Data_5_Factors_2x3_daily',
 'Portfolios_Formed_on_ME',
 'Portfolios_Formed_on_ME_Wout_Div',
 'Portfolios_Formed_on_ME_Daily',
 'Portfolios_Formed_on_BE-ME',
 'Portfolios_Formed_on_BE-ME_Wout_Div',
 'Portfolios_Formed_on_BE-ME_Daily',
 'Portfolios_Formed_on_OP',
 'Portfolios_Formed_on_OP_Wout_Div',
 'Portfolios_Formed_on_OP_Daily',
 'Portfolios_Formed_on_INV',
 'Portfolios_Formed_on_INV_Wout_Div',
 'Portfolios_Formed_on_INV_Daily',
 '6_Portfolios_2x3',
 '6_Portfolios_2x3_Wout_Div',
 '6_Portfolios_2x3_weekly',
 '6_Portfolios_2x3_daily',
 '25_Portfolios_5x5',
 '25_Portfolios_5x5_Wout_Div',
 '25_Portfolios_5x5_Daily',
 '100_Portfolios_10x10',
 '100_Portfolios_10x10_Wout_Div',
 '100_Portfolios_10x10_Daily',
 '6_Portfolios_ME_OP_2x3',
 '6_Portfolios_ME_OP_2x3_Wout_Div',
 '6_Portfolios_ME_OP_2x3_daily',
 '25_Portfolios_ME_OP_5x5',
 '25_Portf

In [4]:
ff = pdr.get_data_famafrench('10_Industry_Portfolios', 1926)

In [22]:
type(ff)

dict

In [24]:
ff.keys()

dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 'DESCR'])

In [47]:
ff.keys()
ten_ind = ff[1]

In [45]:
industry_name = [i for i in ff[0].columns]
industry_name

['NoDur',
 'Durbl',
 'Manuf',
 'Enrgy',
 'HiTec',
 'Telcm',
 'Shops',
 'Hlth ',
 'Utils',
 'Other']

In [48]:
ten_ind = ten_ind.resample('Q').apply(lambda x: (1+ (x/100)).product()-1)

In [49]:
ten_ind

Unnamed: 0_level_0,NoDur,Durbl,Manuf,Enrgy,HiTec,Telcm,Shops,Hlth,Utils,Other
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1926Q3,0.062499,0.008370,0.050122,-0.014611,0.057796,0.026294,-0.000549,0.095794,0.048697,0.039668
1926Q4,0.036895,-0.054746,-0.002352,0.094341,-0.030808,0.021030,0.007802,0.077902,0.043239,-0.003773
1927Q1,-0.014018,0.012062,0.058804,-0.018129,0.070875,0.085652,0.029499,0.085339,0.067053,0.130579
1927Q2,0.060628,-0.011549,0.040557,0.029293,0.068964,-0.004983,0.034686,0.149034,0.074695,0.084061
1927Q3,0.108930,0.072260,0.101493,0.041747,0.102346,0.112249,0.131985,0.152781,0.210272,0.068413
...,...,...,...,...,...,...,...,...,...,...
2022Q1,-0.081467,-0.138146,-0.034966,0.482865,-0.126784,-0.016195,-0.093717,-0.166785,0.067288,-0.039082
2022Q2,-0.141088,-0.180388,-0.178501,-0.094793,-0.248041,-0.183191,-0.184645,-0.242632,-0.040033,-0.163498
2022Q3,-0.129309,-0.068174,-0.054447,0.037160,-0.040105,-0.142568,-0.041964,-0.013281,-0.039541,-0.035206
2022Q4,0.020131,0.017820,0.085372,0.146125,-0.016337,-0.061564,0.049917,-0.061893,0.096549,0.020764


In [62]:
type(ten_ind.index)

pandas.core.indexes.period.PeriodIndex

We have gotten all the data from 1926 for different industries. Here are the industries:
1. Consumer Nondurables: Food, Tobacco, Textiles, Apparel, Leather, Toys
2. Consumer Durables -- Cars, TVs, Furniture, Household Appliances
3. Manufacturing -- Machinery, Trucks, Planes, Chemicals, Off Furn, Paper, Com Printing
4. Oil, Gas, and Coal Extraction and Products
5. Business Equipment -- Computers, Software, and Electronic Equipment
6. Telephone and Television Transmission
7. Wholesale, Retail, and Some Services (Laundries, Repair Shops)
8. Healthcare, Medical Equipment, and Drugs
9. Utilities
10. Other -- Mines, Constr, BldMt, Trans, Hotels, Bus Serv, Entertainment, Finance

The next step will be to gather important economic data from the Federal Reserve and map it alongside the data. 
First question is what do we want to map?

We will do our best to get variables that will not impact one specific industry. However, there may be data that skew towards a specific industry. We will highlight the bias. 

According to https://groww.in/blog/macroeconomic-factors-that-influence-us-stock-markets
1. Gross Domestic Product (GDP)
2. Inflation
3. Unemployment Rate (Payrolls)
4. Retail Sales
5. Industrial Output 

We will also get market data on interest rates, corporate profits, and corporate debt.


Get the data from Federal Reserve Economic Data (FRED)

Another Hypothesis is that markets will react to any economic indicators quickly. Since indicators such as GDP are lagging indicators, we will need market representations (such as yield spreads) that can help understand how stocks react to the propsect of economic events.

First, let us use the common models and then we can use some of the proxies later on.

In [67]:
# RGDP is real GDP... since we will examine inflation later
rgdp = pdr.get_data_fred('GDPC1', 1950)

In [68]:
rgdp = rgdp.pct_change().dropna()

In [69]:
rgdp.head()

Unnamed: 0_level_0,GDPC1
DATE,Unnamed: 1_level_1
1950-04-01,0.030498
1950-07-01,0.038644
1950-10-01,0.019148
1951-01-01,0.013582
1951-04-01,0.017327


In [79]:
rgdp.index = rgdp.index.to_period('Q')

In [80]:
rgdp.head()

Unnamed: 0_level_0,GDPC1
DATE,Unnamed: 1_level_1
1950Q2,0.030498
1950Q3,0.038644
1950Q4,0.019148
1951Q1,0.013582
1951Q2,0.017327


In [84]:
#CPI to capture inflation
cpi = pdr.get_data_fred('CPALTT01USM657N',1960, freq = 'Q', units = )

In [85]:
# We will create dummy variables to create various states
cpi['High'] = (cpi > 2).astype(int)

In [86]:
cpi

Unnamed: 0_level_0,CPALTT01USM657N,High
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1
1960-01-01,-0.340136,0
1960-02-01,0.341297,0
1960-03-01,0.000000,0
1960-04-01,0.340136,0
1960-05-01,0.000000,0
...,...,...
2022-09-01,0.215078,0
2022-10-01,0.405649,0
2022-11-01,-0.101003,0
2022-12-01,-0.307009,0


In [12]:
# Corporate Profits
corp_profit = pdr.get_data_fred('CP', 1947)

In [13]:
corp_profit.pct_change().dropna()

Unnamed: 0_level_0,CP
DATE,Unnamed: 1_level_1
1947-04-01,-0.053801
1947-07-01,-0.010775
1947-10-01,0.091762
1948-01-01,0.057369
1948-04-01,0.050634
...,...
2021-10-01,-0.023466
2022-01-01,0.026317
2022-04-01,0.074003
2022-07-01,-0.050048


In [14]:
# Unemployment Rate
unemp = pdr.get_data_fred('UNRATE', 1948)

In [15]:
unemp

Unnamed: 0_level_0,UNRATE
DATE,Unnamed: 1_level_1
1948-01-01,3.4
1948-02-01,3.8
1948-03-01,4.0
1948-04-01,3.9
1948-05-01,3.5
...,...
2022-11-01,3.6
2022-12-01,3.5
2023-01-01,3.4
2023-02-01,3.6


In [16]:
# Interest Rates (10 yr treasury rates)
irate = pdr.get_data_fred('FEDFUNDS', 1953)

In [43]:
irate = irate.pct_change().dropna()

In [44]:
irate[750:]

Unnamed: 0_level_0,FEDFUNDS
DATE,Unnamed: 1_level_1
2017-02-01,0.015385
2017-03-01,0.196970
2017-04-01,0.139241
2017-05-01,0.011111
2017-06-01,0.142857
...,...
2022-11-01,0.227273
2022-12-01,0.084656
2023-01-01,0.056098
2023-02-01,0.055427


We need to put the data together into a table. Additionally, we will need to group by quarters.

Begin running the regressions to predict the sensitivity of markets to other events.

In [35]:
# import sklearn
from sklearn.linear_model import LinearRegression