# Smart Beta

**Smart Beta** allows portfolio managers the ability to create financial indices based on **alternative weightings and factors** for portfolio construction. Most stock market indices use market capitalization (**share price * number of shares outstanding**) in order to determine which companies should be most heavily represented in that index (the S&P 500 uses Market Capitalization weighting). Alternatively, Smart Beta-focused indices select their consituents by focusing on other criteria - such as certain finanical ratios, or technical indicators (1). 

For this Smart Beta index construction, I'll be creating a data set of financial ratios for the consituents of the S&P 500 by creating webscraper that will automate the task of copying data from https://www.barchart.com/ . After we've collected this data and merged it with the corresponding company sectors, we can create our Smart Beta index. 

(1) https://www.invesco.com/static/us/investors/contentdetail?contentId=a8569b81e1c0e510VgnVCM100000c2f1bf0aRCRD

#### Collecting the Necessary Data

In [2]:
#Install the necessary packages
import os
import time
import pandas as pd
import urllib.request
from bs4 import BeautifulSoup

In [3]:
#Change the directory to where the relevant CSV files are stored
os.chdir('/Users/zacharyescalante/Documents/Python/yewno')
print(os.getcwd())
print(os.listdir())

/Users/zacharyescalante/Documents/Python/yewno
['.ipynb_checkpoints', 'smart_beta.ipynb', 'sp-500-technical.csv', 'sp500_sector.csv', 'Yewno.csv', 'yewno_webscraper.py']


In [4]:
#Create a list of the S&P 500 names that we can pass those to our url request
df_names = pd.DataFrame.from_csv('sp-500-technical.csv')
sp_names = list(df_names.index)

In [118]:
#Use urllib and Beautifulsoup to collect the relevant data from www.barchart.com.
#Then create a pandas dataframe from this data.
i = 0
dict_ = {}
for name in sp_names:
    i+=1
    print(name, i)
    webpage = 'https://www.barchart.com/stocks/quotes/' + name + '/profile'
    with urllib.request.urlopen(webpage) as response:
        html = response.read()
        
    soup = BeautifulSoup(html)
    dict_[name] = {'Market Capitalization, $K': soup.find_all('td', {'class': 'text-right'})[0].text.strip(), 
                    'Shares Outstanding, K': soup.find_all('td', {'class': 'text-right'})[1].text.strip(),
                     'Annual Sales': soup.find_all('td', {'class': 'text-right'})[2].text.strip(),
                     'Annual Net Income': soup.find_all('td', {'class': 'text-right'})[3].text.strip(),
                     'Last Quarter Sales': soup.find_all('td', {'class': 'text-right'})[4].text.strip(),
                     'Last Quarter Net Income': soup.find_all('td', {'class': 'text-right'})[5].text.strip(),
                     '36-Month Beta': soup.find_all('td', {'class': 'text-right'})[6].text.strip(),
                     '% of Insider Shareholders': soup.find_all('td', {'class': 'text-right'})[7].text.strip(),
                     '% of Institutional Shareholders': soup.find_all('td', {'class': 'text-right'})[8].text.strip(),
                     '1-Year Total Return': soup.find_all('td', {'class': 'text-right'})[9].text.strip(),
                     '3-Year Total Return': soup.find_all('td', {'class': 'text-right'})[10].text.strip(),
                     '5-Year Total Return': soup.find_all('td', {'class': 'text-right'})[11].text.strip(),
                     '5-Year Revenue Growth': soup.find_all('td', {'class': 'text-right'})[12].text.strip(),
                     '5-Year Earnings Growth': soup.find_all('td', {'class': 'text-right'})[13].text.strip(),
                     '5-Year Dividend Growth': soup.find_all('td', {'class': 'text-right'})[14].text.strip(),
                     'Price/Earnings ttm': soup.find_all('tr', {'class': 'even'})[0].find_all('td')[1].text.strip(),
                     'Price/Earnings to Growth': soup.find_all('tr', {'class': 'even'})[1].find_all('td')[1].text.strip(),
                     'Return-on-Assets Before Tax': soup.find_all('tr', {'class': 'even'})[2].find_all('td')[1].text.strip(),
                     'Debt/Equity': soup.find_all('tr', {'class': 'even'})[3].find_all('td')[1].text.strip(),
                     'Price/Cash Flow': soup.find_all('tr', {'class': 'even'})[4].find_all('td')[1].text.strip(),
                     'Book Value/Share': soup.find_all('tr', {'class': 'even'})[5].find_all('td')[1].text.strip(),
                     'Price/Earnings forward': soup.find_all('tr', {'class': 'odd'})[0].find_all('td')[1].text.strip(),
                     'Return-on-Equity After Tax': soup.find_all('tr', {'class': 'odd'})[1].find_all('td')[1].text.strip(),
                     'Net Margin %': soup.find_all('tr', {'class': 'odd'})[2].find_all('td')[1].text.strip(),
                     'Price/Sales': soup.find_all('tr', {'class': 'odd'})[3].find_all('td')[1].text.strip(),
                     'Price/Book': soup.find_all('tr', {'class': 'odd'})[4].find_all('td')[1].text.strip(),
                     'Interest Coverage': soup.find_all('tr', {'class': 'odd'})[5].find_all('td')[1].text.strip()}
df = pd.DataFrame.from_dict(dict_, orient = 'index')

A 1




 BeautifulSoup(YOUR_MARKUP})

to this:

 BeautifulSoup(YOUR_MARKUP, "lxml")

  markup_type=markup_type))


AAL 2
AAP 3
AAPL 4
ABBV 5
ABC 6
ABT 7
ACN 8
ADBE 9
ADI 10
ADM 11
ADP 12
ADS 13
ADSK 14
AEE 15
AEP 16
AES 17
AET 18
AFL 19
AGN 20
AIG 21
AIV 22
AIZ 23
AJG 24
AKAM 25
ALB 26
ALGN 27
ALK 28
ALL 29
ALLE 30
ALXN 31
AMAT 32
AMD 33
AME 34
AMG 35
AMGN 36
AMP 37
AMT 38
AMZN 39
ANDV 40
ANSS 41
ANTM 42
AON 43
AOS 44
APA 45
APC 46
APD 47
APH 48
APTV 49
ARE 50
ARNC 51
ATVI 52
AVB 53
AVGO 54
AVY 55
AWK 56
AXP 57
AYI 58
AZO 59
BA 60
BAC 61
BAX 62
BBT 63
BBY 64
BDX 65
BEN 66
BF.B 67
BHF 68
BHGE 69
BIIB 70
BK 71
BKNG 72
BLK 73
BLL 74
BMY 75
BRK.B 76
BSX 77
BWA 78
BXP 79
C 80
CA 81
CAG 82
CAH 83
CAT 84
CB 85
CBOE 86
CBRE 87
CBS 88
CCI 89
CCL 90
CDNS 91
CELG 92
CERN 93
CF 94
CFG 95
CHD 96
CHRW 97
CHTR 98
CI 99
CINF 100
CL 101
CLX 102
CMA 103
CMCSA 104
CME 105
CMG 106
CMI 107
CMS 108
CNC 109
CNP 110
COF 111
COG 112
COL 113
COO 114
COP 115
COST 116
COTY 117
CPB 118
CRM 119
CSCO 120
CSX 121
CTAS 122
CTL 123
CTSH 124
CTXS 125
CVS 126
CVX 127
CXO 128
D 129
DAL 130
DE 131
DFS 132
DG 133
DGX 134
DHI 135
DHR 136

In [7]:
#Store the data in a CSV for further analysis
df.to_csv('Yewno.csv')

In [6]:
df = pd.DataFrame.from_csv('Yewno.csv')

In [8]:
df.head()

Unnamed: 0_level_0,"Market Capitalization, $K","Shares Outstanding, K",Annual Sales,Annual Net Income,Last Quarter Sales,Last Quarter Net Income,36-Month Beta,% of Insider Shareholders,% of Institutional Shareholders,1-Year Total Return,...,Return-on-Assets Before Tax,Debt/Equity,Price/Cash Flow,Book Value/Share,Price/Earnings forward,Return-on-Equity After Tax,Net Margin %,Price/Sales,Price/Book,Interest Coverage
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
A,20525854,322480,"4,472 M","684,000 K","1,211 M","-320,000 K",1.31,0.53%,85.85%,24.30%,...,9.72,0.37,21.57,14.01,23.57,17.7,14.78,4.74,4.69,11.16
AAL,23794212,473140,"42,207 M","1,919 M","10,600 M","258,000 K",0.96,0.64%,80.54%,20.33%,...,4.57,5.73,5.8,8.2,8.46,63.78,,0.58,6.35,3.93
AAP,8279103,73980,"9,374 M","475,510 K","2,037 M","184,500 K",1.04,4.10%,-21.08%,-25.49%,...,4.71,0.31,13.28,46.21,16.7,12.45,5.92,0.92,2.52,9.85
AAPL,854361792,5074010,"229,234 M","48,351 M","88,293 M","20,065 M",1.25,0.08%,61.35%,20.05%,...,13.83,0.74,15.26,27.56,14.8,37.37,23.53,3.82,6.27,0.0
ABBV,142567936,1587970,"28,216 M","5,309 M","7,739 M","52,000 K",1.61,0.07%,70.54%,41.65%,...,13.24,6.07,14.0,3.19,11.95,158.15,,5.19,28.88,8.7


In [9]:
#Load S&P 500 sector data
df_sector = pd.DataFrame.from_csv('sp500_sector.csv')

In [10]:
#Merge the 'Sector' column with the S&P consituent data 
df_sp500 = pd.merge(df, df_sector[['Sector']], how='left', on=None, left_index=True, right_index=True)

In [11]:
#df_sp500 is now our merged data set
df_sp500.columns

Index(['Market Capitalization, $K', 'Shares Outstanding, K', 'Annual Sales',
       'Annual Net Income', 'Last Quarter Sales', 'Last Quarter Net Income',
       '36-Month Beta', '% of Insider Shareholders',
       '% of Institutional Shareholders', '1-Year Total Return',
       '3-Year Total Return', '5-Year Total Return', '5-Year Revenue Growth',
       '5-Year Earnings Growth', '5-Year Dividend Growth',
       'Price/Earnings ttm', 'Price/Earnings to Growth',
       'Return-on-Assets Before Tax', 'Debt/Equity', 'Price/Cash Flow',
       'Book Value/Share', 'Price/Earnings forward',
       'Return-on-Equity After Tax', 'Net Margin %', 'Price/Sales',
       'Price/Book', 'Interest Coverage', 'Sector'],
      dtype='object')

#### Creating the Smart Beta Index

The first step in creating a Smart Beta index is defining the goal of the index. Some indices are targetted to growth stocks, others to momentum stocks. 

Our goal is going to be to create an ETF which outperforms the S&P 500 in a bear market. We've seen the S&P return 23.91% YoY as of January 31, 2018, but then drop to 11.77% YoY on March 31, 2018 (2). My belief is that this could be a signal that the S&P 500 will perform poorly for the duration of 2018 (an assumption I'm using for this assignment) and therefore I want to provide a long-equity option to outperform the broader S%P 500 index. In order to accomplish this, I'm going to focus on companies with **low beta** (price volatility < 1 compared to the relevant index), and solid financial ratios. The two ratios that I want to highlight will be **Debt/equity** and **Price/Cash Flow**. 

There are several factors that I want to consider when creating my Smart Beta index:

1. **36-Month Beta**: I want the stocks that I select to have a beta less than 1 **compared to their sector**
2. **Debt/Equity**: A low debt/equity ratio means that a company will have less risk having to refinance their debt into a rising interest rate environment (rate hikes anticipated second half of 2018 (2)). I'll select all 'Debt/Equity' ratios less than the mean for each specific sector
3. **Price/Cash Flow**: A low price/cash flow may indicate that the market has not yet priced in the companies current cash flow, and may be hesistant to further discount the price of the stock if solid cash flow is already proven. I'll select all 'Price/Cash Flow' ratios less than the mean for each specific sector.

(1) https://ycharts.com/indicators/sandp_500_1_year_return  <br>
(2) https://www.wsj.com/articles/fed-raises-interest-rates-sees-continued-path-of-increases-in-2018-1513191780

In [15]:
#Let's find the average Beta, P/CF and Debt/Equity ratios for each sector
print('36-Month Beta \n', df_sp500.groupby('Sector')['36-Month Beta'].mean(), '\n')
print('Price/Cash Flow \n', df_sp500.groupby('Sector')['Price/Cash Flow'].mean(), '\n')
print('Debt/Equity \n', df_sp500.groupby('Sector')['Debt/Equity'].mean(), '\n')

36-Month Beta 
 Sector
Consumer Discretionary        1.070488
Consumer Staples              0.697941
Energy                        1.115484
Financials                    1.172794
Health Care                   0.969016
Industrials                   1.062687
Information Technology        1.145493
Materials                     1.265600
Real Estate                   0.587273
Telecommunication Services    0.606667
Utilities                     0.286786
Name: 36-Month Beta, dtype: float64 

Price/Cash Flow 
 Sector
Consumer Discretionary        13.591975
Consumer Staples              16.515882
Energy                        11.389355
Financials                    13.759851
Health Care                   20.003500
Industrials                   15.406716
Information Technology        19.791857
Materials                     13.596000
Real Estate                   16.326061
Telecommunication Services     5.013333
Utilities                      7.986071
Name: Price/Cash Flow, dtype: float64 

Debt/

#### Index Rules

As we look at the average index values for each sector, I see four betas that stand out as high vs the rest of the index (Information Technology, Financials, Materials, Energy). Since a high-beta stock is more volatile than the rest of the index, we will eliminate stocks from these sectors for our low-risk ETF (with the goal of outperforming the S&P in a bear market)

In [18]:
hi_beta_sectors = ['Materials', 'Information Technology', 'Materials', 'Energy']
df_sp500_v1 = df_sp500[~df_sp500.Sector.isin(hi_beta_sectors)]

In [43]:
set(df_sp500_v1['Sector'])

{'Consumer Discretionary',
 'Consumer Staples',
 'Financials',
 'Health Care',
 'Industrials',
 'Real Estate',
 'Telecommunication Services',
 'Utilities'}

Now that I've eliminated the four highest-beta sectors, I'll select the stocks that have 'Debt/Equity' and 'Price/Cash Flow' ratios higher than the mean for their respective sectors. There are various ways we could determine the weightings, but for simplicities sake we'll stick with equal weightings. 

In [58]:
index_names = list(df_sp500_v1.index[(df_sp500_v1['Sector'] == 'Consumer Discretionary') & (df_sp500_v1['Debt/Equity'] < 1.103086) & (df_sp500_v1['Price/Cash Flow'] < 13.591975)]) \
+ list(df_sp500_v1.index[(df_sp500_v1['Sector'] == 'Consumer Staples') & (df_sp500_v1['Debt/Equity'] < 2.021765) & (df_sp500_v1['Price/Cash Flow'] < 16.515882)]) \
+ list(df_sp500_v1.index[(df_sp500_v1['Sector'] == 'Financials') & (df_sp500_v1['Debt/Equity'] < 0.458529) & (df_sp500_v1['Price/Cash Flow'] < 13.759851)]) \
+ list(df_sp500_v1.index[(df_sp500_v1['Sector'] == 'Health Care') & (df_sp500_v1['Debt/Equity'] < 0.505574) & (df_sp500_v1['Price/Cash Flow'] < 20.003500)]) \
+ list(df_sp500_v1.index[(df_sp500_v1['Sector'] == 'Industrials') & (df_sp500_v1['Debt/Equity'] < 1.365075) & (df_sp500_v1['Price/Cash Flow'] < 15.406716)]) \
+ list(df_sp500_v1.index[(df_sp500_v1['Sector'] == 'Real Estate') & (df_sp500_v1['Debt/Equity'] < 1.020303) & (df_sp500_v1['Price/Cash Flow'] < 16.326061)]) \
+ list(df_sp500_v1.index[(df_sp500_v1['Sector'] == 'Telecommunication Services') & (df_sp500_v1['Debt/Equity'] < 1.673333) & (df_sp500_v1['Price/Cash Flow'] < 5.013333)]) \
+ list(df_sp500_v1.index[(df_sp500_v1['Sector'] == 'Utilities') & (df_sp500_v1['Debt/Equity'] < 1.725000) & (df_sp500_v1['Price/Cash Flow'] < 7.986071)])

In [62]:
#List of names in our index
index_names

['AAP',
 'AZO',
 'BBY',
 'BWA',
 'CCL',
 'CMCSA',
 'DIS',
 'DLTR',
 'EXPE',
 'FL',
 'GPS',
 'GT',
 'HAS',
 'IPG',
 'KORS',
 'KSS',
 'LB',
 'LEG',
 'MGM',
 'MHK',
 'NCLH',
 'NWL',
 'PHM',
 'PVH',
 'RCL',
 'RL',
 'SNA',
 'TGT',
 'TSCO',
 'TWX',
 'WHR',
 'ADM',
 'CAG',
 'COTY',
 'CPB',
 'CVS',
 'GIS',
 'KHC',
 'MDLZ',
 'PG',
 'SJM',
 'TAP',
 'TSN',
 'WBA',
 'WMT',
 'AFL',
 'ALL',
 'AMG',
 'BEN',
 'CB',
 'HIG',
 'HRB',
 'LNC',
 'MET',
 'PFG',
 'PRU',
 'STI',
 'TMK',
 'TRV',
 'UNM',
 'WLTW',
 'AGN',
 'ALXN',
 'BAX',
 'BIIB',
 'CERN',
 'CI',
 'COO',
 'DHR',
 'HCA',
 'HSIC',
 'HUM',
 'MDT',
 'PFE',
 'XRAY',
 'ALK',
 'AYI',
 'CMI',
 'CSX',
 'DAL',
 'DOV',
 'EFX',
 'ETN',
 'FBHS',
 'FDX',
 'GE',
 'HII',
 'IR',
 'JCI',
 'JEC',
 'KSU',
 'LUV',
 'NSC',
 'PCAR',
 'PNR',
 'PWR',
 'RSG',
 'SRCL',
 'TXT',
 'UTX',
 'AVB',
 'CBRE',
 'DRE',
 'HST',
 'KIM',
 'MAA',
 'O',
 'PLD',
 'WELL',
 'CTL',
 'DUK',
 'EIX',
 'EXC',
 'PCG',
 'PEG',
 'SCG']

#### Smart Beta Summary

We've gone through the exercise of gathering the data, selecting stocks based on our index rules. The next steps would be to backtest the strategy and see how it performs in various bear markets.

**Pros**

A Smart Beta strategy relies on various financial ratios or stock momentum to determine the constituents. This allows the PM to make better (or "smarter") decisions regarding their stock selection which can **enhance portfolio returns**, **reduce portfolio risks** and even increase the amount of **divident income** and client earns. 

**Cons**

A more narrowly based index may have **unique risks** to the downside which are not present in a more traditional broad based index. As more people chase these strategies, its possible that smart beta stock selection could **crowd** into trades and cause favorable stocks to trade at an artificial premium. 