# Empirical Analysis of Market Capitalization Weighted Index (MWI) vs. Equal Weighted Index (EWI) Performance for S&P 500

This notebook presents an empirical analysis of the performance of Market Capitalization Weighted Indexing (MWI) and Equal Weighted Indexing (EWI) strategies applied to a subset of the S&P 500. The objective is to explore why the EWI strategy often outperforms the MWI strategy.

For the sake of stability and simplicity in this analysis, we are working with the 411 companies that have continuously been part of the S&P 500 from January 2006 to August 2023. This eliminates any effects due to companies entering or leaving the index during this period.

We begin by fetching the necessary data for these companies using open-source Python package like `yfinance` and web scraping techniques. All the data used in this analysis is freely available and no proprietary or paid sources were used.

Next, we build MWI and EWI portfolios from this data, calculating returns and other relevant metrics.Please note that we do not consider transaction fees in these custum portfolios to avoid additional complexity. Furthermore, for the rebalancing of both portfolios, we perform it after the close of the third Friday in March, June, September, and December. 

Through exploratory data analysis, we delve into the distribution characteristics of individual stock returns, focusing on their skewness as a potential explanatory factor for the observed performance difference. We also visualize various aspects of our portfolios to better understand their composition and performance over time.

Let's begin

## Step 1: Fetching the Tickers of S&P 500 Constituents

The function `get_sp500_tickers()` fetches the ticker symbols of all companies that are currently constituents of the S&P 500 index.

This is done by web scraping the Wikipedia page that lists the S&P 500 constituents. We are using the pandas `read_html` function which conveniently fetches tabular data from an HTML page and returns it in the form of a DataFrame. After fetching the data, we just extract the ticker symbols and return them as a list. 



In [3]:
import pandas as pd 

def get_sp500_tickers():
    url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
    table = pd.read_html(url, header=0)
    df = table[0]
    return list(df.Symbol)

After running this function, we will have a list of ticker symbols for all current S&P 500 constituents stored in the `tickers` variable.


In [4]:
tickers = get_sp500_tickers()

In [10]:
sorted(tickers)[:10]

['A', 'AAL', 'AAP', 'AAPL', 'ABBV', 'ABC', 'ABT', 'ACGL', 'ACN', 'ADBE']

## Step 1 Fetching Historical Market Capitalization Data 

To construct our **Market Capitalization Weighted Index (MWI)**, we need to collect historical daily market capitalization data for each constituent of the S&P 500. Sourcing this type of data can often be challenging due to access restrictions and cost constraints. After thorough exploration, I found a freely available API endpoint on StockAnalysis.com that supplies this precise information.

This endpoint was discovered by examining the Network tab in the browser's Developer Tools while interacting with the `stockanalysis.com` website. This allowed us to observe the API calls made by the web application.

The API endpoint we identified is `https://stockanalysis.com/api/symbol/s/{symbol}/marketcap?t=price`, where `{symbol}` the ticker symbol of a stock we're interested in. This API call returns a JSON object with a `status` field, indicating the success or failure of the request, and a `data` field containing the actual market cap data. The `data` field is an array of arrays, where each inner array consists of a Unix timestamp (representing the date) and the market cap for the specified stock on that date.

To effectively gather data for all the stocks in our list, I implemented a concurrent fetching system `concurrent.futures` module. This system allows us to send multiple API requests simultaneously, significantly reducing the total time required to gather all our necessary data.



In the following code, we will pull historical market capitalization data for all currently listed stocks in the S&P 500 index :




In [11]:
import requests
import pandas as pd
from concurrent.futures import ThreadPoolExecutor, as_completed
from tenacity import retry, stop_after_attempt, wait_fixed

@retry(stop=stop_after_attempt(3), wait=wait_fixed(1))
def get_market_cap(symbol):
    url = f"https://stockanalysis.com/api/symbol/s/{symbol}/marketcap?t=price"
    response = requests.get(url)
    response.raise_for_status()
    data = response.json()
    return pd.DataFrame({
        'Date': [pd.to_datetime(x[0], unit='ms') for x in data['data']],
        f'Market Cap {symbol}': [x[1] for x in data['data']]
    })

def get_market_caps(tickers):
    first_ticker = tickers.pop(0)
    master_df = get_market_cap(first_ticker)

    with ThreadPoolExecutor(max_workers=3) as executor:
        future_to_ticker = {executor.submit(get_market_cap, ticker): ticker for ticker in tickers}
        for future in as_completed(future_to_ticker):
            ticker = future_to_ticker[future]
            try:
                df = future.result()
                master_df = master_df.merge(df, on='Date', how='outer')
            except Exception as e:
                print(f"Error occurred while getting market cap data for {ticker}: {e}")

    return master_df

In [12]:
data = get_market_caps(tickers)

In [14]:
data.head()

Unnamed: 0,Date,Market Cap MMM,Market Cap ABBV,Market Cap AOS,Market Cap ABT,Market Cap ACN,Market Cap ATVI,Market Cap ADM,Market Cap ADBE,Market Cap ADP,...,Market Cap WTW,Market Cap GWW,Market Cap WYNN,Market Cap XEL,Market Cap XYL,Market Cap ZBRA,Market Cap ZBH,Market Cap YUM,Market Cap ZION,Market Cap ZTS
0,1998-12-01,32661500000.0,,610000000.0,72663700000.0,,,,3018500000.0,23301100000.0,...,,4106500000.0,,,,1046800000.0,,7022000000.0,4022400000.0,
1,1998-12-02,32484900000.0,,612800000.0,74181300000.0,,,,2957000000.0,23980900000.0,...,,4095200000.0,,,,1044900000.0,,6936400000.0,4100600000.0,
2,1998-12-03,31882900000.0,,611400000.0,72663700000.0,,,,3018500000.0,22848000000.0,...,,4112200000.0,,,,1039000000.0,,7098400000.0,4171000000.0,
3,1998-12-04,32208000000.0,,612800000.0,75501700000.0,,,,2965400000.0,23394800000.0,...,,4207200000.0,,,,1039000000.0,,7537400000.0,4257000000.0,
4,1998-12-07,30927700000.0,,598400000.0,75501700000.0,,,,3064200000.0,23582100000.0,...,,4142600000.0,,,,1062400000.0,,7547800000.0,4261700000.0,
