In [24]:
import pandas as pd
from datetime import date
import plotly.express as px
from utilities import color_map

# S&P 500
- S&P 500 is a US-based stock market index that dates back to 1957. The index tracks the value of 503 US-registered companies. Although ~500 does not even scrap the surface of the total number of publicly traded companies in the US, the companies listed in the S&P 500 index account for roughly 80% of the total US market capitalization; hence how the performance of these companies has a significant affect on the US economy.
- Tickers are abbreviations that link back to a given company.

In [128]:
# url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'

# # Read the table from the Wikipedia page
# tables = pd.read_html(url, header=0)
# df = tables[0]

# # Add year column
# df['Year_Added'] = pd.to_datetime(df['Date added']).dt.year

# # Keep the columns that are needed and discard the rest
# df = df[['Symbol', 'Security', 'GICS Sector', 'Year_Added']]

# # Rename columns in accordance with preference
# df.columns = ['company_ticker', 'company_name', 'sector', 'year_added']
# df.sample(5)

df = pd.read_csv('data/snp_market_cap.csv')
df.head(3)

Unnamed: 0,Ticker,Sector,MarketCap,Security,GICS Sector,GICS Sub-Industry,Headquarters Location,Date added,CIK,Founded,Year_Added
0,MMM,Industrials,78714490000.0,3M,Industrials,Industrial Conglomerates,"Saint Paul, Minnesota",1957-03-04,66740,1902,1957
1,AOS,Industrials,9143292000.0,A. O. Smith,Industrials,Building Products,"Milwaukee, Wisconsin",2017-07-26,91142,1916,2017
2,ABT,Healthcare,233034300000.0,Abbott Laboratories,Health Care,Health Care Equipment,"North Chicago, Illinois",1957-03-04,1800,1888,1957


- It would be interesting to find out which original companies (i.e., added to the index at its inception back in 1957) are still in the index today. It indicates that these companies were able to withstand the volatility of the market and strategically evolve with the changing environment -- they are stable companies. In addition to which companies, it would be interesting to see which sector these companies belong to.
    - Source: https://www.home.saxo/content/articles/equities/why-berkshire-hathaway-is-crushing-the-sp-500-19032025
        - "While many investors have a home-bias or prefer the better-known U.S. stocks, Berkshire has been quietly expanding internationally."
            - "Buffett has been increasing his stakes in five major Japanese trading houses (Itochu, Sumitomo, Marubeni, Mitsubishi, and Mitsui). These companies trade at low valuations, offer strong dividends, and provide exposure to global commodity markets."
        - "Tech-light portfolio. While the S&P 500 is heavily weighted toward high-growth, high-multiple tech names, Berkshire’s portfolio skews towards industrials, energy, and insurance – sectors that have historically held up well in periods of economic uncertainty and inflation."
        

In [4]:
# Filter only companies that were added in 1957
stocks_added_1957 = df[df['Year_Added'] == 1957] # 53 stocks added in 1957 and are still in the index
# Create a dataframe of counts of original stocks to plot
stocks_1957_counts = pd.DataFrame(stocks_added_1957.Sector.value_counts()).reset_index()
stocks_1957_counts.columns = ['Sector', 'Count']
stocks_1957_counts

Unnamed: 0,Sector,Count
0,Industrials,14
1,Consumer Defensive,11
2,Utilities,10
3,Energy,6
4,Healthcare,5
5,Consumer Cyclical,2
6,Financial Services,2
7,Technology,2
8,Basic Materials,1


In [5]:
# Create a pie chart of the original compannies, categorized by sector
fig_1957 = px.pie(stocks_1957_counts, 
                  values='Count', 
                  names='Sector', 
                  color ='Sector',
                  color_discrete_map=color_map, 
                  title='S&P 500 Original Stocks Added in 1957 By Sector (2025)',
                  hole=0.3)
fig_1957.update_traces(textposition='outside', textinfo='percent+label')
fig_1957.update_layout(showlegend=False)
fig_1957.show()

- It would be interesting to see how the sectors are divided in the index.
    -  Source: https://www.home.saxo/content/articles/equities/why-berkshire-hathaway-is-crushing-the-sp-500-19032025
        - Berkshire Hathaway is often called a "mini-index fund" because it provides exposure to a wide range of industries. Saxo’s Warren Buffett shortlist provides a quick look at the largest companies in the Berkshire Hathaway portfolio, but here is what one share of Berkshire effectively gives you exposure to:
            - Financials – Bank of America, American Express, Citigroup
            - Consumer staples – Coca-Cola, Kraft Heinz
            - Consumer discretionary – Domino’s Pizza, Pool Corp, Constellation Brands
            - Energy & utilities – Occidental Petroleum, Chevron Energy
            - Insurance – GEICO, Chubb
            - Industrials & railroads – BNSF Railway, Precision Castparts
            - Technology (selective exposure) – Apple (Berkshire’s single largest holding, but the company avoids speculative tech investments)
        - This could make Berkshire an excellent diversifier for investors who are too heavily weighted in U.S. tech stocks.
            - If your portfolio is overloaded with tech, Berkshire offers exposure to more stable sectors like insurance, consumer goods, and industrials.
            - If your portfolio is light on U.S. tech, you still get exposure to Apple, but in a way that’s balanced with traditional value plays.
        - Interest rate & market cycle sensitivity: Many of Berkshire’s businesses, including insurance and financials, benefit from high interest rates. If rates fall sharply, profit margins could shrink. Similarly, if consumer demand weakens, holdings like Coca-Cola and retail-adjacent businesses could face headwinds. The same defensive qualities that make Berkshire attractive in uncertain markets might hold it back when markets turn risk-on.


In [134]:
# Create df to plot sector composition
sector_composition = pd.DataFrame(df['Sector'].value_counts()).reset_index()
sector_composition.columns = ['Sector', 'Count']

In [135]:
# Create pie chart of snp500 by sector
fig_sector = px.pie(
    sector_composition, 
    values='Count',
    names='Sector',
    color='Sector',
    color_discrete_map=color_map,
    title='Current S&P 500 by Sector (June 2025)',
    hole=0.6
)
fig_sector.update_traces(textposition='outside', textinfo='percent+label')
fig_sector.update_layout(showlegend=False, annotations=[dict(
        text='Total Count<br><b>503 Companies',
        font_size=16,
        showarrow=False,
    )],)
fig_sector.show()

- What sector has the largest market cap?

In [None]:
from utilities import download_batches_market_cap

In [129]:
# snp500_market_cap = download_batches_market_cap(ticker_list=df['company_ticker'], batch_size=100)
snp500_market_cap = pd.read_csv('data/snp_market_cap.csv')
snp500_market_cap.head(3)

Unnamed: 0,Ticker,Sector,MarketCap,Security,GICS Sector,GICS Sub-Industry,Headquarters Location,Date added,CIK,Founded,Year_Added
0,MMM,Industrials,78714490000.0,3M,Industrials,Industrial Conglomerates,"Saint Paul, Minnesota",1957-03-04,66740,1902,1957
1,AOS,Industrials,9143292000.0,A. O. Smith,Industrials,Building Products,"Milwaukee, Wisconsin",2017-07-26,91142,1916,2017
2,ABT,Healthcare,233034300000.0,Abbott Laboratories,Health Care,Health Care Equipment,"North Chicago, Illinois",1957-03-04,1800,1888,1957


In [130]:
market_cap_total = round(snp500_market_cap['MarketCap'].sum()/1e9,3)
print(f'Total market cap: {market_cap_total} billion') # $54.83 trillion

Total market cap: 54831.531 billion


In [None]:
# Create a dictionary with ticker as keys and sectors as values, using info from yfinance
sector_map = snp500_market_cap.set_index('Ticker')['Sector'].to_dict()

# Apply to wikipedia df
df.loc[:,'Sector'] = snp500_market_cap['Ticker'].map(sector_map)

- Sector names and company counts per sector do not match between Wikipedia and Yahoo Finance. Using Yahoo Finance's categorization because it's more current.

In [None]:
# what sector categorizations do not match between Wikipedia and Yahoo?
# Create a new column, Match, with boolean to filter the sectors that differ between sources later
df.loc[:,'Match'] = df['sector'] == df['Sector']
df

Unnamed: 0,company_ticker,company_name,sector,year_added,Sector,Match
0,MMM,3M,Industrials,1957,Industrials,True
1,AOS,A. O. Smith,Industrials,2017,Industrials,True
2,ABT,Abbott Laboratories,Health Care,1957,Healthcare,False
3,ABBV,AbbVie,Health Care,2012,Healthcare,False
4,ACN,Accenture,Information Technology,2011,Technology,False
...,...,...,...,...,...,...
498,XYL,Xylem Inc.,Industrials,2011,Industrials,True
499,YUM,Yum! Brands,Consumer Discretionary,1997,Consumer Cyclical,False
500,ZBRA,Zebra Technologies,Information Technology,2019,Technology,False
501,ZBH,Zimmer Biomet,Health Care,2001,Healthcare,False


In [None]:
df[~df['Match']]  # only rows where columns differ

In [131]:
# # Merge Wikipedia and yfinance market cap dfs
# # NOTE: I saved the merged dataset so this code block is no longer neccesssary

# df = df.rename(columns={"company_ticker": "Ticker"})
# merged_df = pd.merge(left=snp500_market_cap, right=df, on=['Ticker', 'Sector'])
# merged_df.head(2)

# See which sectors are adding the most market cap to the index

grouped = df.groupby(['Sector'])['MarketCap'].sum().reset_index()
grouped['MarketCap'] = round(grouped['MarketCap'] / 1e9, 3)  # Convert to billions
grouped.sort_values(by='MarketCap', ascending=False, inplace=True)
grouped


Unnamed: 0,Sector,MarketCap
9,Technology,17324.87
1,Communication Services,7677.517
5,Financial Services,6572.895
2,Consumer Cyclical,5898.559
6,Healthcare,5036.25
7,Industrials,4108.698
3,Consumer Defensive,3367.465
4,Energy,1584.529
10,Utilities,1245.188
8,Real Estate,1125.256


In [132]:

# Create bar chart
fig_market_cap = px.pie(
    grouped,
    names='Sector',
    values='MarketCap',
    color='Sector',
    hole=0.6,
    color_discrete_map=color_map,
    title='S&P 500 Market Capitalization by Sector (June 2025)',
    labels={"MarketCap": "Market Cap"}
)

# Add center text using graph_objects
fig_market_cap.update_layout(
    annotations=[dict(
        text='Total Market Cap<br><b>$54.83 Trillion',
        font_size=16,
        showarrow=False,
    )],
    showlegend=False
)
fig_market_cap.update_traces(textposition='outside', textinfo='percent+label')

fig_market_cap.show()

In [126]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go

In [136]:
# Create subplots with 1 row and 2 columns
fig_combined = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]],
                    subplot_titles=['Pie Chart 1', 'Pie Chart 2'])


pie1 = fig_sector.data[0]
pie1.hole = 0.6
pie1.textinfo='label+percent'

fig_combined.add_trace(pie1, row=1, col=1)



fig_combined.add_trace(fig_market_cap.data[0], row=1, col=2)


fig_combined.update_layout(
    annotations=[
        dict(text='Total Count<br><b>503 Companies', x=0.225, y=0.45, font_size=16, showarrow=False),  # Center of left donut
        dict(text='Total Market Cap<br><b>$54.83 Trillion', x=0.78, y=0.45, font_size=16, showarrow=False)   # Center of right donut
    ], 
    showlegend=False,
    title_text='S&P 500 by Sector and Market Capitalization (June 2025)'
)
fig_combined.show()

- Technology accounts for approximately 1/3 of market cap hence how it performs signficantly influence the economy. Who are the biggest players in this sector?
- Who are the biggest players, with the most market cap, in the Communications and Financials sectors?

In [107]:
snp_sorted_by_sector = snp500_market_cap.sort_values(['Sector', 'MarketCap'])

# Get a descending list of market cap for Technology
snp_sorted_by_sector

Unnamed: 0,Ticker,Sector,MarketCap,Security,GICS Sector,GICS Sub-Industry,Headquarters Location,Date added,CIK,Founded,Year_Added
13,ALB,Basic Materials,7.039658e+09,Albemarle Corporation,Materials,Specialty Chemicals,"Charlotte, North Carolina",2016-07-01,915913,1994,2016
158,EMN,Basic Materials,9.101712e+09,Eastman Chemical Company,Materials,Specialty Chemicals,"Kingsport, Tennessee",1994-01-01,915389,1920,1994
328,MOS,Basic Materials,1.154720e+10,Mosaic Company (The),Materials,Fertilizers & Agricultural Chemicals,"Tampa, Florida",2011-09-26,1285785,2004 (1865 / 1909),2011
95,CF,Basic Materials,1.492760e+10,CF Industries,Materials,Fertilizers & Agricultural Chemicals,"Deerfield, Illinois",2008-08-27,1324404,1946,2008
296,LYB,Basic Materials,1.817196e+10,LyondellBasell,Materials,Specialty Chemicals,"Rotterdam, Netherlands",2012-09-05,1489393,2007,2012
...,...,...,...,...,...,...,...,...,...,...,...
121,CEG,Utilities,9.364899e+10,Constellation Energy,Utilities,Electric Utilities,"Baltimore, Maryland",2022-02-02,1868275,1999,2022
419,SO,Utilities,9.710277e+10,Southern Company,Utilities,Electric Utilities,"Atlanta, Georgia",1957-03-04,92122,1945,1957
337,NEE,Utilities,1.485507e+11,NextEra Energy,Utilities,Multi-Utilities,"Juno Beach, Florida",1976-06-30,753308,1984 (1925),1976
60,BRK.B,,,Berkshire Hathaway,Financials,Multi-Sector Holdings,"Omaha, Nebraska",2010-02-16,1067983,1839,2010


In [114]:
# List the top 10 technology companies that have the highest market cap
tech_top10 = snp_sorted_by_sector[snp_sorted_by_sector['Sector']=='Technology'].sort_values(by='MarketCap', ascending=False)[:10]
tech_top10

Unnamed: 0,Ticker,Sector,MarketCap,Security,GICS Sector,GICS Sub-Industry,Headquarters Location,Date added,CIK,Founded,Year_Added
317,MSFT,Technology,3496118000000.0,Microsoft,Information Technology,Systems Software,"Redmond, Washington",1994-06-01,789019,1975,1994
347,NVDA,Technology,3456211000000.0,Nvidia,Information Technology,Semiconductors,"Santa Clara, California",2001-11-30,1045810,1993,2001
39,AAPL,Technology,3045708000000.0,Apple Inc.,Information Technology,"Technology Hardware, Storage & Peripherals","Cupertino, California",1982-11-30,320193,1977,1982
71,AVGO,Technology,1161052000000.0,Broadcom,Information Technology,Semiconductors,"Palo Alto, California",2014-05-08,1730168,1961,2014
356,ORCL,Technology,487992100000.0,Oracle Corporation,Information Technology,Application Software,"Austin, Texas",1989-08-31,1341439,1977,1989
360,PLTR,Technology,301407700000.0,Palantir Technologies,Information Technology,Application Software,"Denver, Colorado",2024-09-23,1321655,2003,2024
406,CRM,Technology,262431600000.0,Salesforce,Information Technology,Application Software,"San Francisco, California",2008-09-15,1108524,1999,2008
106,CSCO,Technology,261597600000.0,Cisco,Information Technology,Communications Equipment,"San Jose, California",1993-12-01,858877,1984,1993
242,IBM,Technology,249887000000.0,IBM,Information Technology,IT Consulting & Other Services,"Armonk, New York",1957-03-04,51143,1911,1957
254,INTU,Technology,215229900000.0,Intuit,Information Technology,Application Software,"Mountain View, California",2000-12-05,896878,1983,2000


In [115]:
comm_top10 = snp_sorted_by_sector[snp_sorted_by_sector['Sector']=='Communication Services'].sort_values(by='MarketCap', ascending=False)[:10]
comm_top10

Unnamed: 0,Ticker,Sector,MarketCap,Security,GICS Sector,GICS Sub-Industry,Headquarters Location,Date added,CIK,Founded,Year_Added
19,GOOGL,Communication Services,2049928000000.0,Alphabet Inc. (Class A),Communication Services,Interactive Media & Services,"Mountain View, California",2014-04-03,1652044,1998,2014
20,GOOG,Communication Services,2049922000000.0,Alphabet Inc. (Class C),Communication Services,Interactive Media & Services,"Mountain View, California",2006-04-03,1652044,1998,2006
311,META,Communication Services,1721355000000.0,Meta Platforms,Communication Services,Interactive Media & Services,"Menlo Park, California",2013-12-23,1326801,2004,2013
333,NFLX,Communication Services,532183400000.0,Netflix,Communication Services,Movies & Entertainment,"Los Gatos, California",2010-12-20,1065280,1997,2010
431,TMUS,Communication Services,279161700000.0,T-Mobile US,Communication Services,Wireless Telecommunication Services,"Bellevue, Washington",2019-07-15,1283699,1994,2019
482,DIS,Communication Services,204763700000.0,Walt Disney Company (The),Communication Services,Movies & Entertainment,"Burbank, California",1976-06-30,1744489,1923,1976
47,T,Communication Services,202196800000.0,AT&T,Communication Services,Integrated Telecommunication Services,"Dallas, Texas",1983-11-30,732717,1983 (1885),1983
470,VZ,Communication Services,184671700000.0,Verizon,Communication Services,Integrated Telecommunication Services,"New York City, New York",1983-11-30,732712,1983 (1877),1983
116,CMCSA,Communication Services,129231800000.0,Comcast,Communication Services,Cable & Satellite,"Philadelphia, Pennsylvania",2002-11-19,1166691,1963,2002
98,CHTR,Communication Services,54763230000.0,Charter Communications,Communication Services,Cable & Satellite,"Stamford, Connecticut",2016-09-08,1091667,1993,2016


In [116]:
fin_top10 = snp_sorted_by_sector[snp_sorted_by_sector['Sector']=='Financial Services'].sort_values(by='MarketCap', ascending=False)[:10]
fin_top10

Unnamed: 0,Ticker,Sector,MarketCap,Security,GICS Sector,GICS Sub-Industry,Headquarters Location,Date added,CIK,Founded,Year_Added
266,JPM,Financial Services,738487600000.0,JPMorgan Chase,Financials,Diversified Banks,"New York City, New York",1975-06-30,19617,2000 (1799 / 1871),1975
474,V,Financial Services,702610900000.0,Visa Inc.,Financials,Transaction & Payment Processing Services,"San Francisco, California",2009-12-21,1403161,1958,2009
304,MA,Financial Services,531853300000.0,Mastercard,Financials,Transaction & Payment Processing Services,"Harrison, New York",2008-07-18,1141391,1966,2008
57,BAC,Financial Services,338708700000.0,Bank of America,Financials,Diversified Banks,"Charlotte, North Carolina",1976-06-30,70858,1998 (1923 / 1874),1976
487,WFC,Financial Services,248391600000.0,Wells Fargo,Financials,Diversified Banks,"San Francisco, California",1976-06-30,72971,1852,1976
26,AXP,Financial Services,211907100000.0,American Express,Financials,Consumer Finance,"New York City, New York",1976-06-30,4962,1850,1976
327,MS,Financial Services,211481500000.0,Morgan Stanley,Financials,Investment Banking & Brokerage,"New York City, New York",1993-07-29,895421,1935,1993
220,GS,Financial Services,188400400000.0,Goldman Sachs,Financials,Investment Banking & Brokerage,"New York City, New York",2002-07-22,886982,1869,2002
65,BX,Financial Services,167134100000.0,Blackstone Inc.,Financials,Asset Management & Custody Banks,"New York City, New York",2023-09-18,1393818,1985,2023
380,PGR,Financial Services,163744100000.0,Progressive Corporation,Financials,Property & Casualty Insurance,"Mayfield Village, Ohio",1997-08-04,80661,1937,1997


- Plot YTD return for top 10 companies by market cap for the tech, communication, and financial industries, 

In [20]:
import yfinance as yf
from utilities import fetch_historical_data, calculate_ytd_return
from datetime import date
import numpy as np

In [6]:
tech_top10_list = ['MSFT', 'NVDA', 'AAPL', 'AVGO', 'ORCL', 'PLTR', 'CRM', 'CSCO', 'IBM', 'INTU']
comm_top10_list = ['GOOGL', 'GOOG', 'META', 'NFLX', 'TMUS', 'DIS', 'T', 'VZ', 'CMCSA', 'CHTR']
fin_top10_list = ['JPM', 'V', 'MA', 'BAC', 'WFC', 'AXP', 'MS', 'GS', 'BX', 'PGR']
full_top10_list = tech_top10_list + comm_top10_list + fin_top10_list

today = date.today()
start = date(year=today.year, month=1, day=1)

hist_data = fetch_historical_data(full_top10_list, start_date=start, end_date=today)
hist_data

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker
0,2025-01-02 00:00:00-05:00,423.900262,424.438202,413.261173,416.976868,16896500,0.0,0.0,MSFT
1,2025-01-03 00:00:00-05:00,419.467282,422.405996,417.933202,421.728607,16662900,0.0,0.0,MSFT
2,2025-01-06 00:00:00-05:00,426.360784,432.656586,423.850447,426.211365,20573600,0.0,0.0,MSFT
3,2025-01-07 00:00:00-05:00,427.356962,429.000637,419.188356,420.752350,18139100,0.0,0.0,MSFT
4,2025-01-08 00:00:00-05:00,421.838167,425.334733,419.925537,422.933960,15054600,0.0,0.0,MSFT
...,...,...,...,...,...,...,...,...,...
3355,2025-06-09 00:00:00-04:00,278.000000,278.709991,266.010010,271.309998,5885900,0.0,0.0,PGR
3356,2025-06-10 00:00:00-04:00,270.100006,271.000000,264.549988,265.489990,3987300,0.0,0.0,PGR
3357,2025-06-11 00:00:00-04:00,264.959991,265.500000,262.109985,263.220001,3013400,0.0,0.0,PGR
3358,2025-06-12 00:00:00-04:00,263.350006,268.450012,262.929993,268.420013,3045600,0.0,0.0,PGR


In [None]:
# Assign sector information to historical data of top 10 companies by sector
hist_data['Sector'] = ''
hist_data.loc[hist_data['Ticker'].isin(tech_top10_list), 'Sector'] = 'Technology'
hist_data.loc[hist_data['Ticker'].isin(comm_top10_list), 'Sector'] = 'Communication Services'
hist_data.loc[hist_data['Ticker'].isin(fin_top10_list), 'Sector'] = 'Financial Services'
hist_data_copy = hist_data.copy()


In [19]:
hist_data_copy['First Close'] = hist_data_copy.groupby(['Ticker'])['Close'].transform('first')
hist_data_copy

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker,Sector,First Close
0,2025-01-02 00:00:00-05:00,423.900262,424.438202,413.261173,416.976868,16896500,0.0,0.0,MSFT,Technology,416.976868
1,2025-01-03 00:00:00-05:00,419.467282,422.405996,417.933202,421.728607,16662900,0.0,0.0,MSFT,Technology,416.976868
2,2025-01-06 00:00:00-05:00,426.360784,432.656586,423.850447,426.211365,20573600,0.0,0.0,MSFT,Technology,416.976868
3,2025-01-07 00:00:00-05:00,427.356962,429.000637,419.188356,420.752350,18139100,0.0,0.0,MSFT,Technology,416.976868
4,2025-01-08 00:00:00-05:00,421.838167,425.334733,419.925537,422.933960,15054600,0.0,0.0,MSFT,Technology,416.976868
...,...,...,...,...,...,...,...,...,...,...,...
3355,2025-06-09 00:00:00-04:00,278.000000,278.709991,266.010010,271.309998,5885900,0.0,0.0,PGR,Financial Services,236.021561
3356,2025-06-10 00:00:00-04:00,270.100006,271.000000,264.549988,265.489990,3987300,0.0,0.0,PGR,Financial Services,236.021561
3357,2025-06-11 00:00:00-04:00,264.959991,265.500000,262.109985,263.220001,3013400,0.0,0.0,PGR,Financial Services,236.021561
3358,2025-06-12 00:00:00-04:00,263.350006,268.450012,262.929993,268.420013,3045600,0.0,0.0,PGR,Financial Services,236.021561


In [21]:
# Calculate YTD
hist_data_copy['YTD Return'] = calculate_ytd_return(hist_data_copy['Close'], hist_data_copy['First Close'])
hist_data_copy

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker,Sector,First Close,YTD Return
0,2025-01-02 00:00:00-05:00,423.900262,424.438202,413.261173,416.976868,16896500,0.0,0.0,MSFT,Technology,416.976868,0.000
1,2025-01-03 00:00:00-05:00,419.467282,422.405996,417.933202,421.728607,16662900,0.0,0.0,MSFT,Technology,416.976868,1.140
2,2025-01-06 00:00:00-05:00,426.360784,432.656586,423.850447,426.211365,20573600,0.0,0.0,MSFT,Technology,416.976868,2.215
3,2025-01-07 00:00:00-05:00,427.356962,429.000637,419.188356,420.752350,18139100,0.0,0.0,MSFT,Technology,416.976868,0.905
4,2025-01-08 00:00:00-05:00,421.838167,425.334733,419.925537,422.933960,15054600,0.0,0.0,MSFT,Technology,416.976868,1.429
...,...,...,...,...,...,...,...,...,...,...,...,...
3355,2025-06-09 00:00:00-04:00,278.000000,278.709991,266.010010,271.309998,5885900,0.0,0.0,PGR,Financial Services,236.021561,14.951
3356,2025-06-10 00:00:00-04:00,270.100006,271.000000,264.549988,265.489990,3987300,0.0,0.0,PGR,Financial Services,236.021561,12.485
3357,2025-06-11 00:00:00-04:00,264.959991,265.500000,262.109985,263.220001,3013400,0.0,0.0,PGR,Financial Services,236.021561,11.524
3358,2025-06-12 00:00:00-04:00,263.350006,268.450012,262.929993,268.420013,3045600,0.0,0.0,PGR,Financial Services,236.021561,13.727


In [None]:
# Merge to get a list of company names for better plotting
security = pd.DataFrame(snp500_market_cap[['Security', 'Ticker']])
hist_data_merge = pd.merge(hist_data_copy, security, how='inner', on='Ticker')
hist_data_merge.head(5)

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker,Sector,First Close,YTD Return,Security
0,2025-01-02 00:00:00-05:00,423.900262,424.438202,413.261173,416.976868,16896500,0.0,0.0,MSFT,Technology,416.976868,0.0,Microsoft
1,2025-01-03 00:00:00-05:00,419.467282,422.405996,417.933202,421.728607,16662900,0.0,0.0,MSFT,Technology,416.976868,1.14,Microsoft
2,2025-01-06 00:00:00-05:00,426.360784,432.656586,423.850447,426.211365,20573600,0.0,0.0,MSFT,Technology,416.976868,2.215,Microsoft
3,2025-01-07 00:00:00-05:00,427.356962,429.000637,419.188356,420.75235,18139100,0.0,0.0,MSFT,Technology,416.976868,0.905,Microsoft
4,2025-01-08 00:00:00-05:00,421.838167,425.334733,419.925537,422.93396,15054600,0.0,0.0,MSFT,Technology,416.976868,1.429,Microsoft


In [120]:
ytd_top10= px.line(
    hist_data_merge, x='Date', y='YTD Return', color='Security', facet_col='Sector'
)
ytd_top10.show()

In [82]:
ytd_top10_tech = px.line(
    hist_data_merge[hist_data_merge['Sector']=='Technology'], x='Date', y='YTD Return', color='Security', color_discrete_sequence=px.colors.qualitative.T10,
    title='Top 10 Technology Companies YTD Returns (%)'
)
ytd_top10_tech.update_layout(yaxis_title='YTD Returns (%)', legend_title='Companies')
ytd_top10_tech.show()

In [83]:
ytd_top10_comm = px.line(
    hist_data_merge[hist_data_merge['Sector']=='Communication Services'], x='Date', y='YTD Return', color='Security', color_discrete_sequence=px.colors.qualitative.T10,
    title='Top 10 Communication Companies YTD Returns (%)'
)
ytd_top10_comm.update_layout(yaxis_title='YTD Returns (%)', legend_title='Companies')
ytd_top10_comm.show()

In [84]:
ytd_top10_fin = px.line(
    hist_data_merge[hist_data_merge['Sector']=='Financial Services'], x='Date', y='YTD Return', color='Security', color_discrete_sequence=px.colors.qualitative.T10,
    title='Top 10 Financial Companies YTD Returns (%)'
)
ytd_top10_fin.update_layout(yaxis_title='YTD Returns (%)', legend_title='Companies')

ytd_top10_fin.show()

- The original 53 companies that were added in 1957 are still in the index today. Let's see how they performed this year and see if they were affected by the tarrif announcement.

In [None]:
original_list = snp500_market_cap[snp500_market_cap['Year_Added']==1957]['Ticker'].to_list() #53

In [None]:
# top 10 original by market cap
snp500_market_cap[snp500_market_cap['Year_Added']==1957].sort_values('MarketCap', ascending=False)[:10]

Unnamed: 0,Ticker,Sector,MarketCap,Security,GICS Sector,GICS Sub-Industry,Headquarters Location,Date added,CIK,Founded,Year_Added
186,XOM,Energy,449366200000.0,ExxonMobil,Energy,Integrated Oil & Gas,"Irving, Texas",1957-03-04,34088,1999,1957
379,PG,Consumer Defensive,384551500000.0,Procter & Gamble,Consumer Staples,Personal Care Products,"Cincinnati, Ohio",1957-03-04,80424,1837,1957
112,KO,Consumer Defensive,307109700000.0,Coca-Cola Company (The),Consumer Staples,Soft Drinks & Non-alcoholic Beverages,"Atlanta, Georgia",1957-03-04,21344,1886,1957
207,GE,Industrials,272601300000.0,GE Aerospace,Industrials,Aerospace & Defense,"Evendale, Ohio",1957-03-04,40545,1892,1957
242,IBM,Technology,249887000000.0,IBM,Information Technology,IT Consulting & Other Services,"Armonk, New York",1957-03-04,51143,1911,1957
99,CVX,Energy,242876000000.0,Chevron Corporation,Energy,Integrated Oil & Gas,"San Ramon, California",1957-03-04,93410,1879,1957
2,ABT,Healthcare,233034300000.0,Abbott Laboratories,Health Care,Health Care Equipment,"North Chicago, Illinois",1957-03-04,1800,1888,1957
310,MRK,Healthcare,198296000000.0,Merck & Co.,Health Care,Pharmaceuticals,"Kenilworth, New Jersey",1957-03-04,310158,1891,1957
392,RTX,Industrials,185830700000.0,RTX Corporation,Industrials,Aerospace & Defense,"Waltham, Massachusetts",1957-03-04,101829,1922,1957
368,PEP,Consumer Defensive,179762600000.0,PepsiCo,Consumer Staples,Soft Drinks & Non-alcoholic Beverages,"Purchase, New York",1957-03-04,77476,1898,1957


In [69]:
top10_original_list = snp500_market_cap[snp500_market_cap['Year_Added']==1957].sort_values('MarketCap', ascending=False)[:10]['Ticker'].to_list()

hist_data_original = fetch_historical_data(top10_original_list, start_date=start, end_date=today)
hist_data_original

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker
0,2025-01-02 00:00:00-05:00,106.340636,107.047742,104.965709,105.388008,12685400,0.0,0.0,XOM
1,2025-01-03 00:00:00-05:00,106.065654,106.546875,105.535324,105.928162,14237900,0.0,0.0,XOM
2,2025-01-06 00:00:00-05:00,106.301347,107.813766,105.594242,105.810303,15623700,0.0,0.0,XOM
3,2025-01-07 00:00:00-05:00,106.988823,108.088758,106.340640,106.802223,12625900,0.0,0.0,XOM
4,2025-01-08 00:00:00-05:00,105.388008,105.780845,104.111296,105.014816,17858100,0.0,0.0,XOM
...,...,...,...,...,...,...,...,...,...
1115,2025-06-09 00:00:00-04:00,129.830002,130.649994,129.179993,129.960007,8453100,0.0,0.0,PEP
1116,2025-06-10 00:00:00-04:00,130.199997,132.119995,129.460007,131.830002,11852900,0.0,0.0,PEP
1117,2025-06-11 00:00:00-04:00,131.940002,131.970001,129.789993,129.899994,9168400,0.0,0.0,PEP
1118,2025-06-12 00:00:00-04:00,129.889999,132.330002,129.710007,132.300003,11444900,0.0,0.0,PEP


In [None]:
hist_data_original['First Close'] = hist_data_original.groupby('Ticker')['Close'].transform('first')
hist_data_original['YTD Returns'] = calculate_ytd_return(hist_data_original['Close'], hist_data_original['First Close'])
hist_data_original

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker,First Close,YTD Returns
0,2025-01-02 00:00:00-05:00,106.340636,107.047742,104.965709,105.388008,12685400,0.0,0.0,XOM,105.388008,0.000
1,2025-01-03 00:00:00-05:00,106.065654,106.546875,105.535324,105.928162,14237900,0.0,0.0,XOM,105.388008,0.513
2,2025-01-06 00:00:00-05:00,106.301347,107.813766,105.594242,105.810303,15623700,0.0,0.0,XOM,105.388008,0.401
3,2025-01-07 00:00:00-05:00,106.988823,108.088758,106.340640,106.802223,12625900,0.0,0.0,XOM,105.388008,1.342
4,2025-01-08 00:00:00-05:00,105.388008,105.780845,104.111296,105.014816,17858100,0.0,0.0,XOM,105.388008,-0.354
...,...,...,...,...,...,...,...,...,...,...,...
1115,2025-06-09 00:00:00-04:00,129.830002,130.649994,129.179993,129.960007,8453100,0.0,0.0,PEP,147.277557,-11.758
1116,2025-06-10 00:00:00-04:00,130.199997,132.119995,129.460007,131.830002,11852900,0.0,0.0,PEP,147.277557,-10.489
1117,2025-06-11 00:00:00-04:00,131.940002,131.970001,129.789993,129.899994,9168400,0.0,0.0,PEP,147.277557,-11.799
1118,2025-06-12 00:00:00-04:00,129.889999,132.330002,129.710007,132.300003,11444900,0.0,0.0,PEP,147.277557,-10.170


In [76]:
top10_original = snp500_market_cap[snp500_market_cap['Year_Added']==1957].sort_values('MarketCap', ascending=False)[:10][['Security', 'Ticker']]
ytd_top10_og = pd.merge(hist_data_original, top10_original, how='inner', on='Ticker')

In [78]:
ytd_top10_og

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker,First Close,YTD Returns,Security
0,2025-01-02 00:00:00-05:00,106.340636,107.047742,104.965709,105.388008,12685400,0.0,0.0,XOM,105.388008,0.000,ExxonMobil
1,2025-01-03 00:00:00-05:00,106.065654,106.546875,105.535324,105.928162,14237900,0.0,0.0,XOM,105.388008,0.513,ExxonMobil
2,2025-01-06 00:00:00-05:00,106.301347,107.813766,105.594242,105.810303,15623700,0.0,0.0,XOM,105.388008,0.401,ExxonMobil
3,2025-01-07 00:00:00-05:00,106.988823,108.088758,106.340640,106.802223,12625900,0.0,0.0,XOM,105.388008,1.342,ExxonMobil
4,2025-01-08 00:00:00-05:00,105.388008,105.780845,104.111296,105.014816,17858100,0.0,0.0,XOM,105.388008,-0.354,ExxonMobil
...,...,...,...,...,...,...,...,...,...,...,...,...
1115,2025-06-09 00:00:00-04:00,129.830002,130.649994,129.179993,129.960007,8453100,0.0,0.0,PEP,147.277557,-11.758,PepsiCo
1116,2025-06-10 00:00:00-04:00,130.199997,132.119995,129.460007,131.830002,11852900,0.0,0.0,PEP,147.277557,-10.489,PepsiCo
1117,2025-06-11 00:00:00-04:00,131.940002,131.970001,129.789993,129.899994,9168400,0.0,0.0,PEP,147.277557,-11.799,PepsiCo
1118,2025-06-12 00:00:00-04:00,129.889999,132.330002,129.710007,132.300003,11444900,0.0,0.0,PEP,147.277557,-10.170,PepsiCo


In [85]:
ytd_org_fig = px.line(
    ytd_top10_og, x='Date', y='YTD Returns', color='Security', color_discrete_sequence=px.colors.qualitative.T10,
    title='Top 10 Original Companies YTD Returns (%)'
)
ytd_org_fig.update_layout(yaxis_title='YTD Returns (%)', legend_title='Companies')
ytd_org_fig.show()

- It would be interesting to see how many companies are added each year per sector.

In [None]:
# entrants_df = df['Year_Added'].value_counts().reset_index(name='Count')
new_entrants_df = df.groupby(['Year_Added', 'Sector']).size().reset_index(name='Count')

fig_new_entrants = px.bar(
    new_entrants_df[~(new_entrants_df['Year_Added'] == 1957)],
    x='Year_Added',
    y='Count',
    color='Sector',
    barmode='stack',
    color_discrete_map=color_map,
    # category_orders={"Year_Added": sorted(new_entrants_df["Year_Added"].unique())}, # ensure that year is chronological
    title='Annual Additions to S&P 500 by Sector')


fig_new_entrants.update_layout(
    xaxis_title='Year',
    yaxis_title='Count',
    # xaxis=dict(type='category')
)
fig_new_entrants.show()

# How did global indices do this week?

In [87]:
import yfinance as yf
import pandas as pd
from datetime import date
from utilities import fetch_historical_data, calculate_ytd_return, map_country, color_map_countries
import plotly.express as px

In [88]:
tickers = ['^GSPC', '000001.SS', '^HSI', '^AXJO', '^NSEI', '^GSPTSE', '^GDAXI', '^FTSE', '^N225', '^MXX', '^BVSP']

today = date.today()
start = date(year=today.year, month=today.month, day=today.day-6)

# print(f'Start date: {start}, End date: {today}')

hist = fetch_historical_data(tickers, start, today)

- It's too messy to deal with all the timezones so I've decided to drop Australia account for the timezone by labeling by day rather than using the timezones that were given by yfinance

In [None]:
# Drop AUS' AXJO
hist = hist[~(hist['Ticker']=='^AXJO')]
# Create custom datetime datatype in the format of DDMONYYYY
hist.loc[:,'Day'] = (hist.groupby('Ticker').cumcount() + 9).astype(str) + ' June 2025'
hist.loc[:,'Day'] = pd.to_datetime(hist['Day'], format='%d %B %Y').dt.date
hist

In [90]:
# Load YTD dataset to extract closing value of the first day of the year
hist0 = pd.read_csv('data/ytd_6Jun.csv')
hist0.head(2)

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker,First Close,YTD Return,Country
0,2025-01-02,5903.259766,5935.089844,5829.529785,5868.549805,3621680000,0.0,0.0,^GSPC,5868.549805,0.0,GSPC (United States)
1,2025-01-03,5891.069824,5949.339844,5888.660156,5942.470215,3667340000,0.0,0.0,^GSPC,5868.549805,1.26,GSPC (United States)


In [91]:
# Extract closing value of the first day of the year
first_close = hist0.groupby('Ticker')['First Close'].first()
first_close = pd.DataFrame(first_close).reset_index()

# Merge to dataframe with new data
hist_first_close = pd.merge(hist, first_close, on='Ticker', how='left', validate='many_to_one')
hist_first_close = hist_first_close[['Day', 'Close', 'Volume', 'Ticker', 'First Close']]

# Calculate YTD return for new data
hist_first_close['YTD Return'] = calculate_ytd_return(hist_first_close['Close'], hist_first_close['First Close'])

# Map ticker to country so it's easier to see which index belongs to which country
hist_first_close['Ticker (Country)'] = hist_first_close['Ticker'].map(map_country)
hist_first_close.head(5)

Unnamed: 0,Day,Close,Volume,Ticker,First Close,YTD Return,Ticker (Country)
0,2025-06-09,6005.879883,4642360000,^GSPC,5868.549805,2.34,GSPC (USA)
1,2025-06-10,6038.810059,4882880000,^GSPC,5868.549805,2.901,GSPC (USA)
2,2025-06-11,6022.240234,5111550000,^GSPC,5868.549805,2.619,GSPC (USA)
3,2025-06-12,6045.259766,4669500000,^GSPC,5868.549805,3.011,GSPC (USA)
4,2025-06-13,5976.970215,5258910000,^GSPC,5868.549805,1.847,GSPC (USA)


In [119]:
# Dealing with datetime values are tricky; I just want to plot the date withou the time component so I have to manually specify the tick values
tickvals = hist_first_close['Day'].sort_values().unique()
tickvals = [pd.Timestamp(val).to_pydatetime() for val in tickvals]

ytd_fig = px.line(
    data_frame=hist_first_close, 
    x='Day', y='YTD Return', color='Ticker (Country)',
    color_discrete_sequence=px.colors.qualitative.T10,
    title='Global Indices YTD Returns (June 9 - June 13)'
)

ytd_fig.update_layout(
    xaxis=dict(
        tickmode='array',
        tickformat='%m-%d',
        tickvals=tickvals
    )
)

ytd_fig.update_layout(
    yaxis_title='YTD Return (%)'
)


ytd_fig.show()