## README

### Alpha Vantage

https://www.alphavantage.co/support/#api-key

API Limit
- 5 API requests per minute and 500 requests per day

Academic Access
- 30 requests per minute and no daily limit

In [15]:
import pandas as pd
import requests
import time
import datetime

from IPython.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

pd.set_option('display.max_columns', None)
pd.set_option('max_colwidth', None)

## S&P500

https://www.slickcharts.com/sp500

In [86]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

# Make a request to the website
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0'}
page = requests.get("https://www.slickcharts.com/sp500", headers=headers)

# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')

# Find the table element in the HTML
table = soup.find('table', {'class': 'table table-hover table-borderless table-sm'})

# Extract the table headers and rows
headers = [th.text.strip() for th in table.find_all('th')]
rows = []
for tr in table.find_all('tr')[1:]:
    rows.append([td.text.strip() for td in tr.find_all('td')])

# Create a pandas DataFrame from the headers and rows
sp500_df = pd.DataFrame(rows, columns=headers)

In [99]:
sp500_df["Symbol_modified"] = sp500_df["Symbol"].str.replace(".", "-", regex=False)

In [216]:
sp500_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 505 entries, 0 to 504
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   #                505 non-null    object
 1   Company          505 non-null    object
 2   Symbol           505 non-null    object
 3   Weight           505 non-null    object
 4   Price            505 non-null    object
 5   Chg              505 non-null    object
 6   % Chg            505 non-null    object
 7   Symbol_modified  505 non-null    object
dtypes: object(8)
memory usage: 31.7+ KB


In [100]:
sp500_df.head(10)

Unnamed: 0,#,Company,Symbol,Weight,Price,Chg,% Chg,Symbol_modified
0,1,Apple Inc.,AAPL,7.160776,160.22,-0.03,(-0.02%),AAPL
1,2,Microsoft Corporation,MSFT,6.261452,280.96,0.39,(0.14%),MSFT
2,3,Amazon.com Inc.,AMZN,2.665966,98.49,0.36,(0.36%),AMZN
3,4,NVIDIA Corporation,NVDA,2.026405,267.7,-0.09,(-0.03%),NVDA
4,5,Alphabet Inc. Class A,GOOGL,1.905394,105.7,0.26,(0.25%),GOOGL
5,6,Alphabet Inc. Class C,GOOG,1.671415,106.41,0.35,(0.33%),GOOG
6,7,Berkshire Hathaway Inc. Class B,BRK.B,1.628577,298.97,0.05,(0.02%),BRK-B
7,8,Tesla Inc.,TSLA,1.566153,190.23,-0.19,(-0.10%),TSLA
8,9,Meta Platforms Inc. Class A,META,1.377436,206.25,0.24,(0.12%),META
9,10,UnitedHealth Group Incorporated,UNH,1.330112,476.83,0.84,(0.18%),UNH


### Listing status

https://www.alphavantage.co/documentation/#listing-status

This API returns a list of active or delisted US stocks and ETFs, either as of the latest trading day or at a specific time in history.

Get top 25 by market weight from this list
https://www.slickcharts.com/sp500

Exclude Merck & Co (MRK) - not tagged by NYT API

In [3]:
url = "https://www.alphavantage.co/query?function=LISTING_STATUS&apikey=demo"
stock_df = pd.read_csv(url)

In [88]:
stock_df

Unnamed: 0,symbol,name,exchange,assetType,ipoDate,delistingDate,status
0,A,Agilent Technologies Inc,NYSE,Stock,1999-11-18,,Active
1,AA,Alcoa Corp,NYSE,Stock,2016-10-18,,Active
2,AAA,AXS FIRST PRIORITY CLO BOND ETF,NYSE ARCA,ETF,2020-09-09,,Active
3,AAAU,Goldman Sachs Physical Gold ETF,BATS,ETF,2018-08-15,,Active
4,AAC,Ares Acquisition Corporation - Class A,NYSE,Stock,2021-03-25,,Active
...,...,...,...,...,...,...,...
11694,ZYNE,Zynerba Pharmaceuticals Inc,NASDAQ,Stock,2015-08-05,,Active
11695,ZYRX,Global Earnings Capital Ltd,NASDAQ,Stock,2007-07-13,,Active
11696,ZYXI,Zynex Inc,NASDAQ,Stock,2012-07-23,,Active
11697,ZZK,,NYSE ARCA,Stock,2020-07-22,,Active


In [90]:
stock_df.loc[stock_df["symbol"].isna()]

Unnamed: 0,symbol,name,exchange,assetType,ipoDate,delistingDate,status
7112,,Nano Labs Ltd,NASDAQ,Stock,2022-07-12,,Active


In [103]:
available = set(stock_df["symbol"])

sp500 = set(sp500_df["Symbol_modified"])

assert sp500.issubset(available)

In [93]:
# stock_df.loc[stock_df["symbol"].str.contains("BRK", na=False)]
# stock_df.loc[stock_df["symbol"] == "PFE"]
# stock_df.loc[stock_df["name"].str.contains("isa", na=False)]

Unnamed: 0,symbol,name,exchange,assetType,ipoDate,delistingDate,status
1501,BRK-A,Berkshire Hathaway Inc - Class A,NYSE,Stock,1980-03-17,,Active
1502,BRK-B,Berkshire Hathaway Inc - Class B,NYSE,Stock,1996-05-09,,Active
1503,BRKH,BurTech Acquisition Corp - Class A,NASDAQ,Stock,2022-01-31,,Active
1504,BRKHU,BurTech Acquisition Corp - Units (1 Ord Share ...,NASDAQ,Stock,2021-12-13,,Active
1505,BRKHW,BurTech Acquisition Corp - Warrants (18/12/2026),NASDAQ,Stock,2022-02-01,,Active
1506,BRKL,Brookline Bancorp Inc,NASDAQ,Stock,1998-03-25,,Active
1507,BRKR,Bruker Corp,NASDAQ,Stock,2000-08-04,,Active
1508,BRKY,Direxion Breakfast Commodities Strategy ETF,NYSE ARCA,ETF,2022-06-01,,Active


### Intraday Extended

https://www.alphavantage.co/documentation/#intraday-extended

- This API returns historical intraday time series for the trailing 2 years, covering over 2 million data points per ticker. 
- Includes extended trading hours where applicable (e.g., 4:00am to 8:00pm Eastern Time for the US market). 
- The intraday data is derived from the Securities Information Processor (SIP) market-aggregated data. 
- You can query both raw (as-traded) and split/dividend-adjusted intraday data from this endpoint.

Storage

1min
- 12 Stock - 300MB
- 500 Stock - 12600MB (12.6GB)

5min
- 500 Stock - 12600MB (2.5GB)

In [39]:
API_KEY = "8VUOFOJS4BENDZG9"

In [40]:
symbols = [
# "AAPL",
# "AMZN",
# "GOOGL",
# "META",
# "MSFT",
# "TSLA",
"NVDA",
"UNH", 
"XOM",
"JNJ",
"JPM",
"V",
# "PG",
# "MA",
# "HD",  
# "ABBV",
# "CVX",
# "LLY",
# "AVGO",
# "PEP",
# "KO",
# "PFE",
]

In [207]:
df_list = []

for index, symbol in sp500_df["Symbol_modified"].iteritems():
    print(f"Processing {index}-th symbol: {symbol}")
    for year in [1,2]:
        for month in range(1,13):    
            url = "https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY_EXTENDED"
            # format url
            url += f"&symbol={symbol}"
            url += "&interval=5min" # 1min, 5min, 15min, 30min, 60min
            url += f"&slice=year{year}month{month}"
            url += f"&apikey={API_KEY}"
            print(url)

            df = pd.read_csv(url)
            # 30 requests per minute so sleep 2 seconds between requests
            time.sleep(2)
            # sense check data
            if len(df) < 5:
                raise ValueError(f"URL {url} returns incorrect data: {df}")
            df["time"] = pd.to_datetime(df["time"])
            print(f"Result has {len(df)} rows from {df['time'].min()} to {df['time'].max()}")
            df["symbol"] = symbol
            df_list.append(df)
master_df = pd.concat(df_list)

Processing KDP
https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY_EXTENDED&symbol=KDP&interval=5min&slice=year1month1&apikey=8VUOFOJS4BENDZG9
Result has 1932 rows from 2023-02-23 06:35:00 to 2023-03-24 17:10:00
https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY_EXTENDED&symbol=KDP&interval=5min&slice=year1month2&apikey=8VUOFOJS4BENDZG9
Result has 1794 rows from 2023-01-24 07:20:00 to 2023-02-22 19:45:00
https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY_EXTENDED&symbol=KDP&interval=5min&slice=year1month3&apikey=8VUOFOJS4BENDZG9
Result has 1561 rows from 2022-12-27 08:30:00 to 2023-01-23 16:20:00
https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY_EXTENDED&symbol=KDP&interval=5min&slice=year1month4&apikey=8VUOFOJS4BENDZG9
Result has 1712 rows from 2022-11-25 06:50:00 to 2022-12-23 16:05:00
https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY_EXTENDED&symbol=KDP&interval=5min&slice=year1month5&apikey=8VUOFOJS4BENDZG9
Result ha

In [208]:
master_df.groupby("symbol").agg({"time":["min", "max", "count"]})

Unnamed: 0_level_0,time,time,time
Unnamed: 0_level_1,min,max,count
symbol,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
KDP,2021-04-05 08:55:00,2023-03-24 17:10:00,42249


In [105]:
# history_df = pd.read_csv("stock_price_2023_03_17.csv")
# history_df.groupby("symbol").agg({"time":["min", "max", "count"]})
# new_df = pd.concat([history_df, master_df])

Unnamed: 0_level_0,time,time,time
Unnamed: 0_level_1,min,max,count
symbol,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
AAPL,2021-03-29 04:01:00,2023-03-17 19:57:00,401838
AMZN,2021-03-29 07:02:00,2023-03-17 20:00:00,293876
GOOGL,2021-03-29 06:05:00,2023-03-17 20:00:00,240543
JNJ,2021-03-29 04:22:00,2023-03-17 19:44:00,207435
JPM,2021-03-29 04:01:00,2023-03-17 20:00:00,234296
META,2022-06-09 06:14:00,2023-03-17 20:00:00,129364
MSFT,2021-03-29 04:01:00,2023-03-17 20:00:00,306143
NVDA,2021-03-29 04:10:00,2023-03-17 20:00:00,353032
TSLA,2021-03-29 04:01:00,2023-03-17 20:00:00,389519
UNH,2021-03-29 08:01:00,2023-03-17 17:43:00,200140


In [44]:
new_df.groupby("symbol").agg({"time":["min", "max", "count"]})

Unnamed: 0_level_0,time,time,time
Unnamed: 0_level_1,min,max,count
symbol,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
AAPL,2021-03-29 04:01:00,2023-03-17 19:57:00,401838
AMZN,2021-03-29 07:02:00,2023-03-17 20:00:00,293876
GOOGL,2021-03-29 06:05:00,2023-03-17 20:00:00,240543
JNJ,2021-03-29 04:22:00,2023-03-17 19:44:00,207435
JPM,2021-03-29 04:01:00,2023-03-17 20:00:00,234296
META,2022-06-09 06:14:00,2023-03-17 20:00:00,129364
MSFT,2021-03-29 04:01:00,2023-03-17 20:00:00,306143
NVDA,2021-03-29 04:10:00,2023-03-17 20:00:00,353032
TSLA,2021-03-29 04:01:00,2023-03-17 20:00:00,389519
UNH,2021-03-29 08:01:00,2023-03-17 17:43:00,200140


In [45]:
# new_df.to_csv("stock_price_2023_03_17.csv", index=False, mode="w")

## News

In [108]:
import requests

# replace the "demo" apikey below with your own key from https://www.alphavantage.co/support/#api-key
url = 'https://www.alphavantage.co/query?function=NEWS_SENTIMENT&tickers=AAPL&apikey=demo'
r = requests.get(url)
data = r.json()

In [115]:
pd.DataFrame(data["feed"])

Unnamed: 0,title,url,time_published,authors,summary,banner_image,source,category_within_source,source_domain,topics,overall_sentiment_score,overall_sentiment_label,ticker_sentiment
0,3 Warren Buffett Stocks That Are Crushing the S&P 500 This Year. Are They Still Smart Picks?,https://www.fool.com/investing/2023/03/25/warren-buffett-stocks-crushing-sp-500/,20230325T095000,[Keith Speights],"The present looks bright for these stocks. The future could, too.",https://g.foolcdn.com/image/?url=https%3A%2F%2Fg.foolcdn.com%2Feditorial%2Fimages%2F725141%2Fwarren-buffett-brka-brkb-berkshire-hathaway-motley-fool2.jpg&op=resize&h=340,Motley Fool,,www.fool.com,"[{'topic': 'Retail & Wholesale', 'relevance_score': '0.25'}, {'topic': 'Financial Markets', 'relevance_score': '0.990999'}, {'topic': 'Manufacturing', 'relevance_score': '0.25'}, {'topic': 'Earnings', 'relevance_score': '0.360215'}, {'topic': 'Technology', 'relevance_score': '0.25'}, {'topic': 'Finance', 'relevance_score': '0.25'}]",0.26133,Somewhat-Bullish,"[{'ticker': 'NVDA', 'relevance_score': '0.064623', 'ticker_sentiment_score': '0.134178', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'AAPL', 'relevance_score': '0.373378', 'ticker_sentiment_score': '0.328048', 'ticker_sentiment_label': 'Somewhat-Bullish'}, {'ticker': 'FND', 'relevance_score': '0.128822', 'ticker_sentiment_score': '0.200589', 'ticker_sentiment_label': 'Somewhat-Bullish'}, {'ticker': 'AMZN', 'relevance_score': '0.483434', 'ticker_sentiment_score': '0.31209', 'ticker_sentiment_label': 'Somewhat-Bullish'}, {'ticker': 'BRK-A', 'relevance_score': '0.064623', 'ticker_sentiment_score': '0.170497', 'ticker_sentiment_label': 'Somewhat-Bullish'}]"
1,"After TikTok chief's grilling in Washington, Apple's Tim Cook is all smiles in Beijing | Business",https://www.cnn.com/2023/03/25/business/apple-tim-cook-visits-beijing-intl-hnk/index.html,20230325T074600,"[Jake Kwon, Kathleen Magramo]","After TikTok chief's grilling in Washington, Apple's Tim Cook is all smiles in ...","https://media.cnn.com/api/v1/images/stellar/prod/230323220805-douyin-tiktok-comparison-wang-pkg-vpx.jpg?c=16x9&q=h_144,w_256,c_fill",CNN,Economy,www.cnn.com,"[{'topic': 'Earnings', 'relevance_score': '0.158519'}, {'topic': 'Technology', 'relevance_score': '1.0'}]",0.079037,Neutral,"[{'ticker': 'AAPL', 'relevance_score': '0.543681', 'ticker_sentiment_score': '0.192362', 'ticker_sentiment_label': 'Somewhat-Bullish'}, {'ticker': 'WB', 'relevance_score': '0.053992', 'ticker_sentiment_score': '0.09075', 'ticker_sentiment_label': 'Neutral'}]"
2,Tech Is Moving Away From China. Who Could Benefit.,https://www.barrons.com/articles/tech-exodus-from-china-accelerates-these-emerging-markets-may-benefit-7f965392,20230325T070149,[Craig Mellow],Tech Exodus From China Accelerates. These Emerging Markets May Benefit. ...,https://images.barrons.com/im-750131?width=639&height=426,Barrons,,www.barrons.com,"[{'topic': 'Life Sciences', 'relevance_score': '0.25'}, {'topic': 'Retail & Wholesale', 'relevance_score': '0.25'}, {'topic': 'Financial Markets', 'relevance_score': '0.108179'}, {'topic': 'Manufacturing', 'relevance_score': '0.25'}, {'topic': 'Technology', 'relevance_score': '0.25'}]",-0.047457,Neutral,"[{'ticker': 'SSNLF', 'relevance_score': '0.088472', 'ticker_sentiment_score': '-0.033553', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'AAPL', 'relevance_score': '0.132368', 'ticker_sentiment_score': '-0.020412', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'TSLA', 'relevance_score': '0.088472', 'ticker_sentiment_score': '-0.035748', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'GM', 'relevance_score': '0.088472', 'ticker_sentiment_score': '0.0', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'HYMLF', 'relevance_score': '0.044304', 'ticker_sentiment_score': '-0.028691', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'CAT', 'relevance_score': '0.088472', 'ticker_sentiment_score': '0.0', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'PG', 'relevance_score': '0.088472', 'ticker_sentiment_score': '0.0', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'HPQ', 'relevance_score': '0.044304', 'ticker_sentiment_score': '-0.084041', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'HPE', 'relevance_score': '0.088472', 'ticker_sentiment_score': '-0.088591', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'DELL', 'relevance_score': '0.088472', 'ticker_sentiment_score': '-0.088591', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'GE', 'relevance_score': '0.088472', 'ticker_sentiment_score': '0.0', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'KYOCF', 'relevance_score': '0.044304', 'ticker_sentiment_score': '0.055814', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'FOREX:EUR', 'relevance_score': '0.044304', 'ticker_sentiment_score': '-0.084041', 'ticker_sentiment_label': 'Neutral'}]"
3,Market Rally Still Hasn't Done This; What To Do Now,https://www.investors.com/market-trend/stock-market-today/market-rally-still-hasnt-done-this-microsoft-tesla-near-buy-points/,20230324T214048,"[ED CARSON, Investor's Business Daily]","Market Rally Still Hasn't Done This. Microsoft, Tesla Lead 6 Nasdaq ... Investor's Business Daily ...",https://www.investors.com/wp-content/uploads/2016/04/SSA-040816-shutterstock.jpg,Investors Business Daily,,www.investors.com,"[{'topic': 'Life Sciences', 'relevance_score': '0.2'}, {'topic': 'Technology', 'relevance_score': '0.2'}, {'topic': 'Finance', 'relevance_score': '0.2'}, {'topic': 'Economy - Monetary', 'relevance_score': '0.158519'}, {'topic': 'Financial Markets', 'relevance_score': '1.0'}, {'topic': 'Manufacturing', 'relevance_score': '0.2'}, {'topic': 'Real Estate & Construction', 'relevance_score': '0.2'}]",0.019531,Neutral,"[{'ticker': 'MTH', 'relevance_score': '0.046953', 'ticker_sentiment_score': '0.061064', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'PACW', 'relevance_score': '0.070379', 'ticker_sentiment_score': '-0.097146', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'NVDA', 'relevance_score': '0.046953', 'ticker_sentiment_score': '0.011039', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'AAPL', 'relevance_score': '0.117028', 'ticker_sentiment_score': '0.115205', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'JPM', 'relevance_score': '0.070379', 'ticker_sentiment_score': '-0.097146', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'MS', 'relevance_score': '0.070379', 'ticker_sentiment_score': '-0.097146', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'STM', 'relevance_score': '0.046953', 'ticker_sentiment_score': '0.011039', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'COIN', 'relevance_score': '0.070379', 'ticker_sentiment_score': '-0.078792', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'MSFT', 'relevance_score': '0.163275', 'ticker_sentiment_score': '0.171955', 'ticker_sentiment_label': 'Somewhat-Bullish'}, {'ticker': 'MELI', 'relevance_score': '0.140212', 'ticker_sentiment_score': '0.135622', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'TSLA', 'relevance_score': '0.140212', 'ticker_sentiment_score': '0.086125', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'PD', 'relevance_score': '0.046953', 'ticker_sentiment_score': '0.061064', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'REGN', 'relevance_score': '0.093744', 'ticker_sentiment_score': '0.12307', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'SQ', 'relevance_score': '0.070379', 'ticker_sentiment_score': '-0.078792', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'PANW', 'relevance_score': '0.093744', 'ticker_sentiment_score': '0.148777', 'ticker_sentiment_label': 'Neutral'}]"
4,1 Big Reason to Buy This Nasdaq Stock Hand Over Fist Before It Is Too Late,https://www.fool.com/investing/2023/03/24/1-big-reason-to-buy-this-nasdaq-stock-hand-over-fi/,20230324T211500,[Harsh Chauhan],"Apple has been in fine form on the market, and a rumored product could help it sustain its impressive momentum.",https://g.foolcdn.com/image/?url=https%3A%2F%2Fg.foolcdn.com%2Feditorial%2Fimages%2F725141%2Fwarren-buffett-brka-brkb-berkshire-hathaway-motley-fool2.jpg&op=resize&h=340,Motley Fool,,www.fool.com,"[{'topic': 'Economy - Monetary', 'relevance_score': '0.158519'}, {'topic': 'Financial Markets', 'relevance_score': '0.996023'}, {'topic': 'Earnings', 'relevance_score': '0.796627'}, {'topic': 'Technology', 'relevance_score': '0.5'}, {'topic': 'Finance', 'relevance_score': '0.5'}]",0.215524,Somewhat-Bullish,"[{'ticker': 'GS', 'relevance_score': '0.039811', 'ticker_sentiment_score': '0.254828', 'ticker_sentiment_label': 'Somewhat-Bullish'}, {'ticker': 'AAPL', 'relevance_score': '0.657084', 'ticker_sentiment_score': '0.373343', 'ticker_sentiment_label': 'Bullish'}]"
5,Investor Action Plan: Can The Market Bootstrap Itself To A Strong March Finish?,https://www.investors.com/research/investing-action-plan/stock-trading-action-plan-can-the-market-bootstrap-itself-to-a-strong-march-finish/,20230324T211048,"[Investor's Business Daily, IBD STAFF]",Investor Action Plan: Can The Market Bootstrap Itself To A Strong ... Investor's Business Daily ...,https://www.investors.com/wp-content/uploads/2023/03/Stock-monopolyshoe-01-adobe.jpg,Investors Business Daily,,www.investors.com,"[{'topic': 'Life Sciences', 'relevance_score': '0.142857'}, {'topic': 'Energy & Transportation', 'relevance_score': '0.142857'}, {'topic': 'Technology', 'relevance_score': '0.142857'}, {'topic': 'Finance', 'relevance_score': '0.142857'}, {'topic': 'Economy - Monetary', 'relevance_score': '0.451494'}, {'topic': 'Economy - Fiscal', 'relevance_score': '0.310843'}, {'topic': 'Retail & Wholesale', 'relevance_score': '0.142857'}, {'topic': 'Financial Markets', 'relevance_score': '0.999993'}, {'topic': 'Manufacturing', 'relevance_score': '0.142857'}, {'topic': 'Earnings', 'relevance_score': '1.0'}, {'topic': 'Mergers & Acquisitions', 'relevance_score': '0.158519'}, {'topic': 'Real Estate & Construction', 'relevance_score': '0.142857'}]",0.131084,Neutral,"[{'ticker': 'CALM', 'relevance_score': '0.05381', 'ticker_sentiment_score': '0.0', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'AAPL', 'relevance_score': '0.080638', 'ticker_sentiment_score': '0.139757', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'MU', 'relevance_score': '0.080638', 'ticker_sentiment_score': '0.047454', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'JEF', 'relevance_score': '0.080638', 'ticker_sentiment_score': '-0.008258', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'NEOG', 'relevance_score': '0.05381', 'ticker_sentiment_score': '-0.062768', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'PRGS', 'relevance_score': '0.080638', 'ticker_sentiment_score': '0.051237', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'SMTC', 'relevance_score': '0.080638', 'ticker_sentiment_score': '0.038598', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'AEHR', 'relevance_score': '0.080638', 'ticker_sentiment_score': '0.047454', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'MSFT', 'relevance_score': '0.080638', 'ticker_sentiment_score': '0.139757', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'LULU', 'relevance_score': '0.05381', 'ticker_sentiment_score': '0.213837', 'ticker_sentiment_label': 'Somewhat-Bullish'}, {'ticker': 'MELI', 'relevance_score': '0.107375', 'ticker_sentiment_score': '0.150302', 'ticker_sentiment_label': 'Somewhat-Bullish'}, {'ticker': 'TSLA', 'relevance_score': '0.080638', 'ticker_sentiment_score': '0.139757', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'PAYX', 'relevance_score': '0.05381', 'ticker_sentiment_score': '0.065763', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'NEM', 'relevance_score': '0.05381', 'ticker_sentiment_score': '0.008998', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'WBA', 'relevance_score': '0.080638', 'ticker_sentiment_score': '-0.048274', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'EVGO', 'relevance_score': '0.05381', 'ticker_sentiment_score': '0.084108', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'CTAS', 'relevance_score': '0.080638', 'ticker_sentiment_score': '0.163187', 'ticker_sentiment_label': 'Somewhat-Bullish'}, {'ticker': 'REGN', 'relevance_score': '0.05381', 'ticker_sentiment_score': '0.135992', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'NCMGF', 'relevance_score': '0.02692', 'ticker_sentiment_score': '0.008352', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'PANW', 'relevance_score': '0.05381', 'ticker_sentiment_score': '0.135992', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'CRYPTO:NEM', 'relevance_score': '0.02692', 'ticker_sentiment_score': '0.008352', 'ticker_sentiment_label': 'Neutral'}]"
6,Why Warren Buffett's 1999 Warning Still Applies Today,https://www.benzinga.com/news/23/03/31495182/why-warren-buffetts-1999-warning-still-applies-today,20230324T170546,[William Dahl],"Warren Buffett arrived in Sun Valley, Idaho, with an unpopular warning - one the visionaries in the room were in no mood to hear. The tech leaders gathered there were bent on changing the world and had already made fortunes doing it.",https://cdn.benzinga.com/files/images/story/2023/03/24/warren.png?width=1200&height=800&fit=crop,Benzinga,News,www.benzinga.com,"[{'topic': 'Financial Markets', 'relevance_score': '0.998682'}, {'topic': 'Manufacturing', 'relevance_score': '0.333333'}, {'topic': 'Technology', 'relevance_score': '0.333333'}, {'topic': 'Finance', 'relevance_score': '0.333333'}]",-0.055832,Neutral,"[{'ticker': 'MSFT', 'relevance_score': '0.122269', 'ticker_sentiment_score': '0.037599', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'META', 'relevance_score': '0.061315', 'ticker_sentiment_score': '0.051437', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'AAPL', 'relevance_score': '0.061315', 'ticker_sentiment_score': '0.032087', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'TSLA', 'relevance_score': '0.061315', 'ticker_sentiment_score': '0.051437', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'INTC', 'relevance_score': '0.061315', 'ticker_sentiment_score': '0.007031', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'BRK-A', 'relevance_score': '0.061315', 'ticker_sentiment_score': '-0.184759', 'ticker_sentiment_label': 'Somewhat-Bearish'}]"
7,Visa's stock has paid off big over its 15 years. Can the payments giant continue that success from here?,https://www.marketwatch.com/story/visas-stock-has-paid-off-big-over-its-15-years-can-the-payments-giant-continue-that-success-from-here-9e944525,20230324T161900,[Emily Bary],"Visa shares have shot up nearly 1,900% from the price of the company's historic initial public offering, which took place 15 years ago this past weekend.",https://images.mktw.net/im-750138?width=700&height=472,MarketWatch,Top Stories,www.marketwatch.com,"[{'topic': 'Economy - Monetary', 'relevance_score': '0.158519'}, {'topic': 'IPO', 'relevance_score': '0.999937'}, {'topic': 'Financial Markets', 'relevance_score': '0.796627'}, {'topic': 'Earnings', 'relevance_score': '0.795202'}, {'topic': 'Technology', 'relevance_score': '0.5'}, {'topic': 'Finance', 'relevance_score': '0.5'}]",0.119494,Neutral,"[{'ticker': 'BABA', 'relevance_score': '0.049221', 'ticker_sentiment_score': '-0.033588', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'PYPL', 'relevance_score': '0.098255', 'ticker_sentiment_score': '-0.048764', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'AAPL', 'relevance_score': '0.122643', 'ticker_sentiment_score': '0.142138', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'MA', 'relevance_score': '0.288893', 'ticker_sentiment_score': '0.077433', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'JPM', 'relevance_score': '0.024622', 'ticker_sentiment_score': '0.001918', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'V', 'relevance_score': '0.719968', 'ticker_sentiment_score': '0.23984', 'ticker_sentiment_label': 'Somewhat-Bullish'}]"
8,"Bank Fears Come Knocking Again, Rattling Stocks And Keeping Volatility In Driver's Seat",https://www.benzinga.com/markets/23/03/31494257/bank-fears-come-knocking-again-rattling-stocks-and-keeping-volatility-in-drivers-seat,20230324T160203,[TD Ameritrade Network],"( Friday Market Open ) Bank fears continue to rattle Wall Street, demonstrating how sensitive the market remains to perceived instability in the sector and the volatility that can result.",https://cdn.benzinga.com/files/images/story/2023/03/24/shutterstock_564998512_2.jpg?width=1200&height=800&fit=crop,Benzinga,Markets,www.benzinga.com,"[{'topic': 'Energy & Transportation', 'relevance_score': '0.25'}, {'topic': 'Technology', 'relevance_score': '0.25'}, {'topic': 'Finance', 'relevance_score': '0.25'}, {'topic': 'Economy - Monetary', 'relevance_score': '0.769861'}, {'topic': 'IPO', 'relevance_score': '0.158519'}, {'topic': 'Economy - Fiscal', 'relevance_score': '0.310843'}, {'topic': 'Retail & Wholesale', 'relevance_score': '0.25'}, {'topic': 'Financial Markets', 'relevance_score': '1.0'}, {'topic': 'Earnings', 'relevance_score': '0.890401'}, {'topic': 'Economy - Macro', 'relevance_score': '0.158519'}]",0.026516,Neutral,"[{'ticker': 'AAPL', 'relevance_score': '0.0726', 'ticker_sentiment_score': '0.062586', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'CME', 'relevance_score': '0.036338', 'ticker_sentiment_score': '-0.008736', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'FDX', 'relevance_score': '0.036338', 'ticker_sentiment_score': '-0.009743', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'GS', 'relevance_score': '0.054483', 'ticker_sentiment_score': '-0.010001', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'CCL', 'relevance_score': '0.036338', 'ticker_sentiment_score': '0.0', 'ticker_sentiment_label': 'Neutral'}, {'ticker': 'WBA', 'relevance_score': '0.036338', 'ticker_sentiment_score': '0.174246', 'ticker_sentiment_label': 'Somewhat-Bullish'}, {'ticker': 'FOREX:USD', 'relevance_score': '0.036338', 'ticker_sentiment_score': '0.034363', 'ticker_sentiment_label': 'Neutral'}]"
9,"Apple ( AAPL ) Deepens Focus on Streaming, Movie Business",https://www.zacks.com/stock/news/2070274/apple-aapl-deepens-focus-on-streaming-movie-business,20230324T155400,[Zacks Investment Research],Apple (AAPL) is deepening its focus on expanding its footprint in the entertainment business with plans to spend $1 billion on producing movies.,https://staticx-tuner.zacks.com/images/articles/main/b2/56.jpg,Zacks Commentary,,www.zacks.com,"[{'topic': 'Earnings', 'relevance_score': '0.650727'}, {'topic': 'Retail & Wholesale', 'relevance_score': '0.5'}, {'topic': 'Technology', 'relevance_score': '0.5'}]",0.315413,Somewhat-Bullish,"[{'ticker': 'NFLX', 'relevance_score': '0.33628', 'ticker_sentiment_score': '0.349272', 'ticker_sentiment_label': 'Somewhat-Bullish'}, {'ticker': 'AAPL', 'relevance_score': '0.871926', 'ticker_sentiment_score': '0.515186', 'ticker_sentiment_label': 'Bullish'}, {'ticker': 'AMZN', 'relevance_score': '0.255641', 'ticker_sentiment_score': '0.285643', 'ticker_sentiment_label': 'Somewhat-Bullish'}]"


In [202]:
start_date = "2022-03-01"
end_date = "2023-03-31"
month_start_dates = pd.date_range(start=start_date, end=end_date, freq="SMS")
month_end_dates = pd.date_range(start=start_date, end=end_date, freq="SM")

# create a list of tuples of (start date, end date)
dates = []
for start, end in zip(month_start_dates, month_end_dates):
    # add one day to start date to avoid overlapping date range
    if start.day == 15:
        start += datetime.timedelta(days=1)
    t = (start.strftime("%Y%m%d"), end.strftime("%Y%m%d"))
    dates.append(t) 

In [18]:
next(sp500_df[["Company", "Symbol_modified"]].itertuples(index=True, name=None))

(0, 'Apple Inc.', 'AAPL')

In [203]:
df_list = []

sp500_tuples = sp500_df[["Company", "Symbol_modified"]].itertuples(index=True, name=None)

for company, symbol in sp500_tuples:
    print(f"Processing company:{company} symbol: {symbol}") 
    for start, end in dates:
        url = "https://www.alphavantage.co/query?function=NEWS_SENTIMENT"
        # format url
        url += f"&tickers={symbol}"
        url += f"&topics=earnings"
        url += f"&time_from={start}T0000"
        url += f"&time_to={end}T2359"
        url += f"&sort=RELEVANCE"
        url += f"&limit=200"
        url += f"&apikey={API_KEY}"
        print(f"Pulling data from {start} to {end}")
        print(url)

        r = requests.get(url)
        # 30 requests per minute so sleep 2 seconds between requests
        time.sleep(2)
        data = r.json()
        if "feed" not in data:
            raise ValueError(f"Unexpected data format: {data}")
        df = pd.DataFrame(data["feed"])
        # sense check data
        if len(df) < 5:
            raise ValueError(f"URL {url} returns incorrect data: {df}")
        df["time_published"] = pd.to_datetime(df["time_published"])
        print(f"Result has {len(df)} rows from {df['time_published'].min()} to {df['time_published'].max()}")

        # add company and symbol columns
        df["company"] = company
        df["symbol"] = symbol
        df_list.append(df)
    break

master_df = pd.concat(df_list)

Processing company:Apple Inc. symbol: AAPL
Pulling data from 20220301 to 20220315
https://www.alphavantage.co/query?function=NEWS_SENTIMENT&tickers=AAPL&topics=earnings&time_from=20220301T0000&time_to=20220315T2359&sort=RELEVANCE&limit=200&apikey=8VUOFOJS4BENDZG9
Result has 156 rows from 2022-03-03 08:00:00 to 2022-03-15 14:30:11
Processing company:Apple Inc. symbol: AAPL
Pulling data from 20220316 to 20220331
https://www.alphavantage.co/query?function=NEWS_SENTIMENT&tickers=AAPL&topics=earnings&time_from=20220316T0000&time_to=20220331T2359&sort=RELEVANCE&limit=200&apikey=8VUOFOJS4BENDZG9
Result has 200 rows from 2022-03-18 14:24:42 to 2022-03-31 20:56:00
Processing company:Apple Inc. symbol: AAPL
Pulling data from 20220401 to 20220415
https://www.alphavantage.co/query?function=NEWS_SENTIMENT&tickers=AAPL&topics=earnings&time_from=20220401T0000&time_to=20220415T2359&sort=RELEVANCE&limit=200&apikey=8VUOFOJS4BENDZG9
Result has 200 rows from 2022-04-01 14:45:00 to 2022-04-15 16:31:44
Proc

Result has 161 rows from 2023-03-01 00:54:43 to 2023-03-15 15:00:00
Processing company:Apple Inc. symbol: AAPL
Pulling data from 20230316 to 20230331
https://www.alphavantage.co/query?function=NEWS_SENTIMENT&tickers=AAPL&topics=earnings&time_from=20230316T0000&time_to=20230331T2359&sort=RELEVANCE&limit=200&apikey=8VUOFOJS4BENDZG9
Result has 96 rows from 2023-03-16 00:51:48 to 2023-03-25 09:50:00


In [204]:
master_df["source"].value_counts()

Motley Fool                 1142
Benzinga                     791
Investors Business Daily     683
Zacks Commentary             656
MarketWatch                  286
The Street                   261
StockMarket.com              126
CNN                          107
The Financial Express         91
CNBC                          78
Reuters                       73
Forbes                        65
Business Standard             49
Barrons                       33
Kiplinger                     32
Stocknews.com                 30
Money Control                 26
PR Newswire                   15
Al Jareeza                    14
Associated Press              11
Business Insider              11
South China Morning Post      11
Fox Business News             10
PennyStocks.com               10
Wall Street Journal            9
Money Morning                  5
Investing News Network         5
Cointelegraph                  2
Fast Company                   1
Investor Ideas                 1
Name: sour

In [206]:
master_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4634 entries, 0 to 95
Data columns (total 15 columns):
 #   Column                   Non-Null Count  Dtype         
---  ------                   --------------  -----         
 0   title                    4634 non-null   object        
 1   url                      4634 non-null   object        
 2   time_published           4634 non-null   datetime64[ns]
 3   authors                  4634 non-null   object        
 4   summary                  4634 non-null   object        
 5   banner_image             4628 non-null   object        
 6   source                   4634 non-null   object        
 7   category_within_source   4634 non-null   object        
 8   source_domain            4634 non-null   object        
 9   topics                   4634 non-null   object        
 10  overall_sentiment_score  4634 non-null   float64       
 11  overall_sentiment_label  4634 non-null   object        
 12  ticker_sentiment         4634 non-nu