### All datasets below are downloaded in order to serve the purpose of our project:

By using the Yahoo Finance API (Python module *yfinance*) in order to extract stock historical data from **yahoo finance**.

Taking the lists of tickers provided by student A and generate them in function *yf.download()* to pull the data, the original downloaded files are:

- Dataset 1: Stock data of S&P 500 members

- Dataset 2: Stock data of non-members of S&P 500

- Dataset 3: Stock data of "SPY"  (SPDR S&P 500 ETF) that is used as benchmark for comparing the performance among stocks.

##### Download the Dataset 1:

In [1]:
# Importing the necessary packages
import pandas as pd
import yfinance as yf

In [2]:
# Read the dataset (provided by student A) that contains ticker names of S&P 500 members:
sp500 = pd.read_csv("../Tran_Dao_Data/Biber_Martin_StudentA_tickers_sp500_stage.csv")
sp500

Unnamed: 0,Symbol,Security,Sector,Sub-Industry
0,MMM,3M,Industrials,Industrial Conglomerates
1,ABT,Abbott Laboratories,Health Care,Health Care Equipment
2,ABBV,AbbVie,Health Care,Pharmaceuticals
3,ABMD,Abiomed,Health Care,Health Care Equipment
4,ACN,Accenture,Information Technology,IT Consulting & Other Services
...,...,...,...,...
500,YUM,Yum! Brands,Consumer Discretionary,Restaurants
501,ZBRA,Zebra Technologies,Information Technology,Electronic Equipment & Instruments
502,ZBH,Zimmer Biomet,Health Care,Health Care Equipment
503,ZION,Zions Bancorp,Financials,Regional Banks


In [3]:
# Select the column containing tickers as series and then convert it into a list:
tickers_sp500 = sp500['Symbol'].to_list()
#print('List of Tickers: ', tickers_sp500)
len(tickers_sp500)

505

In [4]:
# Define the function to get the stock data for multiple tickers at once
# then store them into the same data frame:
def download_data(tickers_list):
    """
    Return a dictionary where the keys is the tickers  
    and the values are the data downloaded for each ticker.
    """
    tickers_data= {} # empty dictionary
    
    for ticker in tickers_list:    
        
    #download data per each ticker      
        df = yf.download(ticker, start='2020-12-31',end='2021-11-01')
    
    #convert to dataframe
        df = pd.DataFrame(df)
        df.reset_index(inplace=True)
    
    #add keys, values (ticker, dataframe) to main dictionary
        tickers_data[ticker] = df

    return tickers_data


In [5]:
# Function calls:

data_sp500=download_data(tickers_sp500)

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

In [6]:
# Combine the dictionary of dataframes into a single dataframe:

combined_data_sp500 = pd.concat(data_sp500)

combined_data_sp500

Unnamed: 0,Unnamed: 1,Date,Open,High,Low,Close,Adj Close,Volume
MMM,0,2020-12-30,173.880005,174.919998,173.380005,174.110001,168.753479,1419100
MMM,1,2020-12-31,174.119995,174.869995,173.179993,174.789993,169.412537,1841300
MMM,2,2021-01-04,175.000000,176.199997,170.550003,171.869995,166.582382,2996200
MMM,3,2021-01-05,172.009995,173.250000,170.649994,171.580002,166.301315,2295300
MMM,4,2021-01-06,172.720001,175.570007,172.039993,174.190002,168.831024,3346400
...,...,...,...,...,...,...,...,...
ZTS,206,2021-10-25,208.809998,211.770004,207.100006,211.520004,211.267685,1224700
ZTS,207,2021-10-26,211.000000,211.740005,208.000000,210.520004,210.268875,1219300
ZTS,208,2021-10-27,210.259995,211.070007,209.000000,209.580002,209.330002,1625200
ZTS,209,2021-10-28,209.839996,213.419998,209.580002,212.669998,212.669998,1073700


In [7]:
# Write the original dataframe into a csv file:

combined_data_sp500.to_csv("../Tran_Dao_Data/Tran_Dao_StudC_data_sp500_src.csv")

##### Download the Dataset 2: 

In [8]:
# Read the data contains ticker names of S&P 500 non-members (provided by student A): 
non_sp500 = pd.read_csv("../Tran_Dao_Data/Biber_Martin_StudentA_non_members_stage.csv")
non_sp500

Unnamed: 0,Ticker,High,Low,Close,Volume,Trading_Volume
0,LQD,132.80,131.90,131.90,13745400,1.813018e+09
1,VER,51.64,50.20,50.30,29651600,1.491475e+09
2,XLRN,179.90,177.10,178.80,7884200,1.409695e+09
3,BABA,136.90,133.60,133.70,9426940,1.260382e+09
4,ZM,218.60,203.40,206.00,5679784,1.170036e+09
...,...,...,...,...,...,...
495,IMMR,6.39,5.79,5.80,3231300,1.874154e+07
496,FORM,43.88,42.40,42.60,439200,1.870992e+07
497,ISBC,16.20,15.79,15.98,1170300,1.870139e+07
498,IDEX,1.80,1.68,1.70,10976200,1.865954e+07


In [9]:
# Select the column containing tickers as series and then convert it into a list:
tickers_non_sp500 = non_sp500['Ticker'].to_list()
len(tickers_non_sp500)

500

In [10]:
# Download the stock data of non members by using the funtion "download_data" written above

# Function calls:

data_non_sp500=download_data(tickers_non_sp500)

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

In [11]:
# Combine the dictionary of dataframes into a single dataframe:

combined_data_non_sp500= pd.concat(data_non_sp500)

combined_data_non_sp500

Unnamed: 0,Unnamed: 1,Date,Open,High,Low,Close,Adj Close,Volume
LQD,0,2020-12-30,137.889999,138.110001,137.789993,138.100006,136.487366,5343100.0
LQD,1,2020-12-31,138.039993,138.220001,137.960007,138.130005,136.517029,6932300.0
LQD,2,2021-01-04,137.889999,137.889999,137.380005,137.429993,135.825195,14891400.0
LQD,3,2021-01-05,137.059998,137.059998,136.550003,136.990005,135.390350,16294400.0
LQD,4,2021-01-06,135.779999,135.979996,135.399994,135.880005,134.293289,19136500.0
...,...,...,...,...,...,...,...,...
GAN,206,2021-10-25,15.280000,15.366000,14.820000,15.190000,15.190000,435500.0
GAN,207,2021-10-26,15.310000,15.610000,14.950000,14.980000,14.980000,369800.0
GAN,208,2021-10-27,14.840000,15.150000,14.530000,14.560000,14.560000,343200.0
GAN,209,2021-10-28,14.610000,14.740000,14.280000,14.600000,14.600000,404800.0


In [12]:
# Write the original dataframe into a csv file:

combined_data_non_sp500.to_csv("../Tran_Dao_Data/Tran_Dao_StudC_data_non_sp500_src.csv")

##### Download the Dataset 3:

In [13]:
# Download the data of ticker "SPY"
data_spy = yf.download('SPY', start='2020-12-31',end='2021-11-01')
data_spy

[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-12-30,372.339996,373.100006,371.570007,371.989990,369.566559,49455300
2020-12-31,371.779999,374.660004,371.230011,373.880005,371.444244,78520700
2021-01-04,375.309998,375.450012,364.820007,368.790009,366.387390,110210800
2021-01-05,368.100006,372.500000,368.049988,371.329987,368.910828,66426200
2021-01-06,369.709991,376.980011,369.119995,373.549988,371.116394,107997700
...,...,...,...,...,...,...
2021-10-25,454.279999,455.899994,452.390015,455.549988,455.549988,45214500
2021-10-26,457.200012,458.489990,455.559998,455.959991,455.959991,56075100
2021-10-27,456.450012,457.160004,453.859985,453.940002,453.940002,72438000
2021-10-28,455.459991,458.399994,455.450012,458.320007,458.320007,51437900


In [14]:
# Write the original dataframe into a csv file:

data_spy.to_csv("../Tran_Dao_Data/Tran_Dao_StudC_data_spy_src.csv")