# Herron - Downloading Data

This notebook shows how to use the yfinance, pandas-datareader, and requests-cache packages to download data from 
    [Yahoo! Finance](https://finance.yahoo.com/), 
    [the Kenneth French Data Library](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html), 
    [the Federal Reserve Economic Database (FRED)](https://fred.stlouisfed.org/), 
    and others.
For completeness, this notebooks also covers saving to and reading from .csv and .pkl files.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

In [2]:
plt.rcParams['figure.dpi'] = 300
%precision 4
pd.options.display.float_format = '{:.4f}'.format

## The yfinance Package

The [yfinance package](https://github.com/ranaroussi/yfinance) provides "a reliable, threaded, and Pythonic way to download historical market data from Yahoo! finance."
Other packages that provide similar functionality, but I think yfinance is simplest to use.
To avoid repeated calls to Yahoo! Finance's advanced programming interface (API), we will use the [requests-cache package](https://github.com/requests-cache/requests-cache).
These packages should already be installed in your DataCamp Workspace environment.
If not, we can install these packages with the `%pip` magic in the following cell, which we only need to run once.
If you use a local installation of the Anaconda distribution, you can instead run `! conda install -c conda-forge yfinance requests-cache`.

In [3]:
# %pip install yfinance requests-cache

In [4]:
# ! conda install -c conda-forge yfinance requests-cache

In [5]:
import yfinance as yf
import requests_cache
session = requests_cache.CachedSession(expire_after='1D')

We can download data for the FAANG stocks (Facebook, Amazon, Apple, Netflix, and Google).
We can pass tickers as either a space-delimited string or a list of strings.

In [6]:
faang = yf.download(tickers='META AMZN AAPL NFLX GOOG', session=session)

[*********************100%***********************]  5 of 5 completed


## The pandas-datareader package

The [pandas-datareader](https://pandas-datareader.readthedocs.io/en/latest/index.html) package provides easy access to a variety of data sources, such as 
    [the Kenneth French Data Library](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html) 
    and 
    [the Federal Reserve Economic Data (FRED)](https://fred.stlouisfed.org/).
The pandas-datareader package also provides access to Yahoo! Finance data, but the yfinance package has better documentation.
The pandas-datareader packages should already be installed in your DataCamp Workspace environment.
If not, we can install these packages with the `%pip` magic in the following cell, which we only need to run once.
If you use a local installation of the Anaconda distribution, you can instead run `! conda install -c conda-forge pandas-datareader`.

In [7]:
# %pip install pandas-datareader

In [8]:
# ! conda install -c conda-forge pandas-datareader

We will use `pdr` as the abbreviated prefix for pandas-datareader.

In [9]:
import pandas_datareader as pdr

He is an example with the daily benchmark factor from Ken French's Data Library.
The `get_available_datasets()` function provides the exact names for all of Ken French's data sets.

In [10]:
pdr.famafrench.get_available_datasets(session=session)[:5]

['F-F_Research_Data_Factors',
 'F-F_Research_Data_Factors_weekly',
 'F-F_Research_Data_Factors_daily',
 'F-F_Research_Data_5_Factors_2x3',
 'F-F_Research_Data_5_Factors_2x3_daily']

Note that pandas-datareader returns a dictionary of data frames and returns the most recent five years of data unless we specify a `start` date.
Most of French's data are available back through the second half od 1926.

In [11]:
ff = pdr.get_data_famafrench('F-F_Research_Data_Factors_daily', start='1900', session=session)

## Saving and Reading Data with .csv and .pkl Files

The universal way to save data is to a .csv file (i.e., a file with comma-separated values) with the `.to_csv()` method.
You may need to add a "Data" folder at the same hieracrchy as your "Notebooks" folder using the "File Browser" in JupyterLab's left sidebar.

In [12]:
faang.to_csv('../../Data/FAANG.csv')

We have to pass several arguments to `pd.read_csv()` since the `faang` data frame has a column multiindex (i.e., one level of variables and another for tickers).

In [13]:
pd.read_csv('../../Data/FAANG.csv', header=[0,1], index_col=[0], parse_dates=True)

Unnamed: 0_level_0,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Close,Close,Close,Close,Close,...,Open,Open,Open,Open,Open,Volume,Volume,Volume,Volume,Volume
Unnamed: 0_level_1,AAPL,AMZN,GOOG,META,NFLX,AAPL,AMZN,GOOG,META,NFLX,...,AAPL,AMZN,GOOG,META,NFLX,AAPL,AMZN,GOOG,META,NFLX
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
1980-12-12,0.1000,,,,,0.1283,,,,,...,0.1283,,,,,469033600,,,,
1980-12-15,0.0948,,,,,0.1217,,,,,...,0.1222,,,,,175884800,,,,
1980-12-16,0.0879,,,,,0.1127,,,,,...,0.1133,,,,,105728000,,,,
1980-12-17,0.0900,,,,,0.1155,,,,,...,0.1155,,,,,86441600,,,,
1980-12-18,0.0926,,,,,0.1189,,,,,...,0.1189,,,,,73449600,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-09-12,163.4300,136.4500,111.8700,168.9600,236.5300,163.4300,136.4500,111.8700,168.9600,236.5300,...,159.5900,134.1000,111.9900,167.3900,233.6100,104956000,53826900.0000,19732900.0000,23220400.0000,6047400.0000
2022-09-13,153.8400,126.8200,105.3100,153.1300,218.1300,153.8400,126.8200,105.3100,153.1300,218.1300,...,159.9000,131.0100,108.8900,161.5400,226.5000,122656600,72694000.0000,33015000.0000,44444100.0000,8000100.0000
2022-09-14,155.3100,128.5500,105.8700,151.4700,224.1200,155.3100,128.5500,105.8700,151.4700,224.1200,...,154.7900,127.3600,105.4400,153.3300,219.8200,87965400,45316800.0000,22115800.0000,43064200.0000,8230300.0000
2022-09-15,152.3700,126.2800,103.9000,149.5500,235.3800,152.3700,126.2800,103.9000,149.5500,235.3800,...,154.6500,127.3800,105.0100,149.8000,230.4700,90481100,52887200.0000,26494900.0000,34606300.0000,19454100.0000


We can use a .pkl file to save and read a pandas object as-is.
These .pkl files are easier to use than .csv files but less universal.

In [14]:
faang.to_pickle('../../Data/FAANG.pkl')

In [15]:
pd.read_pickle('../../Data/FAANG.pkl')

Unnamed: 0_level_0,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Close,Close,Close,Close,Close,...,Open,Open,Open,Open,Open,Volume,Volume,Volume,Volume,Volume
Unnamed: 0_level_1,AAPL,AMZN,GOOG,META,NFLX,AAPL,AMZN,GOOG,META,NFLX,...,AAPL,AMZN,GOOG,META,NFLX,AAPL,AMZN,GOOG,META,NFLX
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
1980-12-12,0.1000,,,,,0.1283,,,,,...,0.1283,,,,,469033600,,,,
1980-12-15,0.0948,,,,,0.1217,,,,,...,0.1222,,,,,175884800,,,,
1980-12-16,0.0879,,,,,0.1127,,,,,...,0.1133,,,,,105728000,,,,
1980-12-17,0.0900,,,,,0.1155,,,,,...,0.1155,,,,,86441600,,,,
1980-12-18,0.0926,,,,,0.1189,,,,,...,0.1189,,,,,73449600,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-09-12,163.4300,136.4500,111.8700,168.9600,236.5300,163.4300,136.4500,111.8700,168.9600,236.5300,...,159.5900,134.1000,111.9900,167.3900,233.6100,104956000,53826900.0000,19732900.0000,23220400.0000,6047400.0000
2022-09-13,153.8400,126.8200,105.3100,153.1300,218.1300,153.8400,126.8200,105.3100,153.1300,218.1300,...,159.9000,131.0100,108.8900,161.5400,226.5000,122656600,72694000.0000,33015000.0000,44444100.0000,8000100.0000
2022-09-14,155.3100,128.5500,105.8700,151.4700,224.1200,155.3100,128.5500,105.8700,151.4700,224.1200,...,154.7900,127.3600,105.4400,153.3300,219.8200,87965400,45316800.0000,22115800.0000,43064200.0000,8230300.0000
2022-09-15,152.3700,126.2800,103.9000,149.5500,235.3800,152.3700,126.2800,103.9000,149.5500,235.3800,...,154.6500,127.3800,105.0100,149.8000,230.4700,90481100,52887200.0000,26494900.0000,34606300.0000,19454100.0000
