### Importing Finance Data


#### Pandas-Datareader Module

https://pandas-datareader.readthedocs.io/en/latest/index.html

In [1]:
from pandas_datareader import wb
from datetime import datetime

In [2]:
# get gdp data from the World Bank API. 

start = datetime(2005, 1, 1)
end = datetime(2008, 1, 1)
indicator_id = 'NY.GDP.PCAP.KD'

gdp_per_capita = wb.download(indicator=indicator_id, start=start, end=end, country=['US', 'CA', 'TUR'])

print(gdp_per_capita)

                    NY.GDP.PCAP.KD
country       year                
Canada        2008    48497.560167
              2007    48536.539413
              2006    48014.931197
              2005    47257.472197
Turkey        2008    10602.856580
              2007    10640.422825
              2006    10252.360487
              2005     9692.120586
United States 2008    49319.478865
              2007    49856.281491
              2006    49405.767296
              2005    48499.812376


In [3]:
from pandas_datareader.nasdaq_trader import get_nasdaq_symbols

symbols = get_nasdaq_symbols()
print(symbols.head())

        Nasdaq Traded                                      Security Name  \
Symbol                                                                     
A                True            Agilent Technologies, Inc. Common Stock   
AA               True                    Alcoa Corporation Common Stock    
AAAU             True                       Perth Mint Physical Gold ETF   
AACG             True  ATA Creativity Global - American Depositary Sh...   
AADR             True                AdvisorShares Dorsey Wright ADR ETF   

       Listing Exchange Market Category    ETF  Round Lot Size  Test Issue  \
Symbol                                                                       
A                     N                  False           100.0       False   
AA                    N                  False           100.0       False   
AAAU                  P                   True           100.0       False   
AACG                  Q               G  False           100.0       False   

One API that does just that is the Federal Reserve Bank of St. Louis (FRED), which we can access by first importing the pandas_datareader.data sub-module and then calling it’s DataReader function:

The DataReader function takes 4 arguments:

    'MORTGAGE30US' - An identifier provided by the API specifying the data we want back, in this case 30 year mortgage data in the US
    'fred' - The name of the API we want to access
    start_date, end_date - The date range we want the data to be from


In [4]:
import pandas_datareader.data as web

start = datetime(2019,12,1) # year, month, day
end = datetime(2019,12,8)
web.DataReader('SP500', 'fred', start, end)

Unnamed: 0_level_0,SP500
DATE,Unnamed: 1_level_1
2019-12-02,3113.87
2019-12-03,3093.2
2019-12-04,3112.76
2019-12-05,3117.43
2019-12-06,3145.91


In [5]:
#Tiingo

import pandas_datareader as dr 

goog = dr.get_data_tiingo('GOOG', api_key='e207521e103ad24d2d03e9ea0bb934f1acf7fadc')
goog.tail()


Unnamed: 0_level_0,Unnamed: 1_level_0,adjClose,adjHigh,adjLow,adjOpen,adjVolume,close,divCash,high,low,open,splitFactor,volume
symbol,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
GOOG,2019-12-02 00:00:00+00:00,1289.92,1305.83,1281.0,1301.0,1511851,1289.92,0.0,1305.83,1281.0,1301.0,1.0,1511851
GOOG,2019-12-03 00:00:00+00:00,1295.28,1298.46,1279.0,1279.57,1268647,1295.28,0.0,1298.46,1279.0,1279.57,1.0,1268647
GOOG,2019-12-04 00:00:00+00:00,1320.54,1325.8,1304.87,1307.01,1538110,1320.54,0.0,1325.8,1304.87,1307.01,1.0,1538110
GOOG,2019-12-05 00:00:00+00:00,1328.13,1329.36,1316.44,1328.0,1212818,1328.13,0.0,1329.36,1316.44,1328.0,1.0,1212818
GOOG,2019-12-06 00:00:00+00:00,1340.62,1344.0,1333.44,1333.44,1315510,1340.62,0.0,1344.0,1333.44,1333.44,1.0,1315510


In [6]:
#shift
start = datetime(2008,1,1)
end = datetime(2018,1,1)

gdp = web.DataReader('GDP', 'fred', start, end)
gdp['growth'] = gdp['GDP'] - gdp['GDP'].shift(1)
print(gdp)

                  GDP   growth
DATE                          
2008-01-01  14651.039      NaN
2008-04-01  14805.611  154.572
2008-07-01  14835.187   29.576
2008-10-01  14559.543 -275.644
2009-01-01  14394.547 -164.996
2009-04-01  14352.850  -41.697
2009-07-01  14420.312   67.462
2009-10-01  14628.021  207.709
2010-01-01  14721.350   93.329
2010-04-01  14926.098  204.748
2010-07-01  15079.917  153.819
2010-10-01  15240.843  160.926
2011-01-01  15285.828   44.985
2011-04-01  15496.189  210.361
2011-07-01  15591.850   95.661
2011-10-01  15796.460  204.610
2012-01-01  16019.758  223.298
2012-04-01  16152.257  132.499
2012-07-01  16257.151  104.894
2012-10-01  16358.863  101.712
2013-01-01  16569.591  210.728
2013-04-01  16637.926   68.335
2013-07-01  16848.748  210.822
2013-10-01  17083.137  234.389
2014-01-01  17104.555   21.418
2014-04-01  17432.909  328.354
2014-07-01  17721.657  288.748
2014-10-01  17849.912  128.255
2015-01-01  17984.178  134.266
2015-04-01  18219.405  235.227
2015-07-

#### Variance

Variance measures how far a set of numbers are spread out from their average. In finance, this is used to determine the volatility of investments.

#### Covariance

Covariance, in a financial context, describes the relationship between the returns on two different investments over a period of time, and can be used to help balance a portfolio.

The larger the number the more investments tend to move up and down at the same time.

In [7]:
#Thrift Savings Plan (TSP) API
import pandas_datareader.tsp as tsp

start = datetime(2009, 1, 1)
end = datetime(2019, 1, 1)

tsp_data = tsp.TSPReader(start,end).read()

In [8]:
print(tsp_data.var())

L Income    0.094226
L 2020      0.254266
L 2030      0.951525
L 2040      1.483319
L 2050      0.606738
G Fund      0.009592
F Fund      0.313593
C Fund      4.651628
S Fund      3.785746
I Fund      1.166750
dtype: float64


In [9]:
print(tsp_data.cov())

          L Income    L 2020    L 2030    L 2040    L 2050    G Fund  \
L Income  0.094226  0.154570  0.295889  0.368095  0.234595  0.028736   
L 2020    0.154570  0.254266  0.488493  0.608224  0.387923  0.046528   
L 2030    0.295889  0.488493  0.951525  1.187745  0.759189  0.085957   
L 2040    0.368095  0.608224  1.187745  1.483319  0.948508  0.106166   
L 2050    0.234595  0.387923  0.759189  0.948508  0.606738  0.067241   
G Fund    0.028736  0.046528  0.085957  0.106166  0.067241  0.009592   
F Fund    0.152820  0.246218  0.443143  0.544630  0.343415  0.053429   
C Fund    0.655163  1.080896  2.100297  2.620580  1.674410  0.191530   
S Fund    0.511008  0.860671  1.741164  2.190495  1.409628  0.129621   
I Fund    0.305641  0.507356  1.014727  1.272304  0.816339  0.082888   

            F Fund    C Fund    S Fund    I Fund  
L Income  0.152820  0.655163  0.511008  0.305641  
L 2020    0.246218  1.080896  0.860671  0.507356  
L 2030    0.443143  2.100297  1.741164  1.014727  
L 2

#### Review

+ Python is able to import financial data from csv files as well as public financial APIs.
+ The pandas read_csv function can be used to import data from a csv file into a pandas dataframe.
+ Pandas-datareader makes it easy to import data from public financial APIs.
+ Python’s datetime function can be used to create datetime objects which are often used to specify time ranges for financial data.
+ API keys are unique identifiers required for some APIs in order to access data.
+ Sometimes APIs can be flaky. To mitigate the damage this might cause it’s best to test your code often and keep up to date with the pandas-datareader documentation and GitHub page.
+ The shift function can be used on the rows in a DataFrame column to shift them up or down.
+ Pandas provides common statistical functions like var and cov to make it easy to calculate variance and covariance on a dataset.
