## World Bank Module
**Table of Contents**
1. Set up data connection with world bank climate api using _wbpy_ library. Note: country_codes is list of ISO codes used by the world bank and are 'keys' to its database
2. How is data being stored? Preliminary processing. Naive implementation of grab_worldbank()
3. Final version of grab_worldbank()
4. Example uses

**Note: NEED to install wbpy <------- **
- wbpy: https://github.com/mattduck/wbpy. 
    * _not an official Python library for the world bank, but is well documented/used, and handles all the intricacies of interfacing with the World Bank RESTFUL API. 
    * Our project will appropriately unittest any and all functions incorporating the wbpy library. 
    
**Functions grab_worldbank() and test_output_grab_worldbank() also stored in grab_worldbank.py**

## 1 & 2

In [10]:
import wbpy
import pandas as pd
import sys

country_codes = pd.read_csv('country_iso_codes.csv')['CODE'].dropna()
climate_api = wbpy.ClimateAPI()



In [9]:
def grab_worldbank():
    """
    Function handle to return pointer to worldbank dataset for all countries
    
    Arguments:
    -Currently None
    
    Returns:
    - dict:
    
    """
    climate_api = wbpy.ClimateAPI()
    country_codes = pd.read_csv('country_iso_codes.csv')['CODE'].dropna()
    dataset = {}
    for code in country_codes: 
        try:       
            dataset[code]=climate_api.get_instrumental(data_type='tas',interval='year',locations=[code])
        except Exception as e: 
            print("Encountered Error Getting Data for Code: {}".format(code))
            continue
    return dataset



In [5]:
#Run it 
wb_data = grab_worldbank()
print(type(wb_data))

<class 'dict'>


Let's see how we can use the wb_data dict variable...

In [7]:

for key in wb_data.keys():
    country_data = wb_data[key].as_dict()[key]
    print("Temperature Data for {}".format(key))
    for year_key in country_data.keys():
        print("...{} has temperature value of {} C".format(year_key, country_data[year_key]))
    break #so we don't output the whole dataset...

Temperature Data for AFG
...1901 has temperature value of 13.096678 C
...1902 has temperature value of 13.309858 C
...1903 has temperature value of 12.064352 C
...1904 has temperature value of 12.619226 C
...1905 has temperature value of 12.437119 C
...1906 has temperature value of 12.978949 C
...1907 has temperature value of 12.276689 C
...1908 has temperature value of 12.527205 C
...1909 has temperature value of 12.797849 C
...1910 has temperature value of 12.057898 C
...1911 has temperature value of 12.295179 C
...1912 has temperature value of 12.9023695 C
...1913 has temperature value of 12.642592 C
...1914 has temperature value of 13.308415 C
...1915 has temperature value of 13.922549 C
...1916 has temperature value of 12.5204525 C
...1917 has temperature value of 13.175219 C
...1918 has temperature value of 11.86811 C
...1919 has temperature value of 12.038753 C
...1920 has temperature value of 11.508769 C
...1921 has temperature value of 12.502941 C
...1922 has temperature value

## 3. Convert above data retrieval mechanism to Python function form, expanding on former function grab_worldbank()

In [114]:
def grab_worldbank(select_countries=None, start_date = None, end_date = None):
    import wbpy
    import pandas as pd
    import sys
    import numpy as np


    """
    ################ FIX THIS DOCSTRING #####################
    
    Arguments:
        select_countries (list): List of countries from which to obtain temperature data. 
                                Default 'None' means obtain from all countries
        start_date (int): Starting date in yyyy integer format for the historical data. 
                          Default 'None' means 1901, the earliest available year on the World Bank dataset
        end_date (int): Ending date in yyyy integer format. Default 'None' means 2015. 
    
    Returns:
    - Pandas DataFrame:
    
    """
    
    """Argument/type checking"""
    if (  (select_countries is not None and not isinstance(select_countries,list)) or 
          (start_date is not None and not isinstance(start_date,int)) or 
           (end_date is not None and not isinstance(end_date, int))
        ) :
        raise ValueError("Error: Arguments are of incorrect datatype!")
    
    
    if (end_date is not None and end_date > 2015): 
        raise ValueError("Error: Starting date cannot exceed 2015")
        
    if (start_date is None):
        start_date = 1901
    if (end_date is None):
        end_date = 2015
    
    """Dictionary to store countries who do NOT HAVE DATA"""
    err_dict = {}
    
    """Instantiate API Interface using wbpy package"""
    climate_api = wbpy.ClimateAPI()
    
    """Read in country codes from World Bank website"""
    iso_codes_df = pd.read_csv('country_iso_codes.csv').dropna()
    codes_list = np.array(iso_codes_df['CODE'])
    country_list = np.array(iso_codes_df['COUNTRY'])

    
    """Run data collections"""
    if (select_countries is not None):
        try: 
            code_country_pairs = [(country_list[k],codes_list[k]) for k in range(0,len(codes_list)) if country_list[k] in select_countries]
            #country_codes = iso_codes_df['CODE']
        except Exception as e:
            print(e)
            sys.exit(0)
    else: 
        code_country_pairs = [(country_list[k],codes_list[k]) for k in range(0,len(codes_list))]
    data_len = len(code_country_pairs)
    dataset = {}
    for k in range(0,data_len):
        code = code_country_pairs[k][1]
        try:       
            dataset[code]=climate_api.get_instrumental(data_type='tas',interval='year',locations=[code])
        except Exception as e: 
            print("Warning: Data Does Not Exist for Country Code: {}".format(code))
            # Add erroneous country to error dictionary
            err_dict[code] = 1
            continue
    
    """Create pandas dataframe of the data"""
    
    df = pd.DataFrame(columns=['Country','Code','Year','Temperature'])
    
    count = 0
    for k in range(0,data_len):
        c_name,c_code = code_country_pairs[k][0], code_country_pairs[k][1]
        if (c_code not in err_dict): 
            country_data = dataset[c_code].as_dict()[c_code]
            for year_key in country_data.keys():
                if (int(year_key) >= start_date and int(year_key) <= end_date):
                    df.loc[count] = [c_name, c_code,year_key, country_data[year_key]]
                    count += 1
    return df
   

## 4. Example Usage

In [115]:
def test_output_grab_worldbank():
    """Get all data for only United States"""
    print("Get all data for only United States")
    select = ['United States']
    wb_data_df = grab_worldbank(select_countries=select)
    print(wb_data_df.head(30))
    print("------------------------------------------------------------")

    """Get all data for only United States, Burkina Faso, and Georgia"""
    print("Get all data for only United States, Burkina Faso, and Georgia")
    select = ['United States','Burkina Faso', 'Georgia']
    wb_data_df = grab_worldbank(select_countries=select)
    print(wb_data_df.head(30))
    print("------------------------------------------------------------")

    """Get only year 2000-ending date data for only United States, Burkina Faso, and Georgia"""
    print("Get only year 2000-ending date data for only United States, Burkina Faso, and Georgia")
    select = ['United States','Burkina Faso', 'Georgia']
    wb_data_df = grab_worldbank(select_countries=select, start_date = 2000)
    print(wb_data_df.head(30))
    print("------------------------------------------------------------")

    """Get only 1910-1915 date data for only United States, Burkina Faso, and Georgia"""
    print("Get only 1910-1915 date data for only United States, Burkina Faso, and Georgia")
    select = ['United States','Burkina Faso', 'Georgia']
    wb_data_df = grab_worldbank(select_countries=select, start_date = 1910, end_date=1915)
    print(wb_data_df.head(30))
    print("------------------------------------------------------------")

    """Get all data for all countries"""
    print("Get all data for all countries")
    wb_data_df = grab_worldbank()
    print(wb_data_df.tail(20))
    print("------------------------------------------------------------")

In [None]:
test_output_grab_worldbank()

Get all data for only United States
          Country Code  Year  Temperature
0   United States  USA  1901     6.618749
1   United States  USA  1902     6.464327
2   United States  USA  1903     6.073844
3   United States  USA  1904     6.149883
4   United States  USA  1905     6.599617
5   United States  USA  1906     6.522858
6   United States  USA  1907     6.326528
7   United States  USA  1908     6.698544
8   United States  USA  1909     6.037902
9   United States  USA  1910     6.572676
10  United States  USA  1911     6.600330
11  United States  USA  1912     6.377984
12  United States  USA  1913     6.598086
13  United States  USA  1914     6.960702
14  United States  USA  1915     6.805295
15  United States  USA  1916     6.159147
16  United States  USA  1917     5.670571
17  United States  USA  1918     6.365317
18  United States  USA  1919     6.488107
19  United States  USA  1920     6.091616
20  United States  USA  1921     7.385264
21  United States  USA  1922     6.65653