## Vignette for WHO_GHO_API_client package 

### by Toby Law

In [1]:
# ==============================================================+
# Author    : Toby Law                                          |
# Class     : MODERN DATA STRUCTURES (G5072)                    |
# Assignment: Final Project                                     |
# Date      : 12/10/2021 - 18/01/2022                           |
# ==============================================================+

## API client for WHO GHO OData API
*****

## Importing packages and dependencies

In [563]:
import os
import requests
import json
import pandas as pd
import re
import numpy as np

## Introduction

#### Data description
The World Health Organization Global Health Observatory hosts a database containing global health indicator measurements and statistics content, and the GHO OData API provides a query interface.

API documentation: [https://www.who.int/data/gho/info/gho-odata-api](https://www.who.int/data/gho/info/gho-odata-api)

Base URL: [https://ghoapi.azureedge.net/api/](https://ghoapi.azureedge.net/api/)

## Overview of the WHO_GHO_API_client package functions:

The database contains measurement for indicators of health (eg. immunization coverage, air pollution attributable deaths), as well as a plethora of dimensions in which these indicators are measured (eg. age, sex, country/region, income). 
To make it easier to obtain data without directly interacting with the API, this package aims to provide a API wrapper/client that allows the user to view and search indicators and dimensions available in the database, select the dimensions and indicators they are interested in, and easily download the data they need in form of a filtered pandas dataframe, which can be exported as csv files if needed.

The package will be broken down into the following 5 main and accessory functions, and executed accordingly:

* `get_indicators()`: returns all possible indicators of global health in the database, with the option of implementing a filtered search on the indicators names retrieved.

* `get_records()`: retrieves indicator records based on selected dimension filters, which depends on
     - `query_parser()`: Allows user to specify dimensions and indicators they would like to retrieve from the database in the arguments, and parse them into the format that could be submitted via a get request.

* `get_dimensions()`: returns all possible dimensions of measurement in the database, with the option of implementing a regex search on the dimensions retrieved by Function 3. 

* `get_dimension_values()`: implement a regex search on the values of a specific dimension. 


## Package functions

#### `get_indicators()`: Retrieve dataframe of indicators codes and description, and search through indicator names.

In [453]:
def get_indicators(IndicatorName = 'all'):
    """
    Retrieves by default a dataframe of indicators codes and description available on the GHO database, 
    and provides a simple functionality to filter for indicator names of interest.
    The search function is case-insensitive.
    Returns a dash symbol when no records are found. 
    
    Parameters
    ----------
    IndicatorName : str
        A search word which we want to retrieve relevant indicators for. 

    Returns
    -------
    pandas.core.frame.DataFrame
        A dataframe containing the requested indicators recorded in the GHO database.
        Consists of indicator codes and descriptions.

    Examples
    --------
    >>> get_indicators(IndicatorName = 'female')
    returns a dataframe of all indicators containing "female" in the description.
    """
    if IndicatorName == 'all':
        url = "https://ghoapi.azureedge.net/api/Indicator"
    elif IndicatorName != 'all': 
        url = "https://ghoapi.azureedge.net/api/Indicator?$filter=contains(IndicatorName," + "'{}'".format(IndicatorName) + ")" 
    r = requests.get(url)
    indicators = r.json()
    return pd.DataFrame(indicators['value'])

### Demonstration of `get_indicators()` function
Getting all indicators in the database.

In [565]:
get_indicators()

Unnamed: 0,IndicatorCode,IndicatorName,Language
0,AIR_10,Ambient air pollution attributable DALYs per ...,EN
1,AIR_11,Household air pollution attributable deaths,EN
2,AIR_12,Household air pollution attributable deaths in...,EN
3,AIR_13,Household air pollution attributable deaths pe...,EN
4,AIR_14,Household air pollution attributable deaths p...,EN
...,...,...,...
2237,Adult_nonsmoked_svy_yr,Year of latest adult prevalence survey (nonsmo...,EN
2238,Adult_daily_e-cig,Prevalence of daily e-cigarette use among adul...,EN
2239,P_compl_all_sfe,Overall compliance with regulations on smoke-f...,EN
2240,GASPTICFM,Number of isolates tested for cefixime,EN


Search for indicators containing "FEMALE" in its description.

In [543]:
get_indicators(IndicatorName = 'female')

Unnamed: 0,IndicatorCode,IndicatorName,Language
0,DEVICES23,Total density per million females aged from 50...,EN
1,NTD_LEPR11,Number of female leprosy new cases,EN
2,SH_STA_FGMS,Proportion of girls and women aged 15-49 years...,EN


If the search does not yield any valid indicators within the database, a dash will be returned. 

In [455]:
get_indicators(IndicatorName = 'jibberish')

#### `get_records()`: Return dataframe of records for indicators, with filters implemented and summary of recorded dimensions (with query parsing by Function 4):

* includes optional argument whether to output final dataframe as csv.
* generates a summary of geographical, temporal information plus dimensions recorded for this indicator.

In [580]:
def get_records(indicator_code = "AIR_11", spatial_dimension = '', country = '', temporal_dimension = '', year = '', 
                filter_1 = '', filter_1_value = '', 
                filter_2 = '', filter_2_value = '', 
                filter_3 = '', filter_3_value = '', 
                summary = True, to_csv = True, 
                csv_name = "output.csv"):
    """
    Returns a dataframe of records for an indicator of choice, optionally with geographical, temporal and demographical 
    filters implemented.
    Allows for the option to generate a summary of recorded dimensions for the indicator.
    The pulled dataframe can also be saved to csv. 
    
    Parameters
    ----------
    indicator_code : str
        The GHO database unique identifier of the indicator we want to obtain records for. 
        Run get_indicators() function and refer to the IndicatorCode column.
        
    spatial_dimension : str
        (Optional) If the chosen indicator has records for more than one type of spatial dimension (eg. records for a single
        country and continent), this argument can be used to filter records for the desired one.
    
    country : str
        (Optional) If the chosen indicators has records on a country level, this argument can be used to filter records for a country
        of interest.
    
    temporal_dimension : str
        (Optional) If the chosen indicator has records for more than one type of temporal dimension (eg. month of the year vs. 
        entire year), this argument can be used to filter records for the desired one.
    
    year : str
        (Optional) If the chosen indicators has records on a yearly basis, this argument can be used to filter for a year
        of interest.
        
    filter_1/filter_2/filter_3 : str
        (Optional) Additional demographical dimensions to filter indicator records by. 
        Refer to full indicator dataframe or summary for filtering options.
    
    filter_1/filter_2/filter_3 value : str 
        (Optional) Dimension values to filter indicator records by. 
        Refer to full indicator dataframe or summary for filtering options.
        
    summary : bool
        If true, generates summary dictionary of the recorded dimensions for the indicator of choice.
    
    to_csv : bool
        If true, saves the resultant indicator entries dataframe to a local csv file.
    
    csv_name : str
        Name for output csv file.

    Returns
    -------
    pandas.core.frame.DataFrame
        Containing the requested indicator entries.
        
    dict
        Summarizing the values in each column of the output dataframe.
        Provides a overview of what dimensions are measured for a certain indicator, to use as guidance for further filtering.
    
    csv
        Output dataframe saved locally. 

    Examples
    --------
    >>> get_records(indicator_code = 'AIR_11', to_csv = True, 
                    csv_name = 'Household air pollution attributable deaths.csv', country = "USA")
            Returns a dataframe and csv file with entries for the AIR_11 indicator, filtered for records about the USA only.
    """
    # Calling the query_parser function to format a query url using submitted arguments
    try:
        url = query_parser(IndicatorCode = indicator_code, SpatialDimType = spatial_dimension, SpatialDim = country, 
                       TimeDimType = temporal_dimension, TimeDim = year, 
                       Dim1Type = filter_1, Dim1 = filter_1_value, 
                       Dim2Type = filter_2, Dim2 = filter_2_value,  
                       Dim3Type = filter_3, Dim3 = filter_3_value)
        records = requests.get(url) 
        r_df = pd.DataFrame(records.json()['value'])
    except ValueError:
        print("Invalid search criteria provided, please check arguments and try again.")
        return
    
    # generate a dictionary summarizing the recorded dimensions for this indicator for user consideration
    if summary == True and r_df.shape != (0,0):
        desired_columns = ['IndicatorCode', 'SpatialDimType', 'SpatialDim', 'TimeDimType',
                           'TimeDim', 'Dim1Type', 'Dim1', 'Dim2Type', 'Dim2', 'Dim3Type', 'Dim3']
        objects = r_df[desired_columns]
        summary_dict = {}
        for (colname, data) in objects.iteritems():
            summary_dict[colname] = data.unique()
    elif summary == True and r_df.shape == (0,0):
        summary_dict = {}
        raise Exception("0 entries matching search criteria, please adjust and try again.")
        return
    
    # generate output file
    if to_csv == True and r_df.shape != (0,0):
        r_df.to_csv(csv_name)      
        
    return r_df, summary_dict

### Demonstration of the `get_records()` function
Here we get all records for the indicator AIR_11, which generates a data container containing a summary of the dimensions recorded for this indicator, as well as returns a dataframe of all entries. 

In [567]:
data_container = get_records(indicator_code = 'AIR_11', to_csv = True, 
                             csv_name = 'Household air pollution attributable deaths.csv')

In [568]:
data_container[1]

{'IndicatorCode': array(['AIR_11'], dtype=object),
 'SpatialDimType': array(['COUNTRY'], dtype=object),
 'SpatialDim': array(['AFG', 'AGO', 'ALB', 'ARE', 'ARG', 'ARM', 'ATG', 'AUS', 'AUT',
        'AZE', 'BDI', 'BEL', 'BEN', 'BFA', 'BGD', 'BGR', 'BHR', 'BHS',
        'BIH', 'BLR', 'BLZ', 'BOL', 'BRA', 'BRB', 'BRN', 'BTN', 'BWA',
        'CAF', 'CAN', 'CHE', 'CHL', 'CHN', 'CIV', 'CMR', 'COD', 'COG',
        'COL', 'COM', 'CPV', 'CRI', 'CUB', 'CYP', 'CZE', 'DEU', 'DJI',
        'DNK', 'DOM', 'DZA', 'ECU', 'EGY', 'ERI', 'ESP', 'EST', 'ETH',
        'FIN', 'FJI', 'FRA', 'FSM', 'GAB', 'GBR', 'GEO', 'GHA', 'GIN',
        'GMB', 'GNB', 'GNQ', 'GRC', 'GRD', 'GTM', 'GUY', 'HND', 'HRV',
        'HTI', 'HUN', 'IDN', 'IND', 'IRL', 'IRN', 'IRQ', 'ISL', 'ISR',
        'ITA', 'JAM', 'JOR', 'JPN', 'KAZ', 'KEN', 'KGZ', 'KHM', 'KIR',
        'KOR', 'KWT', 'LAO', 'LBR', 'LCA', 'LKA', 'LSO', 'LTU', 'LUX',
        'LVA', 'MAR', 'MDA', 'MDG', 'MDV', 'MEX', 'MKD', 'MLI', 'MLT',
        'MMR', 'MNE', 'MNG', '

In [569]:
data_container[0]

Unnamed: 0,Id,IndicatorCode,SpatialDimType,SpatialDim,TimeDimType,TimeDim,Dim1Type,Dim1,Dim2Type,Dim2,...,DataSourceDim,Value,NumericValue,Low,High,Comments,Date,TimeDimensionValue,TimeDimensionBegin,TimeDimensionEnd
0,19579884,AIR_11,COUNTRY,AFG,YEAR,2016,SEX,BTSX,ENVCAUSE,ENVCAUSE113,...,,7899 [6275–9483],7898.80386,6275.02344,9483.33106,,2018-07-05T14:47:04.12+02:00,2016,2016-01-01T00:00:00+01:00,2016-12-31T00:00:00+01:00
1,19579885,AIR_11,COUNTRY,AFG,YEAR,2016,SEX,BTSX,ENVCAUSE,ENVCAUSE068,...,,414 [300–518],414.40803,300.24280,517.98480,,2018-07-05T14:47:04.157+02:00,2016,2016-01-01T00:00:00+01:00,2016-12-31T00:00:00+01:00
2,19579886,AIR_11,COUNTRY,AFG,YEAR,2016,SEX,BTSX,ENVCAUSE,ENVCAUSE114,...,,3359 [2647–4116],3359.34339,2647.01563,4115.60352,,2018-07-05T14:47:04.183+02:00,2016,2016-01-01T00:00:00+01:00,2016-12-31T00:00:00+01:00
3,19579887,AIR_11,COUNTRY,AFG,YEAR,2016,SEX,BTSX,ENVCAUSE,ENVCAUSE039,...,,7789 [5587–9560],7788.50188,5586.95117,9560.36035,,2018-07-05T14:47:04.21+02:00,2016,2016-01-01T00:00:00+01:00,2016-12-31T00:00:00+01:00
4,19579888,AIR_11,COUNTRY,AFG,YEAR,2016,SEX,BTSX,ENVCAUSE,ENVCAUSE000,...,,21141 [17400–24673],21140.72425,17399.77930,24672.51953,,2018-07-05T14:47:04.243+02:00,2016,2016-01-01T00:00:00+01:00,2016-12-31T00:00:00+01:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3235,19583119,AIR_11,COUNTRY,ZWE,YEAR,2016,SEX,MLE,ENVCAUSE,ENVCAUSE068,...,,67 [48–85],67.22270,48.19599,85.12967,,2018-07-05T14:48:43.177+02:00,2016,2016-01-01T00:00:00+01:00,2016-12-31T00:00:00+01:00
3236,19583120,AIR_11,COUNTRY,ZWE,YEAR,2016,SEX,MLE,ENVCAUSE,ENVCAUSE114,...,,335 [274–408],335.31383,273.99298,408.30600,,2018-07-05T14:48:43.213+02:00,2016,2016-01-01T00:00:00+01:00,2016-12-31T00:00:00+01:00
3237,19583121,AIR_11,COUNTRY,ZWE,YEAR,2016,SEX,MLE,ENVCAUSE,ENVCAUSE039,...,,2568 [1908–3129],2567.78753,1908.41724,3128.80029,,2018-07-05T14:48:43.25+02:00,2016,2016-01-01T00:00:00+01:00,2016-12-31T00:00:00+01:00
3238,19583122,AIR_11,COUNTRY,ZWE,YEAR,2016,SEX,MLE,ENVCAUSE,ENVCAUSE118,...,,335 [91–539],334.70182,91.49987,539.23285,,2018-07-05T14:48:43.28+02:00,2016,2016-01-01T00:00:00+01:00,2016-12-31T00:00:00+01:00


Using the information in the previous summary object, we can refine our filtering for AIR11 for the US only.

In [574]:
data_container2 = get_records(indicator_code = 'AIR_11', to_csv = True, 
                              csv_name = 'Household air pollution attributable deaths.csv', country = "USA")

In [575]:
data_container2[1]

{'IndicatorCode': array(['AIR_11'], dtype=object),
 'SpatialDimType': array(['COUNTRY'], dtype=object),
 'SpatialDim': array(['USA'], dtype=object),
 'TimeDimType': array(['YEAR'], dtype=object),
 'TimeDim': array([2016], dtype=int64),
 'Dim1Type': array(['SEX'], dtype=object),
 'Dim1': array(['BTSX', 'FMLE', 'MLE'], dtype=object),
 'Dim2Type': array(['ENVCAUSE'], dtype=object),
 'Dim2': array(['ENVCAUSE039', 'ENVCAUSE114', 'ENVCAUSE113', 'ENVCAUSE068',
        'ENVCAUSE118', 'ENVCAUSE000'], dtype=object),
 'Dim3Type': array([None], dtype=object),
 'Dim3': array([None], dtype=object)}

In [576]:
data_container2[0]

Unnamed: 0,Id,IndicatorCode,SpatialDimType,SpatialDim,TimeDimType,TimeDim,Dim1Type,Dim1,Dim2Type,Dim2,...,DataSourceDim,Value,NumericValue,Low,High,Comments,Date,TimeDimensionValue,TimeDimensionBegin,TimeDimensionEnd
0,19582926,AIR_11,COUNTRY,USA,YEAR,2016,SEX,BTSX,ENVCAUSE,ENVCAUSE039,...,,0 [0–0],0.0,0.0,0.0,,2018-07-05T14:48:37.343+02:00,2016,2016-01-01T00:00:00+01:00,2016-12-31T00:00:00+01:00
1,19582927,AIR_11,COUNTRY,USA,YEAR,2016,SEX,BTSX,ENVCAUSE,ENVCAUSE114,...,,0 [0–0],0.0,0.0,0.0,,2018-07-05T14:48:37.373+02:00,2016,2016-01-01T00:00:00+01:00,2016-12-31T00:00:00+01:00
2,19582928,AIR_11,COUNTRY,USA,YEAR,2016,SEX,BTSX,ENVCAUSE,ENVCAUSE113,...,,0 [0–0],0.0,0.0,0.0,,2018-07-05T14:48:37.407+02:00,2016,2016-01-01T00:00:00+01:00,2016-12-31T00:00:00+01:00
3,19582929,AIR_11,COUNTRY,USA,YEAR,2016,SEX,BTSX,ENVCAUSE,ENVCAUSE068,...,,0 [0–0],0.0,0.0,0.0,,2018-07-05T14:48:37.44+02:00,2016,2016-01-01T00:00:00+01:00,2016-12-31T00:00:00+01:00
4,19582930,AIR_11,COUNTRY,USA,YEAR,2016,SEX,BTSX,ENVCAUSE,ENVCAUSE118,...,,0 [0–0],0.0,0.0,0.0,,2018-07-05T14:48:37.473+02:00,2016,2016-01-01T00:00:00+01:00,2016-12-31T00:00:00+01:00
5,19582931,AIR_11,COUNTRY,USA,YEAR,2016,SEX,BTSX,ENVCAUSE,ENVCAUSE000,...,,0 [0–0],0.0,0.0,0.0,,2018-07-05T14:48:37.51+02:00,2016,2016-01-01T00:00:00+01:00,2016-12-31T00:00:00+01:00
6,19582932,AIR_11,COUNTRY,USA,YEAR,2016,SEX,FMLE,ENVCAUSE,ENVCAUSE068,...,,0 [0–0],0.0,0.0,0.0,,2018-07-05T14:48:37.54+02:00,2016,2016-01-01T00:00:00+01:00,2016-12-31T00:00:00+01:00
7,19582933,AIR_11,COUNTRY,USA,YEAR,2016,SEX,FMLE,ENVCAUSE,ENVCAUSE113,...,,0 [0–0],0.0,0.0,0.0,,2018-07-05T14:48:37.57+02:00,2016,2016-01-01T00:00:00+01:00,2016-12-31T00:00:00+01:00
8,19582934,AIR_11,COUNTRY,USA,YEAR,2016,SEX,FMLE,ENVCAUSE,ENVCAUSE039,...,,0 [0–0],0.0,0.0,0.0,,2018-07-05T14:48:37.6+02:00,2016,2016-01-01T00:00:00+01:00,2016-12-31T00:00:00+01:00
9,19582935,AIR_11,COUNTRY,USA,YEAR,2016,SEX,FMLE,ENVCAUSE,ENVCAUSE114,...,,0 [0–0],0.0,0.0,0.0,,2018-07-05T14:48:37.637+02:00,2016,2016-01-01T00:00:00+01:00,2016-12-31T00:00:00+01:00


If no entries match the given search criteria for the indicator, an Exception is raised.

In [577]:
data_container3 = get_records(indicator_code = 'AIR_11', to_csv = True, 
                              csv_name = 'Household air pollution attributable deaths.csv',
                              year = 2013)

Exception: 0 entries matching search criteria, please adjust and try again.

When invalid arguments are submitted to the function and the request does not return a valid JSON response, we use a try-except clause to catch this error and output a helper message.

In [578]:
wronginfo = get_records(indicator_code = 'jibberish', to_csv = True, 
                        csv_name = 'Household air pollution attributable deaths.csv', 
                        year = 2013, country = "CHL")

Invalid search criteria provided, please check arguments and try again.


#### `query_parser()`: Return full dataframe for indicator, with filters implemented

The function parses filtering criteria into the format recognized by the API, and then is submitted as the get request. This could have been done with two options:

- Option 1: filter using built-in parsing of API (example below) - saves space locally
- Option 2: filter locally after Function 2 retrieves dataframe for indicator - allows preliminary inspection before filtering

We chose Option 1 for its notable benefit of saving space locally. If the user already knows exactly what data points they require, they are able to just pull exactly what they need without redundancy. We can also cover functionality of Option 2 by just requesting for the entire dataframe for an indicator without filtering, if preliminary inspection is needed.

In [579]:
def query_parser(IndicatorCode = 'WHOSIS_000001',
                 SpatialDimType = '', SpatialDim = '',
                 TimeDimType = '', TimeDim = '',
                 Dim1Type = '', Dim1 = '',
                 Dim2Type = '', Dim2 = '',
                 Dim3Type = '', Dim3 = ''):
    """
    Accessory function of get_records(). 
    With user-defined geographical, temporal and demographical filters, a request query is parsed for the desired indicator.
    All arguments are taken from those submitted to get_records().
    
    Parameters
    ----------
    IndicatorCode : str
        The GHO database unique identifier for which we want to query.
    
    SpatialDimType : str
        If the chosen indicator has records for more than one type of spatial dimension (eg. records for a single
        country and continent), this argument can be used to filter records for the desired one.
    
    SpatialDim : str
        If the chosen spatial dimension has more than one unique values, 
        this argument can be used to filter records for the desired one.
    
    TimeDimType : str
        If the chosen indicator has records for more than one type of temporal dimension (eg. month of the year vs. 
        entire year), this argument can be used to filter records for the desired one.
    
    TimeDim : str
        If the chosen temporal dimension has more than one unique values, 
        this argument can be used to filter records for the desired one.
    
    Dim1/Dim2/Dim3Type : str
        Additional demographical dimensions to filter indicator records by. Refer to full indicator dataframe or summary for
        filtering options.
    
    Dim1/Dim2/Dim3 : str
        Dimension values to filter indicator records by. Refer to full indicator dataframe or summary for
        filtering options.

    Returns
    -------
    str
        A parsed url used to query the database and obtain the data entries of interest from the GHO database.

    Examples
    --------
    >>> query_parser(IndicatorCode = 'WHOSIS_000001', SpatialDimType = 'Region', TimeDimType = 'year', Dim1 = 'WQ1')
        "https://ghoapi.azureedge.net/api/WHOSIS_000001?$filter=SpatialDimType eq 'Region' and TimeDimType eq 'year' and Dim1 eq 'WQ1'"
    """ 
    d = locals()
    given_filters = {key:value for key, value in d.items() if value} 
    if len(given_filters) >= 1:
        parsed_request_url = "https://ghoapi.azureedge.net/api/" + IndicatorCode
        if len(given_filters) > 1:
            del(given_filters["IndicatorCode"])
            filters = '?$filter='
            for key, value in given_filters.items():
                if key == "TimeDim":
                    filters += key + " eq " + str(value) + ' and '
                else: 
                    filters += key + " eq " + "'{}'".format(value) + ' and '   
            parsed_request_url +=  filters.rstrip(' and ')
        
    return parsed_request_url

### Demonstration of the `query_parser()` function

Returns a properly formatted search query that can be interpreted by the GHO OData API server. When the parsed url is submitted we can see the desired datapoints are pulled to local. 

In [445]:
query_parser(IndicatorCode = 'WHOSIS_000001', SpatialDimType = 'Region', TimeDimType = 'year', Dim1 = 'WQ1')

"https://ghoapi.azureedge.net/api/WHOSIS_000001?$filter=SpatialDimType eq 'Region' and TimeDimType eq 'year' and Dim1 eq 'WQ1'"

In [448]:
test_url = query_parser(IndicatorCode = 'vdpt', SpatialDimType = 'country', TimeDimType = 'year', Dim1 = 'WQ1')
test1 = requests.get(test_url)
pd.DataFrame(test1.json()['value'])

Unnamed: 0,Id,IndicatorCode,SpatialDimType,SpatialDim,TimeDimType,TimeDim,Dim1Type,Dim1,Dim2Type,Dim2,...,DataSourceDim,Value,NumericValue,Low,High,Comments,Date,TimeDimensionValue,TimeDimensionBegin,TimeDimensionEnd
0,26738390,vdpt,COUNTRY,AFG,YEAR,2010,WEALTHQUINTILE,WQ1,,,...,EQ_MICS,44.1 [36.1-52.3],44.05745,36.11072,52.32084,,2021-08-26T13:07:02.91+02:00,2010,2010-01-01T00:00:00+01:00,2010-12-31T00:00:00+01:00
1,26738540,vdpt,COUNTRY,AFG,YEAR,2015,WEALTHQUINTILE,WQ1,,,...,EQ_DHS,48.9 [43.7-54.1],48.89169,43.68638,54.12114,,2021-08-26T13:07:08.28+02:00,2015,2015-01-01T00:00:00+01:00,2015-12-31T00:00:00+01:00
2,26738796,vdpt,COUNTRY,ALB,YEAR,2008,WEALTHQUINTILE,WQ1,,,...,EQ_DHS,100.0 [100.0-100.0],100.00000,100.00000,100.00000,,2021-08-26T13:07:18.42+02:00,2008,2008-01-01T00:00:00+01:00,2008-12-31T00:00:00+01:00
3,26739063,vdpt,COUNTRY,DZA,YEAR,2012,WEALTHQUINTILE,WQ1,,,...,EQ_MICS,83.9 [79.7-87.4],83.89432,79.67336,87.37764,,2021-08-26T13:07:29.597+02:00,2012,2012-01-01T00:00:00+01:00,2012-12-31T00:00:00+01:00
4,26739203,vdpt,COUNTRY,DZA,YEAR,2018,WEALTHQUINTILE,WQ1,,,...,EQ_MICS,55.8 [49.8-61.6],55.80416,49.80199,61.64142,,2021-08-26T13:07:35.51+02:00,2018,2018-01-01T00:00:00+01:00,2018-12-31T00:00:00+01:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
371,26792976,vdpt,COUNTRY,ZWE,YEAR,2010,WEALTHQUINTILE,WQ1,,,...,EQ_DHS,67.4 [59.2-74.6],67.37429,59.18990,74.62109,,2021-08-26T13:41:06.993+02:00,2010,2010-01-01T00:00:00+01:00,2010-12-31T00:00:00+01:00
372,26793131,vdpt,COUNTRY,ZWE,YEAR,2014,WEALTHQUINTILE,WQ1,,,...,EQ_MICS,87.4 [83.6-90.3],87.35632,83.63739,90.32776,,2021-08-26T13:41:12.813+02:00,2014,2014-01-01T00:00:00+01:00,2014-12-31T00:00:00+01:00
373,26793311,vdpt,COUNTRY,ZWE,YEAR,2015,WEALTHQUINTILE,WQ1,,,...,EQ_DHS,79.8 [72.6-85.5],79.79072,72.60308,85.47005,,2021-08-26T13:41:19.093+02:00,2015,2015-01-01T00:00:00+01:00,2015-12-31T00:00:00+01:00
374,26793431,vdpt,COUNTRY,ZWE,YEAR,2019,WEALTHQUINTILE,WQ1,,,...,EQ_MICS,87.0 [81.0-91.4],87.03647,80.99952,91.35992,,2021-08-26T13:41:23.463+02:00,2019,2019-01-01T00:00:00+01:00,2019-12-31T00:00:00+01:00


In [449]:
test2_url = query_parser(IndicatorCode = 'DEVICES23', SpatialDimType = 'country', TimeDimType = 'year', TimeDim = "2010")
test2 = requests.get(test2_url)
pd.DataFrame(test2.json()['value'])

Unnamed: 0,Id,IndicatorCode,SpatialDimType,SpatialDim,TimeDimType,TimeDim,Dim1Type,Dim1,Dim2Type,Dim2,...,DataSourceDim,Value,NumericValue,Low,High,Comments,Date,TimeDimensionValue,TimeDimensionBegin,TimeDimensionEnd
0,38733,DEVICES23,COUNTRY,AFG,YEAR,2010,,,,,...,,0.00,0.00,,,,2013-06-11T14:01:25.3+02:00,2010,2010-01-01T00:00:00+01:00,2010-12-31T00:00:00+01:00
1,38734,DEVICES23,COUNTRY,ALB,YEAR,2010,,,,,...,,62.71,62.71,,,,2013-06-11T14:01:25.3+02:00,2010,2010-01-01T00:00:00+01:00,2010-12-31T00:00:00+01:00
2,38735,DEVICES23,COUNTRY,AGO,YEAR,2010,,,,,...,,6.98,6.98,,,Only public sector data,2013-06-11T14:01:25.3+02:00,2010,2010-01-01T00:00:00+01:00,2010-12-31T00:00:00+01:00
3,38736,DEVICES23,COUNTRY,ATG,YEAR,2010,,,,,...,,188.50,188.50,,,the quantity was estimated from the population...,2013-06-11T14:01:25.3+02:00,2010,2010-01-01T00:00:00+01:00,2010-12-31T00:00:00+01:00
4,38737,DEVICES23,COUNTRY,ARM,YEAR,2010,,,,,...,,21.06,21.06,,,,2013-06-11T14:01:25.3+02:00,2010,2010-01-01T00:00:00+01:00,2010-12-31T00:00:00+01:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
116,40353,DEVICES23,COUNTRY,KAZ,YEAR,2010,,,,,...,,23.82,23.82,,,,2013-06-11T14:01:25.3+02:00,2010,2010-01-01T00:00:00+01:00,2010-12-31T00:00:00+01:00
117,40354,DEVICES23,COUNTRY,MWI,YEAR,2010,,,,,...,,0,0.00,,,,2013-06-11T14:01:25.3+02:00,2010,2010-01-01T00:00:00+01:00,2010-12-31T00:00:00+01:00
118,40355,DEVICES23,COUNTRY,KOR,YEAR,2010,,,,,...,,444.57,444.57,,,,2013-06-11T14:01:25.3+02:00,2010,2010-01-01T00:00:00+01:00,2010-12-31T00:00:00+01:00
119,40356,DEVICES23,COUNTRY,SYC,YEAR,2010,,,,,...,,96.06,96.06,,,the quantity was estimated from the population...,2013-06-11T14:01:25.3+02:00,2010,2010-01-01T00:00:00+01:00,2010-12-31T00:00:00+01:00


When there are no records that match the queried criteria, a dash is returned.

In [450]:
test3_url = query_parser(IndicatorCode = 'DEVICES23', SpatialDimType = 'country', TimeDimType = 'year', Dim1 = 'wq1')
test3 = requests.get(test3_url)
print(test3.status_code)
pd.DataFrame(test3.json()['value'])

200


#### `search_dimensions()`: Retrieving and searching types of dimensions

Each indicator has a series of dimensions that tells us where, when and from whom the data entries were collected from. `get_dimensions()` allows us to see and search through dimensions available in the dataset. This is helpful for preliminary understanding the what data entries are present in the database. 

It pulls a dataframe of all dimensions recorded in the database. A local search is implemented on the pulled dataframe, since the GHO API does not have this as a built-in functionality.

In [581]:
def search_dimensions(search_for = 'all'):
    """
    Retrieves a dataframe of all dimensions recorded in the GHO database. 
    A local, case-insensitive keyword search can be optionally implemented, to locate dimensions of interest. 
    
    Parameters
    ----------
    search_for : str
        A searchword specifying dimensions of interest. Used to filter for dimension codes containing the searchword.

    Returns
    -------
    pandas.core.frame.DataFrame
        Containing all dimension codes and descriptions recorded in the database, or alternatively fitting the search criteria.

    Examples
    --------
    >>> search_dimensions()
        Retrives all dimensions recorded in the database.
        
    >>> search_dimensions(search_for = "type")
        Retrives dimensions in the database that contains the word "type" in its code.
    """
    r = requests.get('https://ghoapi.azureedge.net/api/Dimension')
    dimensions = pd.DataFrame(r.json()['value'])
    
    if search_for == 'all':
        return dimensions
    elif search_for != 'all':
        dimensions = dimensions[dimensions.Code.str.contains(search_for.upper())]
        if dimensions.shape == (0,2):
            raise Exception('No matching dimensions found, please adjust your search.')
        
    return dimensions

### Demonstration of `get_dimensions()` function

A simple call with no arguments gives all dimensions recorded in the database.

In [423]:
search_dimensions()

Unnamed: 0,Code,Title
0,ADVERTISINGTYPE,SUBSTANCE_ABUSE_ADVERTISING_TYPES
1,AGEGROUP,Age Group
2,ALCOHOLTYPE,Beverage Types
3,AMRGLASSCATEGORY,AMR GLASS Category
4,ARCHIVE,Archive date
...,...,...
87,WEALTHQUINTILE,Wealth Quintile
88,WHOINCOMEREGION,WHO Income Region
89,WORLDBANKINCOMEGROUP,World Bank income group
90,WORLDBANKREGION,World Bank Region


By entering the keyword "type", we retrieve dimensions that contain this word in their code. We can see that there are probably records about substance and alcohol abuse, as well as possibly records about driving and motor road usage. 

In [461]:
search_dimensions(search_for = 'type')

Unnamed: 0,Code,Title
0,ADVERTISINGTYPE,SUBSTANCE_ABUSE_ADVERTISING_TYPES
2,ALCOHOLTYPE,Beverage Types
5,AWARENESSACTIVITYTYPE,SUBSTANCE_ABUSE_AWARENESS_ACTIVITY_TYPES
7,BEVERAGETYPE,SUBSTANCE_ABUSE_BEVERAGE_TYPES
11,COMMUNITYACTIONTYPE,Community Action
12,CONSUMPTIONTYPE,Consumption type
17,DRIVERTYPE,Driver Type
39,LEGISLATIONTYPE,Legislation type
40,MEASUREIMPORTANCETYPE,SUBSTANCE_ABUSE_MEASURE_IMPORTANCE_TYPE
42,MOTOCYCLEOCCUPANTTYPE,Motorcycle Occupant Type


If there are no dimensions matching the search, an Exception is raised. 

In [562]:
search_dimensions(search_for = 'jibberish')

Exception: No matching dimensions found, please adjust your search.

#### `get_dimension_values()`: Retrieve a specific dimension and search for values within
Allows user to inspect and search through values of a specific dimension. 

In [540]:
def get_dimension_values(dimension_code = 'YEAR', search_for = ''):
    """
    Allows user to inspect, and run a case-insensitive search through the values of a specific dimension.

    Parameters
    ----------
    
    dimension_code : str
        The GHO database unique identifier for each dimension.
    
    search_for : str
        A searchword specifying dimension values of interest. 
        This search applies to all elements in the dimension values dataframe. 

    Returns
    -------
    pandas.core.frame.DataFrame
        Containing all exisiting values and descriptions for a particular dimension, 
        or alternatively values that fit the search criteria.

    Examples
    --------
    >>> get_dimension_values(dimension_code = "country")
        Retrives all countries with records in the GHO database.
        
    >>> get_dimension_values(dimension_code = "country", search_for = "africa")
        Retrives all African countries with records in the GHO database.    

    """
    url = "https://ghoapi.azureedge.net/api/DIMENSION/" + dimension_code.upper() + "/DimensionValues"
    dimension_values = requests.get(url)
    dimension_values_df = pd.DataFrame(dimension_values.json()['value'])
    
    if search_for == '':
        pass
    elif search_for != '':
        result = dimension_values_df.apply(lambda row: row.astype(str).str.contains(search_for, na = False, flags=re.IGNORECASE).any(), axis=1)
        dimension_values_df = dimension_values_df.loc[result]
        
    if dimension_values_df.shape == (0,6):
            raise Exception('No matching dimension values found, please adjust your search.')
            
    return dimension_values_df

### Demonstration of `get_dimension_values()` function

Here we get all values in the year dimension. Then we use the `search_for` argument to search for dimensions containing "2020".

In [531]:
get_dimension_values(dimension_code = 'year')

Unnamed: 0,Code,Title,Dimension,ParentDimension,ParentCode,ParentTitle
0,1920,1920,YEAR,,,
1,1949,1949,YEAR,,,
2,1950,1950,YEAR,,,
3,1951,1951,YEAR,,,
4,1952,1952,YEAR,,,
...,...,...,...,...,...,...
268,2021,2021,YEAR,,,
269,2022,2022,YEAR,,,
270,2025,2025,YEAR,,,
271,PROJECTION2015,2015,YEAR,,,


In [532]:
get_dimension_values(dimension_code = 'year', search_for = '2020')

Unnamed: 0,Code,Title,Dimension,ParentDimension,ParentCode,ParentTitle
250,2014-2020,2014-2020,YEAR,,,
255,2015-2020,2015-2020,YEAR,,,
266,2019-2020,2019-2020,YEAR,,,
267,2020,2020,YEAR,,,


Similarly, here we see all options of the "country" and "region" dimensions. 

The dataframe information then allows us to conduct finer searches on those dimensions: eg. for a particular country, countries within a geographical region (in this case here African countries), as well as European subregions that are recorded in the database.

In [533]:
get_dimension_values(dimension_code = 'country')

Unnamed: 0,Code,Title,Dimension,ParentDimension,ParentCode,ParentTitle
0,ABW,Aruba,COUNTRY,REGION,AMR,Americas
1,AFG,Afghanistan,COUNTRY,REGION,EMR,Eastern Mediterranean
2,AGO,Angola,COUNTRY,REGION,AFR,Africa
3,AIA,Anguilla,COUNTRY,REGION,AMR,Americas
4,ALB,Albania,COUNTRY,REGION,EUR,Europe
...,...,...,...,...,...,...
240,YUG890,SPATIAL_SYNONYM,COUNTRY,REGION,EUR,Europe
241,YUG891,SPATIAL_SYNONYM,COUNTRY,REGION,EUR,Europe
242,ZAF,South Africa,COUNTRY,REGION,AFR,Africa
243,ZMB,Zambia,COUNTRY,REGION,AFR,Africa


In [534]:
get_dimension_values(dimension_code = "country", search_for = "africa")

Unnamed: 0,Code,Title,Dimension,ParentDimension,ParentCode,ParentTitle
2,AGO,Angola,COUNTRY,REGION,AFR,Africa
16,BDI,Burundi,COUNTRY,REGION,AFR,Africa
18,BEN,Benin,COUNTRY,REGION,AFR,Africa
19,BFA,Burkina Faso,COUNTRY,REGION,AFR,Africa
33,BWA,Botswana,COUNTRY,REGION,AFR,Africa
34,CAF,Central African Republic,COUNTRY,REGION,AFR,Africa
40,CIV,Côte d'Ivoire,COUNTRY,REGION,AFR,Africa
41,CMR,Cameroon,COUNTRY,REGION,AFR,Africa
42,COD,Democratic Republic of the Congo,COUNTRY,REGION,AFR,Africa
43,COG,Congo,COUNTRY,REGION,AFR,Africa


In [535]:
get_dimension_values(dimension_code = "country", search_for = "Togo")

Unnamed: 0,Code,Title,Dimension,ParentDimension,ParentCode,ParentTitle
213,TGO,Togo,COUNTRY,REGION,AFR,Africa


In [536]:
get_dimension_values(dimension_code = 'region')

Unnamed: 0,Code,Title,Dimension,ParentDimension,ParentCode,ParentTitle
0,AFR,Africa,REGION,,,
1,AMR,Americas,REGION,,,
2,EMR,Eastern Mediterranean,REGION,,,
3,EUR,Europe,REGION,,,
4,GBD_REG14_AFRD,"Africa region, stratum D (AFR D)",REGION,,,
5,GBD_REG14_AFRE,"Africa region, stratum E(AFR E)",REGION,,,
6,GBD_REG14_AMRA,"Americas region, stratum A (AMR A)",REGION,,,
7,GBD_REG14_AMRB,"Americas region, stratum B (AMR B)",REGION,,,
8,GBD_REG14_AMRD,"Americas region, stratum D (AMR D)",REGION,,,
9,GBD_REG14_EMRB,"Eastern Mediterranean region, stratum B (EMR B)",REGION,,,


In [537]:
get_dimension_values(dimension_code = 'region', search_for = 'europe')

Unnamed: 0,Code,Title,Dimension,ParentDimension,ParentCode,ParentTitle
3,EUR,Europe,REGION,,,
11,GBD_REG14_EURA,"Europe region, stratum A (EUR A)",REGION,,,
12,GBD_REG14_EURB,"Europe region, stratum B (EUR B)",REGION,,,
13,GBD_REG14_EURC,"Europe region, stratum C (EUR C)",REGION,,,
21,OECD_HII_EUR,"Europe, high-income OECD",REGION,,,
26,OECD_NON_EUR,"Europe, non-OECD",REGION,,,
34,WHO_LMI_EUR,Low-and-middle-income countries of the Europea...,REGION,,,


If invalid arguments are submitted: the function raises an Exception if the search value is not found. If an invalid dimension code is submitted, the API returns a dash.

In [542]:
get_dimension_values(dimension_code = "country", search_for = "jibberish")

Exception: No matching dimension values found, please adjust your search.

In [541]:
get_dimension_values(dimension_code = "jibberish", search_for = "2020")