# This is a tutorial/demo on how to use the `Datamart` REST API.

## Installation

This Jupyter notebook requires at least Python 3.3 with these packages installed:

```
pip install notebook
pip install requests
pip install pandas
```

To run change to the directory containing this notebook, and type

```
jupyter notebook
```

Then, open this page in the web browser: http://localhost:8888/notebooks/Datamart%20Data%20API%20Demo.ipynb

## Configuration

By default the this notebook accesses the Datamart REST API server at ISI. Edit the cell below to choose a different server.

To run you own server **locally** follow the instructions here: [README](README.md)

In [1]:
## set datamart api url
# The datamart server running at ISI
# datamart_api_url = 'https://datamart:datamart-api-789@dsbox02.isi.edu/datamart-api'

# Datamart server running on localhost
# datamart_api_url = 'http://localhost:14080'

# Datamart server running on localhost in development mode
datamart_api_url = 'http://localhost:5000'


## Import python modules

In [2]:
from requests import get,post,put,delete
import json
import pandas as pd
from io import StringIO
from IPython.display import display, HTML

### Get all datasets 

**GET `/metadata/datasets`**

In [3]:
response = get(f'{datamart_api_url}/metadata/datasets')
df = pd.DataFrame(response.json())
df

Unnamed: 0,name,description,url,dataset_id
0,UAZ Indicators,"Collection of indicators, including indicators...",https://github.com/ml4ai/delphi,UAZ
1,WDI dataset,World Development Indicators,https://databank.worldbank.org/source/world-de...,WDI
2,Corruption Perceptions Index,Transparency International Corruption Percept...,https://www.transparency.org/,TICPI
3,SIPRI Military Expenditure,"Military expenditure by country, in millions o...",https://sipri.org/databases/milex,SIPRI
4,economic fitness dataset,EconomicFitness,https://databank.banquemondiale.org/source/eco...,EconomicFitness
5,Agricultural Market Information System (AMIS),The Agricultural Market Information System (AM...,http://www.amis-outlook.org,AMIS
6,test test test,testy test,https://test.com,TEST000
7,World Press Freedom Index,Published every year since 2002 by Reporters W...,https://rsf.org/en,WPFI
8,Poverty Rate Global DP,Poverty Rate Global DP,http://url,DPPoverty
9,FSI dataset,data downloaded from FSI,https://fragilestatesindex.org,FSI


As of June 25, 2020 there are 11 datasets in the database. More datasets will be added as they are processed. 

We can also get metadata about one dataset using the `dataset_id`.

### Get metadata about one dataset

**GET `/metadata/datasets/{dataset_id}`**

In [4]:
response = get(f'{datamart_api_url}/metadata/datasets/WDI')
df = pd.DataFrame(response.json())
df

Unnamed: 0,name,description,url,dataset_id
0,WDI dataset,World Development Indicators,https://databank.worldbank.org/source/world-de...,WDI


### Get all variables in a dataset 

**GET `/metadata/datasets/{dataset_id}/variables`**

In [8]:
response = get(f'{datamart_api_url}/metadata/datasets/WDI/variables')
print(json.dumps(response.json()[:5], indent=2)) # print only 5 variables

[
  {
    "name": "_2005 PPP conversion factor, GDP (LCU per international $)",
    "variable_id": "_2005_ppp_conversion_factor_gdp_lcu_per_international",
    "description": "_2005 PPP conversion factor, GDP (LCU per international $) in WDI",
    "corresponds_to_property": "PWDI-002",
    "qualifier": [
      {
        "name": "point in time",
        "identify": "P585"
      },
      {
        "name": "stated in",
        "identify": "P248"
      }
    ]
  },
  {
    "name": "_2005 PPP conversion factor, private consumption (LCU per international $)",
    "variable_id": "_2005_ppp_conversion_factor_private_consumption_lcu_per_international",
    "description": "_2005 PPP conversion factor, private consumption (LCU per international $) in WDI",
    "corresponds_to_property": "PWDI-003",
    "qualifier": [
      {
        "name": "point in time",
        "identify": "P585"
      },
      {
        "name": "stated in",
        "identify": "P248"
      }
    ]
  },
  {
    "name": "Acces

In [9]:
print('Total number of variables in dataset: {} is {}'.format('WDI', len(response.json())))

Total number of variables in dataset: WDI is 1429


### Get metadata about one variable

**GET `/metadata/datasets/{dataset_id}/variables/{variable_id}`**

In [10]:
response = get(f'{datamart_api_url}/metadata/datasets/WDI/variables/access_to_electricity_of_population')
print(json.dumps(response.json(), indent=2))

{
  "name": "Access to electricity (% of population)",
  "variable_id": "access_to_electricity_of_population",
  "dataset_id": "WDI",
  "description": "Access to electricity (% of population) in WDI",
  "corresponds_to_property": "PWDI-005",
  "qualifier": [
    {
      "name": "point in time",
      "identify": "P585"
    },
    {
      "name": "stated in",
      "identify": "P248"
    }
  ]
}


### Find a variable using keyword search

**GET `/metadata/variables?keyword={keyword}`**

Query for datasets related to: **road**

In [11]:
response = get(f'{datamart_api_url}/metadata/variables?keyword=road')
df = pd.DataFrame(response.json())
df

Unnamed: 0,variable_id,name,rank,dataset_id
0,mortality_caused_by_road_traffic_injury_per_10...,Mortality caused by road traffic injury (per ...,0.075991,WDI
1,road_fatalities,Road Fatalities,0.075991,OECD
2,VUAZ-8054,WDI: Mortality caused by road traffic injury[...,0.060793,UAZ


Query datasets related to: **road AND fatalities**

In [12]:
response = get(f'{datamart_api_url}/metadata/variables?keyword=road fatalities')
df = pd.DataFrame(response.json())
df

Unnamed: 0,variable_id,name,rank,dataset_id
0,road_fatalities,Road Fatalities,0.334428,OECD


Query datasets related to: **road OR fatalities**

In [13]:
response = get(f'{datamart_api_url}/metadata/variables?keyword=road&keyword=fatalities')
df = pd.DataFrame(response.json())
df

Unnamed: 0,variable_id,name,rank,dataset_id
0,road_fatalities,Road Fatalities,0.075991,OECD
1,mortality_caused_by_road_traffic_injury_per_10...,Mortality caused by road traffic injury (per ...,0.037995,WDI
2,VUAZ-8054,WDI: Mortality caused by road traffic injury[...,0.030396,UAZ
3,VUAZ-8136,Conflict fatalities[number of cases],0.030396,UAZ


### Get time series data for a variable

**GET `/datasets/{dataset_id}/variables/{variable_id}`**

In [14]:
response = get(f'{datamart_api_url}/datasets/WDI/variables/access_to_electricity_of_population')
df = pd.read_csv(StringIO(response.text))
display(HTML(df.fillna('').head(20).to_html(index=False)))

dataset_id,variable_id,variable,main_subject,main_subject_id,value,value_unit,time,time_precision,country,admin1,admin2,admin3,coordinate,stated_in,stated_in_id
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,73.6,,2000-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,76.34446,,2001-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,77.307663,,2002-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,78.251656,,2003-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,79.171516,,2004-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,81.6,,2005-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,80.943794,,2006-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,81.820259,,2007-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,82.708366,,2008-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,83.621689,,2009-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640


### Get time series data for a variable for a country

**GET `/datasets/{dataset_id}/variables/{variable_id}?country={country}`**

Get data for **Gabon**

In [15]:
response = get(f'{datamart_api_url}/datasets/WDI/variables/access_to_electricity_of_population?country=Gabon')
df = pd.read_csv(StringIO(response.text))
display(HTML(df.fillna('').to_html(index=False)))

dataset_id,variable_id,variable,main_subject,main_subject_id,value,value_unit,time,time_precision,country,admin1,admin2,admin3,coordinate,stated_in,stated_in_id
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,73.6,,2000-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,76.34446,,2001-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,77.307663,,2002-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,78.251656,,2003-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,79.171516,,2004-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,81.6,,2005-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,80.943794,,2006-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,81.820259,,2007-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,82.708366,,2008-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,83.621689,,2009-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640


Get data for **Gabon OR Guinea**

In [16]:
response = get(f'{datamart_api_url}/datasets/WDI/variables/access_to_electricity_of_population?country=Gabon&country=Guinea')
df = pd.read_csv(StringIO(response.text))
display(HTML(df.fillna('').to_html(index=False)))

dataset_id,variable_id,variable,main_subject,main_subject_id,value,value_unit,time,time_precision,country,admin1,admin2,admin3,coordinate,stated_in,stated_in_id
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,73.6,,2000-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,76.34446,,2001-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,77.307663,,2002-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,78.251656,,2003-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,79.171516,,2004-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,81.6,,2005-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,80.943794,,2006-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,81.820259,,2007-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,82.708366,,2008-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640
WDI,access_to_electricity_of_population,Access to electricity (% of population),Gabon,Q1000,83.621689,,2009-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640


### Get time series data for all variables in a dataset

**GET `/datasets/{dataset_id}/variables`**

**Please note that this API will return data for 20 variables only, by default. However that limit can be increased by setting the limit in the url**

For example, to fetch 50 variables,

**GET `/datasets/{dataset_id}/variables?limit=50`**

In [19]:
response = get(f'{datamart_api_url}/datasets/WDI/variables')
df = pd.read_csv(StringIO(response.text), dtype=object)
print(f'Number of rows in the file: {len(df)}')
display(HTML(df.fillna('').head(20).to_html(index=False)))

Number of rows in the file: 29311


dataset_id,main_subject,main_subject_id,time,time_precision,country,admin1,admin2,admin3,coordinate,stated_in,stated_in_id,_2005_ppp_conversion_factor_gdp_lcu_per_international,_2005_ppp_conversion_factor_gdp_lcu_per_international_NAME,_2005_ppp_conversion_factor_gdp_lcu_per_international_UNIT,_2005_ppp_conversion_factor_private_consumption_lcu_per_international,_2005_ppp_conversion_factor_private_consumption_lcu_per_international_NAME,_2005_ppp_conversion_factor_private_consumption_lcu_per_international_UNIT,access_to_clean_fuels_and_technologies_for_cooking_of_population,access_to_clean_fuels_and_technologies_for_cooking_of_population_NAME,access_to_clean_fuels_and_technologies_for_cooking_of_population_UNIT,access_to_electricity_of_population,access_to_electricity_of_population_NAME,access_to_electricity_of_population_UNIT,access_to_electricity_rural_of_rural_population,access_to_electricity_rural_of_rural_population_NAME,access_to_electricity_rural_of_rural_population_UNIT,access_to_electricity_urban_of_urban_population,access_to_electricity_urban_of_urban_population_NAME,access_to_electricity_urban_of_urban_population_UNIT,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_of_population_ages_15,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_of_population_ages_15_NAME,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_of_population_ages_15_UNIT,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_female_of_population_ages_15,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_female_of_population_ages_15_NAME,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_female_of_population_ages_15_UNIT,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_male_of_population_ages_15,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_male_of_population_ages_15_NAME,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_male_of_population_ages_15_UNIT,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_older_adults_of_population_ages_25,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_older_adults_of_population_ages_25_NAME,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_older_adults_of_population_ages_25_UNIT,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_poorest_40_of_population_ages_15,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_poorest_40_of_population_ages_15_NAME,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_poorest_40_of_population_ages_15_UNIT,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_primary_education_or_less_of_population_ages_15,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_primary_education_or_less_of_population_ages_15_NAME,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_primary_education_or_less_of_population_ages_15_UNIT,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_richest_60_of_population_ages_15,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_richest_60_of_population_ages_15_NAME,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_richest_60_of_population_ages_15_UNIT,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_secondary_education_or_more_of_population_ages_15,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_secondary_education_or_more_of_population_ages_15_NAME,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_secondary_education_or_more_of_population_ages_15_UNIT,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_young_adults_of_population_ages_15_24,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_young_adults_of_population_ages_15_24_NAME,account_ownership_at_a_financial_institution_or_with_a_mobile_money_service_provider_young_adults_of_population_ages_15_24_UNIT,adequacy_of_social_insurance_programs_of_total_welfare_of_beneficiary_households,adequacy_of_social_insurance_programs_of_total_welfare_of_beneficiary_households_NAME,adequacy_of_social_insurance_programs_of_total_welfare_of_beneficiary_households_UNIT,adequacy_of_social_protection_and_labor_programs_of_total_welfare_of_beneficiary_households,adequacy_of_social_protection_and_labor_programs_of_total_welfare_of_beneficiary_households_NAME,adequacy_of_social_protection_and_labor_programs_of_total_welfare_of_beneficiary_households_UNIT,adequacy_of_social_safety_net_programs_of_total_welfare_of_beneficiary_households,adequacy_of_social_safety_net_programs_of_total_welfare_of_beneficiary_households_NAME,adequacy_of_social_safety_net_programs_of_total_welfare_of_beneficiary_households_UNIT,adequacy_of_unemployment_benefits_and_almp_of_total_welfare_of_beneficiary_households,adequacy_of_unemployment_benefits_and_almp_of_total_welfare_of_beneficiary_households_NAME,adequacy_of_unemployment_benefits_and_almp_of_total_welfare_of_beneficiary_households_UNIT,adjusted_net_enrollment_rate_primary_of_primary_school_age_children,adjusted_net_enrollment_rate_primary_of_primary_school_age_children_NAME,adjusted_net_enrollment_rate_primary_of_primary_school_age_children_UNIT
WDI,Gabon,Q1000,2005-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,256.2303101,"_2005 PPP conversion factor, GDP (LCU per international $)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
WDI,The Gambia,Q1005,2005-01-01T00:00:00Z,year,The Gambia,,,,"POINT(-15.5, 13.5)",WDI,Q8035640,7.560359062,"_2005 PPP conversion factor, GDP (LCU per international $)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
WDI,Guinea,Q1006,2005-01-01T00:00:00Z,year,Guinea,,,,"POINT(-11.0, 10.0)",WDI,Q8035640,1219.348401,"_2005 PPP conversion factor, GDP (LCU per international $)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
WDI,Guinea-Bissau,Q1007,2005-01-01T00:00:00Z,year,Guinea-Bissau,,,,"POINT(-15.0, 12.0)",WDI,Q8035640,217.3003471,"_2005 PPP conversion factor, GDP (LCU per international $)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
WDI,Cameroon,Q1009,2005-01-01T00:00:00Z,year,Cameroon,,,,"POINT(12.0, 7.0)",WDI,Q8035640,251.0153029,"_2005 PPP conversion factor, GDP (LCU per international $)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
WDI,Cape Verde,Q1011,2005-01-01T00:00:00Z,year,Cape Verde,,,,"POINT(-24.083333333333, 15.916666666667)",WDI,Q8035640,69.3602975,"_2005 PPP conversion factor, GDP (LCU per international $)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
WDI,Lesotho,Q1013,2005-01-01T00:00:00Z,year,Lesotho,,,,"POINT(28.25, -29.55)",WDI,Q8035640,3.490095754,"_2005 PPP conversion factor, GDP (LCU per international $)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
WDI,Liberia,Q1014,2005-01-01T00:00:00Z,year,Liberia,,,,"POINT(-9.75, 6.533333)",WDI,Q8035640,0.492553151,"_2005 PPP conversion factor, GDP (LCU per international $)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
WDI,Libya,Q1016,2005-01-01T00:00:00Z,year,Libya,,,,"POINT(17.0, 27.0)",WDI,Q8035640,0.7345614,"_2005 PPP conversion factor, GDP (LCU per international $)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
WDI,Madagascar,Q1019,2005-01-01T00:00:00Z,year,Madagascar,,,,"POINT(47.0, -20.0)",WDI,Q8035640,649.5681317,"_2005 PPP conversion factor, GDP (LCU per international $)",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


### Get time series for multiple variables in a dataset

**GET `/datasets/{dataset_id}/variables?variable={variable_id}`**

Get data for variables **`access_to_clean_fuels_and_technologies_for_cooking_of_population` AND
`access_to_electricity_of_population`**

In [20]:
response = get(f'{datamart_api_url}/datasets/WDI/variables?variable=access_to_clean_fuels_and_technologies_for_cooking_of_population&variable=access_to_electricity_of_population')
df = pd.read_csv(StringIO(response.text))
# display only 30 rows
display(HTML(df.fillna('').head(30).to_html(index=False)))

dataset_id,main_subject,main_subject_id,time,time_precision,country,admin1,admin2,admin3,coordinate,stated_in,stated_in_id,access_to_clean_fuels_and_technologies_for_cooking_of_population,access_to_clean_fuels_and_technologies_for_cooking_of_population_NAME,access_to_clean_fuels_and_technologies_for_cooking_of_population_UNIT,access_to_electricity_of_population,access_to_electricity_of_population_NAME,access_to_electricity_of_population_UNIT
WDI,Gabon,Q1000,2000-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,58.72,Access to clean fuels and technologies for cooking (% of population),,,,
WDI,Gabon,Q1000,2001-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,60.59,Access to clean fuels and technologies for cooking (% of population),,,,
WDI,Gabon,Q1000,2002-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,62.4,Access to clean fuels and technologies for cooking (% of population),,,,
WDI,Gabon,Q1000,2003-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,64.33,Access to clean fuels and technologies for cooking (% of population),,,,
WDI,Gabon,Q1000,2004-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,65.35,Access to clean fuels and technologies for cooking (% of population),,,,
WDI,Gabon,Q1000,2005-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,67.19,Access to clean fuels and technologies for cooking (% of population),,,,
WDI,Gabon,Q1000,2006-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,68.85,Access to clean fuels and technologies for cooking (% of population),,,,
WDI,Gabon,Q1000,2007-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,69.84,Access to clean fuels and technologies for cooking (% of population),,,,
WDI,Gabon,Q1000,2008-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,71.14,Access to clean fuels and technologies for cooking (% of population),,,,
WDI,Gabon,Q1000,2009-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,72.06,Access to clean fuels and technologies for cooking (% of population),,,,


### Get time series for multiple variables in a dataset, filter by country

**GET `/datasets/{dataset_id}/variables?variable={variable_id}&country={country}`**

Get data for variables 
**`access_to_clean_fuels_and_technologies_for_cooking_of_population` AND
`access_to_electricity_of_population`** 
and country **Gabon**

In [21]:
response = get(f'{datamart_api_url}/datasets/WDI/variables?variable=access_to_clean_fuels_and_technologies_for_cooking_of_population&variable=access_to_electricity_of_population&country=Gabon')
df = pd.read_csv(StringIO(response.text))
display(HTML(df.fillna('').to_html(index=False)))

dataset_id,main_subject,main_subject_id,time,time_precision,country,admin1,admin2,admin3,coordinate,stated_in,stated_in_id,access_to_clean_fuels_and_technologies_for_cooking_of_population,access_to_clean_fuels_and_technologies_for_cooking_of_population_NAME,access_to_clean_fuels_and_technologies_for_cooking_of_population_UNIT,access_to_electricity_of_population,access_to_electricity_of_population_NAME,access_to_electricity_of_population_UNIT
WDI,Gabon,Q1000,2000-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,58.72,Access to clean fuels and technologies for cooking (% of population),,,,
WDI,Gabon,Q1000,2001-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,60.59,Access to clean fuels and technologies for cooking (% of population),,,,
WDI,Gabon,Q1000,2002-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,62.4,Access to clean fuels and technologies for cooking (% of population),,,,
WDI,Gabon,Q1000,2003-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,64.33,Access to clean fuels and technologies for cooking (% of population),,,,
WDI,Gabon,Q1000,2004-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,65.35,Access to clean fuels and technologies for cooking (% of population),,,,
WDI,Gabon,Q1000,2005-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,67.19,Access to clean fuels and technologies for cooking (% of population),,,,
WDI,Gabon,Q1000,2006-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,68.85,Access to clean fuels and technologies for cooking (% of population),,,,
WDI,Gabon,Q1000,2007-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,69.84,Access to clean fuels and technologies for cooking (% of population),,,,
WDI,Gabon,Q1000,2008-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,71.14,Access to clean fuels and technologies for cooking (% of population),,,,
WDI,Gabon,Q1000,2009-01-01T00:00:00Z,year,Gabon,,,,"POINT(11.5, -0.68333055555556)",WDI,Q8035640,72.06,Access to clean fuels and technologies for cooking (% of population),,,,


### Create a new dataset

**NOTE: If the following POST methods have already been ran against the Datamart server, then server will respond with error messages.**

**POST `/metadata/datasets`**

In [5]:
# Define a new dataset
test_dataset = {
    "name": "TEST04",
    "dataset_id": "TEST04",
    "description": "TEST04",
    "url": "http://test01.com/test"
}

In [6]:
# post it to the API
td_response = post(f'{datamart_api_url}/metadata/datasets', json=test_dataset)
print(json.dumps(td_response.json(), indent=2))


{
  "name": "TEST04",
  "description": "TEST04",
  "url": "http://test01.com/test",
  "dataset_id": "TEST04"
}


**NOTE: If the above POST method has already been ran against this Datamart server, then server will respond with:**

```
{
  "Error": "Dataset identifier TEST01 has already been used"
}
```

Retrieve all datasets

In [17]:
response = get(f'{datamart_api_url}/metadata/datasets')
df = pd.DataFrame(response.json())
df

Unnamed: 0,name,description,url,dataset_id
0,UAZ Indicators,"Collection of indicators, including indicators...",https://github.com/ml4ai/delphi,UAZ
1,WDI dataset,World Development Indicators,https://databank.worldbank.org/source/world-de...,WDI
2,Corruption Perceptions Index,Transparency International Corruption Percept...,https://www.transparency.org/,TICPI
3,SIPRI Military Expenditure,"Military expenditure by country, in millions o...",https://sipri.org/databases/milex,SIPRI
4,economic fitness dataset,EconomicFitness,https://databank.banquemondiale.org/source/eco...,EconomicFitness
5,Agricultural Market Information System (AMIS),The Agricultural Market Information System (AM...,http://www.amis-outlook.org,AMIS
6,test test test,testy test,https://test.com,TEST000
7,World Press Freedom Index,Published every year since 2002 by Reporters W...,https://rsf.org/en,WPFI
8,Poverty Rate Global DP,Poverty Rate Global DP,http://url,DPPoverty
9,FSI dataset,data downloaded from FSI,https://fragilestatesindex.org,FSI


The newly created dataset `TEST01` is returned

### Create a variable in the dataset `TEST01`

**POST `/metadata/datasets/{dataset_id}/variables`**

In [9]:
# define a new variable
test_variable = {
    "name": "test variable for test dataset",
    "variable_id": "TEST01-01"
}

In [10]:
tv_response = post(f'{datamart_api_url}/metadata/datasets/TEST01/variables', json=test_variable)
print(json.dumps(tv_response.json(), indent=2))

{
  "name": "test variable for test dataset",
  "variable_id": "TEST01-01",
  "dataset_id": "TEST01",
  "corresponds_to_property": "PTEST01-TEST01-01"
}


**NOTE: If the above POST method has already been ran against this Datamart server, then server will respond with:**

```
{
  "Error": "Variable TEST01-01 has already been defined in dataset TEST01"
}
```

Retrieve all variables for the dataset `TEST01`

In [18]:
response = get(f'{datamart_api_url}/metadata/datasets/TEST01/variables')
df = pd.DataFrame(response.json())
df

Unnamed: 0,name,variable_id,dataset_id
0,test variable for test dataset,TEST01-01,TEST01


The variable `TEST01-01` is created in the dataset `TEST01`

### Upload data to a variable

Lets upload some data to the dataset: TEST01 and the variable TEST01-01. 

**PUT `/datasets/{dataset_id}/variables/{variable_id}`**

In [5]:
import os
def upload_data(file_path, url):
    file_name = os.path.basename(file_path)
    files = {
        'file': (file_name, open(file_path, mode='rb'), 'application/octet-stream')
    }
    response = put(url, files=files)
    if response.status_code == 400:
        print(json.dumps(response.json(), indent=2))
    else:
        print(response.json())

The upload data API validates the input file.

All required columns are:
- main_subject
- value
- time
- time_precision
- country

We will upload the contents of the file in `test_data/test_sample.csv`, which is a `valid` file

In [18]:
df = pd.read_csv('test/test_data/test_sample.csv')
df

Unnamed: 0,main_subject,value,value_unit,time,time_precision,country,source,dataset_id,variable_id
0,belllgium,1.8,Annual growth %,2019-01-01T00:00:00Z,year,belllgium,OECD,TEST01,TEST01-01
1,bellgium,1.9,Annual growth %,2020-01-01T00:00:00Z,year,bellgium,OECD,TEST01,TEST01-01


In [25]:
url = f'{datamart_api_url}/datasets/TEST01/variables/TEST01-01'
file_path = 'test/test_data/test_sample.csv'
upload_data(file_path, url)


2 rows imported!


Get the data for the variable `TEST01-01` to check if the was added

In [26]:
response = get(f'{datamart_api_url}/datasets/TEST01/variables/TEST01-01')
df = pd.read_csv(StringIO(response.text))
display(HTML(df.to_html()))

Unnamed: 0,dataset_id,variable_id,variable,main_subject,main_subject_id,value,value_unit,time,time_precision,country,coordinate,stated_in,stated_in_id
0,TEST01,TEST01-01,test variable for test dataset,Belgium,Q31,1.8,Annual growth %,2019-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",OECD,QTEST01Source-0
1,TEST01,TEST01-01,test variable for test dataset,Belgium,Q31,1.9,Annual growth %,2020-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",OECD,QTEST01Source-0


Success! The 2 rows from 2019 and 2020 were added 

**Delete the rows added to the dataset for another run of this Jupyter Notebook**

In [27]:
response = delete(f'{datamart_api_url}/datasets/TEST01/variables/TEST01-01')

**The data has been deleted**

In the example below, the file `test_sample_missing_header.csv` is missing a required column `main_subject`.

In [13]:
df = pd.read_csv('test/test_data/test_sample_missing_header.csv')
df

Unnamed: 0,value,value_unit,time,time_precision,country
0,1.8,Annual growth %,2021-01-01T00:00:00Z,year,belllgium
1,1.9,Annual growth %,2022-01-01T00:00:00Z,year,bellgium


Lets try to upload this file

In [14]:
url = f'{datamart_api_url}/datasets/TEST01/variables/TEST01-01'
file_path = 'test/test_data/test_sample_missing_header.csv'
upload_data(file_path, url)

[
  {
    "Error": "Missing required column: 'main_subject'",
    "Line Number": 1,
    "Column": "main_subject",
    "Description": "The uploaded file is missing a required column: main_subject. Please add the missing column and upload again."
  }
]


As expected, the API throws an error about missing column `main_subject`

In the example below, we have the file`test_sample_invalid.csv`
This file contains some invalid values in the required columns.

In [15]:
df = pd.read_csv('test/test_data/test_sample_invalid.csv')
df

Unnamed: 0,main_subject,value,value_unit,time,time_precision,country,source,dataset_id,variable_id
0,shdjshduihskdj,fifty,Annual growth %,20-01-01T00:00:00Z,blah,belllgium,OECD,FAO,fake_gdp_growth
1,bellgium,1.9,Annual growth %,2022-01-01T00:00:00Z,year,shdjshduihskdj,OECD,OECD,real_gdp_growth


Lets try to upload this file

In [17]:
url = f'{datamart_api_url}/datasets/TEST01/variables/TEST01-01'
file_path = 'test/test_data/test_sample_invalid.csv'
upload_data(file_path, url)

[
  {
    "Error": "Value Error: 'fifty'",
    "Line Number": 2,
    "Column": "value",
    "Description": "'fifty' is not a valid number"
  },
  {
    "Error": "Illegal precision value: 'blah'",
    "Line Number": 2,
    "Column": "time_precision",
    "Description": "Legal precision values are: 'billion years,hundred million years,million years,hundred thousand years,ten thousand years,millennium,century,decade,year,month,day,hour,minute,second'"
  },
  {
    "Error": "Could not wikify: 'shdjshduihskdj'",
    "Line Number": 2,
    "Column": "main_subject",
    "Description": "Could not find a Wikidata Qnode for the main subject: 'shdjshduihskdj.' Please check for spelling mistakes in the country name."
  },
  {
    "Error": "Dataset ID in the file: 'FAO' is not same as Dataset ID in the url : 'TEST01'",
    "Line Number": 2,
    "Column": "dataset_id",
    "Description": "Dataset IDs in the input file should match the Dataset Id in the API url"
  },
  {
    "Error": "Variable ID in t

The API will list all the errors in the file, which have to be fixed first before it can be uploaded!

### Upload an Annotated spreadsheet

We can upload an annotated spreadsheet for a dataset as well. Annotation example - https://docs.google.com/spreadsheets/d/1fLEPvEu9OuKa2_7BEzhY0oWGZ_9CMXEE/edit#gid=280610980

**POST `datasets/{dataset_id}/annotated`**

Lets upload a sample annotated file to the dataset `TEST01`

In [9]:
import os
def upload_data_post(file_path, url):
    file_name = os.path.basename(file_path)
    files = {
        'file': (file_name, open(file_path, mode='rb'), 'application/octet-stream')
    }
    response = post(url, files=files)
    if response.status_code == 400:
        print(json.dumps(response.json(), indent=2))
    else:
        print(json.dumps(response.json(), indent=2))

In [10]:
url = f'{datamart_api_url}/datasets/TEST04/annotated'
file_path = '/Users/amandeep/Github/t2wml-annotation/tests/data/test_file_main_subject_country.xlsx'
upload_data_post(file_path, url)

[
  {
    "name": "INGO",
    "variable_id": "ingo",
    "dataset_id": "TEST04",
    "description": "INGO in TEST04",
    "corresponds_to_property": "PVARIABLE-QTEST04-003",
    "qualifier": [
      {
        "name": "Location",
        "identifier": "PQUALIFIER-QTEST04-006"
      },
      {
        "name": "Attack context",
        "identifier": "PQUALIFIER-QTEST04-005"
      },
      {
        "name": "Means of attack",
        "identifier": "PQUALIFIER-QTEST04-004"
      },
      {
        "name": "City",
        "identifier": "PQUALIFIER-QTEST04-002"
      },
      {
        "name": "stated in",
        "identifier": "P248"
      },
      {
        "name": "point in time",
        "identifier": "P585"
      }
    ]
  }
]


In [11]:
response = get(f'{datamart_api_url}/metadata/datasets/TEST04/variables')
print(len(response.json()))
print(json.dumps(response.json(), indent=2)) # print only 5 variables


1
[
  {
    "name": "INGO",
    "variable_id": "ingo",
    "description": "INGO in TEST04",
    "corresponds_to_property": "PVARIABLE-QTEST04-003",
    "qualifier": [
      {
        "name": "Location",
        "identifier": "PQUALIFIER-QTEST04-006"
      },
      {
        "name": "Attack context",
        "identifier": "PQUALIFIER-QTEST04-005"
      },
      {
        "name": "Means of attack",
        "identifier": "PQUALIFIER-QTEST04-004"
      },
      {
        "name": "City",
        "identifier": "PQUALIFIER-QTEST04-002"
      },
      {
        "name": "stated in",
        "identifier": "P248"
      },
      {
        "name": "point in time",
        "identifier": "P585"
      }
    ]
  }
]


In [12]:
response = get(f'{datamart_api_url}/datasets/TEST04/variables/ingo')
df = pd.read_csv(StringIO(response.text))
display(HTML(df.fillna('').to_html(index=False)))

dataset_id,variable_id,variable,main_subject,main_subject_id,value,value_unit,time,time_precision,country,admin1,admin2,admin3,coordinate,stated_in,stated_in_id,Location,Attack_context,Means_of_attack,City
TEST04,ingo,INGO,Ethiopia,Q115,0.0,person,1997-09-24T00:00:00Z,,Ethiopia,,,,"POINT(40.0, 9.0)",,,Unknown,Individual attack,Shooting,roadside
TEST04,ingo,INGO,Ethiopia,Q115,0.0,person,1998-06-25T00:00:00Z,,Ethiopia,,,,"POINT(40.0, 9.0)",,,Road,Ambush,Kidnapping,travelling from Gode to Degeh Bur
TEST04,ingo,INGO,Ethiopia,Q115,1.0,person,1999-04-01T00:00:00Z,,Ethiopia,,,,"POINT(40.0, 9.0)",,,Unknown,Unknown,Kidnapping,around the corner
