# This is a tutorial/demo on how to use the `Datamart` REST API.

## Installation

This Jupyter notebook requires at least Python 3.3 with these packages installed:

```
pip install notebook
pip install requests
pip install pandas
```

To run change to the directory containing this notebook, and type

```
python notebook
```

Then, open this page in the web browser: http://localhost:8888/notebooks/Datamart%20Data%20API%20Demo.ipynb

## Configuration

By default the this notebook accesses the Datamart REST API server at ISI. Edit the cell below to choose a different server.

To run you own server **locally** follow the instructions here: [README](README.md)

In [1]:
## set datamart api url
# The datamart server running at ISI
# datamart_api_url = 'https://datamart:datamart-api-789@dsbox02.isi.edu:8888/datamart-api'

# Datamart server running on localhost
# datamart_api_url = 'http://localhost:14080'

# Datamart server running on localhost in development mode
datamart_api_url = 'http://localhost:5000'


## Import python modules

In [2]:
from requests import get,post,put
import json
import pandas as pd
from io import StringIO
from IPython.display import display, HTML

### Get all datasets 

**GET `/metadata/datasets`**

In [3]:
response = get(f'{datamart_api_url}/metadata/datasets')
print(json.dumps(response.json(), indent=2))

[
  {
    "name": "FSI dataset",
    "description": "data downloaded from FSI",
    "url": "https://fragilestatesindex.org",
    "dataset_id": "FSI"
  },
  {
    "name": "UAZ Indicators",
    "description": "Collection of indicators, including indicators from FAO, WDI, FEWSNET, CLiMIS, UNICEF, ieconomics.com, UNHCR, DSSAT, WHO, IMF, WHP, ACLDE, World Bank and IOM-DTM",
    "url": "https://github.com/ml4ai/delphi",
    "dataset_id": "UAZ"
  },
  {
    "name": "WGI dataset",
    "description": "Worldwide Governance Indicators",
    "url": "https://databank.worldbank.org/source/worldwide-governance-indicators",
    "dataset_id": "WGI"
  },
  {
    "name": "OECD dataset",
    "description": "data downloaded from OECD",
    "url": "https://data.oecd.org",
    "dataset_id": "OECD"
  }
]


As of June 14, 2020 there are four datasets in the database. More datasets will be added as they are processed. The currently available dataset are `FSI`, `OECD`, `UAZ` and `WGI`. 

We can also get metadata about one dataset using the `dataset_id`.

### Get metadata about one dataset

**GET `/metadata/datasets/{dataset_id}`**

In [4]:
response = get(f'{datamart_api_url}/metadata/datasets/OECD')
print(json.dumps(response.json(), indent=2))

[
  {
    "name": "OECD dataset",
    "description": "data downloaded from OECD",
    "url": "https://data.oecd.org",
    "dataset_id": "OECD"
  }
]


### Get all variables in a dataset 

**GET `/metadata/datasets/{dataset_id}/variables`**

In [5]:
response = get(f'{datamart_api_url}/metadata/datasets/OECD/variables')
print(json.dumps(response.json()[:4], indent=2)) # printing only 4 

[
  {
    "variable_id": "gdp_per_capita",
    "dataset_id": "OECD"
  },
  {
    "variable_id": "gross_national_income_gni_per_capita",
    "dataset_id": "OECD"
  },
  {
    "variable_id": "household_disposable_income",
    "dataset_id": "OECD"
  },
  {
    "variable_id": "real_gdp_growth",
    "dataset_id": "OECD"
  }
]


In [6]:
print('Total number of variables in dataset: {} is {}'.format('OECD', len(response.json())))

Total number of variables in dataset: OECD is 112


### Get metadata about one variable

**GET `/metadata/datasets/{dataset_id}/variables/{variable_id}`**

In [7]:
response = get(f'{datamart_api_url}/metadata/datasets/OECD/variables/real_gdp_growth')
print(json.dumps(response.json(), indent=2))

{
  "variable_id": "real_gdp_growth",
  "dataset_id": "OECD",
  "description": "Real GDP growth in OECD",
  "corresponds_to_property": "POECD-005",
  "qualifier": [
    {
      "identifier": "P248",
      "name": "stated in"
    },
    {
      "identifier": "P585",
      "name": "point in time"
    }
  ]
}


### Find a variable using keyword search

**GET `/metadata/variables?keyword={keyword}`**

Query for datasets related to: **road**

In [8]:
response = get(f'{datamart_api_url}/metadata/variables?keyword=road')
print(json.dumps(response.json(), indent=2))

[
  {
    "variable_id": "road_fatalities",
    "name": " Road Fatalities",
    "rank": 0.0759909,
    "dataset_id": "OECD"
  },
  {
    "variable_id": "VUAZ-8054",
    "name": " WDI: Mortality caused by road traffic injury[per 100,000 people]",
    "rank": 0.0607927,
    "dataset_id": "UAZ"
  }
]


Query datasets related to: **road AND fatalities**

In [9]:
response = get(f'{datamart_api_url}/metadata/variables?keyword=road fatalities')
print(json.dumps(response.json(), indent=2))

[
  {
    "variable_id": "road_fatalities",
    "name": " Road Fatalities",
    "rank": 0.334428,
    "dataset_id": "OECD"
  }
]


Query datasets related to: **road OR fatalities**

In [10]:
response = get(f'{datamart_api_url}/metadata/variables?keyword=road&keyword=fatalities')
print(json.dumps(response.json(), indent=2))

[
  {
    "variable_id": "road_fatalities",
    "name": " Road Fatalities",
    "rank": 0.0759909,
    "dataset_id": "OECD"
  },
  {
    "variable_id": "VUAZ-8054",
    "name": " WDI: Mortality caused by road traffic injury[per 100,000 people]",
    "rank": 0.0303964,
    "dataset_id": "UAZ"
  },
  {
    "variable_id": "VUAZ-8136",
    "name": " Conflict fatalities[number of cases]",
    "rank": 0.0303964,
    "dataset_id": "UAZ"
  }
]


### Get time series data for a variable

**GET `/datasets/{dataset_id}/variables/{variable_id}`**

In [11]:
response = get(f'{datamart_api_url}/datasets/OECD/variables/real_gdp_growth')
df = pd.read_csv(StringIO(response.text))
display(HTML(df.to_html()))

Unnamed: 0,dataset_id,variable_id,variable,main_subject,main_subject_id,value,value_unit,time,time_precision,country,coordinate,stated_in,stated_in_id
0,OECD,real_gdp_growth,Real GDP growth,Russia,Q159,3.1,Annual growth %,2011-01-01T00:00:00Z,year,Russia,"POINT(100.0, 62.0)",Organisation for Economic Cooperation and Development,Q41550
1,OECD,real_gdp_growth,Real GDP growth,Russia,Q159,3.7,Annual growth %,2012-01-01T00:00:00Z,year,Russia,"POINT(100.0, 62.0)",Organisation for Economic Cooperation and Development,Q41550
2,OECD,real_gdp_growth,Real GDP growth,Russia,Q159,1.8,Annual growth %,2013-01-01T00:00:00Z,year,Russia,"POINT(100.0, 62.0)",Organisation for Economic Cooperation and Development,Q41550
3,OECD,real_gdp_growth,Real GDP growth,Russia,Q159,0.7,Annual growth %,2014-01-01T00:00:00Z,year,Russia,"POINT(100.0, 62.0)",Organisation for Economic Cooperation and Development,Q41550
4,OECD,real_gdp_growth,Real GDP growth,Russia,Q159,-2.3,Annual growth %,2015-01-01T00:00:00Z,year,Russia,"POINT(100.0, 62.0)",Organisation for Economic Cooperation and Development,Q41550
5,OECD,real_gdp_growth,Real GDP growth,Russia,Q159,0.3,Annual growth %,2016-01-01T00:00:00Z,year,Russia,"POINT(100.0, 62.0)",Organisation for Economic Cooperation and Development,Q41550
6,OECD,real_gdp_growth,Real GDP growth,Russia,Q159,1.6,Annual growth %,2017-01-01T00:00:00Z,year,Russia,"POINT(100.0, 62.0)",Organisation for Economic Cooperation and Development,Q41550
7,OECD,real_gdp_growth,Real GDP growth,Russia,Q159,2.3,Annual growth %,2018-01-01T00:00:00Z,year,Russia,"POINT(100.0, 62.0)",Organisation for Economic Cooperation and Development,Q41550
8,OECD,real_gdp_growth,Real GDP growth,Canada,Q16,3.1,Annual growth %,2011-01-01T00:00:00Z,year,Canada,"POINT(-109.0, 56.0)",Organisation for Economic Cooperation and Development,Q41550
9,OECD,real_gdp_growth,Real GDP growth,Canada,Q16,1.8,Annual growth %,2012-01-01T00:00:00Z,year,Canada,"POINT(-109.0, 56.0)",Organisation for Economic Cooperation and Development,Q41550


### Get time series data for a variable for a country

**GET `/datasets/{dataset_id}/variables/{variable_id}?country={country}`**

Get data for **Belgium**

In [12]:
response = get(f'{datamart_api_url}/datasets/OECD/variables/real_gdp_growth?country=Belgium')
df = pd.read_csv(StringIO(response.text))
display(HTML(df.to_html()))

Unnamed: 0,dataset_id,variable_id,variable,main_subject,main_subject_id,value,value_unit,time,time_precision,country,coordinate,stated_in,stated_in_id
0,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,1.8,Annual growth %,2011-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",Organisation for Economic Cooperation and Development,Q41550
1,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,0.2,Annual growth %,2012-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",Organisation for Economic Cooperation and Development,Q41550
2,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,0.2,Annual growth %,2013-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",Organisation for Economic Cooperation and Development,Q41550
3,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,1.3,Annual growth %,2014-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",Organisation for Economic Cooperation and Development,Q41550
4,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,1.7,Annual growth %,2015-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",Organisation for Economic Cooperation and Development,Q41550
5,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,1.5,Annual growth %,2016-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",Organisation for Economic Cooperation and Development,Q41550
6,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,1.7,Annual growth %,2017-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",Organisation for Economic Cooperation and Development,Q41550
7,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,1.4,Annual growth %,2018-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",Organisation for Economic Cooperation and Development,Q41550


Get data for **Belgium OR Latvia**

In [13]:
response = get(f'{datamart_api_url}/datasets/OECD/variables/real_gdp_growth?country=Belgium&country=Latvia')
df = pd.read_csv(StringIO(response.text))
display(HTML(df.to_html()))

Unnamed: 0,dataset_id,variable_id,variable,main_subject,main_subject_id,value,value_unit,time,time_precision,country,coordinate,stated_in,stated_in_id
0,OECD,real_gdp_growth,Real GDP growth,Latvia,Q211,6.4,Annual growth %,2011-01-01T00:00:00Z,year,Latvia,"POINT(25.0, 57.0)",Organisation for Economic Cooperation and Development,Q41550
1,OECD,real_gdp_growth,Real GDP growth,Latvia,Q211,4.0,Annual growth %,2012-01-01T00:00:00Z,year,Latvia,"POINT(25.0, 57.0)",Organisation for Economic Cooperation and Development,Q41550
2,OECD,real_gdp_growth,Real GDP growth,Latvia,Q211,2.4,Annual growth %,2013-01-01T00:00:00Z,year,Latvia,"POINT(25.0, 57.0)",Organisation for Economic Cooperation and Development,Q41550
3,OECD,real_gdp_growth,Real GDP growth,Latvia,Q211,1.9,Annual growth %,2014-01-01T00:00:00Z,year,Latvia,"POINT(25.0, 57.0)",Organisation for Economic Cooperation and Development,Q41550
4,OECD,real_gdp_growth,Real GDP growth,Latvia,Q211,3.0,Annual growth %,2015-01-01T00:00:00Z,year,Latvia,"POINT(25.0, 57.0)",Organisation for Economic Cooperation and Development,Q41550
5,OECD,real_gdp_growth,Real GDP growth,Latvia,Q211,2.1,Annual growth %,2016-01-01T00:00:00Z,year,Latvia,"POINT(25.0, 57.0)",Organisation for Economic Cooperation and Development,Q41550
6,OECD,real_gdp_growth,Real GDP growth,Latvia,Q211,4.6,Annual growth %,2017-01-01T00:00:00Z,year,Latvia,"POINT(25.0, 57.0)",Organisation for Economic Cooperation and Development,Q41550
7,OECD,real_gdp_growth,Real GDP growth,Latvia,Q211,4.8,Annual growth %,2018-01-01T00:00:00Z,year,Latvia,"POINT(25.0, 57.0)",Organisation for Economic Cooperation and Development,Q41550
8,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,1.8,Annual growth %,2011-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",Organisation for Economic Cooperation and Development,Q41550
9,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,0.2,Annual growth %,2012-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",Organisation for Economic Cooperation and Development,Q41550


### Upload data to a variable

Lets upload some data to the dataset: OECD and the variable real_gdp_growth. 

**PUT `/datasets/{dataset_id}/variables/{variable_id}`**

In [14]:
import os
def upload_data(file_path, url):
    file_name = os.path.basename(file_path)
    files = {
        'file': (file_name, open(file_path, mode='rb'), 'application/octet-stream')
    }
    response = put(url, files=files)
    if response.status_code == 400:
        print(json.dumps(response.json(), indent=2))
    else:
        print(response.json())

The upload data API validates the input file.

In the example below, the file `oecd_gdp_sample_missing_header.csv` is missing a required column `variable_id`.

All required columns are:
- dataset_id
- variable_id
- main_subject
- value
- time
- time_precision
- country
- source

In [15]:
df = pd.read_csv('test/test_data/oecd_gdp_sample_missing_header.csv')
df

Unnamed: 0,main_subject,value,value_unit,time,time_precision,country,source,dataset_id
0,belllgium,1.8,Annual growth %,2021-01-01T00:00:00Z,year,belllgium,OECD,OECD
1,bellgium,1.9,Annual growth %,2022-01-01T00:00:00Z,year,bellgium,OECD,OECD


Lets try to upload this file

In [16]:
url = f'{datamart_api_url}/datasets/OECD/variables/real_gdp_growth'
file_path = 'test/test_data/oecd_gdp_sample_missing_header.csv'
upload_data(file_path, url)

[
  {
    "Error": "Missing required column: 'variable_id'",
    "Line Number": 1,
    "Column": "variable_id",
    "Description": "The uploaded file is missing a required column: variable_id. Please add the missing column and upload again."
  }
]


As expected, the API throws an error about missing column `variable_id`

In the example below, we have the file`oecd_gdp_sample_invalid.csv`
This file contains some invalid values in the required columns.

In [17]:
df = pd.read_csv('test/test_data/oecd_gdp_sample_invalid.csv')
df

Unnamed: 0,main_subject,value,value_unit,time,time_precision,country,source,dataset_id,variable_id
0,shdjshduihskdj,fifty,Annual growth %,20-01-01T00:00:00Z,blah,belllgium,OECD,FAO,fake_gdp_growth
1,bellgium,1.9,Annual growth %,2022-01-01T00:00:00Z,year,shdjshduihskdj,OECD,OECD,real_gdp_growth


Lets try to upload this file

In [18]:
url = f'{datamart_api_url}/datasets/OECD/variables/real_gdp_growth'
file_path = 'test/test_data/oecd_gdp_sample_invalid.csv'
upload_data(file_path, url)

[
  {
    "Error": "Value Error: 'fifty'",
    "Line Number": 2,
    "Column": "value",
    "Description": "'fifty' is not a valid number"
  },
  {
    "Error": "Illegal precision value: 'blah'",
    "Line Number": 2,
    "Column": "time_precision",
    "Description": "Legal precision values are: 'billion years,hundred million years,million years,hundred thousand years,ten thousand years,millennium,century,decade,year,month,day,hour,minute,second'"
  },
  {
    "Error": "Could not wikify: 'shdjshduihskdj'",
    "Line Number": 2,
    "Column": "main_subject",
    "Description": "Could not find a Wikidata Qnode for the main subject: 'shdjshduihskdj.' Please check for spelling mistakes in the country name."
  },
  {
    "Error": "Dataset ID in the file: 'FAO' is not same as Dataset ID in the url : 'OECD'",
    "Line Number": 2,
    "Column": "dataset_id",
    "Description": "Dataset IDs in the input file should match the Dataset Id in the API url"
  },
  {
    "Error": "Variable ID in the

The API will list all the errors in the file, which have to be fixed first before it can be uploaded!

We will upload the contents of the file in `test_data/oecd_gdp_sample.csv`, which is a `valid` file

In [19]:
df = pd.read_csv('test/test_data/oecd_gdp_sample.csv')
df

Unnamed: 0,main_subject,value,value_unit,time,time_precision,country,source,dataset_id,variable_id
0,belllgium,1.8,Annual growth %,2021-01-01T00:00:00Z,year,belllgium,OECD,OECD,real_gdp_growth
1,bellgium,1.9,Annual growth %,2022-01-01T00:00:00Z,year,bellgium,OECD,OECD,real_gdp_growth


In [20]:
url = f'{datamart_api_url}/datasets/OECD/variables/real_gdp_growth'
file_path = 'test/test_data/oecd_gdp_sample.csv'
upload_data(file_path, url)


2 rows imported!


Get the data for the variable `real_gdp_growth` to check if the was added

In [21]:
response = get(f'{datamart_api_url}/datasets/OECD/variables/real_gdp_growth?country=Belgium')
df = pd.read_csv(StringIO(response.text))
display(HTML(df.to_html()))

Unnamed: 0,dataset_id,variable_id,variable,main_subject,main_subject_id,value,value_unit,time,time_precision,country,coordinate,stated_in,stated_in_id
0,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,1.8,Annual growth %,2011-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",Organisation for Economic Cooperation and Development,Q41550
1,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,0.2,Annual growth %,2012-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",Organisation for Economic Cooperation and Development,Q41550
2,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,0.2,Annual growth %,2013-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",Organisation for Economic Cooperation and Development,Q41550
3,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,1.3,Annual growth %,2014-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",Organisation for Economic Cooperation and Development,Q41550
4,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,1.7,Annual growth %,2015-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",Organisation for Economic Cooperation and Development,Q41550
5,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,1.5,Annual growth %,2016-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",Organisation for Economic Cooperation and Development,Q41550
6,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,1.7,Annual growth %,2017-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",Organisation for Economic Cooperation and Development,Q41550
7,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,1.4,Annual growth %,2018-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",Organisation for Economic Cooperation and Development,Q41550
8,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,1.8,Annual growth %,2021-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",OECD,QOECDSource-0
9,OECD,real_gdp_growth,Real GDP growth,Belgium,Q31,1.9,Annual growth %,2022-01-01T00:00:00Z,year,Belgium,"POINT(4.6680555555556, 50.641111111111)",OECD,QOECDSource-0


Success! The 2 rows from 2019 and 2020 were added 

### Create a new dataset

**POST `/metadata/datasets`**

In [22]:
# Define a new dataset
wdi = {
    "name": "World Development Indicators",
    "dataset_id": "WDI",
    "description": "Indicators from World Bank",
    "url": "https://data.worldbank.org/indicator"
}

In [23]:
# post it to the API
wdi_response = post(f'{datamart_api_url}/metadata/datasets', json=wdi)
print(json.dumps(wdi_response.json(), indent=2))


{
  "name": "World Development Indicators",
  "description": "Indicators from World Bank",
  "url": "https://data.worldbank.org/indicator",
  "dataset_id": "WDI"
}


Retrieve all datasets

In [24]:
response = get(f'{datamart_api_url}/metadata/datasets')
print(json.dumps(response.json(), indent=2))

[
  {
    "name": "FSI dataset",
    "description": "data downloaded from FSI",
    "url": "https://fragilestatesindex.org",
    "dataset_id": "FSI"
  },
  {
    "name": "OECD dataset",
    "description": "data downloaded from OECD",
    "url": "https://data.oecd.org",
    "dataset_id": "OECD"
  },
  {
    "name": "UAZ Indicators",
    "description": "Collection of indicators, including indicators from FAO, WDI, FEWSNET, CLiMIS, UNICEF, ieconomics.com, UNHCR, DSSAT, WHO, IMF, WHP, ACLDE, World Bank and IOM-DTM",
    "url": "https://github.com/ml4ai/delphi",
    "dataset_id": "UAZ"
  },
  {
    "name": "WGI dataset",
    "description": "Worldwide Governance Indicators",
    "url": "https://databank.worldbank.org/source/worldwide-governance-indicators",
    "dataset_id": "WGI"
  },
  {
    "name": "World Development Indicators",
    "description": "Indicators from World Bank",
    "url": "https://data.worldbank.org/indicator",
    "dataset_id": "WDI"
  }
]


The newly created dataset `WDI` is returned

### Create a variable in the dataset `WDI`

**POST `/metadata/datasets/{dataset_id}/variables`**

In [25]:
# define a new variable
gdp = {
    "name": "gross domestic product based on purchasing power parity",
    "variable_id": "GDP"
}

In [26]:
gdp_response = post(f'{datamart_api_url}/metadata/datasets/WDI/variables', json=gdp)
print(json.dumps(gdp_response.json(), indent=2))

{
  "name": "gross domestic product based on purchasing power parity",
  "variable_id": "GDP",
  "dataset_id": "WDI"
}


Retrieve all variables for the dataset `WDI`

In [27]:
response = get(f'{datamart_api_url}/metadata/datasets/WDI/variables')
print(json.dumps(response.json(), indent=2))

[
  {
    "name": "gross domestic product based on purchasing power parity",
    "variable_id": "GDP",
    "dataset_id": "WDI"
  }
]


The variable `GDP` is created in the dataset `WDI`