In [None]:
import pandas as pd

# Interacting with APIs using Python

Now that we have an understanding of APIs, we can start interacting with them programatically.

# The `requests` library

- `requests` is a Python library to interact with the web
- It sends and receives data to and from URLs
- You can think of it as a web browser (Chrome, Firefox), but without the graphic interface
- APIs are URLs, so we can use `requests` to interact with APIs in Python

To enable the use `requests` , run:

In [None]:
import requests

## Receiving informationfrom the web

- When we access a URL, most of the data we receive is in HTML format
- A probably familiar example:

In [None]:
url = 'https://www.worldbank.org/en/home'

In [None]:
response = requests.get(url)

In [None]:
print(response)

A status code of 200 indicates that the request was successful.

The `.text` attribute allows to acces the text source code of the object retrieved in the response. In this case our object is an HTML file.

In [None]:
print(response.text)

This is how this code looks when rendered in a web browser, by the way:

![world-bank-site](img/world-bank-site.PNG)

## Interacting with APIs

Remember the API to get live data from the ISS? http://open-notify.org

We'll retrieve information from one of its two endpoints:

- Astronauts in space now - http://api.open-notify.org/astros.json

In [None]:
url = 'http://api.open-notify.org/astros.json'

In [None]:
response = requests.get(url)

In [None]:
print(response)

This time the result stored in `response` is not an HTML but a JSON file. We can use the `.json()` method to retrieve it:

In [None]:
response.json()

JSON is a widely used data format, especially in APIs. JSON data are usually loaded as dictionaries by default in Python.

In [None]:
data_dic = response.json()
type(data_dic)

In [None]:
data_dic['people']

Furthermore, we can transform the list inside `data['people']` into a Pandas dataframe for further analysis.

In [None]:
df = pd.DataFrame(data=data_dic['people'])

In [None]:
df

# Coding a simple API client

We can write Python functions to simplify programmatic API access.

## Example 1: URL-based parameters

After exploring this API and its documentation, we knew that it takes URLs with the following generic form:

`https://www.geoboundaries.org/api/current/gbOpen/[3-letter-iso-code]/[admin-level]/`

Then, we can build a function that takes the 3-letter ISO code and the administative boundaries level as parameters to automate API calls.

In [None]:
def fetch_geoboundaries_data(country_code, admin_level):
    
    endpoint = 'https://www.geoboundaries.org/api/current/gbOpen/'
    url = endpoint + country_code + '/ADM' + str(admin_level)
    response = requests.get(url)
    
    if response.status_code == 200:
        
        data = response.json()
        return data
        
    else:
        
        print('Request failed!')
        return None    

We can use our new function to fetch admin-1 level data from Kenya.

In [None]:
kenya_data = fetch_geoboundaries_data('KEN', 1)

In [None]:
print(kenya_data)

A few notes:
- You might have noticed that the result stored in `kenya_data` is not the actual admin-1 level boundaries, but _metadata_ about the boundaries data
- A visual inspection of `kenya_data` shows that a URL to the data is in the key `simplifiedGeometryGeoJSON`

In [None]:
kenya_data['simplifiedGeometryGeoJSON']

- You can use `requests` once again to fetch the data from this URL

**Important:**
- Not all APIs will provide direct access to the information you need
- Many will require additional coding to get from the initial API call to the data of your interest

**Challenge:** create a function that builds upon `fetch_geoboundaries_data()` and returns the actual geographic data.

## Example 2: Argument-based parameters

- The example of geoboundaries.org takes _URL-based_ API query parameters
- Many APIs take _argument-based_ query parameters
- This will be the case every time you have to use query parameters separated by an ampersand symbol (`&`) after a question mark (`?`) in the API endpoint
- Take this generic example:

`https://api.org/endpoint/?parameter1=value1&parameter2=value2&parameter3=value3`

- Theoretically, it's _possible_ to modify an API URL call using concatenated strings in Python to build argument-based queries

```{python}
endpoint = 'https://api.org/endpoint/'
p1 = 'parameter1'
v1 = 'value1'
requests.get(endpoint+'?'+p1+'='+'v1)
```

- However, the convention in Python is to **use the argument `params` of `requests.get()`** to pass argument-based query parameters

```{python}
parameters = {'parameter1': 'value1'}
requests.get(endpoint, params=parameters)
```

### Applied example: the WBG API

The WBG has an extensive API with country indicators, among many other data. We'll use the endpoint of the total population API to fetch country population data.

Documentation and use examples of the WBG API can be found in the [WBG Knowledge Base](https://datahelpdesk.worldbank.org/knowledgebase) and the [Developer Information resources](https://datahelpdesk.worldbank.org/knowledgebase/topics/125589-developer-information).

In [None]:
def fetch_population_by_year(year):
    
    endpoint = 'https://api.worldbank.org/v2/country/all/indicator/SP.POP.TOTL'
    parameters = {'date': year, 'format':'json'}
    # note: the API documentation specifies that format=json
    # is a required parameter in order to return the results as JSON
    response = requests.get(endpoint, params=parameters)
    
    if response.status_code == 200:
        
        data = response.json()
        return data
    
    else:
        
        print('Request failed!')
        return None

In [None]:
pop_2015 = fetch_population_by_year(2015)

In [None]:
pop_2015

A few notes:
- This is the same data you obtain when accessing https://api.worldbank.org/v2/country/all/indicator/SP.POP.TOTL?date=2015&format=json on a web browser
- In this case, the resulting JSON is not a dictionary but a list
- Note the detail of the information in the keys `page`, `pages`, `per_page`, and `total` in the first element of `pop_2015`

In [None]:
print('Total obs: {}'.format(pop_2015[0]['total']))
print('Obs per page: {}'.format(pop_2015[0]['per_page']))
print('Total pages: {}'.format(pop_2015[0]['pages']))
print('Current page: {}'.format(pop_2015[0]['page']))

- The result is only one page with 50 observations, out of a total of 266
- This means that results are incomplete! More API calls are needed to complete the total 266 observations
- A further inspection of the API documentation shows that the parameters `page=page_number` or `per_page=obs_per_page` can be used to retrieve complete results.

**Important:** When fetching data using APIs, always inspect the result to look for possible limitations in the results' default format. You might be inadvertently missing observations in your API calls!

## Coding API clients - Main takeaways 1

- Programming a client for an API requires reviewing the documentation and understanding the API uses
- Many APIs will combine the use of URL-based and argument-based parameters to pass information in API queries. A good API client will take that into account to build the correct query URL and parameters argument. Take the following example:

In [None]:
def fetch_population_by_year_country(year, country):
    
    endpoint = 'https://api.worldbank.org/v2/country/' + country + '/indicator/SP.POP.TOTL'
    parameters = {'date': year, 'format':'json'}
    # note: the API documentation specifies that format=json
    # is a required parameter in order to return the results as JSON
    response = requests.get(endpoint, params=parameters)
    
    if response.status_code == 200:
        
        data = response.json()
        return data
    
    else:
        
        print('Request failed!')
        return None

In [None]:
brazil_data = fetch_population_by_year_country(2015, 'BRA')

In [None]:
brazil_data

## Coding API clients - Main takeaways 2

- Remember further coding might be needed to get from the API result to the information that is relevant for a user
- Some APIs divide results with many observations in pages and return only a limited number of pages. It's up to the user to review the results and take measures to ensure data is complete

**Challenges:**
- Modify the function `fetch_population_by_year()` to retrieve the complete results (not only page 1) in the population endpoint
- Create a new function that builds upon `fetch_population_by_year()` and returns a Pandas Series of country and region populations for a given year

# Python API client libraries

- Programming a client for an API is not always needed
- Many APIs have their own client in the form of a Python library
- Check the example of [`geopy`](https://geopy.readthedocs.io/), a [Nominatim](https://nominatim.org/) encoder

In [None]:
# Installing geopy in your personal library
!pip install geopy

In [None]:
from geopy.geocoders import Nominatim

In [None]:
# Important: please add a user alias in this function
geolocator = Nominatim(user_agent='write-your-alias-here')

In [None]:
query = geolocator.geocode('1818 H St NW, Washington DC')

In [None]:
print(query)

In [None]:
print('The address of the WB main building is: {}'.format(query.address))
print('The location of the WB main building is: {}, {}'.format(query.latitude, query.longitude))

Please note the following:
- We didn't have to code an API client using `requests` this time: `geopy` is the API client
- The results of our query are not in JSON format. `geopy` returns an ad-hoc variable class with the attributes `.adress`, `.latitude`, and `.longitude`, among others
- We've mostly seen examples of database query APIs, but APIs can do much more!
    + `geopy` is an example of an API that does some data processing with the information passed (the address)
    + Remember: in general, an API is a channel to interact with a web server


A few notes about API authentication:
- Some APIs require some form of authentication to control API overuse
    + That's why `Nominatim()` requires the `user_agent` parameter: it's a way of detecting which API calls come from the the same alias
    + This is a very soft way of authentication
- When authentication is needed, most APIs will require users to register and account and will provide a unique combination of characters called _API key_ that uniquely identifies the user
- When they are required, API keys are usually passed as argument-based parameters. If the API has a dedicated client library, they will ask for the key after importing the library (as `user_agent` in `Nominatim()`) (we'll explain more about this later)

## The World Bank API Python library

- The World Bank API we used for one of the examples above also has a dedicated client Python library
- Release blog post [here](https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data)
- Documentation [here](https://pypi.org/project/wbgapi/)
- Examples [here](https://nbviewer.org/github/tgherzog/wbgapi/blob/master/examples/wbgapi-cookbook.ipynb)

In [None]:
!pip install wbgapi

In [None]:
import wbgapi as wb

This example gets the total population of Brazil for all years available:

In [None]:
wb.data.DataFrame('SP.POP.TOTL', 'BRA', labels=True)

Remember the endpoint of this API? This URL would have returned a similar result in JSON format:
https://api.worldbank.org/v2/country/BRA/indicator/SP.POP.TOTL?date=1960:2021&format=json

We can also get the series for multiple countries if we specify a list instead of a single string:

In [None]:
wb.data.DataFrame('SP.POP.TOTL', ['BRA', 'ARG', 'URY', 'PRY'], labels=True)

Lastly, we can specify the years we want in a population query:

In [None]:
countries = ['BRA', 'ARG', 'URY', 'PRY']
years = range(2015, 2021) # note the last element is never included in range()
wb.data.DataFrame('SP.POP.TOTL', countries, years, labels=True)

The WB API has hundreds of indicator available. They can be explored with `wb.series.info()`:

In [None]:
wb.series.info()

## Python API client libraries - Main takeaways

- API client libraries greatly facilitate the use of APIs. You don't have to code your own client anymore!
- You need to review the library documentation to know how to use them
- The resulting variables from client libraries might not be in JSON format
    + `geopy` returned an ad-hoc variable class
    + `wb.data.DataFrame` returned a Pandas dataframe, which is very convenient for further data wrangling
- Many client libraries do much more than just retrieving the API results