<a href="https://colab.research.google.com/github/worldbank/dec-python-course/blob/main/1-foundations/4-api-and-dataviz/foundations-s4-api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd

# Interacting with APIs using Python

Now that we have an understanding of APIs, we can start interacting with them programatically.

# The `requests` library

- `requests` is a Python library to interact with the internet
- It sends and receives data to and from URLs
- You can think of it as a web browser (Chrome, Firefox), but without the graphic interface
- APIs are URLs, so we can use `requests` to interact with APIs in Python

To enable the use `requests` , run:

In [None]:
import requests

## Sending a request and receiving information from the web

- When you access a URL in your web browser, it is sending a request to receive information from a web server
- Usually, most of the information your browser receives as a response to your request is in HTML format
- Your web browser then renders the HTML information in the response and shows it to you
- A probably familiar example:

`https://www.worldbank.org/en/home`

![world-bank-site](img/world-bank-site.png)

Interacting with the web using Python through `requests` is not different than this. The basic syntax of `requests` is the following:

`requests.get(my_url)`

- `my_url` is a string with the URL address you want to access
- the `get()` method of `requests` uses your internet connection to enter a URL address in the internet and obtain a response with information from it
- you can save the response in a Python variable this way:

`response = requests.get(my_url)`

See the following example:

In [None]:
url = 'https://www.worldbank.org/en/home'

In [None]:
response = requests.get(url)

`response` is a variable of an ad-hoc type used by the `requests` library, similarly to how data frames and series are custom variable types from the Pandas library.

In [None]:
type(response)

The response to your request will be stored in `response` even if it failed. To check if your request was successful or not, use the attribute `status_code` of this variable type or print `response`.

In [None]:
response.status_code

In [None]:
print(response)

A status code of 200 means that your request received a successful response. These are some of the most common types of response codes:

- **200 - OK:** successful request
- **403 - Forbidden:** the user (you) is not authorized to access that URL
- **404 - Not found:** the web server cannot find the requested resource (often because your URL is incorrect)
- **429 - Too many requests:** the web server has a limit on how many requests a user can send over a period of time (rate limit) and you went over that limit
- **500 - Internal server error:** the request didn't work because of an unspecified error originated by the web server, not the user

If a request is successful, the response variable will contain the content (also called "body") of the response from the server. When you access a URL from your web browser, this is the part that contains the HTML code that your browser renders.

In `response`, the response content is in the `content` attribute:

In [None]:
response.content

## Interacting with APIs

- `requests` works very similarly when you use it to interact with an API instead of a URL with HTML code in the response content
- the only difference is that the content of an API request will be in a data-friendly format, such as JSON
- JSON is a format for storing data that is commonly used to transfer data in the web
- Python handles JSON data and reads them into lists or dictionaries (more on this below)

Remember the API to get live data from the ISS? http://open-notify.org

We'll retrieve information from one of its two endpoints:

- Astronauts in space now - http://api.open-notify.org/astros.json

In [None]:
url = 'http://api.open-notify.org/astros.json'

In [None]:
response = requests.get(url)

In [None]:
response.status_code

In [None]:
response.content

Response variable types from `requests` have the `json()` method which works very efficiently to convert JSON contents of a response into Python dictionaries or lists, depending on the format of the JSON data.

In [None]:
response.json()

The result of `response.json()` is a Python dictionary for this example.

In [None]:
data_dic = response.json()
type(data_dic)

Now that the response content is saved into a dictionary, we can explore more about its content.

In [None]:
data_dic.keys()

In [None]:
data_dic['number']

In [None]:
data_dic['people']

In [None]:
data_dic['message']

Furthermore, we can transform the list inside `data['people']` into a Pandas dataframe for further analysis.

In [None]:
df = pd.DataFrame(data=data_dic['people'])

In [None]:
df

**Note:** the content and format of the JSON data in the response is specific to the API endpoint you access.

# Exercise 1

Create a function that repeats the steps shown above and returns the latitude and longitude of the current location of the ISS. The location endpoint is this: `http://api.open-notify.org/iss-now.json`.

Suggested steps:

1. use `requests` to send a request for this URL and store the result in a variable
2. If the response was not successful, return `None`
3. If it was, extract the response content into a dictionary
4. Extract the latitude and longitude from the dictionary and return them in your function

In [None]:
# Note: this function should not take any inputs. Leave the parentheses empty after the function name
def iss_location():
    
    # === REPLACE THE EMPTY STRINGS AND None BELOW TO ADD YOUR SOLUTION ===
    
    url = ''            # 1. Add URL here
    response = None     # 2. use requests to get the response of the URL
    
    if response.status_code == 200:
        
        data = None         # 3. Extract the data from response with the json() method
        latitude = None     # 4. Extract the latitude from the data dictionary
        longitude = None    # 5. Extract the logitude from the data dictionary
        
        return latitude, longitude
    
    # === DO NOT MODIFY THE FUNCTION FROM THIS POINT ON ===
    else:
        return None    

Remove the first line of the followin block and run it to verify that your solution works. It should not return an error.

In [None]:
%%script echo Remove this line after filling in your own code

position = iss_location()
assert isinstance(position[0], float) and isinstance(position[1], float)

# Coding a simple API client

- Remember we introduced the term "client" in the API explanation? A client is a piece of code that facilitates the interaction with an API
- In the example with the astronauts, we had to go through several coding steps to execute the request, obtain the JSON data, and load it into a Pandas data frame
- All of those steps could be packed in a Python function that simplified the process of interacting with the API. That function is an API client
- In fact, in exercise 1 you were inadvertently creating an API client!
- Most APIs require custom information to be passed in the URL. This can be incorporated to an API client, as in the examples below

## Example 1: URL-based parameters

We previously introduced the geoBoundaries API example. After exploring this API and its documentation, we knew that it takes URLs with the following generic form:

`https://www.geoboundaries.org/api/current/gbOpen/[3-letter-iso-code]/[admin-level]/`

Then, we can build a function that takes the 3-letter ISO code and the administative boundaries level as parameters to automate API calls.

In [None]:
def fetch_geoboundaries_data(country_code, admin_level):
    
    endpoint = 'https://www.geoboundaries.org/api/current/gbOpen/'
    url = endpoint + country_code + '/ADM' + str(admin_level)
    response = requests.get(url)
    
    if response.status_code == 200:
        
        data = response.json()
        return data
        
    else:
        
        print('Request failed!')
        return None    

We can use our new function to fetch admin-1 level data from Kenya.

In [None]:
kenya_data = fetch_geoboundaries_data('KEN', 1)

In [None]:
kenya_data

A few notes:
- You might have noticed that the result stored in `kenya_data` is not the actual admin-1 level boundaries, but _metadata_ about the boundaries data
- A visual inspection of `kenya_data` shows that a URL to the data is in the key `simplifiedGeometryGeoJSON`

In [None]:
kenya_data['simplifiedGeometryGeoJSON']

- You can use `requests` once again to fetch the data from this URL

**Important:**
- Not all APIs will provide direct access to the information you need
- Many will require additional coding to get from the initial API call to the data of your interest

## Example 2: Argument-based parameters

- The example of geoBoundaries takes _URL-based_ API query parameters
- Many APIs take _argument-based_ query parameters
- This will be the case every time you have to use query parameters separated by an ampersand symbol (`&`) after a question mark (`?`) in the API endpoint
- Take this generic example:

`https://api.org/endpoint/?parameter1=value1&parameter2=value2&parameter3=value3`

- Theoretically, it's _possible_ to modify an API URL call using concatenated strings in Python to build argument-based queries

In [None]:
endpoint = 'https://api.org/endpoint/'
p1 = 'parameter1'
v1 = 'value1'
url = endpoint+'?'+p1+'='+v1
print(url)

- Then we could use `requests` as usual to obtain a response from this API

```{python}
requests.get(url)
```

- However, the convention in Python is to **use the argument `params` of `requests.get()`** to pass argument-based query parameters

```{python}
parameters = {'parameter1': 'value1'}
requests.get(endpoint, params=parameters)
```

### Applied example: the WBG API

The WBG has an extensive API with country indicators, among many other data. We'll use the endpoint of the total population API to fetch country population data.

Documentation and use examples of the WBG API can be found in the [WBG Knowledge Base](https://datahelpdesk.worldbank.org/knowledgebase) and the [Developer Information resources](https://datahelpdesk.worldbank.org/knowledgebase/topics/125589-developer-information).

In [None]:
def fetch_population_by_year(year):
    
    endpoint = 'https://api.worldbank.org/v2/country/all/indicator/SP.POP.TOTL'
    parameters = {'date': year, 'format':'json'}
    # note: the API documentation specifies that format=json
    # is a required parameter in order to return the results as JSON
    response = requests.get(endpoint, params=parameters)
    
    if response.status_code == 200:
        
        data = response.json()
        return data
    
    else:
        
        print('Request failed!')
        return None

In [None]:
pop_2015 = fetch_population_by_year(2015)

In [None]:
pop_2015

A few notes:
- This is the same data you obtain when accessing https://api.worldbank.org/v2/country/all/indicator/SP.POP.TOTL?date=2015&format=json on a web browser
- In this case, the resulting JSON is not interpreted by Python as a dictionary but a list
- The first element of this list contains metadata about the data returned by the API. The second element contains the actual data

In [None]:
pop_2015[0]

- Note the detail of the information in the keys `page`, `pages`, `per_page`, and `total` in the first element of `pop_2015`

In [None]:
print('Total obs: {}'.format(pop_2015[0]['total']))
print('Obs per page: {}'.format(pop_2015[0]['per_page']))
print('Total pages: {}'.format(pop_2015[0]['pages']))
print('Current page: {}'.format(pop_2015[0]['page']))

- The result is only one page with 50 observations, out of a total of 266
- This means that results are incomplete! More API calls are needed to complete the total 266 observations
- A further inspection of the API documentation shows that the parameters `page=page_number` or `per_page=obs_per_page` can be used to retrieve complete results.

**Important:** When fetching data using APIs, always inspect the result to look for possible limitations in the results' default format. You might be inadvertently missing observations in your API calls!

## Coding API clients - Main takeaways 1

- Programming a client for an API requires reviewing the documentation and understanding the API uses
- Many APIs will combine the use of URL-based and argument-based parameters to pass information in API queries. A good API client will take that into account to build the correct query URL and parameters argument. Take the following example:

In [None]:
def fetch_population_by_year_country(year, country):
    
    endpoint = 'https://api.worldbank.org/v2/country/' + country + '/indicator/SP.POP.TOTL'
    parameters = {'date': year, 'format':'json'}
    # note: the API documentation specifies that format=json
    # is a required parameter in order to return the results as JSON
    response = requests.get(endpoint, params=parameters)
    
    if response.status_code == 200:
        
        data = response.json()
        return data
    
    else:
        
        print('Request failed!')
        return None

In [None]:
brazil_data = fetch_population_by_year_country(2015, 'BRA')

In [None]:
brazil_data

## Coding API clients - Main takeaways 2

- Remember that further coding might be needed to get from the API result to the information that is relevant for a user
- Some APIs divide results with many observations in pages and return only a limited number of pages. It's up to the user to review the results and take measures to ensure data is complete

# Exercise 2a

Create a function that builds upon `fetch_geoboundaries_data()` and returns the actual geographic data from the API.

Suggested steps:

1. Inspect the result of `fetch_geoboundaries_data()` to locate where is the URL with the geographic data in the resulting dictionary
1. Access the URL using `requests.get()` and store the response in a variable
1. Use the `.json()` method to transform the response content in a dictionary and return that variable

In [None]:
#  == DO NOT MOFIDY THIS FUNCTION BUT THE NEXT ==
def fetch_geoboundaries_data(country_code, admin_level):
    
    endpoint = 'https://www.geoboundaries.org/api/current/gbOpen/'
    url = endpoint + country_code + '/ADM' + str(admin_level)
    response = requests.get(url)
    
    if response.status_code == 200:
        
        data = response.json()
        return data
        
    else:
        
        print('Request failed!')
        return None

# == MODIFY THIS FUNCTION FOR YOUR ANSWER ==
def obtain_geodata(country_code, admin_level):
    
    metadata = fetch_geoboundaries_data(country_code, admin_level)
    
    if metadata is None:
        # this is a check to see if the previous line worked. Do not modify it
        return None
    
    else:
        # === REPLACE THE None BELOW TO ADD YOUR SOLUTION ===
        
        data_url = None       # 1. Extract the URL containing the data from the metadata dictionary
        response = None       # 2. Use requests to get a new reponse from that URL
        
        if response.status_code == 200:
            geojson_data = None    # 3. Use the json() method to extract the data from response
            return geojson_data
        
        # === DO NOT MODIFY THE FUNCTION FROM THIS POINT ON ===
        else:
            print('Request failed!')
            return None

## Exercise 2b

Modify the function `fetch_population_by_year()` below to retrieve the complete results for a given year (not only page 1) from the population endpoint.

Suggested steps:

1. Run `fetch_population_by_year()` for any year and inspect the resulting list. You will note that the first element of the list is a dictionary with a key `total` that indicates the total number of observations in the query result
1. In your function, save that number into a variable
1. Send a new API request adding the parameter `per_page` in the parameters dictionary. Its value should be the total number of observations
1. The JSON object in the response content will be a new list with the same format: the first element of the list will contain metadata about the API result and the data will be in the second element
1. Return the second element of your list

In [None]:
def fetch_population_by_year(year):
    
    endpoint = 'https://api.worldbank.org/v2/country/all/indicator/SP.POP.TOTL'
    parameters = {'date': year, 'format':'json'}
    response = requests.get(endpoint, params=parameters)
    
    if response.status_code == 200:
        
        data = response.json()
        
        # === ADD YOUR SOLUTION HERE ===
        # Note that this time we're not giving you the steps
        # to follow and variables to create. Figuring that
        # out is also part of the exercise :)
        
        return data
    
    else:
        
        print('Request failed!')
        return None

# Python API client libraries

- Programming a client for an API is not always needed
- Many APIs have their own client in the form of a Python library
- Check the example of [`geopy`](https://geopy.readthedocs.io/), a [Nominatim](https://nominatim.org/) encoder

In [None]:
# Installing geopy in your personal library
!pip install geopy

In [None]:
from geopy.geocoders import Nominatim

In [None]:
# Important: please add a user alias in this function
geolocator = Nominatim(user_agent='write-your-alias-here')

In [None]:
query = geolocator.geocode('1818 H St NW, Washington DC')

In [None]:
print(query)

`query` is an ad-hoc variable type for this library.

In [None]:
type(query)

If you want to check the attributes and methods of this variable type, you can use `help(query)`, `dir(query)`, `query?`, or check the [`geopy` library documentation](https://geopy.readthedocs.io/en/stable/).

In [None]:
query?

In [None]:
print('The address of the WB main building is: {}'.format(query.address))
print('The location of the WB main building is: {}, {}'.format(query.latitude, query.longitude))

When an API client is a Python library, it will most probably have ad-hoc variable types with particular attributes and methods. You will have to check the corresponding library documentation to learn how to operate with them.

Please also note the following:
- We didn't have to code an API client using `requests` this time: `geopy` is the API client
- The results of our query are not in JSON format. `geopy` returns an ad-hoc variable class with the attributes `.adress`, `.latitude`, and `.longitude`, among others
- We've mostly seen examples of database query APIs, but APIs can do much more!
    + `geopy` is an example of an API that does some data processing with the information passed (the address)
    + Remember: in general, an API is a channel to interact with a web server

A few notes about API authentication:
- Some APIs require some form of authentication to control API overuse
    + That's why `Nominatim()` requires the `user_agent` parameter: it's a way of detecting which API calls come from the the same alias
    + This is a very soft way of authentication
- When authentication is needed, most APIs will require users to register and account and will provide a unique combination of characters called _API key_ that uniquely identifies the user
- When they are required, API keys are usually passed as argument-based parameters. If the API has a dedicated client library, they will ask for the key after importing the library (as `user_agent` in `Nominatim()`)

## The World Bank API Python library

- The World Bank API we used for one of the examples above also has a dedicated client Python library
- Release blog post [here](https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data)
- Documentation [here](https://pypi.org/project/wbgapi/)
- Examples [here](https://nbviewer.org/github/tgherzog/wbgapi/blob/master/examples/wbgapi-cookbook.ipynb)

In [None]:
!pip install wbgapi

In [None]:
import wbgapi as wb

This example gets the total population of Brazil for all years available:

In [None]:
wb.data.DataFrame('SP.POP.TOTL', 'BRA', labels=True)

Remember the endpoint of this API? This URL would have returned a similar result in JSON format:
https://api.worldbank.org/v2/country/BRA/indicator/SP.POP.TOTL?date=1960:2021&format=json

We can also get the series for multiple countries if we specify a list instead of a single string:

In [None]:
wb.data.DataFrame('SP.POP.TOTL', ['BRA', 'ARG', 'URY', 'PRY'], labels=True)

Lastly, we can specify the years we want in a population query:

In [None]:
countries = ['BRA', 'ARG', 'URY', 'PRY']
years = range(2015, 2021) # note the last element is never included in range()
wb.data.DataFrame('SP.POP.TOTL', countries, years, labels=True)

The WB API has hundreds of indicators available. They can be explored with `wb.series.info()`:

In [None]:
wb.series.info()

## Python API client libraries - Main takeaways

- API client libraries greatly facilitate the use of APIs. You don't have to code your own client anymore!
- You need to review the library documentation to know how to use them
- The resulting variables from client libraries might not be in JSON format
    + `geopy` returned an ad-hoc variable class
    + `wb.data.DataFrame` returned a Pandas dataframe, which is very convenient for further data analysis
- Many client libraries do much more than just retrieving the API results

**Final note only if you're working on Colab:**
Remember to go to `File` > `Save a copy in Drive` to save a copy of this notebook in your Google account.