# Welcome to Session 6 - Getting Web-Based Data

Much useful data is available online and can be accessed with a script. Application Programming Interfaces (APIs) make data available using URLs that are requested (accessed) by the script. The data are delivered in a predicable format for use.

Note, some APIs limit how many requests can be made per time period, the rate of requests per second, etc. These may lock a user out if the requests criteria are exceeded.

Also, many APIs require users to register and obtain an API Key which must be used with requests to obtain data. This requires extra steps to authenticate with the API when using it. In this lesson, we're using APIs that do not require API keys.

## Requesting Data

### The requests Library

The Python requests library is a simple HTTP library for interacting with web content. Requests is not included with Python; it must be installed on your system to import it.

To make a request using a URL, we will use the requests library's get() function, written as requests.get()


This has two basic aspects:
1) The base URL, which specifies the API address and the specific API application, or feature, that is requested
2) Data which forms the context of our request, to pass to the API as part of the URL

We'll use the [World Register of Marine Species (WoRMS) API](https://www.marinespecies.org/aphia.php?p=webservice) which is called Aphia.

In [None]:
import requests

# We'll use the API to get the currently accepted species ID for Sciaenops ocellatus
# The API URL root (used for all Aphia API applications) is https://www.marinespecies.org/rest
# The specific API application to get the accepted species ID is /AphiaIDByName/{ScientificName}
# An example complete URL to get the AphiaID for Sciaenops ocellatus is https://www.marinespecies.org/rest/AphiaIDByName/Sciaenops ocellatus

sciname = 'Sciaenops ocellatus'
aphiaID = requests.get(f'https://www.marinespecies.org/rest/AphiaIDByName/{sciname}')
print(aphiaID)

Let's take a closer look at our request string

requests.get(f'https://www.marinespecies.org/rest/AphiaIDByName/{sciname}')

1) requests.get()
    * We're using the 'get' function of the requests library - this means we're using a URL and the 'get' method to obtain the data.
    * Another method that is sometimes used is the 'post ' method. The method is determined by the API.

2) f'https://www.marinespecies.org/rest/AphiaIDByName/{sciname}'
    * We're dynamically constructing the URL as a formatted string.
    * We've written most of it as a string, but we're using a variable {sciname} for the species name.

The request to the API returns a response object which, in this case, is JSON encoded.
   * JSON means JavaScript Object Notation.
   * JSON is a universal data structure; not Python specific.

In [None]:
# Response objects contain numerous data items. Here are a few:

# The URL that was used to retrieve the data
print(aphiaID.url)

# The data that was returned
print(aphiaID.text)

# The response code (200=OK, 404= Not Found, etc.)
print(aphiaID.status_code)

In [None]:
#As a reminder, we can easily learn more about all of the response object's available methods with

dir(aphiaID)

Python response objects in JSON can easily be decoded with the .json() method

In [None]:
decodedID = aphiaID.json()
print(decodedID)

We could also do this more efficiently by appending the .json() method on our request

In [None]:
sciname = 'Sciaenops ocellatus'
aphiaID = requests.get(f'https://www.marinespecies.org/rest/AphiaIDByName/{sciname}').json()
print(aphiaID)

### A Bit More About JSON

Let's take a look and see what Python data encoded as JSON look like with json.dumps().

In [None]:
import json

fishnames=['Red drum', 'Sheepshead', 'Flounder']
fishcounts=[12,3,5]
mysites=['Grice Cove','St. Helena Sound','Bulls Bay']

mydict = {'fish':{'names':fishnames, 'counts':fishcounts},'sites':mysites}

myjson = json.dumps(mydict)
print(f'This is JSON {myjson}')

And let's convert the JSON back to a Python object with json.loads()

In [None]:
mydata = json.loads(myjson)
print(f'This is a Python object converted from JSON {mydata}')

Now we're familiar with JSON data structure, let's carry on exploring the WoRMS API

### Activity 1

Use the AphiaID number generated by the provided code to get the vernaculars (common names) for Sciaenops ocellatus from the WoRMS API
1. The specific WoRMS API application to get the vernaculars for a given AphiaID is /AphiaVernacularsByAphiaID/{aphiaID}
2. Remember to unpack the JSON response
3. Print the unpacked response

When you're done with Activity 1, please use the [Miro Board](https://miro.com/app/board/uXjVNCUJ0JI=/) to indicate completion in the area for this session and this activity.

In [None]:
# You'll need this code to get the aphiaID variable for Sciaenops ocellatus

import requests
sciname = 'Sciaenops ocellatus'
aphiaID = requests.get(f'https://www.marinespecies.org/rest/AphiaIDByName/{sciname}').json()

# Tackle Activity 1 here




Here we have a chunk of data. We see square brackets and curly braces. We have lists and dictionaries.

### Activity 2

Iterate over the list and print the following for each dictionary:

"A vernacular for {species name} in {language} is {vernacular}"
e.g. "A vernacular for Sciaenops ocellatus in English is red drum"

Hint: Remember how we constructed the URL as a formatted string with a variable?

f'https://www.marinespecies.org/rest/AphiaIDByName/{sciname}'

* You'll need to use three variables (two are dictionary references).

In [None]:
# Tackle Activity 2 here




### Activity 3

Now let's make another request using the AphiaID to get the full species classification as 'speciesClassif'

The specific API application for this is /AphiaClassificationByAphiaID/{ID}



In [None]:
# You'll need this code

import requests
sciname = 'Sciaenops ocellatus'
ID = requests.get(f'https://www.marinespecies.org/rest/AphiaIDByName/{sciname}').json()

# Tackle Activity 3 here





This is a mess of nested dictionaries!

If we look closely, we see that we have a dictionary with four variables (AphiaID, rank, scientific name, and child). The first three have integer or string values, but the value of child is itself a dictionary. It has four variables of the same types, with child being a dictionary. And so on, all the way through the classification down to the species dictionary, which has a child with a value of None.

### Question 1

How many levels of iteration would we have to go through to extract all of the data from this series of nested dictionaries?

### Recursive Functions to Iterate Over Objects of Unknown or Variable Depth

In [None]:
# It's neither efficient nor feasible to hard code many depths of nesting, especially if we don't know
# how many levels there will be.
# A really handy technique for addressing this is to write a recursive function.

def unpack(mydict): # Pass the function a dictionary
    for k, v in mydict.items(): # Iterate over its top level, extracting keys (k) and values (v)
        if isinstance(v, dict): # Test type of value to see if it is a dictionary
            unpack(v) # If it is a dictionary, call this same unpack() function to process it
        else: # If it is not a dictionary (we assume in this case that it is a string, integer, etc.)
            print(f'{k} : {v}') # print the key and value separated by spaces and a colon

unpack(speciesClassif)

### Question 2

Why did 'child : None' print at the very end of the list?

### Activity 4

Using Callinectes sapidus as the species, write a script to use the Aphia API to accomplish the following:

1) Import the requests library.

2) Find the AphiaID for Callinectes sapidus (using https://www.marinespecies.org/rest//AphiaIDByName/{speciesname}).

3) Use the AphiaID to get all synonyms for the species' scientific name (using /AphiaSynonymsByAphiaID/{ID}).

4) Look at the output and iterate over it (no need for a recursive function here) to extract the synonym's scientific name, authority, and status, as well as the valid species name, and valid authority. Print a sentence using the data points to inform the reader about the unaccepted name and authority for the species and the currently accepted name and authority for the species.

In [None]:
#Tackle Activity 4 here







### API requests that require several parameters.

Some APIs don't have multiple applications or URLs for making requests. The APIs typically have a single URL and the user must pass several parameters to define their request needs. An example of this is [NOAA's Tides & Currents API](https://api.tidesandcurrents.noaa.gov/api/prod/).

It is often necessary to read the API documentation to understand what data may be obtained, which parameters must be submitted, and how parameters must be formatted.

Here's an example request URL for this API that contains numerous parameters:

https://api.tidesandcurrents.noaa.gov/api/prod/datagetter?begin_date=20130808 15:00&end_date=20130808 15:06&station=8454000&product=water_temperature&units=english&time_zone=gmt&application=ports_screen&format=json

1. https://api.tidesandcurrents.noaa.gov/api/prod/datagetter (The API URL)
2. ? (? is used before the first parameter)
3. begin_date=20130808 15:00 (first parameter has a key of begin_date and a value of 20130808 15:00)
4. & (& precedes all subsequent parameters
5. end_date=20130808 15:06 (second parameter has a key of end_date and a value of 20130808 15:06)
6. &station=8454000 (third parameter has a key of station and a value [station id number] of 8454000)
7. &product=water_temperature (fourth parameter has a key of product [i.e. which data is requested] and a value of water_temperature)
8. &units=english (fourth parameter has a key of units and a value of english [meaning imperial, not metric units])
9. &time_zone=gmt (fifth parameter has a key of time_zone and a value of gmt)
10. &application=ports_screen (sixth parameter has a key of application [an indication of who we are] and a value of ports_screen)
11. &format=json (seventh parameter has a key of format [in what format do we want the data returned?] and a value of json.

In [None]:
# We can construct this URL with its parameters, using a dictionary.
import requests

url = 'https://api.tidesandcurrents.noaa.gov/api/prod/datagetter'
payload = {
    'begin_date':'20130808 15:00',
    'end_date':'20130808',
    'station':'8454000',
    'product':'water_temperature',
    'units':'english',
    'time_zone':'gmt',
    'application':'CofC',
    'format':'json'
}

output = requests.get(url, params=payload).json()
print(output)

We need the [API Response Help page](https://api.tidesandcurrents.noaa.gov/api/prod/responseHelp.html) to understand how to interpret this output.

We have a dictionary containing two key-value pairs.
1. The metadata key references a dictionary containing four variables
2. the data key references a list containing a dictionary for every observation made.
    a. Each observation dictionary contains:
          i. A timestamp (t)
         ii. A value (v) for the measurement
        iii. A flag string with comma-separated data flags (0 means no flag and 1 means flag). See the [API Response Help page](https://api.tidesandcurrents.noaa.gov/api/prod/responseHelp.html) for flag explanations for each data type.

In [None]:
# We can print this much more clearly using the Pretty Printer library.

import pprint as pp
pp.pprint(output)

Let's plot it using an external library, MatPlotLib (this does not come with Python; it must be downloaded and installed on your system).

In [None]:
import matplotlib.pyplot as plt

ts=[] # an empty list for the x axis
tmp=[] # an empty list for the y axis
for dict in output['data']: # Iterate over the data dictionaries to put the time and temp data in each list
    ts.append(dict['t'])
    tmp.append(dict['v'])


fig, ax = plt.subplots(figsize=(15,5))
ax.plot(ts,tmp)
ax.set_xticks(ts[::5])
ax.set_xticklabels(ts[::5], rotation=45)
ax.set_title("Water Temperature")
plt.xlabel("Timestamp")
plt.ylabel("Temperature (F)")
plt.grid(color='green',linestyle='--')
plt.show()

### Responses in Formats Other than JSON

The NOAA Tides and Currents API has an option to request data in several formats: JSON, CSV, and XML

In [None]:
import requests

url = 'https://api.tidesandcurrents.noaa.gov/api/prod/datagetter'
payload = {
    'begin_date':'20130808 15:00',
    'end_date':'20130808',
    'station':'8454000',
    'product':'water_temperature',
    'units':'english',
    'time_zone':'gmt',
    'application':'CofC',
    'format':'xml'
}

output = requests.get(url, params=payload) # We pass the dictionary of parameters as part of our request
print(output.text)

In [None]:
import requests, csv  # We need to import the csv library

url = 'https://api.tidesandcurrents.noaa.gov/api/prod/datagetter'
payload = {
    'begin_date':'20130808 15:00',
    'end_date':'20130808',
    'station':'8454000',
    'product':'water_temperature',
    'units':'english',
    'time_zone':'gmt',
    'application':'CofC',
    'format':'csv'
}

output = requests.get(url, params=payload)
for row in csv.reader(output.text.split('\n')): # We have to split the data at the new line symbol \n
    print(row) # each row is a list

### Authentication and APIs

The APIs we've used were open to everyone and needed no authentication, but some APIs require the user either set up a user name and password or apply for a 'key' (a code string) to access the API.

Here's a simple example:

my_secret_key = 'r8EdCcMpUUP7lwMnYPEOxonUWo9s7DnbTIrMdwPR1DulJ6WtLXKGT7O5uyWl' (This is not a real key, but this is what they often look like)

response = requests.get(f'http://www.worldcat.org/webservices/catalog/search/worldcat/sru?query=srw.bn={isbn}&servicelevel=full&wskey={my_secret_key}')

With this API, the key must be passed with every request.

Some APIs use more complex processes to establish a connection and then make requests. But that's usually addresssed in the API documentation.

### Summative Assessment Quiz

The purpose of summative assessment quizzes is twofold:

1) The process of recall helps to transfer information from short term to longer term memory.
2) The quizzes help us evaluate the effectiveness of our training sessions.

Take [Summative Assessment Quiz 6](https://cofc.libwizard.com/f/intro-python-6) to test your knowledge about this session.

### Resources

* [Python Requests Library Documentation](https://docs.python-requests.org/en/latest/)
* [Python Documentation - JSON](https://docs.python.org/3/library/json.html)