At this point, all records accepted into OBIS have taxonomic names as a primary attribute and focus of the observations (and measurements in some cases), most of which have been validated against the World Register of Marine Species (WoRMS). In order to get a sense for the institutions, datasets, and other characteristics of the data in OBIS that may help contribute to the Essential Ocean Variables for Biodiversity, it will be useful to assemble a mapping of higher level taxonomic identifiers from WoRMS that can be used in queries where OBIS supports the unambiguous use of "taxonid."

This notebook provides a setup process for working with OBIS data using taxonomic identifiers. It builds and updates a file, eov_packet.json, stored here that contains a list of the [EOVs for Biodiversity](http://www.goosocean.org/index.php?option=com_content&view=article&id=14&Itemid=114) and some related details. Each EOV record has, for now, a supplied list of taxon names that can be tweaked over time as we refine the process for working with OBIS. Since the initial point is to build on what's in OBIS, I started with the dynamic list of Classes in OBIS retrieved via https://api.obis.org/statistics/composition/class. For mangroves and seagrasses, I had to set a list at the Family level since the species we have records for are all in Class Magnoliopsida. This also presents a list to work against to identify any important taxonomic groups for the EOVs that are not yet represented in OBIS.

Starting from the taxa list (stored in the property, obis_worms_taxa_list), the code runs through an update/summarization process to query a couple of different API routes in OBIS to assemble useful data to then drive a number of other processes and reports. Operationally, we can use this process or something like it to routinely update this process because of either new data in OBIS or a revised list of taxa groups for the EOVs. As we settle in on the latter, we can also add properties to OBIS itself that will associate EOVs into the data system.

We will need to do some work on these lists, which are very simplistic at the moment. Ultimately, we need more than taxon names for some of the EOVs such as the plankton groups where we need life history details in the data to determine species ecosystem function. In the near term, these dynamics will be a bit problematic where the integrated observation data is concerned as we lack that kind of observational detail in OBIS, WoRMS, and many of the datasets we are analyzing against. We may need to introduce some type of more complex algorithm for working with the data, including some way of characterizing uncertainty in the observation record as they relate to calculating subvariables or building derivative reports.

In [1]:
from IPython.display import display
import requests
import json
from datetime import datetime
import pegasus_functions as pf

In [2]:
eov_packet = pf.get_eov_packet()

In [4]:
new_eov_doc = dict()
for eov,config in eov_packet.items():
    # Set up a new dictionary object for the EOV
    new_eov_doc[eov] = dict()
    # Copy in the constant variables we will maintain
    new_eov_doc[eov]["label"] = config["label"]
    new_eov_doc[eov]["url"] = config["url"]
    # The simple list of higher level taxa names will likely be the thing that changes over time
    new_eov_doc[eov]["obis_worms_taxa_list"] = config["obis_worms_taxa_list"]
    
    # Set up new dictionary structure to contain the taxa summary from OBIS/WoRMS
    new_eov_doc[eov]["obis_worms_taxa"] = dict()

    # Loop through the supplied taxa names for an EOV and retrieve details from the OBIS API
    for taxon in config["obis_worms_taxa_list"]:
        # Retrieve the taxon report for the supplied name
        url_obis_taxon = f"https://api.obis.org/taxon/{taxon}"
        try:
            r_obis_taxon = requests.get(url_obis_taxon).json()
            usable_result = next((r for r in r_obis_taxon["results"] if r["taxonRank"] in ["Family","Class","Subphylum"]), None)
        except Exception as e:
            usable_result = None
            print(e)
            print(url_obis_taxon)

        if usable_result is None:
            # Print out a message if we fail to get a valid result from the API for some reason
            print(taxon, "NOT FOUND")
        else:
            # Start the taxon record with the high level taxon result from the OBIA API
            new_eov_doc[eov]["obis_worms_taxa"][taxon] = usable_result
            # Record the API url we used and the date we cached the information
            new_eov_doc[eov]["obis_worms_taxa"][taxon]["api"] = url_obis_taxon
            new_eov_doc[eov]["obis_worms_taxa"][taxon]["date_cached"] = datetime.utcnow().isoformat()
            
            # Set up a process to retrieve summary results on the observation records in OBIS for the taxon
            url_summary_stats = f"https://api.obis.org/statistics/all?taxonid={usable_result['taxonID']}"
            try:
                # Retrieve and record summary stats for the taxon in the data structure along with the API route and date/time
                r_summary_stats = requests.get(url_summary_stats).json()
                new_eov_doc[eov]["obis_worms_taxa"][taxon]["summary_stats"] = r_summary_stats
                new_eov_doc[eov]["obis_worms_taxa"][taxon]["summary_stats"]["api"] = url_summary_stats
                new_eov_doc[eov]["obis_worms_taxa"][taxon]["summary_stats"]["date_cached"] = datetime.utcnow().isoformat()
            except:
                # Print out the URL we tried here in case it failed for some reason
                print(url_summary_stats)


https://api.obis.org/statistics/all?taxonid=1070
https://api.obis.org/statistics/all?taxonid=794


In [5]:
# Show what the new document looks like
display(new_eov_doc)

{'coral': {'label': 'Hard coral cover and composition',
  'obis_worms_taxa': {'Anthozoa': {'acceptedNameUsage': 'Anthozoa',
    'acceptedNameUsageID': 1292,
    'api': 'https://api.obis.org/taxon/Anthozoa',
    'class': 'Anthozoa',
    'classid': 1292,
    'date_cached': '2019-03-03T13:17:55.545118',
    'is_brackish': True,
    'is_marine': True,
    'is_terrestrial': False,
    'kingdom': 'Animalia',
    'kingdomid': 2,
    'phylum': 'Cnidaria',
    'phylumid': 1267,
    'scientificName': 'Anthozoa',
    'scientificNameAuthorship': 'Ehrenberg, 1834',
    'summary_stats': {'api': 'https://api.obis.org/statistics/all?taxonid=1292',
     'datasets': 447,
     'date_cached': '2019-03-03T13:17:56.569209',
     'records': 1249114,
     'species': 5336,
     'specieslevel': 772924,
     'taxa': 6312,
     'yearrange': [1758, 2018]},
    'taxonID': 1292,
    'taxonRank': 'Class',
    'taxonomicStatus': 'accepted'},
   'Ascidiacea': {'acceptedNameUsage': 'Ascidiacea',
    'acceptedNameUsageID

In [7]:
# Write out the new data to a JSON file
with open('eov_packet.json', 'w') as f:
    f.write(json.dumps(new_eov_doc, indent=4))