At this point, all records accepted into OBIS have taxonomic names as a primary attribute and focus of the observations (and measurements in some cases), most of which have been validated against the World Register of Marine Species (WoRMS). In order to get a sense for the institutions, datasets, and other characteristics of the data in OBIS that may help contribute to the Essential Ocean Variables for Biodiversity, it will be useful to assemble a mapping of higher level taxonomic identifiers from WoRMS that can be used in queries where OBIS supports the unambiguous use of "taxonid."

This notebook provides a setup process for working with OBIS data using taxonomic identifiers. It builds from a file, eov_packet.json, stored here that contains a list of the [EOVs for Biodiversity](http://www.goosocean.org/index.php?option=com_content&view=article&id=14&Itemid=114) and some related details. Each EOV record has, for now, a supplied list of taxon names (mostly at a Family level) that can be tweaked over time as we refine the process for working with OBIS. In some cases, I pulled together a representative set of names for which we do have data in OBIS, but more work is needed, particularly for the plankton and coral groups. We may be able to leverage some of the species function and other attribute information from WoRMS, but the information there seems a little patchy at this point.

This and other scripts leverage a set of Python functions I created in the "pegasus_function.py" file. This notebook uses the worms_info_from_names() function to retrieve WoRMS information using the AphiaRecordsByName API route. It builds a data structure of the information retrieved from WoRMS for each name in each eov group along with simplified list of just the valid AphiaIDs for later use. A number of the basic statistical API routes from OBIS take multiple taxonid parameters to return a set of summarized results.

The taxonomic info files are written into their own local cache as JSON documents that are then accessed using the get_worms_info() function. This info can be retrieved in real time as well, but it won't change all that often once we work out appropriate taxa name lists. We may also build more intelligence into OBIS itself such that it is able to respond to queries directly based on the EOV subvariables or derived products into which OBIS data can feed.

In [1]:
import pegasus_functions as pf
import json

In [2]:
eov_packet = pf.get_eov_packet()

In [3]:
for eov in eov_packet:
    tax_data = pf.worms_info_from_names(eov["taxonomic_names"])
    with open(f"{eov['name']}.json", 'w') as f:
        f.write(json.dumps(tax_data, indent=4, separators=(',', ': ')))
    print(eov['label'])
    print("Valid Aphia IDs:", tax_data['valid_aphiaids'])

Phytoplankton biomass and diversity
Valid Aphia IDs: [582177, 802, 148899, 368677, 160581, 17639, 146542, 345487, 17329, 115057, 852, 146230, 19542, 146232]
Zooplankton biomass and diversity
Valid Aphia IDs: [1410, 137224, 387338, 345868, 793, 2081, 149668, 125741, 732976, 135219, 135220, 14260, 1076, 1078, 1080, 1337, 11707, 1371, 1248, 325345, 1762, 101091, 1128, 1130, 1131, 586732, 1135, 883, 146421, 137212, 137214]
Fish abundance and distribution
Valid Aphia IDs: [11676]
Marine turtles, birds, mammals abundance and distribution
Valid Aphia IDs: [2689, 1836, 1837]
Hard coral cover and composition
Valid Aphia IDs: [1363]
Seagrass cover and composition
Valid Aphia IDs: [143768, 143769, 143770, 143751]
Macroalgal canopy cover and composition
Valid Aphia IDs: [143794, 830]
Mangrove cover and composition
Valid Aphia IDs: [344737, 235048, 413963, 235059, 414871, 234494]
