Before we move on to other data retrieval activities, it is useful to explore what we found in consulting ITIS with either declared TSN identifiers or scientific name searches to see if there is any further work that can be done to improve our ability and accuracy in getting other data sources pulled together. This notebook examines the cached ITIS results and makes some decisions about what to do next.

In [1]:
import json
import bispy
import jsonschema
from collections import Counter

bis_utils = bispy.bis.Utils()

import helperfunctions

In [2]:
with open("../cache/workplan_species.json", "r") as f:
    workplan_species = json.load(f)
    f.close()

with open("../cache/itis.json", "r") as f:
    itis_cache = json.load(f)
    f.close()

All of the cases where we returned more than one ITIS record from the function mean that something interesting happened. We can check this pretty quickly by looking at the "Processing Metadata" that our function records. In all of the cases here, our code followed the accepted TSN declared at the point of discovery, whether based on what appears to have been an invalid TSN assignment from the FWS source information or a name lookup that was invalid but for which there is a valid record in ITIS.

The following codeblock lets us examine what is going on in these cases. The Processing Metadata structure is what our function records about what it did. It includes the URLs to the ITIS API that resulted in some action. We record both the valid/accepted and invalid/unaccepted names from ITIS and reach back to the workplan_species to show that record.

In [3]:
for r in [i for i in itis_cache if "data" in i.keys() and len(i["data"]) > 1]:
    source_url = next((o for o in r["processing_metadata"]["details"] if "Exact Match" in o.keys()), None)
    if not source_url:
        source_url = next((o for o in r["processing_metadata"]["details"] if "Exact Match Fail" in o.keys()), None)
    source_url = next((v for k, v in source_url.items()), None)
    source_identifier = source_url.split(":")[-1].replace("\%20", " ")
    if source_identifier.isdigit():
        print("ITIS TSN used in lookup:", source_identifier)
    else:
        print("Scientific name used in lookup:", source_identifier)
    
    display(r["processing_metadata"])


ITIS TSN used in lookup: 207135


{'status': 'success',
 'date_processed': '2019-09-16T17:30:10.297976',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:773525'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:207135'}]}

ITIS TSN used in lookup: 80066


{'status': 'success',
 'date_processed': '2019-09-16T17:30:10.479797',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:983630'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:80066'}]}

ITIS TSN used in lookup: 80079


{'status': 'success',
 'date_processed': '2019-09-16T17:30:11.067607',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:983775'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:80079'}]}

ITIS TSN used in lookup: 173717


{'status': 'success',
 'date_processed': '2019-09-16T17:30:15.196176',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:775913'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:173717'}]}

ITIS TSN used in lookup: 567231


{'status': 'success',
 'date_processed': '2019-09-16T17:30:15.761133',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:983772'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:567231'}]}

ITIS TSN used in lookup: 894872


{'status': 'success',
 'date_processed': '2019-09-16T17:30:20.841243',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:894898'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:894872'}]}

ITIS TSN used in lookup: 524343


{'status': 'success',
 'date_processed': '2019-09-16T17:30:21.529003',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:517582'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:524343'}]}

ITIS TSN used in lookup: 547326


{'status': 'success',
 'date_processed': '2019-09-16T17:30:21.969982',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:1063038'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:547326'}]}

ITIS TSN used in lookup: 609873


{'status': 'success',
 'date_processed': '2019-09-16T17:30:27.328918',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:102712'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:609873'}]}

ITIS TSN used in lookup: 183452


{'status': 'success',
 'date_processed': '2019-09-16T17:30:28.305081',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:822596'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:183452'}]}

ITIS TSN used in lookup: 209559


{'status': 'success',
 'date_processed': '2019-09-16T17:30:29.907358',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:683027'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:209559'}]}

Scientific name used in lookup: Quincuncina mitchelli


{'status': 'success',
 'date_processed': '2019-09-16T17:30:32.142369',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:906951'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=nameWOInd:Quincuncina\\%20mitchelli'}]}

Scientific name used in lookup: Quadrula houstonensis


{'status': 'success',
 'date_processed': '2019-09-16T17:30:32.824546',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:983629'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=nameWOInd:Quadrula\\%20houstonensis'}]}

Scientific name used in lookup: Macrhybopsis aestivalis tetranemus


{'status': 'success',
 'date_processed': '2019-09-16T17:30:32.933443',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:553282'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=nameWOInd:Macrhybopsis\\%20aestivalis\\%20tetranemus'}]}

Scientific name used in lookup: Aster puniceus scabricaulis


{'status': 'success',
 'date_processed': '2019-09-16T17:30:33.430622',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:541115'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=nameWOInd:Aster\\%20puniceus\\%20scabricaulis'}]}

Scientific name used in lookup: Aspidoscelis arizonae


{'status': 'success',
 'date_processed': '2019-09-16T17:30:34.235656',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:208930'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=nameWOInd:Aspidoscelis\\%20arizonae'}]}

Scientific name used in lookup: Sistrurus catenatus edwardsii


{'status': 'success',
 'date_processed': '2019-09-16T17:30:34.541900',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:1058793'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=nameWOInd:Sistrurus\\%20catenatus\\%20edwardsii'}]}

Scientific name used in lookup: Alces alces andersoni


{'status': 'success',
 'date_processed': '2019-09-16T17:30:35.327639',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:898420'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=nameWOInd:Alces\\%20alces\\%20andersoni'}]}

Scientific name used in lookup: Oncidium undulatum


{'status': 'success',
 'date_processed': '2019-09-16T17:30:36.258735',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:894691'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=nameWOInd:Oncidium\\%20undulatum'}]}

Scientific name used in lookup: Macroclemys temmincki


{'status': 'success',
 'date_processed': '2019-09-16T17:30:36.393658',
 'status_message': 'Followed Accepted TSN',
 'details': [{'Exact Match Fail': 'https://services.itis.gov/?wt=json&rows=10&q=nameWOInd:Macroclemys\\%20temmincki'},
  {'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:668671'},
  {'Fuzzy Match': 'https://services.itis.gov/?wt=json&rows=10&q=nameWOInd:Macroclemys\\%20temmincki~0.8'}]}

Scientific name used in lookup: Stilosoma extenuatum


{'status': 'success',
 'date_processed': '2019-09-16T17:30:36.984547',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:1082386'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=nameWOInd:Stilosoma\\%20extenuatum'}]}

Scientific name used in lookup: Pseudanophthalmus potomaca potomaca


{'status': 'success',
 'date_processed': '2019-09-16T17:30:38.360092',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:110131'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=nameWOInd:Pseudanophthalmus\\%20potomaca\\%20potomaca'}]}

Scientific name used in lookup: Martes pennanti


{'status': 'success',
 'date_processed': '2019-09-16T17:30:38.798108',
 'status_message': 'Followed Accepted TSN',
 'details': [{'TSN Search': 'https://services.itis.gov/?wt=json&rows=10&q=tsn:1086356'},
  {'Exact Match': 'https://services.itis.gov/?wt=json&rows=10&q=nameWOInd:Martes\\%20pennanti'}]}

There are essentially three things that seem to be going on here:

1. There are cases where the FWS data declares a TSN for what ITIS considers to be invalid (animals)/unaccepted (plants) taxonomy. It could be that FWS biologists disagree with ITIS or simply a matter of FWS information being out of date. We don't have enough information as yet to make a judgment.
2. There are cases where the FWS data uses a scientific name that ITIS considers to be invalid/unaccepted. This is kind of the same issue as no. 1, but it could be that FWS is out of date with whatever has happened in the ITIS taxonomy world.
3. There is one case of a misspelling (Macroclemys temmincki should have been Macroclemys temminckii) where the search had to go to a fuzzy match to find the record. In this case, ITIS considers one of its records to be invalid and the other valid, though they are ultimately for the same name. 

Because there could be disagreement on the part of species biologists with the taxonomic authority, the conservative course of action here is to record the valid ITIS name and other information as an additional potential point of reference. If we run into cases where the FWS name does not find results in another system we are checking for data, we can try the ITIS valid/accepted name to see what we can turn up, flagging that result with some uncertainty factor or note to help determine its utility.

The following code block establishes a connection back to the appropriate originating record in the workplan_species data and injects ITIS valid/accepted names into a new attribute, just for these cases where we may want to use that additional information.

In [4]:
updated_workplan_species = list()

for r in [i for i in itis_cache if "data" in i.keys() and len(i["data"]) > 1]:
    source_url = next((o for o in r["processing_metadata"]["details"] if "Exact Match" in o.keys()), None)
    if not source_url:
        source_url = next((o for o in r["processing_metadata"]["details"] if "Exact Match Fail" in o.keys()), None)
    source_url = next((v for k, v in source_url.items()), None)
    source_identifier = source_url.split(":")[-1].replace("\%20", " ")
    if source_identifier.isdigit():
        source_workplan_species = next((s for s in workplan_species if s["ITIS TSN"] == source_identifier), None)
    else:
        source_workplan_species = next((s for s in workplan_species if s["Lookup Name"] == source_identifier), None)
    
    source_workplan_species["Valid ITIS Scientific Name"] = next((i["nameWOInd"] for i in r["data"] if i["usage"] in ["valid","accepted"]), None)
    source_workplan_species["Valid ITIS TSN"] = next((i["tsn"] for i in r["data"] if i["usage"] in ["valid","accepted"]), None)
    updated_workplan_species.append(source_workplan_species)


In [5]:
# Build a new workplan species recordset, injecting the augmented records
new_workplan_species = list()
for s in workplan_species:
    new_record = next((r for r in updated_workplan_species if r["Lookup Name"] == s["Lookup Name"]), None)
    if new_record:
        new_workplan_species.append(new_record)
    else:
        new_workplan_species.append(s)

# Cache the new set of workplan species information
display(bis_utils.doc_cache("../cache/workplan_species.json", new_workplan_species))

{'Doc Cache File': '../cache/workplan_species.json',
 'Number of Documents in Cache': 363,
 'Document Number 10': {'Guild': 'Insects',
  'Species Name (Common)': "Franklin's bumblebee",
  'Scientific Name': 'Bombus franklini',
  'Lead FWS Regional Office': 'Region 1 - Pacific (Northwest)',
  'Proposed FWS Decision Timeframe (Fiscal Year)': 2018,
  'Range': 'OR',
  'Bin': None,
  'Lookup Name': 'Bombus franklini',
  'ECOS Link': 'https://ecos.fws.gov/ecp/species/7022',
  'ITIS TSN': '714804'}}

# Schema Validation
Building out the schema for the workplan_species dataset used in subsequent lookup and data assembly steps gave me an opportunity to explore the enum aspect of the JSON Schema specification. This is a way of forcing a set of available values for a property, analogous to specifying enumerated domain values in FGDC metadata or ISO Feature Catalog. There were a few properties in the workplan species list that lend themselves to this approach. As we look toward scaling this methodology, we will develop the semantic infrastructure behind the system such that defined "code lists" and more robust vocabularies are used to drive validation mechanisms.

I also ran into a technical snag in using the jsonschema package for validation where passing the entire list of workplan species documents resulted in a recursion error that I could not resolve. Alternatively, I iterated through each document to validate against the schema individually, which was not noticeably slower anyway.

In [6]:
workplan_species_schema = helperfunctions.load_schema('workplan_species')
display(workplan_species_schema)

for record in new_workplan_species:
    jsonschema.validate(record, workplan_species_schema)

{'definitions': {'items': {'$id': '#/items',
   'type': ['object', 'array'],
   'title': 'Generic container for items in a dataset',
   'description': 'A JSON array or object property containing one or more items in a dataset or data structure within a dataset.'}},
 '$schema': 'http://json-schema.org/draft-07/schema#',
 '$id': 'http://data.usgs.gov/property_registry/',
 'type': ['array', 'object'],
 'title': 'FWS National Listing Workplan Species',
 'description': 'A processed dataset of the FWS National Listing Workplan species from original source material. Includes the addition of properties to aid in name/identifier lookup from other sources.',
 'items': {'$ref': '#/definitions/items',
  'required': ['Guild',
   'Species Name (Common)',
   'Scientific Name',
   'Lead FWS Regional Office',
   'Proposed FWS Decision Timeframe (Fiscal Year)',
   'Range',
   'Bin',
   'ECOS Link',
   'ITIS TSN',
   'Lookup Name']},
 'properties': {'Guild': {'$id': '#/items/properties/Guild',
   'type':