The FWS Threatened and Endangered Species (TESS) system is part of their underlying data source. TESS has a set of web services that are a little rudimentary but functional. In this notebook, we retrieve information from TESS, first for all of the species we "ECOS-scraped" that have ITIS TSN numbers declared (cached to the cache/workplan_species.json file), and then we look for any other species in the list where we did not get a TSN.

The main bispy function here is the search function in the Tess class of the tess module. It takes either an ITIS TSN or scientific name and runs a search. The TESS service returns XML, but the function transforms and tweaks slightly to return a dictionary (JSON) structure.

In [1]:
import json
import bispy
from IPython.display import display
from joblib import Parallel, delayed
import jsonschema

tess = bispy.ecos.Tess()
bis_utils = bispy.bis.Utils()

import helperfunctions

In [2]:
# Open up the cached workplan species
with open("../cache/workplan_species.json", "r") as f:
    workplan_species = json.loads(f.read())
    f.close()

In [3]:
# Prepare two lists - one of the TSNs we already know about from ECOS scraping and the other of the remaining scientific names
tsn_list = [r["ITIS TSN"] for r in workplan_species if r["ITIS TSN"] is not None]
names_without_tsns = [r["Lookup Name"] for r in workplan_species if r["ITIS TSN"] is None]

In [4]:
%%time
# Use joblib to run multiple requests for TESS documents in parallel via known ITIS TSNs
tess_cache_from_tsn = Parallel(n_jobs=8)(delayed(tess.search)(tsn) for tsn in tsn_list)

CPU times: user 684 ms, sys: 93.7 ms, total: 778 ms
Wall time: 22.3 s


In [5]:
%%time
# Use joblib to run multiple requests for TESS documents in parallel via scientific names
tess_cache_from_names = Parallel(n_jobs=8)(delayed(tess.search)(name) for name in names_without_tsns)

CPU times: user 228 ms, sys: 14.9 ms, total: 243 ms
Wall time: 7.41 s


In [6]:
# If any new TSNs were found via name search, update those back to the workplan species JSON
updated_workplan_species = list()
for record in [t for t in tess_cache_from_names if 'tess_species' in t.keys() and int(t['tess_species']['SPECIES_DETAIL']['TSN']) > 0]:
    workplan_record = next((d for d in workplan_species if d["Lookup Name"] == record['tess_species']['SPECIES_DETAIL']['SCINAME']), None)
    workplan_record["ITIS TSN"] = record["tess_species"]["SPECIES_DETAIL"]["TSN"]
    updated_workplan_species.append(workplan_record)
    updated_species_names = [s["Scientific Name"] for s in updated_workplan_species]

if len(updated_workplan_species) > 0:
    new_workplan_species = list(filter(lambda i: i['Scientific Name'] not in updated_species_names, workplan_species))
    new_workplan_species.extend(updated_workplan_species)
    display(bis_utils.doc_cache("../cache/workplan_species.json", new_workplan_species))
    

In [7]:
# Cache the array of retrieved documents and return/display a random sample for verification
display(bis_utils.doc_cache("../cache/tess.json", tess_cache_from_tsn + tess_cache_from_names))

{'Doc Cache File': '../cache/tess.json',
 'Number of Documents in Cache': 363,
 'Document Number 49': {'processing_metadata': {'status': 'success',
   'date_processed': '2019-09-16T17:26:17.692552',
   'api': 'https://ecos.fws.gov/ecp0/TessQuery?request=query&xquery=/SPECIES_DETAIL[TSN=593084]'},
  'data': {'SPECIES_DETAIL': {'ENTITY_ID': '4132',
    'SPCODE': 'I06O',
    'VIPCODE': 'I01',
    'SCINAME': 'Somatochlora margarita',
    'COMNAME': 'Texas emerald',
    'INVNAME': 'emerald, Texas',
    'POP_ABBREV': 'Wherever found',
    'POP_DESC': 'Wherever found',
    'FAMILY': 'Corduliidae',
    'STATUS': 'UR',
    'STATUS_TEXT': 'Under Review in the Candidate or Petition Process',
    'LEAD_AGENCY': '1',
    'LEAD_REGION': '2',
    'COUNTRY': '1',
    'TSN': '593084',
    'DPS': '0',
    'REFUGE_OCCURRENCE': None}}}}

# Schema Validation
The USFWS TESS system is documented in a set of [myUSGS wiki pages](https://my.usgs.gov/confluence/pages/viewpage.action?pageId=518426757). The schema documents specific aspects of the TESS information model of note with links to code values. One slight challenge in using these data is that there is not necessarily a one-to-one match between names or ITIS TSNs and TESS records. There are instances where the "tess_species" key in the data structure contains an array of results because of more than one record being found in the TESS system, and the validation reflects this dynamic.

In [10]:
tess_schema = helperfunctions.load_schema('tess')
display(tess_schema)

jsonschema.validate(tess_cache_from_names + tess_cache_from_tsn, tess_schema)

{'definitions': {'items': {'$id': '#/items',
   'type': ['object', 'array'],
   'title': 'Generic container for items in a dataset',
   'description': 'A JSON array or object property containing one or more items in a dataset or data structure within a dataset.'}},
 '$schema': 'http://json-schema.org/draft-07/schema#',
 '$id': 'http://data.usgs.gov/property_registry/',
 'type': 'array',
 'title': 'TESS Schema',
 'description': 'A dataset containing summarized results from the USFWS Threatened and Endangered Species System pulled from its API. Data were assembled using a search function build into the experimental bispy software package (https://github.com/usgs-bcb/bispy).',
 'items': {'$ref': '#/definitions/items'},
 'properties': {'processing_metadata': {'$ref': 'common_properties.json#/definitions/processing_metadata'},
  'data': {'$ref': 'common_properties.json#/definitions/data',
   'required': ['SPECIES_DETAIL'],
   'properties': {'SPECIES_DETAIL': {'$id': '#/items/properties/tess