The World Register of Marine Species (WoRMS) is another taxonomic authority we use in our work. In the case of the FWS work plan species, we mostly rely on ITIS as that is the taxonomic authority most used by FWS, and ITIS TSNs are determined/declared for most species. This notebook follows the ITIS caching process to retrieve any unmatched names and try them against WoRMS using the worms module of the bispy package.

In [1]:
import requests
import json
import bispy
from IPython.display import display
from joblib import Parallel, delayed
import jsonschema

worms = bispy.worms.Worms()
bis_utils = bispy.bis.Utils()

import helperfunctions

In [2]:
with open("../cache/itis.json", "r") as f:
    itis_cache = json.loads(f.read())
    f.close()

In [3]:
#Unwind the ITIS URLs into names for cases where we were not able to obtain an exact match
unmatched_itis_names = [i["processing_metadata"]["details"][0]["Exact Match Fail"].replace("http://services.itis.gov/?wt=json&rows=10&q=nameWOInd:", "").replace("\\%20", " ") for i in itis_cache if i["processing_metadata"]["status_message"] == "Not Matched"]


In [4]:
unmatched_itis_names

['Copablepharon fuscum',
 'Oreohelix n. sp. 1',
 'Monadenia fidelis minor',
 'Vertigo sp.',
 'Eurycea sp.',
 'Quincuncina mitchelli',
 'Pleurobema ridellii',
 'Potamilus metnecktayi',
 'Quadrula houstonensis',
 'Macrhybopsis aestivalis tetranemus',
 'Ictalurus sp.',
 'Castilleja ornata',
 'Aster puniceus scabricaulis',
 'Helianthus occidentalis plantagineus',
 'astylis species',
 'Lycaena ferrisi',
 'heterocampa amanda',
 'litodonta alpina',
 'Donrichardsonia macroneuron',
 'Aspidoscelis arizonae',
 'Sistrurus catenatus edwardsii',
 'Deirochelys reticularia miaria',
 'Orconectes peruncus',
 'Orconectes quadruncus',
 'Schoenoplectus hallii',
 'Papaipema eryngii',
 'Grus canadensis pratensis',
 'Pleurobema athearni',
 'Toxolasma lividum',
 'Pleuronaia barnesiana',
 'Percina kusha',
 'Etheostoma maydeni',
 'Percina williamsi',
 'Oncidium undulatum',
 'Euphyes pilatka klotsi',
 'Atlantea tulita',
 'Macroclemys temmincki',
 'Eumeces egregius insularis',
 'Eumeces egregius egregius',
 'Pyrgu

In [5]:
%%time
# Use joblib to run multiple searches for WoRMS species in parallel via species names
worms_result = Parallel(n_jobs=8)(delayed(worms.search)(name) for name in unmatched_itis_names)

CPU times: user 234 ms, sys: 73.1 ms, total: 307 ms
Wall time: 10.1 s


In [6]:
# Cache the array of retrieved documents and return/display a random sample for verification
display(bis_utils.doc_cache("../cache/worms.json", worms_result))

{'Doc Cache File': 'cache/worms.json',
 'Number of Documents in Cache': 83,
 'Document Number 50': {'processing_metadata': {'status': 'error',
   'date_processed': '2019-07-24T15:51:38.218140',
   'status_message': 'Not Matched',
   'api': 'http://www.marinespecies.org/rest/AphiaRecordsByName/Pseudanophthalmus sanctipauli?like=true&marine_only=false&offset=1'}}}

# Schema Validation
Working through the WoRMS and ITIS processes and schema documentation introduced an ability to establish three new common properties:

* date_created (a date/time from source data indicating when a record was created)
* date_modified (a date/time from source data indicating when a record was last updated/modified)
* biological_taxonomy (an array data structure containing the full taxonomic hierarchy upward from a given taxon record)

Because I'm building these data structures with a processing function, it seems reasonable to introduce these common properties at the build point, and so I put functionality into the relevant functions in the bispy package to write source attributes to these property names. Alternatively, it might be better to layer on some other logic that adds or transforms these common property names after the fact, retaining full original source data and then building out something like a secondary index with common properties.

In [8]:
worms_schema = helperfunctions.load_schema('worms')
display(worms_schema)

jsonschema.validate(worms_result, worms_schema)

{'definitions': {'items': {'$id': '#/items',
   'type': ['object', 'array'],
   'title': 'Generic container for items in a dataset',
   'description': 'A JSON array or object property containing one or more items in a dataset'},
  'doi': {'$id': '#doi',
   'type': ['string', 'null'],
   'title': 'Digital Object Identifier',
   'description': 'A digital object identifier for or associated with a record. May be in the form of an HTTP url or a standalone identifier.',
   'examples': ['http://dx.doi.org/10.2305/IUCN.UK.2004.RLTS.T59435A11941314.en',
    '10.2305/IUCN.UK.2004.RLTS.T59435A11941314.en']},
  'resolvable_identifier': {'$id': '#resolvable_identifier',
   'type': 'string',
   'title': 'Resolvable Identifier',
   'description': 'Some form of resolvable identifier for a record that returns a response when accessed over an included protocol such as HTTP. May or may not provide for content negotiation.',
   'examples': ['https://www.iucnredlist.org/species/59435/11941314']},
  'citat