ScienceBase provides a potential source of data that may be of interest to the community working with the FWS Workplan Species. In particular, discrete datasets that have been through the formal USGS Fundamental Science Practices review process as Data Release Products may offer unique data assets not generally known to the community that may be partial results of research funding in USGS dedicated to these species. This notebook leverages a function developed to search ScienceBase, constraining the query to only Data Release products, and packaging the results as a cache.

In [1]:
import requests
import bispy
from IPython.display import display
from joblib import Parallel, delayed
import jsonschema

sb_search = bispy.sb.Search()
bis_utils = bispy.bis.Utils()

import helperfunctions

In [2]:
name_list = helperfunctions.workplan_species()

In [5]:
%%time
# Use joblib to run multiple requests for SGCN records in parallel via scientific names
sb_results = Parallel(n_jobs=8)(delayed(sb_search.search_snapshot)(q=name, system_type="Data Release", fields="title,body,contacts,dates,webLinks,files") for name, name_source in name_list)


CPU times: user 1.33 s, sys: 66.5 ms, total: 1.39 s
Wall time: 28.4 s


In [14]:
# Cache the array of retrieved documents and return/display a random sample for verification
display(bis_utils.doc_cache("../cache/sb_datarelease.json", sb_results))

{'Doc Cache File': '../cache/sb_datarelease.json',
 'Number of Documents in Cache': 386,
 'Document Number 50': {'processing_metadata': {'status': 'success',
   'date_processed': '2019-09-16T18:56:04.180650',
   'status_message': 'no items found',
   'api': 'https://www.sciencebase.gov/catalog/items?q=Cyprinodon+tularosa&max=1000&filter0=systemType%3DData+Release&fields=title%2Cbody%2Ccontacts%2Cdates%2CwebLinks%2Cfiles'},
  'parameters': {'fields': 'title,body,contacts,dates,webLinks,files',
   'max': 1000,
   'filter0': 'systemType=Data Release',
   'q': 'Cyprinodon tularosa'}}}

In [22]:
sb_datarelease_schema = helperfunctions.load_schema('sb_datarelease')
display(sb_datarelease_schema)

jsonschema.validate(sb_results, sb_datarelease_schema)

{'definitions': {'items': {'$id': '#items',
   'type': ['object', 'array'],
   'title': 'Generic container for items in a dataset',
   'description': 'A JSON array or object property containing one or more items in a dataset or data structure within a dataset.'}},
 '$schema': 'http://json-schema.org/draft-07/schema#',
 '$id': 'http://data.usgs.gov/property_registry/',
 'title': 'Cache of ScienceBase search results for Data Releases related to species',
 'description': 'A cached set of ScienceBase search results from the ScienceBase API using the Python sciencebasepy package.',
 'type': 'array',
 'items': {'$ref': '#/definitions/items',
  'type': 'object',
  'properties': {'processing_metadata': {'$ref': 'common_properties.json#/definitions/processing_metadata'},
   'parameters': {'$ref': 'common_properties.json#/definitions/parameters',
    'type': 'object',
    'properties': {'fields': {'type': 'string',
      'title': 'fields to return',
      'description': 'Comma-delimited list of 