It is often useful to gain a basic understanding of what species observation records may exist for potential use in modeling and analysis. The Global Biodiversity Information Facility (GBIF) is one source of observation data with a robust API that lends itself to a process for relatively rapid summarization in this context of building out a species information catalog. This notebook exercises an experimental function that ties together a couple aspects of the GBIF API to make a reasonable guess at a species identifier (from the GBIF taxonomic hub) and pull back a basic characterization of US-based occurrences. The three facets of the occurrence records that we can reasonably assemble quickly at this time include the following:

* basisOfRecord - The Darwin Core term that describes what the basis is for the species occurrence, mostly helping to distinguish between museum specimens (which may or may not have accurate spatial information for where the specimen was collected in the field) and human observations (or other methods of observing) a species.
* year - Provides a basic time series by year for the number of occurrences.
* institutionCode - Somewhat obscure set of codes/terms for the institution providing the record. Further details exist behind this, but the codes can help provide a basic idea on where the records come from.

In [1]:
import requests
import bispy
from IPython.display import display
from joblib import Parallel, delayed

gbif = bispy.gbif.Gbif()
bis_utils = bispy.bis.Utils()

import helperfunctions

In [2]:
name_list = helperfunctions.workplan_species()

In [3]:
%%time
# Use joblib to run multiple requests for records in parallel via scientific names
gbif_results = Parallel(n_jobs=8)(delayed(gbif.summarize_us_species)(name, name_source) for name, name_source in name_list)


CPU times: user 1.09 s, sys: 137 ms, total: 1.23 s
Wall time: 1min 15s


In [4]:
# Cache the array of retrieved documents and return/display a random sample for verification
display(bis_utils.doc_cache("cache/gbif.json", gbif_results))

{'Doc Cache File': 'cache/gbif.json',
 'Document Number 236': {'GBIF Species Record': {'canonicalName': 'Percina macrocephala',
   'class': 'Actinopterygii',
   'classKey': 204,
   'family': 'Percidae',
   'familyKey': 4481,
   'genus': 'Percina',
   'genusKey': 2382030,
   'higherClassificationMap': {'1': 'Animalia',
    '204': 'Actinopterygii',
    '2382030': 'Percina',
    '44': 'Chordata',
    '4481': 'Percidae',
    '587': 'Perciformes'},
   'key': 2382086,
   'kingdom': 'Animalia',
   'kingdomKey': 1,
   'nubKey': 2382086,
   'order': 'Perciformes',
   'orderKey': 587,
   'parent': 'Percina',
   'parentKey': 2382030,
   'phylum': 'Chordata',
   'phylumKey': 44,
   'rank': 'SPECIES',
   'scientificName': 'Percina macrocephala (Cope, 1867)',
   'species': 'Percina macrocephala',
   'speciesKey': 2382086,
   'status': 'ACCEPTED',
   'synonym': False},
  'Occurrence Summary': {'count': 315,
   'facets': [{'counts': [{'count': 152, 'name': 'ntsrv'},
      {'count': 24, 'name': 'cumv'}