It is often useful to gain a basic understanding of what species observation records may exist for potential use in modeling and analysis. The Global Biodiversity Information Facility (GBIF) is one source of observation data with a robust API that lends itself to a process for relatively rapid summarization in this context of building out a species information catalog. This notebook exercises an experimental function that ties together a couple aspects of the GBIF API to make a reasonable guess at a species identifier (from the GBIF taxonomic hub) and pull back a basic characterization of US-based occurrences. The three facets of the occurrence records that we can reasonably assemble quickly at this time include the following:

* basisOfRecord - The Darwin Core term that describes what the basis is for the species occurrence, mostly helping to distinguish between museum specimens (which may or may not have accurate spatial information for where the specimen was collected in the field) and human observations (or other methods of observing) a species.
* year - Provides a basic time series by year for the number of occurrences.
* institutionCode - Somewhat obscure set of codes/terms for the institution providing the record. Further details exist behind this, but the codes can help provide a basic idea on where the records come from.

In [1]:
import requests
import json
import bispy
from IPython.display import display
from joblib import Parallel, delayed
import random

gbif = bispy.gbif.Gbif()
bis_utils = bispy.bis.Utils()

In [2]:
# Open up the cached workplan species
with open("cache/workplan_species.json", "r") as f:
    workplan_species = json.loads(f.read())

CPU times: user 2.2 ms, sys: 1.59 ms, total: 3.79 ms
Wall time: 4.4 ms


In [3]:
%%time
# Use joblib to run multiple requests for records in parallel via scientific names
gbif_results = Parallel(n_jobs=8)(delayed(gbif.summarize_us_species)(name) for name in [r["Scientific Name"] for r in workplan_species])


CPU times: user 1.01 s, sys: 122 ms, total: 1.13 s
Wall time: 2min 36s


In [4]:
# Cache the array of retrieved documents and return/display a random sample for verification
display(bis_utils.doc_cache("cache/gbif.json", gbif_results))

{'Doc Cache File': 'cache/gbif.json',
 'Document Number 182': {'GBIF Species Record': {'canonicalName': 'Balduina atropurpurea',
   'class': 'Magnoliopsida',
   'classKey': 220,
   'family': 'Asteraceae',
   'familyKey': 3065,
   'genus': 'Balduina',
   'genusKey': 3104987,
   'higherClassificationMap': {'220': 'Magnoliopsida',
    '3065': 'Asteraceae',
    '3104987': 'Balduina',
    '414': 'Asterales',
    '6': 'Plantae',
    '7707728': 'Tracheophyta'},
   'key': 3104996,
   'kingdom': 'Plantae',
   'kingdomKey': 6,
   'nubKey': 3104996,
   'order': 'Asterales',
   'orderKey': 414,
   'parent': 'Balduina',
   'parentKey': 3104987,
   'phylum': 'Tracheophyta',
   'phylumKey': 7707728,
   'rank': 'SPECIES',
   'scientificName': 'Balduina atropurpurea R.M.Harper',
   'species': 'Balduina atropurpurea',
   'speciesKey': 3104996,
   'status': 'ACCEPTED',
   'synonym': False},
  'Occurrence Summary': {'count': 172,
   'facets': [{'counts': [{'count': 81, 'name': 'HUMAN_OBSERVATION'},
      