Perhaps one of the most important aspects of building a continuous stream of new information associated with species in the FWS work plan or other circumstances is the ability to identify and access new published literature. Many of the structured databases assembled and organized into this collection are based wholly or partly in literature references and information extracted from literature. We are currently working with the team developing the xDD Digital Library on a number of tools and techniques for a) identifying literature potentially applicable to species-based research and b) using natural language processing tools to pull specific data from those sources for use. This is an ongoing effort that will result in improved production capabilities over time.

In the near term, we take advantage of some basic and enhanced search functionality to identify potential articles of interested in the xDD library of millions of documents that are increasing daily. The xdd module in the bispy package contains some search and packaging functionality that interfaces with the xDD REST API.

In [1]:
import requests
import json
import bispy
from IPython.display import display
from joblib import Parallel, delayed
import random

xdd = bispy.xdd.Xdd()

In [2]:
# Open up the cached workplan species
with open("cache/workplan_species.json", "r") as f:
    workplan_species = json.loads(f.read())

In [3]:
# Use joblib to run multiple requests for SGCN records in parallel via scientific names
xdd_results = Parallel(n_jobs=8)(delayed(xdd.snippets)(name) for name in [r["Scientific Name"] for r in workplan_species])


In [16]:
# Dump the records we discovered and packaged to a cache file
# I need to revisit this once I get some things cleared up with taxonomic matching to hopefully find more records
with open("cache/xdd.json", "w") as f:
    f.write(json.dumps([x for x in xdd_results if "Number Documents" in x["Processing Metadata"].keys() and x["Processing Metadata"]["Number Documents"] > 0], indent=4))

In [17]:
# Open the file back up and verify
with open("cache/xdd.json", "r") as f:
    xdd_cache = json.loads(f.read())

print(len(xdd_cache))
display(xdd_cache[random.randint(0,len(xdd_cache)-1)])

271


{'Data': [{'URL': 'http://www.sciencedirect.com/science/article/pii/S1470160X01000036',
   '_gddid': '579f8f51cf58f138945af82b',
   'authors': 'Dale, Virginia H.; Beyeler, Suzanne C.',
   'coverDate': 'August 2001',
   'doi': '10.1016/S1470-160X(01)00003-6',
   'highlight': ['many other species (e.g. the northern spotted owl (Strix occidentalis caurina)',
    'the northern spotted owl (Strix occidentalis caurina) that occupy',
    'Strix occidentalis caurina) that occupy old growth forest in the Paciﬁc'],
   'publisher': 'Elsevier',
   'pubname': 'Ecological Indicators',
   'title': 'Challenges in the development and use of ecological indicators'},
  {'URL': 'http://www.nrcresearchpress.com/doi/abs/10.1139/x98-039',
   '_gddid': '57985e4dcf58f139fabbd5a0',
   'authors': 'Acker, S A; Zenner, E K; Emmingham, W H',
   'coverDate': 'May 1998',
   'doi': '10.1139/x98-039',
   'highlight': ['forests within the range of the northern spotted owl (Strix occidentalis caurina;',
    'the northern