The operational concept behind the Research Reference Library is to have a continuously running process that is constantly receiving new citation "registrants" and regularly checks for the best available and updated information on all registrants over time. This will eventually move to our message queueing pipeline infrastructure, but for now is executed in the lab as we work through building this capability. The results of these processes are assembled into a document store that pretty much retains the original data structures as provided by any of our related sources through their APIs or whatever interface we've been able to work with.

The next step in the process (something we're still working up) is to process the cached documents we pull together in the RRL and distribute usable data structures to other places. This will involve picking out particular attributes or combinations of information, running various synthesis algorithms, and shipping out a distribution copy of the data to an appropriate locale in our own systems or through our API for someone else to use. This is a similar architectural approach that we are taking for our other registry work - Taxa Information Registry and Spatial Feature Registry.

This notebook grabs up a couple of existing records for some of the use cases we are working against and displays them with notes for further conversation. For now, you'll see that we call up the MongoDB database directly using the bis package where we rely on locally set environment variables to establish the connection. As we gather information on what the API needs to offer on this aspect of the BIS, we'll build in public REST API functionality there that will use an open authentication mechanism.

In [1]:
from IPython.display import display
from bis import dd

In [2]:
bis = dd.getDB("bis")
collection_rrl = bis["RRL"]
annotation_rrl = bis["RRL Annotations"]

# Sage Grouse
One of the use cases we are working on with Ecosystems folks is an annotated bibliography of sage grouse science. That bibliography consists of two parts - the citations themselves, which are processed just like any other citation, and a set of annotations created by reviewers. The latter is an information pool we are looking at for a separate component of the Biogeographic Information System, a generalized annotation store that will contain both human and software-generated annotations on anything else in the BIS (Research References, Taxa, Spatial Features, etc.).

The following code block runs a basic faceting aggregation showing the number of records where we found data and where we did not find data in the three sources worked so far - link content negotiation, CrossRef and Scopus. In each item, the first number is the number of citations where we found data for the category and the second is the number where we have not yet returned a result.

In [3]:
pipeline = [
    {"$unwind":{"path":"$Sources"}},
    {"$match":{"Sources.source":"https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=0"}},
    {"$facet":{
        "Link Content Negotiation":[{"$sortByCount":"$Link Metadata.Success"}],
        "Scopus":[{"$sortByCount":"$Scopus.Success"}]
        }  
    }
]

for record in collection_rrl.aggregate(pipeline):
    print ("Link Content Negotiation -", [r for r in record["Link Content Negotiation"] if r["_id"] is True][0]["count"], "/", [r for r in record["Link Content Negotiation"] if r["_id"] is False][0]["count"])
    try:
        print ("Scopus -", [r for r in record["Scopus"] if r["_id"] is True][0]["count"], "/", [r for r in record["Scopus"] if r["_id"] is False][0]["count"])
    except:
        pass


Link Content Negotiation - 160 / 7
Scopus - 149 / 17


## Sage Grouse Citation Record
The following code block outputs a single sage grouse bibliographic record in its entirity, including all assembled information and the associated annotations. As stated previously, the idea for this, operationally, would be to put together a bespoke index of information optimized for the Sage Grouse Bibliography with an accompanying set of API routes to help drive exploration, analysis and visualization functionality with that particular collection. We take advantage of the backend infrastructure and systematic processing through the RRL and Annotation models, but maintain the same customized functionality of interest to this particular community. Over time, we will likely see the same usage patterns again and again for similar use cases and can built a code library that takes advantage of economies of scale, but this approach provides the flexibility for any given community to meet their particular objectives while contributing to and taking advantage of a larger infrastructure.

This is simply the raw data dump in JSON format, so it's a little messy to wade through but still hopefully understandable enough. You can see that we've been able to compound the available information about this particular citation in some potentially useful ways, pulling up information about funding agency, references from the paper, and other useful details. Now, we need to work through this, determine what else is useful to the specific use case, assemble a new data model, and provide an API for public consumption.

In [10]:
sgRecord = collection_rrl.find_one({"$and":[{"Sources":{"$elemMatch":{"source":"https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=0"}}},{"Link Metadata.Success":True},{"Scopus.Success":True}]})
annotationTarget = "rrl:"+sgRecord["_id"]

display (sgRecord)

for annotation in annotation_rrl.find({"target":annotationTarget}):
    display (annotation)

{'Citation String': 'Apa, A.D., Thompson, T.R., and Reese, K.P., 2017, Juvenile greater sage-grouse survival, movements, and recruitment in Colorado: Journal of Wildlife Management, v. 81, no. 4, p. 652–668.',
 'Link Metadata': {'Date Checked': '2018-05-23T21:15:40.020848',
  'Link Checked': 'https://doi.org/10.1002/jwmg.21230',
  'Link Response': {'DOI': '10.1002/jwmg.21230',
   'ISSN': ['0022-541X'],
   'URL': 'http://dx.doi.org/10.1002/jwmg.21230',
   'author': [{'affiliation': [{'name': 'Colorado Division of Parks and Wildlife; 711 Independent Avenue Grand Junction CO 81505'}],
     'family': 'Apa',
     'given': 'Anthony D.',
     'sequence': 'first'},
    {'affiliation': [{'name': 'Department of Fish and Wildlife Sciences; University of Idaho; P. O. Box 441136 Moscow ID 83844 USA'}],
     'family': 'Thompson',
     'given': 'Thomas R.',
     'sequence': 'additional'},
    {'affiliation': [{'name': 'Department of Fish and Wildlife Sciences; University of Idaho; P. O. Box 441136 Mo

{'_id': ObjectId('5af487190601ba168f80ea57'),
 'body': 'Juvenile survival of GRSG is an important component of GRSG demographics, but available information is limited. GRSG populations have become isolated in some locations and may be augmented through reintroduction of captive-reared chicks. Information on juvenile survival of captive-reared chicks can inform such efforts.',
 'datetime': '2018-05-10T17:53:29.004658',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=1',
 'target': 'rrl:58219f9b42767011a13d7fd742cd5800',
 'type': 'Background'}

{'_id': ObjectId('5af487190601ba168f80ea58'),
 'body': 'The authors sought (1) to estimate and evaluate factors affecting the survival of juvenile GRSG, (2) to compare adult and juvenile survival, (3) to determine recruitment rates, (4) to identify movement from fall to winter ranges, and (5) to compare survival of wild and domestically hatched chicks.',
 'datetime': '2018-05-10T17:53:29.004658',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=1',
 'target': 'rrl:58219f9b42767011a13d7fd742cd5800',
 'type': 'Objectives'}

{'_id': ObjectId('5af487190601ba168f80ea59'),
 'body': 'The authors used radio telemetry to monitor movement and survival of 60â\x80\x9365 adult and 183 juvenile GRSG from September until March at two study sites from 2005 to 2008. Both wild chicks and domestically hatched chicks that had been introduced into wild broods were monitored. Survival was evaluated for influence of study area, year, body mass at two ages, day of hatch, and whether the chicks were wild or domestically hatched. ',
 'datetime': '2018-05-10T17:53:29.004658',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=1',
 'target': 'rrl:58219f9b42767011a13d7fd742cd5800',
 'type': 'Methods'}

{'_id': ObjectId('5af487190601ba168f80ea5a'),
 'body': 'Colorado; MZ II',
 'datetime': '2018-05-10T17:53:29.004658',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=1',
 'target': 'rrl:58219f9b42767011a13d7fd742cd5800',
 'type': 'Location'}

{'_id': ObjectId('5af487190601ba168f80ea5b'),
 'body': 'In juvenile GRSG, female survival rates were greater than male survival rates, with most mortality occurring in fall. Survival rates of wild and domestically hatched chicks were similar. Survival of juveniles was lower than adults during fall and spring but was comparable over winter months. Most surviving juveniles recruited into their natal population and did not migrate to other populations. Survival and recruitment rates and movement distances varied by study site.',
 'datetime': '2018-05-10T17:53:29.004658',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=1',
 'target': 'rrl:58219f9b42767011a13d7fd742cd5800',
 'type': 'Findings'}

{'_id': ObjectId('5af487190601ba168f80ea5c'),
 'body': 'Juvenile GRSG are vulnerable in the fall, when broods disassociate. Lower survival of juveniles compared to adults in fall and spring may be related to greater vulnerability of juveniles to predation during these time periods. Domestic hatching followed by introduction of young chicks into existing wild broods was deemed successful because survival rates of these birds were comparable to wild-hatched birds. However, observed recruitment rates may not be sufficient to replace adult mortality in these two populations.',
 'datetime': '2018-05-10T17:53:29.004658',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=1',
 'target': 'rrl:58219f9b42767011a13d7fd742cd5800',
 'type': 'Implications'}

{'_id': ObjectId('5af487190601ba168f80ea5d'),
 'body': ['Behavior or demographics', 'Captive breeding', 'Survival'],
 'datetime': '2018-05-10T17:53:29.004658',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=1',
 'target': 'rrl:58219f9b42767011a13d7fd742cd5800',
 'type': 'Topics'}

# Annotations
We are really excited to start working more on the annotations component of the BIS, and the Sage Grouse bibliography offers an interesting use case for experimentation. We also have ongoing work in our own portfolio on this, including the following use cases:

* Specific habitat relationship attributes for modeled vertebrate species and the citations from which the knowledge was generated
* Specific vegetation classification parameters in the US National Vegetation Classification and the reference papers tied to those
* Dam removal science characteristics

For this exercise, I started a new "RRL Annotations" data store just to capture the basic structured annotations that the Sage Grouse bibliography contains into a more generalized data model. We are still working through various ways of structuring and employing the W3C Annotation specification into a working data model, so this will be shifting.

The following code block displays a few of the scraped Sage Grouse annotations as a conversation starter. The notion here is that annotations need a place to flow freely as they are generated in any particular context. Contexts may include some form of web application provided for authorized users to build annotations on some particular set of papers. Or they might come from some algorithm operating in a text mining context. So far, we're dealing with the target of the annotation being a paper or report of some kind, something in the RRL. But annotations can also be made against other identified targets in the BIS.

The simple information model we're experimenting with here uses a convention to identify the target as the identifier of the rrl entity. The ID string there is the MD5 hashed citation string used in the RRL registry process. We identify the source of the annotation here as the web app the annotations were scraped from, but that should actually be a pointer to the scraping algorithm or some way of representing the process that grabbed up the annotations. Ultimately, the sage grouse annotations should be sourced to the person who wrote them up, a piece of information I didn't have access to as yet. We include a type label string for now based on the section of the annotation from the web app, but these ultimately need to be more sophisticated pointers to a type classification vocabulary that tells us more about the type of annotation, including what form we expect its data to be in. The annotation itself in this case is just the text string scraped from the web app, and the data structure shows the raw output. The first query just shows the annotation type count currently in the data.

Lots more interesting work to do here like parsing the location information to tie to the Spatial Feature Registry and working through the similarities and differences in documenting scientific findings with the GCIS model we are working toward.

In [5]:
for record in annotation_rrl.aggregate([{"$facet":{"Annotation Type":[{"$sortByCount":"$type"}]}}]):
    display (record)


for record in annotation_rrl.find({}).limit(20):
    display (record)

{'Annotation Type': [{'_id': 'Topics', 'count': 167},
  {'_id': 'Implications', 'count': 167},
  {'_id': 'Findings', 'count': 167},
  {'_id': 'Background', 'count': 167},
  {'_id': 'Objectives', 'count': 167},
  {'_id': 'Methods', 'count': 167},
  {'_id': 'Location', 'count': 167}]}

{'_id': ObjectId('5af487190601ba168f80ea57'),
 'body': 'Juvenile survival of GRSG is an important component of GRSG demographics, but available information is limited. GRSG populations have become isolated in some locations and may be augmented through reintroduction of captive-reared chicks. Information on juvenile survival of captive-reared chicks can inform such efforts.',
 'datetime': '2018-05-10T17:53:29.004658',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=1',
 'target': 'rrl:58219f9b42767011a13d7fd742cd5800',
 'type': 'Background'}

{'_id': ObjectId('5af487190601ba168f80ea58'),
 'body': 'The authors sought (1) to estimate and evaluate factors affecting the survival of juvenile GRSG, (2) to compare adult and juvenile survival, (3) to determine recruitment rates, (4) to identify movement from fall to winter ranges, and (5) to compare survival of wild and domestically hatched chicks.',
 'datetime': '2018-05-10T17:53:29.004658',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=1',
 'target': 'rrl:58219f9b42767011a13d7fd742cd5800',
 'type': 'Objectives'}

{'_id': ObjectId('5af487190601ba168f80ea59'),
 'body': 'The authors used radio telemetry to monitor movement and survival of 60â\x80\x9365 adult and 183 juvenile GRSG from September until March at two study sites from 2005 to 2008. Both wild chicks and domestically hatched chicks that had been introduced into wild broods were monitored. Survival was evaluated for influence of study area, year, body mass at two ages, day of hatch, and whether the chicks were wild or domestically hatched. ',
 'datetime': '2018-05-10T17:53:29.004658',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=1',
 'target': 'rrl:58219f9b42767011a13d7fd742cd5800',
 'type': 'Methods'}

{'_id': ObjectId('5af487190601ba168f80ea5a'),
 'body': 'Colorado; MZ II',
 'datetime': '2018-05-10T17:53:29.004658',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=1',
 'target': 'rrl:58219f9b42767011a13d7fd742cd5800',
 'type': 'Location'}

{'_id': ObjectId('5af487190601ba168f80ea5b'),
 'body': 'In juvenile GRSG, female survival rates were greater than male survival rates, with most mortality occurring in fall. Survival rates of wild and domestically hatched chicks were similar. Survival of juveniles was lower than adults during fall and spring but was comparable over winter months. Most surviving juveniles recruited into their natal population and did not migrate to other populations. Survival and recruitment rates and movement distances varied by study site.',
 'datetime': '2018-05-10T17:53:29.004658',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=1',
 'target': 'rrl:58219f9b42767011a13d7fd742cd5800',
 'type': 'Findings'}

{'_id': ObjectId('5af487190601ba168f80ea5c'),
 'body': 'Juvenile GRSG are vulnerable in the fall, when broods disassociate. Lower survival of juveniles compared to adults in fall and spring may be related to greater vulnerability of juveniles to predation during these time periods. Domestic hatching followed by introduction of young chicks into existing wild broods was deemed successful because survival rates of these birds were comparable to wild-hatched birds. However, observed recruitment rates may not be sufficient to replace adult mortality in these two populations.',
 'datetime': '2018-05-10T17:53:29.004658',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=1',
 'target': 'rrl:58219f9b42767011a13d7fd742cd5800',
 'type': 'Implications'}

{'_id': ObjectId('5af487190601ba168f80ea5d'),
 'body': ['Behavior or demographics', 'Captive breeding', 'Survival'],
 'datetime': '2018-05-10T17:53:29.004658',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=1',
 'target': 'rrl:58219f9b42767011a13d7fd742cd5800',
 'type': 'Topics'}

{'_id': ObjectId('5af4871a0601ba168f80ea5e'),
 'body': 'Incorporating climate-change projections into species and land management conservation is challenging. Landscape-level vulnerability assessments may help managers consider potential effects of climate on habitat conditions that affect GRSG. Spatial data and maps quantify assessments and offer visualizations important to interpreting them and guiding management.',
 'datetime': '2018-05-10T17:53:29.911165',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=2',
 'target': 'rrl:da534ed6dbf0b10c8fab673c9d576aee',
 'type': 'Background'}

{'_id': ObjectId('5af4871a0601ba168f80ea5f'),
 'body': 'Project objectives were (1) to assess vulnerability of GRSG habitat to climate change and (2) to map vulnerability of GRSG habitat to climate change at a relevant scale to inform planning and habitat management for GRSG.',
 'datetime': '2018-05-10T17:53:29.911165',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=2',
 'target': 'rrl:da534ed6dbf0b10c8fab673c9d576aee',
 'type': 'Objectives'}

{'_id': ObjectId('5af4871a0601ba168f80ea60'),
 'body': 'The authors developed climate envelope models for sagebrush, pinyon pine, juniper, and cheatgrass to inform climate-change vulnerability assessments conducted at a subregional scale and at a local scale (defined by boundaries of Priority Areas for Conservation ). Vulnerability assessments also considered drought (both scales) and fire, conifer encroachment, risk of invasive annual grasses, and human modification (local scale only). Two commonly used indices of climate-change vulnerability were calculated. Comparisons were made between mean contemporary (1961â\x80\x931990) and projected future (2041â\x80\x932070) conditions. An ensemble of 23 future climate models and a moderate emissions scenario were used to model potential future conditions.',
 'datetime': '2018-05-10T17:53:29.911165',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=2',
 'target': 'rrl:da534ed6dbf0b10c8fab673c9d576aee',
 'type': '

{'_id': ObjectId('5af4871a0601ba168f80ea61'),
 'body': 'Utah, Nevada; MZ III',
 'datetime': '2018-05-10T17:53:29.911165',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=2',
 'target': 'rrl:da534ed6dbf0b10c8fab673c9d576aee',
 'type': 'Location'}

{'_id': ObjectId('5af4871a0601ba168f80ea62'),
 'body': 'Climate envelope models indicated a loss of suitable climate for Wyoming big sagebrush across much of the subregion. Multiyear droughts have occurred in all PACs in the subregion over the last century. Vulnerability to climate change was high according to both indices at both spatial scales. Local-level evaluations indicated higher risk in the drier Sheeprock PAC compared to the Strawberry PAC because of potential loss of sagebrush from climate change and increasing potential for conifer expansion and cheatgrass.',
 'datetime': '2018-05-10T17:53:29.911165',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=2',
 'target': 'rrl:da534ed6dbf0b10c8fab673c9d576aee',
 'type': 'Findings'}

{'_id': ObjectId('5af4871a0601ba168f80ea63'),
 'body': 'Periodic drought, increasing conifer cover, and invasion by cheatgrass were projected to affect sagebrush ecosystems across the region and within local evaluation areas. These changes resulted in projected vulnerability of GRSG. At the local level, landscape connections between the Strawberry PAC and other GRSG populations contributed to greater projected resilience of GRSG populations there. The authors presented a model framework that incorporates multiple stressors to assess potential future habitat vulnerability, which can assist decision makers in determining species vulnerability and inform management plans. ',
 'datetime': '2018-05-10T17:53:29.911165',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=2',
 'target': 'rrl:da534ed6dbf0b10c8fab673c9d576aee',
 'type': 'Implications'}

{'_id': ObjectId('5af4871a0601ba168f80ea64'),
 'body': ['Conifer expansion',
  'New geospatial data',
  'Energy development',
  'Fire or fuel breaks',
  'Nonnative invasive plants',
  'Weather and climate'],
 'datetime': '2018-05-10T17:53:29.911165',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=2',
 'target': 'rrl:da534ed6dbf0b10c8fab673c9d576aee',
 'type': 'Topics'}

{'_id': ObjectId('5af4871b0601ba168f80ea65'),
 'body': 'Pinyon pine and juniper woodlands have expanded substantially in the last 150 years, with most of that expansion occurring in sagebrush steppe ecosystems. Management treatments to control woodland expansion have been implemented since the 1950s with goals of improving watershed function, increasing livestock forage, and restoring wildlife habitat. GRSG are sensitive to the presence of conifers and may avoid areas and abandon leks as tree cover increases. Increased conifer cover has been associated with decreases in sagebrush, grasses, and forbs, which provide food and cover for GRSG.',
 'datetime': '2018-05-10T17:53:30.500782',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=3',
 'target': 'rrl:4648b1d47f0b7da63b6839479e152504',
 'type': 'Background'}

{'_id': ObjectId('5af4871b0601ba168f80ea66'),
 'body': 'Authors analyzed datasets from previous and ongoing experimental conifer treatment studies to compare how forb cover was affected by (1) fire treatments (fuel reduction, prescribed fire) and mechanical treatments (clear cutting, mastication) compared to control sites and (2) treatment in different phases of conifer expansion.',
 'datetime': '2018-05-10T17:53:30.500782',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=3',
 'target': 'rrl:4648b1d47f0b7da63b6839479e152504',
 'type': 'Objectives'}

{'_id': ObjectId('5af4871b0601ba168f80ea67'),
 'body': 'The authors analyzed forb cover (including annual and perennial forbs known to be consumed by GRSG) for 4â\x80\x938 years following treatment at 18 sites across the northern Great Basin. Sites included multiple woodland types (western juniper, singleleaf pinyon and Utah juniper, Utah juniper, and Utah juniper and Colorado pinyon) and all woodland expansion phases. Fuel reduction treatments consisted of burning cut trees and slash in the winter or spring with minimal other disturbance to the site. ',
 'datetime': '2018-05-10T17:53:30.500782',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=3',
 'target': 'rrl:4648b1d47f0b7da63b6839479e152504',
 'type': 'Methods'}

{'_id': ObjectId('5af4871b0601ba168f80ea68'),
 'body': 'California, southwestern Idaho, Nevada, eastern Oregon, Utah; MZ IV, MZ V',
 'datetime': '2018-05-10T17:53:30.500782',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=3',
 'target': 'rrl:4648b1d47f0b7da63b6839479e152504',
 'type': 'Location'}

{'_id': ObjectId('5af4871b0601ba168f80ea69'),
 'body': 'Treatment of pinyon and juniper woodlands produced variable responses in perennial and annual forbs that are forage for GRSG. Cover of perennial forbs consumed by GRSG was greater in most, but not all, sites after treatment, and responses were similar among treatment methods. Annual forbs consumed by GRSG benefited most from prescribed fire, with cover being higher in most sites where prescribed fire treatments were applied compared to both control sites and sites with other treatments. However, though annual forb cover was dominated by native species in the western juniper sites, it was dominated by nonnative species in the Wyoming big sagebrush sites. Treatments applied to woodlands in early phases of expansion tended to produce a greater forb response. Differences among sites, including vegetation composition and site potential, influenced conditions after treatment.',
 'datetime': '2018-05-10T17:53:30.500782',
 'source': 'http

{'_id': ObjectId('5af4871b0601ba168f80ea6a'),
 'body': 'Patchy fires may provide habitat benefits for GRSG, but application requires caution so that the fires do not become too large and are not applied in inappropriate habitats. Mechanical and fuel-reduction treatments applied in the early phases of woodland expansion may achieve benefits for forbs while also retaining the sagebrush structure of the system, but they will require follow-up treatments. All treatments should target those seasonal habitats in which forb availability may be limiting for GRSG and site potential is high. ',
 'datetime': '2018-05-10T17:53:30.500782',
 'source': 'https://apps.usgs.gov/gsgbib/5654e232e4b071e7ea53d6e1.php?page=1&id=3',
 'target': 'rrl:4648b1d47f0b7da63b6839479e152504',
 'type': 'Implications'}

# Fire Science
Another use case we are working on with Ecosystems is the Fire Science Bibliography, another existing resource that we are collaborating on to enhance through the RRL processing pipeline. Mark Miller has already done some similar work, and we're looking forward to comparing code.

The following code block runs the same faceting routine seen above for Sage Grouse but filters to the Fire Science sourced citations.

In [6]:
pipeline = [
    {"$unwind":{"path":"$Sources"}},
    {"$match":{"Sources.source":"https://www2.usgs.gov/ecosystems/environments/USGS%20Wildland%20Fire%20Science%20Publications.2007to2016.pdf"}},
    {"$facet":{
        "Link Content Negotiation":[{"$sortByCount":"$Link Metadata.Success"}],
        "Scopus":[{"$sortByCount":"$Scopus.Success"}]
        }  
    }
]

for record in collection_rrl.aggregate(pipeline):
    print ("Link Content Negotiation -", [r for r in record["Link Content Negotiation"] if r["_id"] is True][0]["count"], "/", [r for r in record["Link Content Negotiation"] if r["_id"] is False][0]["count"])
    try:
        print ("Scopus -", [r for r in record["Scopus"] if r["_id"] is True][0]["count"], "/", [r for r in record["Scopus"] if r["_id"] is False][0]["count"])
    except:
        pass


Link Content Negotiation - 696 / 311
Scopus - 655 / 133


In [8]:
display (collection_rrl.find_one({"$and":[{"Sources":{"$elemMatch":{"source":"https://www2.usgs.gov/ecosystems/environments/USGS%20Wildland%20Fire%20Science%20Publications.2007to2016.pdf"}}},{"Link Metadata.Success":True},{"Scopus.Success":True},{"CrossRef.Success":True}]}))

{'Citation String': 'Albano, Christine M, Michael Dettinger, and Christopher E Soulard. 2016. “Data on Influence of Atmospheric Rivers on Vegetation Productivity and Fire Patterns in the Southwestern US.” https://doi.org/10.5066/F71Z42KJ.',
 'CrossRef': {'Date Checked': '2018-05-24T13:27:00.497575',
  'Query URL': 'https://api.crossref.org/works?mailto=bcb@usgs.gov&query.bibliographic=Albano, Christine M, Michael Dettinger, and Christopher E Soulard. 2016. “Data on Influence of Atmospheric Rivers on Vegetation Productivity and Fire Patterns in the Southwestern US.” https://doi.org/10.5066/F71Z42KJ.',
  'Record': {'DOI': '10.1002/2016jg003608',
   'ISSN': ['2169-8953'],
   'URL': 'http://dx.doi.org/10.1002/2016jg003608',
   'archive': ['Portico'],
   'author': [{'ORCID': 'http://orcid.org/0000-0003-1610-6961',
     'affiliation': [{'name': 'Desert Research Institute; University of Nevada, Reno; Reno Nevada USA'}],
     'authenticated-orcid': False,
     'family': 'Albano',
     'given':

# Next Steps
There is still plenty to do here.

* Finish out and improve on the search routines to find records for citations where we don't have an easily discernible DOI
* Work on the additional discovery capacity that Google Scholar should provide plus understanding their citation index information
* Build out the Web of Science functions once we nail down the APIs we can legitimately use
* Review assembled information for both of these use cases, develop custom data models with generalizable aspects
* Develop the APIs to help serve future generations of web apps and other functionality for these use cases
* Work out better methods for dealing with assembling additional information on USGS Series Pubs

A little bit farther future work should tie these use cases into the text and data mining work we are doing with the GeoDeepDive project. An initial function there will be to just determine which of the references have already been processed through the GDD engine, setting up the full text for mining. For instance, most USGS Series Pubs are pulled into the GDD library, and new pubs harvested routinely. This will give us a starting point to see what all we can start working againm