This notebook explores using content negotiation from the overall DOI registration process across registration agents to retrieve basic metadata where we have links. We'll also try this with other types of links recorded as part of registering research references to see what all might be possible with this basic approach.

In [1]:
import requests
from IPython.display import display
from bis2 import dd
from bis import rrl

In [2]:
bis = dd.getDB("bis")
collection_rrl = bis["RRL"]

# Notes
I ended up putting all of the very simple logic for this into a function in the rrl module of the bis package. I found that most if not all of the information coming back from content negotiation on DOI links is potentially useful. It does provide everything needed to pull out the citation parts and dig for further information. DOIs issued by CrossRef, in particular, have a ton of useful stuff, including all of the references from a publication in some instances. It may be that what we get from DOI content negotiation in those cases is sufficient and no further CrossRef search will be needed.

I did find some odd cases where we have a DOI link that does appropriately de-reference to a URL for a landing page (particularly to FigShare) but the structured information returned through content negotiation on the original link does not have any metadata to speak of (e.g., not even a title). When I go resolve those and get to the landing page, I found a different DOI for the publication. We'll have to explore this further to see if there's a way to retrieve the "correct" DOI reference for these cases.

In [5]:
count = 0
recordToCheck = {}
while recordToCheck is not None:
    recordToCheck = collection_rrl.find_one({"$and":[{"url":{"$exists":True}},{"Link Metadata":{"$exists":False}}]})
    
    if recordToCheck is not None:
        count = count + 1
        print (count, collection_rrl.update_one({"_id":recordToCheck["_id"]},{"$set":{"Link Metadata":rrl.ResearchReferenceLibrary.ref_link_data(recordToCheck["url"])}}))


1 <pymongo.results.UpdateResult object at 0x1042255e8>
2 <pymongo.results.UpdateResult object at 0x10390a090>
3 <pymongo.results.UpdateResult object at 0x1038f6ea0>
4 <pymongo.results.UpdateResult object at 0x1038f63a8>
5 <pymongo.results.UpdateResult object at 0x10319eaf8>
6 <pymongo.results.UpdateResult object at 0x1035d2f78>
7 <pymongo.results.UpdateResult object at 0x103958480>
8 <pymongo.results.UpdateResult object at 0x104225630>
9 <pymongo.results.UpdateResult object at 0x10420a708>
10 <pymongo.results.UpdateResult object at 0x1038f6f78>
11 <pymongo.results.UpdateResult object at 0x10390aab0>
12 <pymongo.results.UpdateResult object at 0x1038f6990>
13 <pymongo.results.UpdateResult object at 0x10420af78>
14 <pymongo.results.UpdateResult object at 0x10420a870>
15 <pymongo.results.UpdateResult object at 0x103049c60>
16 <pymongo.results.UpdateResult object at 0x10420a900>
17 <pymongo.results.UpdateResult object at 0x10399dea0>
18 <pymongo.results.UpdateResult object at 0x10399dee8>
1

147 <pymongo.results.UpdateResult object at 0x103997f30>
148 <pymongo.results.UpdateResult object at 0x1042437e0>
149 <pymongo.results.UpdateResult object at 0x10399daf8>
150 <pymongo.results.UpdateResult object at 0x1038ffdc8>
151 <pymongo.results.UpdateResult object at 0x104225af8>
152 <pymongo.results.UpdateResult object at 0x1038ffaf8>
153 <pymongo.results.UpdateResult object at 0x103997630>
154 <pymongo.results.UpdateResult object at 0x103904120>
155 <pymongo.results.UpdateResult object at 0x103904e10>
156 <pymongo.results.UpdateResult object at 0x104242360>
157 <pymongo.results.UpdateResult object at 0x103904e10>
158 <pymongo.results.UpdateResult object at 0x10399daf8>
159 <pymongo.results.UpdateResult object at 0x10399de10>
160 <pymongo.results.UpdateResult object at 0x103997990>
161 <pymongo.results.UpdateResult object at 0x10424e0d8>
162 <pymongo.results.UpdateResult object at 0x10399df78>
163 <pymongo.results.UpdateResult object at 0x10399dd38>
164 <pymongo.results.UpdateResu

291 <pymongo.results.UpdateResult object at 0x104225678>
292 <pymongo.results.UpdateResult object at 0x104241b88>
293 <pymongo.results.UpdateResult object at 0x1038ff2d0>
294 <pymongo.results.UpdateResult object at 0x1039043a8>
295 <pymongo.results.UpdateResult object at 0x103908828>
296 <pymongo.results.UpdateResult object at 0x1039087e0>
297 <pymongo.results.UpdateResult object at 0x1038ffe10>
298 <pymongo.results.UpdateResult object at 0x1042416c0>
299 <pymongo.results.UpdateResult object at 0x1042416c0>
300 <pymongo.results.UpdateResult object at 0x1039041f8>
301 <pymongo.results.UpdateResult object at 0x1039043f0>
302 <pymongo.results.UpdateResult object at 0x103904048>
303 <pymongo.results.UpdateResult object at 0x103904048>
304 <pymongo.results.UpdateResult object at 0x104241fc0>
305 <pymongo.results.UpdateResult object at 0x104242678>
306 <pymongo.results.UpdateResult object at 0x103997f78>
307 <pymongo.results.UpdateResult object at 0x104242ca8>
308 <pymongo.results.UpdateResu

435 <pymongo.results.UpdateResult object at 0x103904900>
436 <pymongo.results.UpdateResult object at 0x103904900>
437 <pymongo.results.UpdateResult object at 0x1039040d8>
438 <pymongo.results.UpdateResult object at 0x1039040d8>
439 <pymongo.results.UpdateResult object at 0x103997798>
440 <pymongo.results.UpdateResult object at 0x104262750>
441 <pymongo.results.UpdateResult object at 0x1039088b8>
442 <pymongo.results.UpdateResult object at 0x1042625a0>
443 <pymongo.results.UpdateResult object at 0x103904120>
444 <pymongo.results.UpdateResult object at 0x103997ee8>
445 <pymongo.results.UpdateResult object at 0x10424ee58>
446 <pymongo.results.UpdateResult object at 0x1039087e0>
447 <pymongo.results.UpdateResult object at 0x103908b40>
448 <pymongo.results.UpdateResult object at 0x104262af8>
449 <pymongo.results.UpdateResult object at 0x10427cdc8>
450 <pymongo.results.UpdateResult object at 0x10427c7e0>
451 <pymongo.results.UpdateResult object at 0x1042428b8>
452 <pymongo.results.UpdateResu

579 <pymongo.results.UpdateResult object at 0x10424e5a0>
580 <pymongo.results.UpdateResult object at 0x10424eea0>
581 <pymongo.results.UpdateResult object at 0x10425be10>
582 <pymongo.results.UpdateResult object at 0x104246fc0>
583 <pymongo.results.UpdateResult object at 0x104246048>
584 <pymongo.results.UpdateResult object at 0x10424e7e0>
585 <pymongo.results.UpdateResult object at 0x104262168>
586 <pymongo.results.UpdateResult object at 0x104262168>
587 <pymongo.results.UpdateResult object at 0x104225318>
588 <pymongo.results.UpdateResult object at 0x104225318>
589 <pymongo.results.UpdateResult object at 0x104225318>
590 <pymongo.results.UpdateResult object at 0x10426d900>
591 <pymongo.results.UpdateResult object at 0x104262288>
592 <pymongo.results.UpdateResult object at 0x1042461f8>
593 <pymongo.results.UpdateResult object at 0x10424e948>
594 <pymongo.results.UpdateResult object at 0x10424ea20>
595 <pymongo.results.UpdateResult object at 0x1042620d8>
596 <pymongo.results.UpdateResu

723 <pymongo.results.UpdateResult object at 0x10427c558>
724 <pymongo.results.UpdateResult object at 0x10424ea68>
725 <pymongo.results.UpdateResult object at 0x10427c630>
726 <pymongo.results.UpdateResult object at 0x10427c630>
727 <pymongo.results.UpdateResult object at 0x104246750>
728 <pymongo.results.UpdateResult object at 0x10427c318>
729 <pymongo.results.UpdateResult object at 0x10427cd38>
730 <pymongo.results.UpdateResult object at 0x1042466c0>
731 <pymongo.results.UpdateResult object at 0x1042415a0>
732 <pymongo.results.UpdateResult object at 0x1042415a0>
733 <pymongo.results.UpdateResult object at 0x104246dc8>
734 <pymongo.results.UpdateResult object at 0x10426dd38>
735 <pymongo.results.UpdateResult object at 0x10426d360>
736 <pymongo.results.UpdateResult object at 0x10426d318>
737 <pymongo.results.UpdateResult object at 0x10427c120>
738 <pymongo.results.UpdateResult object at 0x10427c7e0>
739 <pymongo.results.UpdateResult object at 0x10426da68>
740 <pymongo.results.UpdateResu

867 <pymongo.results.UpdateResult object at 0x10427a048>
868 <pymongo.results.UpdateResult object at 0x10427cc18>
869 <pymongo.results.UpdateResult object at 0x1042411f8>
870 <pymongo.results.UpdateResult object at 0x10425fbd0>
871 <pymongo.results.UpdateResult object at 0x10425fbd0>
872 <pymongo.results.UpdateResult object at 0x10425f990>
873 <pymongo.results.UpdateResult object at 0x104243e10>
874 <pymongo.results.UpdateResult object at 0x10427c480>
875 <pymongo.results.UpdateResult object at 0x10427c480>
876 <pymongo.results.UpdateResult object at 0x104241f30>
877 <pymongo.results.UpdateResult object at 0x10427a318>
878 <pymongo.results.UpdateResult object at 0x104241120>
879 <pymongo.results.UpdateResult object at 0x104241ca8>
880 <pymongo.results.UpdateResult object at 0x104241a68>
881 <pymongo.results.UpdateResult object at 0x104246558>
882 <pymongo.results.UpdateResult object at 0x1042465a0>
883 <pymongo.results.UpdateResult object at 0x104243ca8>
884 <pymongo.results.UpdateResu

1011 <pymongo.results.UpdateResult object at 0x104243630>
1012 <pymongo.results.UpdateResult object at 0x10427ad38>
1013 <pymongo.results.UpdateResult object at 0x104243120>
1014 <pymongo.results.UpdateResult object at 0x104241ab0>
1015 <pymongo.results.UpdateResult object at 0x10427cd80>
1016 <pymongo.results.UpdateResult object at 0x104241510>
1017 <pymongo.results.UpdateResult object at 0x104241990>
1018 <pymongo.results.UpdateResult object at 0x10427c4c8>
1019 <pymongo.results.UpdateResult object at 0x104243168>
1020 <pymongo.results.UpdateResult object at 0x104241288>
1021 <pymongo.results.UpdateResult object at 0x104241cf0>
1022 <pymongo.results.UpdateResult object at 0x10425b900>
1023 <pymongo.results.UpdateResult object at 0x104246900>
1024 <pymongo.results.UpdateResult object at 0x104246900>
1025 <pymongo.results.UpdateResult object at 0x1079bd9d8>
1026 <pymongo.results.UpdateResult object at 0x10425bab0>
1027 <pymongo.results.UpdateResult object at 0x10427a8b8>
1028 <pymongo.

1153 <pymongo.results.UpdateResult object at 0x10427d3a8>
1154 <pymongo.results.UpdateResult object at 0x10427d510>
1155 <pymongo.results.UpdateResult object at 0x104246798>
1156 <pymongo.results.UpdateResult object at 0x1079bbc60>
1157 <pymongo.results.UpdateResult object at 0x1079bbb40>
1158 <pymongo.results.UpdateResult object at 0x1042693f0>
1159 <pymongo.results.UpdateResult object at 0x10427c1b0>
1160 <pymongo.results.UpdateResult object at 0x10427cd80>
1161 <pymongo.results.UpdateResult object at 0x104246750>
1162 <pymongo.results.UpdateResult object at 0x10427c3a8>
1163 <pymongo.results.UpdateResult object at 0x10427c6c0>
1164 <pymongo.results.UpdateResult object at 0x10425fab0>
1165 <pymongo.results.UpdateResult object at 0x10427ca20>
1166 <pymongo.results.UpdateResult object at 0x104242438>
1167 <pymongo.results.UpdateResult object at 0x104242cf0>
1168 <pymongo.results.UpdateResult object at 0x10427c3f0>
1169 <pymongo.results.UpdateResult object at 0x10427c3f0>
1170 <pymongo.

# Example
Here's an example of a single record and the information we are able to assemble when we have a DOI and it responds with metadata via content negotiation.

In [10]:
exampleRecord = collection_rrl.find_one({"$and":[{"Link Metadata":{"$exists":True}},{"url":{"$exists":True}},{"Link Metadata.Success":True}]})

print (exampleRecord["Citation String"])

display (exampleRecord["Link Metadata"]["Link Response"])

Apa, A.D., Thompson, T.R., and Reese, K.P., 2017, Juvenile greater sage-grouse survival, movements, and recruitment in Colorado: Journal of Wildlife Management, v. 81, no. 4, p. 652–668.


{'DOI': '10.1002/jwmg.21230',
 'ISSN': ['0022-541X'],
 'URL': 'http://dx.doi.org/10.1002/jwmg.21230',
 'author': [{'affiliation': [{'name': 'Colorado Division of Parks and Wildlife; 711 Independent Avenue Grand Junction CO 81505'}],
   'family': 'Apa',
   'given': 'Anthony D.',
   'sequence': 'first'},
  {'affiliation': [{'name': 'Department of Fish and Wildlife Sciences; University of Idaho; P. O. Box 441136 Moscow ID 83844 USA'}],
   'family': 'Thompson',
   'given': 'Thomas R.',
   'sequence': 'additional'},
  {'affiliation': [{'name': 'Department of Fish and Wildlife Sciences; University of Idaho; P. O. Box 441136 Moscow ID 83844 USA'}],
   'family': 'Reese',
   'given': 'Kerry P.',
   'sequence': 'additional'}],
 'container-title': 'The Journal of Wildlife Management',
 'container-title-short': 'Jour. Wild. Mgmt.',
 'content-domain': {'crossmark-restriction': False, 'domain': []},
 'created': {'date-parts': [[2017, 4, 7]],
  'date-time': '2017-04-07T10:33:27Z',
  'timestamp': 1491