CrossRef and their API give us a bunch of data to work with. For CrossRef records, I've found that the simple content negotiation method on the DOI link already returns everything that CrossRef's /works/ route offers, so there's no real value in retrieving that information again. This code works with anything in the registry that does not have a DOI. After initially trying some of the citation string parsing methods that I've worked with in the past, I came across [this post](https://www.crossref.org/labs/resolving-citations-we-dont-need-no-stinkin-parser/) that inspired me to just try a wide open search with the full and sometimes very messy citation strings to see what we could return. I've found that setting a search score threshold above 60 and taking the first record returned in the search seems to work pretty well, but we'll have to test and improve this over time.

In [1]:
from bis2 import dd
from bis import rrl

In [2]:
bis = dd.getDB("bis")
collection_rrl = bis["RRL"]

Eventually, this needs to all run on the message queing infrastructure we are looking to build where we'll run something to check the registry for appropriate records for this process, throw the citation string in as a message, run the process and write the results. In the meantime, this loop does that, firing off a function in the rrl module of the bis package where I put the logic. It looks for anything that didn't get a DOI confirmed/resolved from the content negotiation process and has also not yet tried to get a result from a CrossRef search.

In [4]:
count = 0
recordToCheck = {}

while recordToCheck is not None:
    recordToCheck = collection_rrl.find_one({"$and":[{"Link Metadata.Link Response.DOI":{"$exists":False}},{"CrossRef":{"$exists":False}}]})
    
    if recordToCheck is not None:
        count = count + 1
        print (count, collection_rrl.update_one({"_id":recordToCheck["_id"]},{"$set":{"CrossRef":rrl.ResearchReferenceLibrary.lookup_crossref(recordToCheck["Citation String"])}}))
    

1 <pymongo.results.UpdateResult object at 0x106b8dbd0>
2 <pymongo.results.UpdateResult object at 0x107400630>
3 <pymongo.results.UpdateResult object at 0x106adcaf8>
4 <pymongo.results.UpdateResult object at 0x106a95e58>
5 <pymongo.results.UpdateResult object at 0x1063645a0>
6 <pymongo.results.UpdateResult object at 0x106524f78>
7 <pymongo.results.UpdateResult object at 0x106afc510>
8 <pymongo.results.UpdateResult object at 0x106afc8b8>
9 <pymongo.results.UpdateResult object at 0x106b92558>
10 <pymongo.results.UpdateResult object at 0x106afc5e8>
11 <pymongo.results.UpdateResult object at 0x106afc510>
12 <pymongo.results.UpdateResult object at 0x106522048>
13 <pymongo.results.UpdateResult object at 0x10656b750>
14 <pymongo.results.UpdateResult object at 0x10638aaf8>
15 <pymongo.results.UpdateResult object at 0x106b11a20>
16 <pymongo.results.UpdateResult object at 0x1065221b0>
17 <pymongo.results.UpdateResult object at 0x106522168>
18 <pymongo.results.UpdateResult object at 0x106ae9d38>
1

147 <pymongo.results.UpdateResult object at 0x1074974c8>
148 <pymongo.results.UpdateResult object at 0x10749ddc8>
149 <pymongo.results.UpdateResult object at 0x106b11b40>
150 <pymongo.results.UpdateResult object at 0x107400900>
151 <pymongo.results.UpdateResult object at 0x107515c60>
152 <pymongo.results.UpdateResult object at 0x106b92900>
153 <pymongo.results.UpdateResult object at 0x10749d990>
154 <pymongo.results.UpdateResult object at 0x107426048>
155 <pymongo.results.UpdateResult object at 0x106b19b88>
156 <pymongo.results.UpdateResult object at 0x107400750>
157 <pymongo.results.UpdateResult object at 0x107495870>
158 <pymongo.results.UpdateResult object at 0x106b19048>
159 <pymongo.results.UpdateResult object at 0x107515ca8>
160 <pymongo.results.UpdateResult object at 0x106b92480>
161 <pymongo.results.UpdateResult object at 0x106b196c0>
162 <pymongo.results.UpdateResult object at 0x107426798>
163 <pymongo.results.UpdateResult object at 0x107515b40>
164 <pymongo.results.UpdateResu

291 <pymongo.results.UpdateResult object at 0x106ab5bd0>
292 <pymongo.results.UpdateResult object at 0x106b2a288>
293 <pymongo.results.UpdateResult object at 0x1074a0870>
294 <pymongo.results.UpdateResult object at 0x106ab5a20>
295 <pymongo.results.UpdateResult object at 0x1074974c8>
296 <pymongo.results.UpdateResult object at 0x106b11ee8>
297 <pymongo.results.UpdateResult object at 0x107400ee8>
298 <pymongo.results.UpdateResult object at 0x107497558>
299 <pymongo.results.UpdateResult object at 0x106b38ab0>
300 <pymongo.results.UpdateResult object at 0x1074a0b88>
301 <pymongo.results.UpdateResult object at 0x1074a0240>
302 <pymongo.results.UpdateResult object at 0x106b11948>
303 <pymongo.results.UpdateResult object at 0x106b11b88>
304 <pymongo.results.UpdateResult object at 0x106b11e58>
305 <pymongo.results.UpdateResult object at 0x107495a68>
306 <pymongo.results.UpdateResult object at 0x106b11c60>
307 <pymongo.results.UpdateResult object at 0x106ac6558>
308 <pymongo.results.UpdateResu