In [1]:
import pandas as pd
from IPython.display import clear_output
import numpy as np

# Applying DOI and MagIC Link search functions

In the case that you have many literature references that include a title, author, year, journal, volume, and pages but do not have the DOI
or MagIC Earthref Data DOI contribution link these functions can assist in finding these missing identifiers. 

Many are familiar with the reference DOI assigned to most published papers. The format we will make use of here will start with the 10 ('10.1029/A1B2C3'). 
We are interested in the MagIC link that relates to the rock magnetic data so that one can easily find the data associated with a paper. The http links produced by these functions are the Earthref Data DOI links related to contributions (http://dx.doi.org/10.7288/V4/MAGIC/19904).

These functions used here in this notebook are found in the module "dl_search". 

In [2]:
import dl_search as srch   # searching functions

## Functions and their uses
See the documentation contained in the dl_search.py file or using help()

In [3]:
help( srch.magic_link_from_doi )

Help on function magic_link_from_doi in module dl_search:

magic_link_from_doi(doi)
    This uses the earthref MagIC api to search for a magic contribution using a doi. 
    Adapted from pmagpy's "ipmag.download_magic_from_doi()"
    
    Input: 
        doi: str beginning with '10.'
        
    Output: 
        earthref_doi_link: http link if found
                           NaN if not found in MagIC database, try using magic_link_from_title to search the title in MagIC



### magic_link_from_doi 

In [4]:
doi = '10.1029/2021GC009990'
srch.magic_link_from_doi(doi)

'http://dx.doi.org/10.7288/V4/MAGIC/17452'

In [5]:
doi = '10.1029/2019GC008728'
srch.magic_link_from_doi(doi)

'http://dx.doi.org/10.7288/V4/MAGIC/16709'

### magic_link_from_title

In [6]:
title = "Geomagnetic field intensity between 70 000 and 130 000 years B.P. from a volcanic sequence on La Réunion, Indian Ocean"
#id = '19405'

srch.magic_link_from_title(title)

'http://dx.doi.org/10.7288/V4/MAGIC/19405'

### get_doi
Examples

In [7]:
title = 'Geomagnetic field intensity between 70 000 and 130 000 years B.P. from a volcanic sequence on La Reunion, Indian Ocean.' 
srch.get_doi(title)

'10.1016/0012-821x(96)00024-6'

### These can be used in tandem when magic_link_from_title cannot find the associated link

In [8]:
title = 'Evidence of anomalously weak geomagnetic field during Matuyama reversed epoch'
srch.magic_link_from_title(title)   # no content found

nan

The above does not work. So then we will try:

In [9]:
title = 'Evidence of anomalously weak geomagnetic field during Matuyama reversed epoch'
srch.magic_link_from_title2(title)

'http://dx.doi.org/10.7288/V4/MAGIC/18775'

Which is succesful.

## Working Example: Adding DOIs and Earthref Data DOI links to a spreadsheet of references of \*paper titles

\*Paper in the case of this file, may inclue pre-finalized paper titles.

In [10]:
# import references file with missing DOIs and MagIC links
pint_refs = pd.read_excel('../testdata/PINT_References_magiclinks.xlsx')
pint_refs_dois = pint_refs.loc[pint_refs['DOI'].isna() == False].reset_index()    # collect data rows that have DOIs 
pint_refs_no_dois = pint_refs.loc[pint_refs['DOI'].isna() == True].reset_index()
# print dataframe with this selection
pd.set_option('display.max_rows', 5)
print(len(pint_refs))
pint_refs_dois    # 36 have DOIs out of the 413 rows of references, we'll keep this for later
pint_refs_no_dois  # 377 have none

413


Unnamed: 0,index,REFNO,AUTHORS,YEAR,TITLE,JOURNAL,VOL,PAGES,DOI,MagIC
0,0,1,"Aoki, Y., Kase, H., Ishibashi, K., Kinoshita, H.",1971,Evidence of anomalously weak geomagnetic field...,J. Geomag. Geoelect.,23,129-132,,
1,1,2,"Bagina, O.L., Minasyan, D.O., Petrova, G.N.",1976,Determination of the intensity of the ancient ...,Izv. Akad. Nauk. (in Russian),2,81-86,,
...,...,...,...,...,...,...,...,...,...,...
375,395,766,"Shcherbakova V.V., Zhidkov G.A., Pavlov V.E..,...",2004,The paleointensity determinations on Early Pro...,Kazan',,61-66,,
376,397,768,"Goguitchaichvili A., Valdivia L.A., Morales J....",2000,New contributions to the Early Pliocene geomag...,Geofisica Int.,39,277-284,,


In [11]:
pint_refs_no_dois # with the section of the table with missing DOIs only
pint_refs_no_dois_titles = pint_refs_no_dois['TITLE']    # assign colums to variables

dois = []    # container for DOIs found
for i in range(len(pint_refs_no_dois_titles)):   # for each index of the number of references (413)
    dois.append(srch.get_doi(pint_refs_no_dois_titles[i]))   # apply get_doi
    clear_output(wait=True)
    print( i+1,'/{n:5d}'.format(n=len(pint_refs_no_dois_titles)) )   # prints progress, takes ~5 min to run

377 /  377


In [12]:
dois # view the list of DOIs
# set list of DOIs to a new name
dois_n = dois
dois_n[0]

'10.5636/jgg.23.129'

In [13]:
# fill a new column in our dataframe with DOIs found
pint_refs_no_dois['DOI_found'] = dois_n

# view
pd.set_option('display.max_rows', 5)
pint_refs_no_dois

Unnamed: 0,index,REFNO,AUTHORS,YEAR,TITLE,JOURNAL,VOL,PAGES,DOI,MagIC,DOI_found
0,0,1,"Aoki, Y., Kase, H., Ishibashi, K., Kinoshita, H.",1971,Evidence of anomalously weak geomagnetic field...,J. Geomag. Geoelect.,23,129-132,,,10.5636/jgg.23.129
1,1,2,"Bagina, O.L., Minasyan, D.O., Petrova, G.N.",1976,Determination of the intensity of the ancient ...,Izv. Akad. Nauk. (in Russian),2,81-86,,,
...,...,...,...,...,...,...,...,...,...,...,...
375,395,766,"Shcherbakova V.V., Zhidkov G.A., Pavlov V.E..,...",2004,The paleointensity determinations on Early Pro...,Kazan',,61-66,,,
376,397,768,"Goguitchaichvili A., Valdivia L.A., Morales J....",2000,New contributions to the Early Pliocene geomag...,Geofisica Int.,39,277-284,,,10.22201/igeof.00167169p.2000.39.3.331


In [14]:
# now to find the MagIC Links
magic_links = []     # MagIC link container
for i in range(len(dois_n)): 
    magic_links.append(srch.magic_link_from_doi(dois_n[i]))
    clear_output(wait=True)
    print( i+1,'/{n:5d}'.format(n=len(dois_n)) )   # prints progress, takes ~5 min to run

377 /  377


In [15]:
magic_links  # view the list of MagIC link

# set list to a new name
pint_refs_no_dois['MagIC_found'] = magic_links
pd.set_option('display.max_rows', 5)
pint_refs_no_dois

Unnamed: 0,index,REFNO,AUTHORS,YEAR,TITLE,JOURNAL,VOL,PAGES,DOI,MagIC,DOI_found,MagIC_found
0,0,1,"Aoki, Y., Kase, H., Ishibashi, K., Kinoshita, H.",1971,Evidence of anomalously weak geomagnetic field...,J. Geomag. Geoelect.,23,129-132,,,10.5636/jgg.23.129,http://dx.doi.org/10.7288/V4/MAGIC/18775
1,1,2,"Bagina, O.L., Minasyan, D.O., Petrova, G.N.",1976,Determination of the intensity of the ancient ...,Izv. Akad. Nauk. (in Russian),2,81-86,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...
375,395,766,"Shcherbakova V.V., Zhidkov G.A., Pavlov V.E..,...",2004,The paleointensity determinations on Early Pro...,Kazan',,61-66,,,,
376,397,768,"Goguitchaichvili A., Valdivia L.A., Morales J....",2000,New contributions to the Early Pliocene geomag...,Geofisica Int.,39,277-284,,,10.22201/igeof.00167169p.2000.39.3.331,


### Update references with initial DOIs
Now with the dataframe from earlier, with some found DOIs, we'll search for their corresponding links to their MagIC contribution. We can apply the *magic_link_from_doi* function to the part of our data that had DOIs already. 

In [16]:
print( len(pint_refs_dois) )
pd.set_option('display.max_rows', 5)
pint_refs_dois

36


Unnamed: 0,index,REFNO,AUTHORS,YEAR,TITLE,JOURNAL,VOL,PAGES,DOI,MagIC
0,375,745,"Abdulghafur, F., Bowles, J.A.",2019,Absolute Paleointensity Study of Miocene Tiva ...,"Geochemistry, Geophysics, Geosystems",20,5818–5830,https://doi.org/10.1029/2019GC008728,
1,376,746,"Sánchez-Moreno, E.M., Calvo-Rathert, M., Gogui...",2019,Weak palaeointensity results over a Pliocene v...,Geophysical Journal International,220,1604–1618,https://doi.org/10.1093/gji/ggz533,
...,...,...,...,...,...,...,...,...,...,...
34,411,782,"Tauxe, L., Asefaw, H., Behar, N., Koppers, A. ...",2022,Paleointensity Estimates From the Pleistocene ...,"Geochemistry, Geophysics, Geosystems",23,e2022GC010473,https://doi.org/10.1029/2022GC010473,
35,412,783,"di Chiara, A., Tauxe, L., Staudigel, H., Flori...",2021,Earth's Magnetic Field Strength and the Cretac...,"Geochemistry, Geophysics, Geosystems",22,e2020GC009605,https://doi.org/10.1029/2020GC009605,


For the 36 PINT references that already have a listed DOI, some have the "doi.org" attached to it but these functions only read in the number ID. The code below secludes the DOI number. 

In [17]:
# seclude doi number as a str
df_dois = pint_refs_dois['DOI'].values
doi_vals = []
for i in range(len(df_dois)): 
    if len( df_dois[i].split('.org/') ) == 2:   # if the column value has doi.org + id number 
        doi_vals.append(df_dois[i].split('.org/')[1])
    elif len( df_dois[i].split('.org/') ) == 1:  # if the column value has just the id number
        doi_vals.append(df_dois[i].split('.org/')[0])

In [18]:
#doi_vals # check the values 

Now with our array with everything from our "DOI" column in the same format, we can run the search function.

In [19]:
# send DOI number to be searched in MagIC
magic_links = []
for i in range(len(doi_vals)): 
    magic_links.append(srch.magic_link_from_doi(doi_vals[i])) # takes ~5 mins

In [20]:
pint_refs_dois['MagIC_found'] = magic_links  # create a new column for our new MagIC links

pd.set_option('display.max_rows', 5)
pint_refs_dois  # no need to worry about the warning for now 

Unnamed: 0,index,REFNO,AUTHORS,YEAR,TITLE,JOURNAL,VOL,PAGES,DOI,MagIC,MagIC_found
0,375,745,"Abdulghafur, F., Bowles, J.A.",2019,Absolute Paleointensity Study of Miocene Tiva ...,"Geochemistry, Geophysics, Geosystems",20,5818–5830,https://doi.org/10.1029/2019GC008728,,http://dx.doi.org/10.7288/V4/MAGIC/16709
1,376,746,"Sánchez-Moreno, E.M., Calvo-Rathert, M., Gogui...",2019,Weak palaeointensity results over a Pliocene v...,Geophysical Journal International,220,1604–1618,https://doi.org/10.1093/gji/ggz533,,http://dx.doi.org/10.7288/V4/MAGIC/17131
...,...,...,...,...,...,...,...,...,...,...,...
34,411,782,"Tauxe, L., Asefaw, H., Behar, N., Koppers, A. ...",2022,Paleointensity Estimates From the Pleistocene ...,"Geochemistry, Geophysics, Geosystems",23,e2022GC010473,https://doi.org/10.1029/2022GC010473,,http://dx.doi.org/10.7288/V4/MAGIC/19491
35,412,783,"di Chiara, A., Tauxe, L., Staudigel, H., Flori...",2021,Earth's Magnetic Field Strength and the Cretac...,"Geochemistry, Geophysics, Geosystems",22,e2020GC009605,https://doi.org/10.1029/2020GC009605,,http://dx.doi.org/10.7288/V4/MAGIC/16869


Now to combine our two partial dataframes back into one dataframe ready for export

In [21]:
# combining the two dataframes
pint_refs_links_filled = pd.concat([pint_refs_no_dois, pint_refs_dois])

pd.set_option('display.max_rows', 5)
pint_refs_links_filled

Unnamed: 0,index,REFNO,AUTHORS,YEAR,TITLE,JOURNAL,VOL,PAGES,DOI,MagIC,DOI_found,MagIC_found
0,0,1,"Aoki, Y., Kase, H., Ishibashi, K., Kinoshita, H.",1971,Evidence of anomalously weak geomagnetic field...,J. Geomag. Geoelect.,23,129-132,,,10.5636/jgg.23.129,http://dx.doi.org/10.7288/V4/MAGIC/18775
1,1,2,"Bagina, O.L., Minasyan, D.O., Petrova, G.N.",1976,Determination of the intensity of the ancient ...,Izv. Akad. Nauk. (in Russian),2,81-86,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...
34,411,782,"Tauxe, L., Asefaw, H., Behar, N., Koppers, A. ...",2022,Paleointensity Estimates From the Pleistocene ...,"Geochemistry, Geophysics, Geosystems",23,e2022GC010473,https://doi.org/10.1029/2022GC010473,,,http://dx.doi.org/10.7288/V4/MAGIC/19491
35,412,783,"di Chiara, A., Tauxe, L., Staudigel, H., Flori...",2021,Earth's Magnetic Field Strength and the Cretac...,"Geochemistry, Geophysics, Geosystems",22,e2020GC009605,https://doi.org/10.1029/2020GC009605,,,http://dx.doi.org/10.7288/V4/MAGIC/16869


Depending on the paper titles included in our original spreadsheet (some errors may include: spelling, if they differ from the published paper title, if they aren't in MagIC, no assigned Earthref Data DOI etc.), the DOI and MagIC search results may vary. 

This is a summary of our results from this test case:

Out of our 413 refrences, we started off with: 
- 36 DOI links or IDs
- 0 working MagIC Earthref Data DOI links

In [22]:
print( pint_refs_links_filled['DOI_found'].describe() )
print( pint_refs_links_filled['MagIC_found'].describe() )

count                  320
unique                 319
top       10.1360/02yd9092
freq                     2
Name: DOI_found, dtype: object
count                                          222
unique                                         222
top       http://dx.doi.org/10.7288/V4/MAGIC/18775
freq                                             1
Name: MagIC_found, dtype: object


This tells us we now have: 
- 320 DOI ID links (77% with 93 NaN) 
- 222 MagIC EarthRef Data DOI links (54% with 191 NaN)

To export this file as an excel sheet into our 'testdata' folder: 

In [23]:
pint_refs_links_filled.to_excel('../testdata/PINT_references_ex.xlsx', index=False)