## Find publishing authors in African countries
- Uses the [ADS API](https://github.com/adsabs/adsabs-dev-api)
- Uses the unoffical [Python wrapper for this API](https://ads.readthedocs.io/en/latest/)
- Still needs a lot of sorting at the end to remove duplicates 

In [50]:
import ads
import pandas as pd 

#There are better ways to do this - set an environment variable
ads.config.token = INCLUDE YOUR TOKEN HERE


In [51]:
# Read in the country list - this is all countries except South Africa

with open('ListCountries-SA.txt') as f:
    country_list = f.read().splitlines()
    
print(country_list)

#country_list = ["Nigeria","Burkina Faso"]

['Nigeria', 'Ethiopia', 'Egypt', 'Democratic Republic of the Congo', 'Republic of the Congo', 'DR Congo', 'DRC', 'The Congo', 'Congo', 'Tanzania', 'Kenya', 'Sudan', 'Algeria', 'Uganda', 'Morocco', 'Mozambique', 'Ghana', 'Angola', 'Ivory Coast', 'Madagascar', 'Cameroon', 'Niger', 'Burkina Faso', 'Mali', 'Malawi', 'Zambia', 'Somalia', 'Senegal', 'Chad', 'Zimbabwe', 'South Sudan', 'Rwanda', 'Tunisia', 'Guinea', 'Benin', 'Burundi', 'Togo', 'Eritrea', 'Sierra Leone', 'Libya', 'Central African Republic', 'Liberia', 'Mauritania', 'Namibia', 'Botswana', 'Gambia', 'Equatorial Guinea', 'Lesotho', 'Gabon', 'Guinea-Bissau', 'Guinea Bissau', 'Mauritius', 'Swaziland', 'Djibouti', 'Reunion', 'Réunion', 'Comoros', 'Cape Verde', 'Western Sahara', 'Mayotte', 'Sao Tome and Principe', 'São Tomé and Principe', 'Seychelles', 'Saint Helena, Ascension and Tristan da Cunha', 'Saint Helena', 'Ascension', 'Tristan da Cunha']


### Running the query
For each country, query all refereed articles between 2013-2018. Return authors, doi, title, affiliation and bibcode.
Then keep only *those authors* with **African affiliations**

In [52]:

# Set up empty lists to store the results
african_authors = []
african_dois = []
african_bibcodes = []
african_countries = []
african_years = []


for country in country_list:
    
    print(country)
    
    #Run the query
    qstring = "aff:"+country+" database:astronomy property:refereed year:2013-2018 doctype:article"
    
    
    # I think we need to check what the default number of rows returned is. It's 10 - just upped to 3000
    # because South Africa has ~2900 returns in this timeframe
    
    # Update - 3000 is over the limit so I have to search in batches 2013 - 2015 and 2016 - 2018
    
    q = ads.SearchQuery(q=qstring, fl="doi,author,title,aff,bibcode,year", rows=3000)
    
    for paper in q:
        # this bit of gobbledygook matches for the specific country in the individual author affiliations
        my_indices = [i for i, s in enumerate(paper.aff) if country in s]
    
        for i in my_indices:
            # now print the paper DOIs and the author names that are Sudanese
            # print(paper.doi, country,paper.author[i], paper.bibcode)
        
            #And append them to the african_authors list
            african_authors.append(paper.author[i])
            african_dois.append(paper.doi)
            african_bibcodes.append(paper.bibcode)
            african_countries.append(country)
            african_years.append(paper.year)

Nigeria


  "Setting this query's rows to {}".format(self.query['rows']))


Ethiopia
Egypt
Democratic Republic of the Congo
Republic of the Congo
DR Congo
DRC
The Congo
Congo
Tanzania
Kenya
Sudan
Algeria
Uganda
Morocco
Mozambique
Ghana
Angola
Ivory Coast
Madagascar
Cameroon
Niger
Burkina Faso
Mali
Malawi
Zambia
Somalia
Senegal
Chad
Zimbabwe
South Sudan
Rwanda
Tunisia
Guinea
Benin
Burundi
Togo
Eritrea
Sierra Leone
Libya
Central African Republic
Liberia
Mauritania
Namibia
Botswana
Gambia
Equatorial Guinea
Lesotho
Gabon
Guinea-Bissau
Guinea Bissau
Mauritius
Swaziland
Djibouti
Reunion
Réunion
Comoros
Cape Verde
Western Sahara
Mayotte
Sao Tome and Principe
São Tomé and Principe
Seychelles
Saint Helena, Ascension and Tristan da Cunha
Saint Helena
Ascension
Tristan da Cunha


In [53]:
q.response.get_ratelimits()



{'limit': '5000', 'remaining': '4904', 'reset': '1540277180'}

In [54]:
# Now convert these lists into data frames

d = {'Authors': african_authors, 'DOI' : african_dois, 'Bibcode' : african_bibcodes, 'Country' : african_countries,
     'Year': african_years}

df = pd.DataFrame(d)

Unnamed: 0,Authors,Bibcode,Country,DOI,Year
0,"Onah, C. I.",2014JApA...35..619O,Nigeria,[10.1007/s12036-014-9311-z],2014
1,"Ubachukwu, A. A.",2014JApA...35..619O,Nigeria,[10.1007/s12036-014-9311-z],2014
2,"Odo, F. C.",2014JApA...35..619O,Nigeria,[10.1007/s12036-014-9311-z],2014
3,"Ogbodo, C. S.",2017MNRAS.469.4788O,Nigeria,[10.1093/mnras/stx1154],2017
4,"Chibueze, J. O.",2017MNRAS.469.4788O,Nigeria,[10.1093/mnras/stx1154],2017
5,"Ubachukwu, A. A.",2017MNRAS.469.4788O,Nigeria,[10.1093/mnras/stx1154],2017
6,"Eze, R. N. C.",2017MNRAS.469.4788O,Nigeria,[10.1093/mnras/stx1154],2017
7,"Odo, F. C.",2014Ap&SS.349..939O,Nigeria,[10.1007/s10509-013-1694-9],2014
8,"Chukwude, A. E.",2014Ap&SS.349..939O,Nigeria,[10.1007/s10509-013-1694-9],2014
9,"Ubachukwu, A. A.",2014Ap&SS.349..939O,Nigeria,[10.1007/s10509-013-1694-9],2014


### Removing duplicates
Might be able to do a first pass with `pd.DataFrame.drop_duplicates` but this won't deal with slight changes in names.

In [55]:
unique_authors = df.drop_duplicates(subset='Authors')
unique_authors.sort_values('Authors')

Unnamed: 0,Authors,Bibcode,Country,DOI,Year
2321,"Abada, Abdessamad",2013PhRvD..88a6006A,Algeria,[10.1103/PhysRevD.88.016006],2013
2725,"Abate Essi, Jean Marcel",2018EP&S...70...42M,Cameroon,[10.1186/s40623-018-0812-x],2018
752,"Abbas, Abbas M.",2016JAsGe...5..147A,Egypt,[10.1016/j.nrjag.2016.01.003],2016
36,"Abbas, M.",2017JASTP.164..203B,Nigeria,[10.1016/j.jastp.2017.08.025],2017
1766,"Abbas, Mahmoud Ahmed",2014GeoJI.199.1625A,Egypt,[10.1093/gji/ggu354],2014
1816,"Abd Allah, Sabry",2016EP&S...68...76A,Egypt,[10.1186/s40623-016-0443-z],2016
1525,"Abd Allah, Saud A.",2014JAsGe...3...18M,Egypt,[10.1016/j.nrjag.2014.02.001],2014
737,"Abd EL-Razek, Enas M.",2018JAsGe...7..162A,Egypt,[10.1016/j.nrjag.2017.12.002],2018
1350,"Abd El Aziz, Mohamed",2017ExA....43..131S,Egypt,[10.1007/s10686-017-9524-7],2017
1321,"Abd El-Bar, S. E.",2014Ap&SS.350..507A,Egypt,[10.1007/s10509-014-1800-7],2014


In [56]:
unique_authors.count(0)

Authors    1671
Bibcode    1671
Country    1671
DOI        1649
Year       1671
dtype: int64

In [57]:
# Write out to CSV file
unique_authors.to_csv("Unique_authors_restofAfrica.csv", index = False, sep="|")
