The idea with this project was to determine the number of publishing astronomers on the African continent, and to see how the number of publications with African affiliations has increased over he recent past.
The data we used to make these figures were gathered by querying NASA's Astrophysics Data System for peer-reviewed astronomy articles in which any of the authors had affiliations that linked to one of the 54 countries in Africa. The search was further restricted by including only publications between 2013 and 2018.
The query used the new ADS API and the Python wrapper for this API
For ADS query syntax see this link.
For each country in our list, the query output is parsed to search for that specific country in each affiliation. We then keep track of the Author, DOI, bibcode and date for each article.
We used multiple spellings, as can be seen in the file `ListCountries.txt', as a catch all for different names for countries.
We convert the lists into a pandas DataFrame and then drop any duplicate authors.
The Jupyter notebook containing the query and an initial is in this repository
-
The ADS query returns a maximum of around 2800 lines, even if you set the number of rows larger than this. For that reason, we've had to run separate queries for South Africa (for each year). This is a problem that will affect any query that returns a large number of rows from the database.
-
The pandas
drop_duplicates
command will drop only literal duplicates, e.g. Carignan, C and Carignan, Claude will be viewed as different authors. We had to implement a final, manual check for duplicates.