Skip to content

Analyse citation data from Google Scholar

License

Notifications You must be signed in to change notification settings

YuLab-SMU/scholar

 
 

Repository files navigation

scholar

CRAN status R-CMD-check

The scholar R package provides functions to extract citation data from Google Scholar. In addition to retrieving basic information about a single scholar, the package also allows you to compare multiple scholars and predict future h-index values.

Installation

# from CRAN
install.packages("scholar")

# from GitHub
if(!requireNamespace('remotes')) install.packages("remotes")
remotes::install_github('YuLab-SMU/scholar')

Basic features

Individual scholars are referenced by a unique character string, which can be found by searching for an author and inspecting the resulting scholar homepage. For example, the profile of physicist Richard Feynman is located at http://scholar.google.com/citations?user=B7vSqZsAAAAJ and so his unique id is B7vSqZsAAAAJ.

Basic information on a scholar can be retrieved as follows:

# Define the id for Richard Feynman
id <- 'B7vSqZsAAAAJ'

# Get his profile and print his name
l <- get_profile(id)
l$name 

# Get his citation history, i.e. citations to his work in a given year 
get_citation_history(id)

# Get his publications (a large data frame)
get_publications(id)

Additional functions allow the user to query the publications list, e.g. get_num_articles, get_num_distinct_journals, get_oldest_article, get_num_top_journals. Note that Google doesn't explicit categorize publications as journal articles, book chapters, etc, and so journal or article in these function names is just a generic term for a publication.

Comparing scholars

You can also compare multiple scholars, as shown below. Note that these two particular scholars are rather prolific and these queries will take a very long time to run.

# Compare Feynman and Stephen Hawking
ids <- c('B7vSqZsAAAAJ', 'qj74uXkAAAAJ')

# Get a data frame comparing the number of citations to their work in
# a given year 
compare_scholars(ids)

# Compare their career trajectories, based on year of first citation
compare_scholar_careers(ids)

Predicting future h-index values

Users can predict the future h-index of a scholar, based on the method of Acuna et al.. Since the method was originally calibrated on data from neuroscientists, it goes without saying that, if the scholar is from another discipline, then the results should be taken with a large pinch of salt. A more general critique of the original paper is available here. Still, it's a bit of fun.

## Predict h-index of original method author, Daniel Acuna
id <- 'GAi23ssAAAAJ'
predict_h_index(id)

Formatting publications for CV

Finally, the format_publications function can be used (e.g., in conjunction with the vitae package) to format publications in APA Style. The short name of the author of interest (e.g., of the person whose CV is being made) can be highlighted in bold with the author.name argument. The function after the pipe allows rmarkdown to format them properly, and the code chunk should be set to results = "asis".

# APA style:
format_publications("NrfwEncAAAAJ", "R Thériault") |> cat(sep='\n\n')

# Numbering format:
format_publications("NrfwEncAAAAJ", "R Thériault") |> print(quote=FALSE)

About

Analyse citation data from Google Scholar

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • R 98.7%
  • Makefile 1.3%