![DBC](Images/DBC.png)

# Scholar Metrics Scraper: ScholarCollabs notebook

**Introduction**

This R notebook creates a chord diagram to visualize publication collaborations between coauthors. It creates links between authors and their coauthors from the output CSV file created by the ScholarScraper.ipynb notebook. This notebook should be run after you run the ScholarScraper notebook. This notebook does not group the authors into sub-groups. For this purpose, use GroupedCollabs.ipynb. 

**Installation and Setup**
1. At this point we will assume you have this project loaded in Jupyter and have successfully run the ScholarScraper notebook, which has created an output CSV file with author data. If this is not the case, go to the ScholarScraper.ipynb file and follow the instructions to setup and run. 

2. Ensure that the CSV file created by ScholarScraper is in the same directory as this notebook. Make sure it has a column labeled 'Name' and a column labeled 'Coauthors'.





**Steps**

1. Install and load libraries

In [None]:
### Uncomment and install packages if necessary ###
# devtools::install_github("thomasp85/patchwork")
# devtools::install_github("jokergoo/circlize")
# install.packages("RColorBrewer")

In [2]:
sessionInfo()
install.packages("installr")

R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.6.3  IRdisplay_0.7.0 pbdZMQ_0.3-3    tools_3.6.3    
 [5] htmltools_0.3.6 base64enc_0.1-3 crayon_1.3.4    Rcpp_1.0.1     
 [9] uuid_0.1-2      IRkernel_1.3.2  jsonlite_1.6    digest_0.6.18  
[13] repr_0.19.2     evaluate_0.21  

also installing the dependencies 'cli', 'glue', 'lifecycle', 'rlang', 'stringr'





  There are binary versions available but the source versions are later:
          binary source needs_compilation
cli        2.5.0  3.6.1              TRUE
glue       1.4.2  1.6.2              TRUE
lifecycle  1.0.0  1.0.3             FALSE
rlang     0.4.11  1.1.1              TRUE
stringr    1.4.0  1.5.0             FALSE
installr  0.23.2 0.23.4             FALSE

  Binaries will be installed
package 'cli' successfully unpacked and MD5 sums checked
package 'glue' successfully unpacked and MD5 sums checked
package 'rlang' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\caoyut\AppData\Local\Temp\RtmpegGTr4\downloaded_packages


installing the source packages 'lifecycle', 'stringr', 'installr'


"installation of package 'lifecycle' had non-zero exit status"
"installation of package 'stringr' had non-zero exit status"
"installation of package 'installr' had non-zero exit status"


In [7]:
install.packages('lifecycle')

also installing the dependencies 'cli', 'rlang'





  There are binary versions available but the source versions are later:
          binary source needs_compilation
cli        2.5.0  3.6.1              TRUE
rlang     0.4.11  1.1.1              TRUE
lifecycle  1.0.0  1.0.3             FALSE

  Binaries will be installed
package 'cli' successfully unpacked and MD5 sums checked
package 'rlang' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\caoyut\AppData\Local\Temp\RtmpegGTr4\downloaded_packages


installing the source package 'lifecycle'


"installation of package 'lifecycle' had non-zero exit status"


In [None]:
library(tidyverse)
library(viridis)
library(patchwork)
library(circlize)
library(RColorBrewer)

2. Define the name of the data file (CSV created by Scholar Scraper)


In [None]:
# !!! Modify this to match the name of the CSV file that was created as output form the ScholarScraper notebook
ss_output_file = "ss_output_data.csv"

3. Define the title, colors, and whether you want to create weighted or non-weighted diagram. 

    View color palettes [here](https://www.r-graph-gallery.com/38-rcolorbrewers-palettes.html)

In [None]:
# !!! Modify the diagram title
title = "Google Scholar Dynamic Brain Circuits Collaborations"

# !!! Modify the colour palette. 
c_pallete <- brewer.pal(12,"Paired")

# !!! Modify this - Set to TRUE if you want a weighted diagram or FALSE if you want a non-weighted diagram.
weighted = TRUE 

4. Load in collaboration data.


In [None]:
df_collab = read.csv(ss_output_file)

5. Tidy the dataframe

In [None]:
# creating a subset of our survey data that extracts the useful columns. 
df_collab = subset(df_collab, select = c(Name, Coauthors))

# get rid of rows containing NAs
df_collab=df_collab[rowSums(is.na(df_collab)) != ncol(df_collab), ]

6. Set up the links

In [None]:
origin = c()
destination = c()
count = c()

for (i in 1:nrow(df_collab)) {
  x = df_collab$Name[i]
  for (n in 1:nrow(df_collab)) {
    if(is.na(df_collab$Coauthors[n]) == FALSE) {
      if(str_detect(df_collab$Coauthors[n], x) == TRUE) {
        origin = append(origin, df_collab$Name[n])
        destination = append(destination, x)
#       extracts digits that come after author name in df_collab$Collaborators[n]
        count = append(count, strtoi(
            str_extract(df_collab$Coauthors[n],paste("(?<=", df_collab$Name[i], "\': )\\d+", sep=""))))
      }
    }
  }
}

edge_l = data.frame(origin, destination, count)
# cleaning up the edge list by removing duplicates
edge_l = unique(edge_l)
edge_l$temp = apply(edge_l, 1, function(x) paste(sort(x), collapse=""))
edge_l = edge_l[!duplicated(edge_l$temp), 1:3]
                    
 # create an adjacency list. 
adjacencyData = data.frame(with(edge_l, table(origin, destination))) 
                    
# set the links to the edge list for a weighted diagram or adjacency list for non-weighted
if (weighted == TRUE){
    links = edge_l
} else {
    links = adjacencyData
}

7. Assign a color to each investigator.

In [None]:
color =  c()

for (i in 1:nrow(df_collab)) {
    j=i%%12
    if(j==0){
        j= 12
        }
      color = append(color,c_pallete[j])
}

# adding the color column to the dataframe.
df_collab$color = color

# creating colour_ind structure
color_ind = structure(df_collab$color, names = df_collab$Name)

8. Create the chord diagram. Modify the name of the output PDF file. You can make additional optional modifications as well (read the comments below).

In [None]:
# Modify this!!!
pdf("chord_diagram.pdf") 

# set up the parameters
circos.clear()
circos.par(start.degree = 90,gap.degree = 1, 
           track.margin = c(-0.1, 0.1), 
           points.overflow.warning = FALSE, canvas.xlim = c(-1.3,1.3),
           canvas.ylim = c(-1.3,1.3))
par(mar = c(0,0,2,0),xpd = TRUE, cex.main = 1.5)

# create the chord diagram
chordDiagram(links,
              grid.col = color_ind,
             transparency = 0.25,
             diffHeight  = -0.04,
             annotationTrack = "grid", 
             annotationTrackHeight = c(0.05, 0.1),
             link.sort = TRUE, 
             link.largest.ontop = FALSE,
              self.link = 1, 
              small.gap = 1,
              big.gap = 1
)
                    

# Add the text and the axis surrounding the diagram.
circos.trackPlotRegion(
  track.index = 1, 
  bg.border = NA, 
  panel.fun = function(x, y) {
    
    xlim = get.cell.meta.data("xlim")
    sector.index = get.cell.meta.data("sector.index")
    
    # Add names to the sector. 
    # You can modify the font size of the names by changing cex and the distance between the names
    #    and the circle by changing y. 
    circos.text(
      x = mean(xlim), 
      y = 6, 
      labels = sector.index, 
      facing = "clockwise", 
      niceFacing = TRUE,
      cex = 0.7,
    )
    
     #Add graduation on axis
    circos.axis(
      h = "top", 
      labels.cex = 0.001,
      minor.ticks = 2, 
      major.tick.length = 0.1, 
      labels.niceFacing = FALSE)
      
  }
)



# Add the title (user can modify the title in step 3) 
title(title,outer=FALSE)
                    
                    
dev.off()
                    