![DBC](Images/DBC.png)

# ScholarCollabs

**Introduction**

This R notebook creates a chord diagram to visualize publication collaborations between coauthors. It creates links between authors and their coauthors from the output CSV file created by the ScholarScraper.ipynb notebook. This notebook does not group the authors into sub-groups. For this purpose, use GroupedCollabs.ipynb. 

**Installation and Setup**
1. At this point we will assume you have this project loaded in Jupyter and have successfully run ScholarScraper, which has created an output CSV file with author data. If this is not the case, go to the ScholarScraper.ipynb file and follow the instructions to setup and run. 

2. Ensure that the CSV file created by ScholarScraper is in the same directory as this notebook. Make sure it has a column labeled 'Name' and a column labeled 'Coauthors'.

3. Create a CSV file for the groupings. This should contain the author names in the first column, the group names as column names, and the author names under their respective groups.
![Groupings](Images/ExampleGroupCSV.png)

4. Modify the names of ss_output_file and the author_name_file. 

5. Modify the name of the output PDF file.

6. Modify "links" depending on whether you want a weighted or non-weighted diagram. 



Install and load libraries

In [5]:
library(tidyverse)
library(viridis)
devtools::install_github("thomasp85/patchwork")
devtools::install_github("jokergoo/circlize")
install.packages("RColorBrewer")
library(patchwork)
library(circlize)
library(RColorBrewer)


Downloading GitHub repo thomasp85/patchwork@HEAD




[32m✔[39m  [90mchecking for file ‘/tmp/RtmpGXXOx1/remotes34c423df0c1/thomasp85-patchwork-79223d3/DESCRIPTION’[39m[36m[36m (667ms)[36m[39m
[90m─[39m[90m  [39m[90mpreparing ‘patchwork’:[39m[36m[39m
[32m✔[39m  [90mchecking DESCRIPTION meta-information[39m[36m[39m
[90m─[39m[90m  [39m[90mchecking for LF line-endings in source and make files and shell scripts[39m[36m[39m
[90m─[39m[90m  [39m[90mchecking for empty or unneeded directories[39m[36m[39m
[90m─[39m[90m  [39m[90mbuilding ‘patchwork_1.1.0.9000.tar.gz’[39m[36m[39m
   


Installing package into ‘/home/jupyter/R/x86_64-pc-linux-gnu-library/4.0’
(as ‘lib’ is unspecified)

Downloading GitHub repo jokergoo/circlize@HEAD




[32m✔[39m  [90mchecking for file ‘/tmp/RtmpGXXOx1/remotes34c51d62f7b/jokergoo-circlize-14116da/DESCRIPTION’[39m[36m[36m (659ms)[36m[39m
[90m─[39m[90m  [39m[90mpreparing ‘circlize’:[39m[36m[39m
[32m✔[39m  [90mchecking DESCRIPTION meta-information[39m[36m[39m
[90m─[39m[90m  [39m[90mchecking for LF line-endings in source and make files and shell scripts[39m[36m[39m
[90m─[39m[90m  [39m[90mchecking for empty or unneeded directories[39m[36m[39m
   Removed empty directory ‘circlize/example’
   Removed empty directory ‘circlize/test’
[90m─[39m[90m  [39m[90mbuilding ‘circlize_0.4.14.tar.gz’[39m[36m[39m
   


Installing package into ‘/home/jupyter/R/x86_64-pc-linux-gnu-library/4.0’
(as ‘lib’ is unspecified)

Installing package into ‘/home/jupyter/R/x86_64-pc-linux-gnu-library/4.0’
(as ‘lib’ is unspecified)



Define the name of the data file (CSV created by Scholar Scraper) and author name CSV file


In [6]:
ss_output_file = "ss_output_data.csv"
author_name_file = "DBC Investigators.csv"

Define the title and the color palette you want to use 

View color palettes: https://www.r-graph-gallery.com/38-rcolorbrewers-palettes.html

In [7]:
title = "Dynamic Brain Circuits"
c_pallete <- brewer.pal(12,"Paired")

Load in collaboration and investigator data


In [8]:
df_collab = read.csv(ss_output_file)
names_l = read.csv(author_name_file)

Assign colors.

In [9]:
color =  c()

for (i in 1:nrow(df_collab)) {
    j=i%%12
    if(j==0){
        j= 12
        }
      color = append(color,c_pallete[j])
}

# adding the color column to the dataframe.
df_collab$color = color

# creating colour_ind structure
color_ind = structure(df_collab$color, names = df_collab$Name)

Tidy the dataframe

In [10]:
# creating a subset of our survey data that extracts the useful columns. 
df_collab = subset(df_collab, select = c(Name, Coauthors))

# get rid of rows containing NAs
df_collab=df_collab[rowSums(is.na(df_collab)) != ncol(df_collab), ]

Set up the links

In [11]:
origin = c()
destination = c()
count = c()

for (i in 1:nrow(df_collab)) {
  x = df_collab$Name[i]
  for (n in 1:nrow(df_collab)) {
    if(is.na(df_collab$Coauthors[n]) == FALSE) {
      if(str_detect(df_collab$Coauthors[n], x) == TRUE) {
        origin = append(origin, df_collab$Name[n])
        destination = append(destination, x)
#       extracts digits that come after author name in df_collab$Collaborators[n]
        count = append(count, strtoi(
            str_extract(df_collab$Coauthors[n],paste("(?<=", df_collab$Name[i], "\': )\\d+", sep=""))))
      }
    }
  }
}

edge_l = data.frame(origin, destination, count)
# cleaning up the edge list by removing duplicates
edge_l = unique(edge_l)
edge_l$temp = apply(edge_l, 1, function(x) paste(sort(x), collapse=""))
edge_l = edge_l[!duplicated(edge_l$temp), 1:3]
                    
 # create an adjacency list. 
adjacencyData = data.frame(with(edge_l, table(origin, destination))) 

If you want a non-weighted diagram (all links the same width, set links to adjacencyData. If you want a weighted diagram, set links to edge_l. 

In [18]:
links <- edge_l

Modify the name of the output PDF file, and create the chord diagram

In [19]:
# Modify this!!!
pdf("DBC_collab_diagram_weighted_NOV29.pdf") 

# set up the parameters
circos.clear()
circos.par(start.degree = 90,gap.degree = 1, 
           track.margin = c(-0.1, 0.1), 
           points.overflow.warning = FALSE, canvas.xlim = c(-1.3,1.3),
           canvas.ylim = c(-1.3,1.3))
par(mar = c(0,0,2,0),xpd = TRUE, cex.main = 1.5)

# create the chord diagram
chordDiagram(links,
              grid.col = color_ind,
             transparency = 0.25,
             diffHeight  = -0.04,
             annotationTrack = "grid", 
             annotationTrackHeight = c(0.05, 0.1),
             link.sort = TRUE, 
             link.largest.ontop = FALSE,
              self.link = 1, 
              small.gap = 1,
              big.gap = 1
)
                    

# Add the text and the axis surrounding the diagram.
circos.trackPlotRegion(
  track.index = 1, 
  bg.border = NA, 
  panel.fun = function(x, y) {
    
    xlim = get.cell.meta.data("xlim")
    sector.index = get.cell.meta.data("sector.index")
    
    # Add names to the sector. 
    circos.text(
      x = mean(xlim), 
      y = 5.2, 
      labels = sector.index, 
      facing = "clockwise", 
      niceFacing = TRUE,
      cex = 0.7,
    )
    
     #Add graduation on axis
    circos.axis(
      h = "top", 
      labels.cex = 0.001,
      minor.ticks = 2, 
      major.tick.length = 0.1, 
      labels.niceFacing = FALSE)
      
  }
)

# Add a title
title(title,outer=FALSE)
                    
                    
dev.off()
                    