![DBC](Images/DBC.png)

# GroupedCollabs

**Introduction**

This R notebook creates a **grouped** chord diagram to visualize publication collaborations between coauthors. It creates links between authors and their coauthors from the output CSV file created by the ScholarScraper.ipynb notebook. It also takes another CSV file as input to specify groups. If you do not want a grouped diagram, use ScholarCollabs.ipynb. 

**Installation and Setup**
1. At this point we will assume you have this project loaded in Jupyter and have successfully run ScholarScraper, which has created an output CSV file with author data. If this is not the case, go to the ScholarScraper.ipynb file and follow the instructions to setup and run. 

2. Ensure that the CSV file created by ScholarScraper is in the same directory as this notebook. Make sure it has a column labeled 'Name' and a column labeled 'Coauthors'. 

3. Create a CSV file for the groupings. This should contain the author names in the first column, the group names as column names, and the author names under their respective groups.
![Groupings](Images/ExampleGroupCSV.png)

4. Modify the names of ss_output_file, author_name_file, and group_file to match the names of your files. Ensure they are in the main project directory. 

5. Modify the title of the diagram. 

5. Modify "links" depending on whether you want a weighted or non-weighted diagram. 

6. Modify the name of the output PDF file.



Install and load libraries

In [1]:
library(tidyverse)
library(viridis)
devtools::install_github("thomasp85/patchwork")
devtools::install_github("jokergoo/circlize")
install.packages("RColorBrewer")
library(patchwork)
library(circlize)
library(RColorBrewer)



“running command 'timedatectl' had status 1”
── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.5     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.6     [32m✔[39m [34mdplyr  [39m 1.0.7
[32m✔[39m [34mtidyr  [39m 1.1.3     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.4.0     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

Loading required package: viridisLite

Skipping install of 'patchwork' from a github remote, the SHA1 (79223d30) has not changed since last install.
  Use `force = TRUE` to force installation

Skipping install of 'circlize' from a github remote, the SHA1 (14116da5) has not changed since last install.
  Use `fo

Define the name of the data file (CSV created by Scholar Scraper), investigator names CSV file, and group CSV file

In [21]:
ss_output_file = "ss_output_data.csv"
author_name_file = "DBC Investigators.csv"
group_file = "DBC-Investigator-Faculty-Groups.csv"

Define the title and colors you want to use. 

View color options [here](http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf).

In [2]:
title = "Dynamic Brain Circuits"

c_pallete = c("red","green","blue","cyan","magenta")

Load in collaboration data and investigator names


In [15]:
df_collab = read.csv(ss_output_file)
names_l = read.csv(author_name_file)

Tidy the dataframe

In [None]:
# creating a subset of our survey data that extracts the useful columns. 
df_collab = subset(df_collab, select = c(Name, Coauthors))

# get rid of rows containing NAs
df_collab=df_collab[rowSums(is.na(df_collab)) != ncol(df_collab), ]

Setup the links

In [15]:
##### create an edge list using for loop. ####
origin = c()
destination = c()
count = c()

for (i in 1:nrow(df_collab)) {
  x = df_collab$Name[i]
  for (n in 1:nrow(df_collab)) {
    if(is.na(df_collab$Coauthors[n]) == FALSE) {
      if(str_detect(df_collab$Coauthors[n], x) == TRUE) {
        origin = append(origin, df_collab$Name[n])
        destination = append(destination, x)
#       extracts digits that come after author name in df_collab$Coauthors[n]
        count = append(count, strtoi(
            str_extract(df_collab$Coauthors[n],paste("(?<=", df_collab$Name[i], "\': )\\d+", sep=""))))
      }
    }
  }
}

edge_l = data.frame(origin, destination, count)
# cleaning up the edge list by removing duplicates
edge_l = unique(edge_l)
edge_l$temp = apply(edge_l, 1, function(x) paste(sort(x), collapse=""))
edge_l = edge_l[!duplicated(edge_l$temp), 1:3]
                    
# create an adjacency list. 
adjacencyData = data.frame(with(edge_l, table(origin, destination)))

Set up the groupings

In [40]:
# loading in the group names
df_group = read.csv(group_file)

# Setting up group names
Primary_c = colnames(df_group)[-1]

# Removing unnecessary first column
df_group <- df_group[,-1]

# Creating a group column by pivoting 
df_group = df_group %>%
  pivot_longer(Primary_c,
               names_to = "group",
               values_to = "name")

# Removing NAs
df_group = na.omit(df_group)

# Assigning a colour to each group. 
color =  c()
                    
for (i in 1:nrow(df_group)) {
  for (j in 1:length(Primary_c)) {
    if(df_group$group[i] == Primary_c[j]) {
      color = append(color,c_pallete[j])
    }
  }
}

# adding the color column to the dataframe.
df_group$color = color

# creating the groupings
group_ind = structure(df_group$group, names = df_group$name)

# creating colors for the groupings
color_ind = structure(df_group$color, names = df_group$name)

If you want a non-weighted diagram (all links the same width, set links to adjacencyData. If you want a weighted diagram, set links to edge_l. 

In [None]:
links <- edge_l

Modify the name of the output PDF file, and create the chord diagram

In [46]:

# Modify this!!!
pdf("DBC_collab_diagram_grouped_weighted.pdf") 

# set up the parameters
circos.clear()
circos.par(start.degree = 90,gap.degree = 1, 
           track.margin = c(-0.1, 0.1), 
           points.overflow.warning = FALSE, canvas.xlim = c(-1.3,1.3),
           canvas.ylim = c(-1.3,1.3))
par(mar = c(0,0,2,0),xpd = TRUE, cex.main = 1.5)

# create the chord diagram
chordDiagram(links, group = group_ind,
              grid.col = color_ind,
             transparency = 0.25,
             diffHeight  = -0.04,
             annotationTrack = "grid", 
             annotationTrackHeight = c(0.05, 0.1),
             link.sort = TRUE, 
             link.largest.ontop = FALSE,
              self.link = 1, 

)
                    

# Add the text and the axis surrounding the diagram.
circos.trackPlotRegion(
  track.index = 1, 
  bg.border = NA, 
  panel.fun = function(x, y) {
    
    xlim = get.cell.meta.data("xlim")
    sector.index = get.cell.meta.data("sector.index")
    
    # Add names to the sector. 
    circos.text(
      x = mean(xlim), 
      y = 5.2, 
      labels = sector.index, 
      facing = "clockwise", 
      niceFacing = TRUE,
      cex = 0.7,
    )
    
     #Add graduation on axis
    circos.axis(
      h = "top", 
      labels.cex = 0.001,
      minor.ticks = 2, 
      major.tick.length = 0.1, 
      labels.niceFacing = FALSE)
      
  }
)

# Add a title
title(title,outer=FALSE)
                    
                    
dev.off()
                    