# Gene set variation analysis (GSVA) on the most upregulated and downregulated genes of each cluster (top/bottom 50 genes)

Obtaining positive and negative transcriptional programs as the top/bottom 50 based on the average log2FC and significance in adjusted p-value. By “bottom” 50 it is meant the most downregulated genes, so top 50 when ordering the DEGs of each RNA cluster in increasing order. 

In [4]:
suppressWarnings({suppressPackageStartupMessages({
    library(GSVA)
    library(GSEABase)
    library(GSVAdata)
    library(Seurat)
    library(ggsci)
    library(ComplexHeatmap)
    library(oppar)
    library(circlize)
    library(parallel)
})})    

Loading the DEG analysis of the integrated batches (only significant results)

In [1]:
deg <- read.table(file = "All_vs_all_RNAclusters_DEG_signif.txt", sep = "\t")

Selecting the positive and the negative markers of each cluster.

In [3]:
pos_markers <- lapply(sort(unique(deg$cluster)), function(x) deg[deg$cluster == x & deg$avg_log2FC > 0, "gene"])
neg_markers <- lapply(sort(unique(deg$cluster)), function(x) deg[deg$cluster == x & deg$avg_log2FC < 0, "gene"])

Extracting the original data frame, re-ordering it in decreasing or increasing avg log2FC depending on whether we are looking for positive or negative markers.

In [5]:
positive_transcriptional_programs <- list()
negative_transcriptional_programs <- list()

for(cluster in 1:length(pos_markers)){
    # Getting specific degs for the cluster and positive markers
    deg_specific <- deg[deg$cluster == cluster & deg$gene %in% unlist(pos_markers[cluster]), ]
    
    # Sorting degs by avg log2FC
    deg_specific <- deg_specific[order(deg_specific$avg_log2FC, decreasing = T), ]

    # Selecting the top 50
    positive_transcriptional_programs[[paste0("Positive_transcriptional_program_", cluster)]] <- deg_specific[1:50, "gene"]
}

# Could be done also without re-ordering and just selecting the genes also above
for(cluster in 1:length(neg_markers)){
    # Getting specific degs for the cluster and positive markers
    deg_specific <- deg[deg$cluster == cluster & deg$gene %in% unlist(neg_markers[cluster]), ]
    
    # Sorting degs by avg log2FC
    deg_specific <- deg_specific[order(deg_specific$avg_log2FC, decreasing = F), ] # Ascending order

    # Selecting the top 50
    negative_transcriptional_programs[[paste0("Negative_transcriptional_program_", cluster)]] <- deg_specific[1:50, "gene"]
}

Saving the transcriptional programs.

In [6]:
write.table(x = data.frame(positive_transcriptional_programs), 
            file = "TranscriptionalPrograms_top50_withDuplicates.txt", 
            sep = "\t", row.names = F, col.names = T, quote = FALSE)

write.table(x = data.frame(negative_transcriptional_programs), 
            file = "NEGATIVE_TranscriptionalPrograms_top50_withDuplicates.txt", 
            sep = "\t", row.names = F, col.names = T, quote = FALSE)

Loading the scRNA-Seq data.

In [7]:
sc_data <- readRDS(file = "HGSOC_CellHashing_CLUSTERED.RDS")

For GSVA, we need to get the average expression for the RNA clusters (also known as pseudo-bulk).

In [8]:
avg_rna_clusters <- AverageExpression(sc_data,
  group.by = "RNA_clusters",
  slot = "data",
  verbose = TRUE,
)

“Exponentiation yielded infinite values. `data` may not be log-normed.”


Focusing on the SCT assay.

In [9]:
avg_rna_clusters <- avg_rna_clusters$SCT

For this dot plot heatmap, we pick the most upregulated and downregulated genes defined as positive and negative transcriptional programs, respectively.

In [10]:
positive_transcriptional_programs <- data.frame(positive_transcriptional_programs)
negative_transcriptional_programs <- data.frame(negative_transcriptional_programs)

In [11]:
head(positive_transcriptional_programs)

Unnamed: 0_level_0,Positive_transcriptional_program_1,Positive_transcriptional_program_2,Positive_transcriptional_program_3,Positive_transcriptional_program_4,Positive_transcriptional_program_5,Positive_transcriptional_program_6,Positive_transcriptional_program_7,Positive_transcriptional_program_8,Positive_transcriptional_program_9,Positive_transcriptional_program_10,Positive_transcriptional_program_11,Positive_transcriptional_program_12,Positive_transcriptional_program_13
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
1,CRIP1,LTB,PPP1R14A,MMP7,CADM2,CPA4,SNHG25,DLC1,SOX2-OT,TEX14,PLCG2,CCL20,TMEM178A
2,IFI27,CDH13,SERPINE1,CERNA2,DCXR,PURPL,ALDH3A1,PLCB4,LINC00536,CCDC200,AL627171.2,CXCL3,IGFBP2
3,LAMA3,FRMD6,ANKRD1,AL357507.1,HIST1H4H,AREG,RPS12,FMN2,LINC00326,AC007952.4,RPS20,CXCL8,CLIC5
4,BST2,PLCB4,EDN1,TNFSF10,TXNRD1,SCHLAP1,RPL39,AC092445.1,AC011287.1,FOS,RPL36A,ATF3,LYPD6B
5,SLPI,TMEM178B,IGFBP3,SOX6,PEG10,AL356804.1,FTL,SAMD3,LINC01208,EGR1,MTRNR2L12,SNHG12,CXCL1
6,ISG15,PLAU,MFAP5,ERBB4,CCN2,NCAM1,RPSA,SVEP1,DLG2,LINC00910,RPL37,DNAJB1,PDE1A


In [12]:
head(negative_transcriptional_programs)

Unnamed: 0_level_0,Negative_transcriptional_program_1,Negative_transcriptional_program_2,Negative_transcriptional_program_3,Negative_transcriptional_program_4,Negative_transcriptional_program_5,Negative_transcriptional_program_6,Negative_transcriptional_program_7,Negative_transcriptional_program_8,Negative_transcriptional_program_9,Negative_transcriptional_program_10,Negative_transcriptional_program_11,Negative_transcriptional_program_12,Negative_transcriptional_program_13
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
1,ANKRD1,IFI27,IFI27,IFI27,LTB,NNMT,LAMA3,IFI27,MT1E,RHEX,MT-CO2,RHEX,IFI27
2,SERPINE1,CRIP1,CRIP1,BST2,CRYAB,AL357507.1,FN1,SLPI,CCN2,AC024230.1,PPP1R14A,AL357507.1,UBB
3,PPP1R14A,SLPI,BST2,CRIP1,CXCL8,CDH6,MALAT1,CRIP1,PTPRJ,AL357507.1,MT-CO3,CDH6,CRIP1
4,IGFBP7,WFDC2,IFI6,IFI6,CAV1,UGT2B7,NEAT1,BST2,AREG,FRMD5,MT-ND3,PLCB4,SLPI
5,SNHG25,BST2,ISG15,SLPI,MMP7,HMGA2,RHEX,PPP1R14A,CDH6,EFNA5,MT-CYB,AC024230.1,BST2
6,AL357507.1,RHEX,SLPI,ISG15,CRIP1,RHEX,ERBB4,RHEX,MT1X,MAML2,MT-ATP6,PTPRJ,KRT19


Creating a vector with all the genes.

In [13]:
positive_genes_programs <- unlist(Map(c, positive_transcriptional_programs))
negative_genes_programs <- unlist(Map(c, negative_transcriptional_programs))
genes_programs <- c(positive_genes_programs, negative_genes_programs)

Removing duplicates.

In [14]:
genes_programs <- unique(genes_programs)

Filtering the pseudobulk for the genes in the transcriptional programs.

In [15]:
avg_rna_clusters_transcriptional_programs <- avg_rna_clusters[genes_programs, ]

GO biological processes downloaded from MSigDB on March 24th, 2023

In [16]:
go_bp <- getGmt("c5.go.bp.v2023.1.Hs.entrez.gmt", geneIdType = EntrezIdentifier(), 
                    collectionType = BroadCollection(category = "c5"))

Since we have entrez IDs, we need to convert the gene symbols of the matrix to entrez IDs.

In [17]:
master_gene_table <- mapIds(org.Hs.eg.db, 
                            keys = rownames(avg_rna_clusters_transcriptional_programs), 
                            keytype = "SYMBOL", column = "ENTREZID")
master_gene_table <- as.data.frame(master_gene_table)

'select()' returned 1:1 mapping between keys and columns



Removing NAs and keeping in the average only the genes with an entrez ID.

In [18]:
master_gene_table <- na.omit(master_gene_table)
avg_rna_clusters_no_nas <- avg_rna_clusters_transcriptional_programs[rownames(master_gene_table), ]

Substituting gene symbols with entrez IDs.

In [19]:
rownames(avg_rna_clusters_no_nas) <- master_gene_table$master_gene_table

Now going forward with GSVA using the GO:BP, 1000 bootstraps. This is a very long computation.

In [20]:
go_bp_results_gsva <- gsva(expr = avg_rna_clusters_no_nas,                               
                                method = "gsva", 
                                no.bootstraps = 1000, 
                                gset.idx.list = go_bp, 
                                min.sz = 10, # More sets and overlapping, so increasing the min sz, helping speed
                                max.sz = 500) 

Estimating GSVA scores for 834 gene sets.
Computing observed enrichment scores
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Computing bootstrap enrichment scores
Sequential bootstrap...
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in micr

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 cores
  |                                                                      |   0%
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 96 core

Obtaining the GO terms that are most variable by calculating the coefficient of variation across the 13 Leiden clusters.

In [21]:
compute_cv <- function(x) sd(x) / mean(x)
go_bp_cv <- apply(go_bp_results_gsva$es.obs, 1, compute_cv)

Selecting the top 5% variable GO terms.

In [22]:
go_bp_results_gsva_variable <- go_bp_results_gsva$es.obs[rank(go_bp_cv) / length(go_bp_cv) > 1 - 0.05, ]

In [23]:
nrow(go_bp_results_gsva_variable)

Producing the dot plot heatmap.

In [24]:
bootstrap_pval_go_gp <- go_bp_results_gsva$bootstrap$p.vals.sign
bootstrap_pval_go_gp <- bootstrap_pval_go_gp[rownames(go_bp_results_gsva_variable), ]

options(repr.plot.width = 10, repr.plot.height = 30)
col_fun = circlize::colorRamp2(c(-1, 0, 1), colors = c("#041562", "white", "#DA1212"))
cell_fun = function(j, i, x, y, w, h, fill){
          grid.rect(x = x, y = y, width = w, height = h, 
                    gp = gpar(col = NA, fill = NA))
          grid.circle(x=x, y=y, r = 0.5*min(unit.c(w, h))*(-log10(bootstrap_pval_go_gp[i, j])), # 0.6 is just to decrease the overall radius to fit it
                      gp = gpar(fill = col_fun(go_bp_results_gsva_variable[i, j]), lty = 1,
                                col = "black"))}

pdf(file = "GSVA_DotPlot_Heatmap_5percent_most_variable_GOBP.pdf", width = 20, height = 20)
draw(Heatmap(matrix = go_bp_results_gsva_variable, 
        row_labels = sapply(rownames(go_bp_results_gsva_variable), 
                            function(x) gsub(x = x, pattern = "_", replacement = " ")),
        cluster_columns = T,
        cluster_rows = T,
        top_annotation = HeatmapAnnotation(Cluster = colnames(go_bp_results_gsva_variable),
                                          col = list(Cluster = setNames(object = c(pal_npg()(10), pal_nejm()(3)),
                                                                       nm = colnames(go_bp_results_gsva_variable))), 
                                           show_legend = TRUE),
        show_column_names = FALSE, 
        border = TRUE,
        col = col_fun,
        cell_fun = cell_fun,
        name = "GSVA score", 
        rect_gp = gpar(type = "none"),
        width = unit(0.8, "cm")*10, 
        height = unit(0.8, "cm")*nrow(go_bp_results_gsva_variable), heatmap_legend_param = list(
               legend_direction = "horizontal", 
               legend_width = unit(3, "cm"))), 
     heatmap_legend_side = "bottom", annotation_legend_side = "bottom")
dev.off()

Saving the bootstraps and the GSVA scores.

In [25]:
write.table(x = go_bp_results_gsva_variable, file = "GSVA_data_heatmap_0.05.txt", sep = "\t", quote = F)
write.table(x = bootstrap_pval_go_gp, file = "BootstrappedPvalues_GSVA_data_heatmap_0.05.txt", sep = "\t", quote = F)

In [26]:
sessionInfo()

R version 4.2.2 (2022-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Rocky Linux 8.8 (Green Obsidian)

Matrix products: default
BLAS/LAPACK: /homedir01/adini22/.conda/envs/cellhashing_analyses/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] parallel  grid      stats4    stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] circlize_0.4.15       oppar_0.99.8          ComplexHeatmap_2.14.0
 [4] ggsci_3.0.0           SeuratObject_4.1.3    Seurat_4.3.0.9002    
 [7] GSVAdata_1.34.0       hgu95a.db_3.13.0      org.Hs.eg.db_3.16.0  
[10] GSEABase_1.60.0     