# 2D UMAPs of the individual models

Model-specific batches re-normalizing the data using SCTransform but without recomputing the % of MT and RB feature sets as well as the cell cycle scores. 

In [1]:
suppressWarnings({suppressMessages({
    library(ggplot2)
    library(Seurat)
    library(readxl)
    library(AnnotationHub)
    library(ensembldb)
    library(tidyr)
    library(dplyr)
    library(viridis)
})})

Loading the data.

In [2]:
hgsoc <- readRDS("HGSOC_CellHashing_CLUSTERED.RDS")

First, we need to subset for each model, then, re-normalize it with SCTransform using the previous approach, get the PCA and the UMAP. No Harmony integration this time, as they are separated and taken alone as they were in their original batch for sequencing. In this way, we will be able to see the separation of the different MoAs within model-specific UMAPs.

In [3]:
JHOS2 <- subset(hgsoc, subset = model == "JHOS2")
PDC1 <- subset(hgsoc, subset = model == "PDC1")
PDC2 <- subset(hgsoc, subset = model == "PDC2")

The default assay must be RNA to work on the raw counts.

In [4]:
DefaultAssay(JHOS2) <- "RNA"
DefaultAssay(PDC1) <- "RNA"
DefaultAssay(PDC2) <- "RNA"

In [5]:
JHOS2 <- SCTransform(JHOS2, 
                     vars.to.regress = c("percent.rb", 
                                                    "percent.mt", 
                                                    "nFeature_RNA", 
                                                    "nCount_RNA", 
                                                    "S.Score", 
                                                    "G2M.Score"),  
                     method = "glmGamPoi",
                     return.only.var.genes = FALSE, 
                     variable.features.n = 2000,
                     vst.flavor = "v2", verbose = FALSE)

In [7]:
PDC1 <- SCTransform(PDC1,
                    vars.to.regress = c("percent.rb", 
                                                    "percent.mt", 
                                                    "nFeature_RNA", 
                                                    "nCount_RNA", 
                                                    "S.Score", 
                                                    "G2M.Score"),  
                    method = "glmGamPoi",
                    return.only.var.genes = FALSE, 
                    variable.features.n = 2000,
                    vst.flavor = "v2", verbose = FALSE)

In [8]:
PDC2 <- SCTransform(PDC2,
                    vars.to.regress = c("percent.rb", 
                                                    "percent.mt", 
                                                    "nFeature_RNA", 
                                                    "nCount_RNA", 
                                                    "S.Score", 
                                                    "G2M.Score"),  
                    method = "glmGamPoi",
                    return.only.var.genes = FALSE, 
                    variable.features.n = 2000,
                    vst.flavor = "v2", verbose = FALSE)

In [9]:
set.seed(1)
JHOS2 <- RunPCA(JHOS2, verbose = FALSE, assay.use = "SCT")
PDC1 <- RunPCA(PDC1, verbose = FALSE, assay.use = "SCT")
PDC2 <- RunPCA(PDC2, verbose = FALSE, assay.use = "SCT")

In [10]:
set.seed(1)
JHOS2 <- RunUMAP(JHOS2, reduction = "pca", dims = 1:30, n.neighbors = 5, min.dist = 0.5, verbose = FALSE)
PDC1 <- RunUMAP(PDC1, reduction = "pca", dims = 1:30, n.neighbors = 5, min.dist = 0.5, verbose = FALSE)
PDC2 <- RunUMAP(PDC2, reduction = "pca", dims = 1:30, n.neighbors = 5, min.dist = 0.5, verbose = FALSE)

"The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session"


In [11]:
high_anno <- as.data.frame(read_xlsx(path = "mechanisms_of_actions.xlsx", sheet = 1))
high_anno <- high_anno[, c(1:3)]
rownames(high_anno) <- high_anno$`Preferred name`

[1m[22mNew names:
[36m•[39m `` -> `...4`
[36m•[39m `` -> `...5`


In [12]:
custom_palette <- c("Bcl-2 inhibitor" = "#2f4f4f", 
                    "BET inhibitor" = "#a52a2a", 
                    "Control" = "#bcbec0",
                    "CDK inhibitor" = "#228b22",
                    "CHK1 inhibitor" = "#4b0082",
                    "HDAC inhibitor" = "#ff8c00",
                    "IAPs/SMAC mimetic" = "#d2b48c",
                    "Multi-kinase inhibitor" = "#00ff00",
                    "PAK inhibitor" = "#00bfff",
                    "PARP inhibitor" = "#0000ff",
                    "PI3K/mTOR/AKT inhibitor" = "#ff1493",
                    "PLK1 inhibitor" = "#ffff54",
                    "Ras/Raf/MEK/ERK inhibitor" = "#dda0dd",
                    "XPO1/CRM1 inhibitor" = "#7fffd4")

In [13]:
JHOS2@meta.data$higher_mechanism_class <- sapply(JHOS2@meta.data$Treatment_group, 
                                                 function(x) high_anno[x, "Higher level classification"])
PDC1@meta.data$higher_mechanism_class <- sapply(PDC1@meta.data$Treatment_group, 
                                                 function(x) high_anno[x, "Higher level classification"])   
PDC2@meta.data$higher_mechanism_class <- sapply(PDC2@meta.data$Treatment_group, 
                                                 function(x) high_anno[x, "Higher level classification"])                                                                                                 

In [15]:
Idents(JHOS2) <- "higher_mechanism_class"
Idents(PDC1) <- "higher_mechanism_class"
Idents(PDC2) <- "higher_mechanism_class"
levels(JHOS2) <- levels(PDC1) <- levels(PDC2) <- c("Control", sort(levels(JHOS2))[-which(sort(levels(JHOS2)) == "Control")])

In [17]:
ggsave(filename = "JHOS2_UMAP_MoAs.pdf", width = 9, height = 7, 
       plot = DimPlot(JHOS2, pt.size = 1, cols = custom_palette) + NoAxes() + ggtitle("JHOS2"))

ggsave(filename = "PDC1_UMAP_MoAs.pdf", width = 9, height = 7, 
       plot = DimPlot(PDC1, pt.size = 1, cols = custom_palette) + NoAxes() + ggtitle("PDC1"))

ggsave(filename = "PDC2_UMAP_MoAs.pdf", width = 9, height = 7, 
       plot = DimPlot(PDC2, pt.size = 1, cols = custom_palette) + NoAxes() + ggtitle("PDC2"))

As done for other UMAPs, we save them both with and without the legend to ensure that the UMAP size is the same, and that the legend can be added using graphical design softwares.

In [20]:
ggsave(filename = "JHOS2_UMAP_MoAs_nolegend.pdf", width = 7, height = 7, 
       plot = DimPlot(JHOS2, pt.size = 1, cols = custom_palette) + NoAxes() + NoLegend() + ggtitle("JHOS2"))

ggsave(filename = "PDC1_UMAP_MoAs_nolegend.pdf", width = 7, height = 7, 
       plot = DimPlot(PDC1, pt.size = 1, cols = custom_palette) + NoAxes() + NoLegend() + ggtitle("PDC1"))

ggsave(filename = "PDC2_UMAP_MoAs_nolegend.pdf", width = 7, height = 7, 
       plot = DimPlot(PDC2, pt.size = 1, cols = custom_palette) + NoAxes() + NoLegend() + ggtitle("PDC2"))

In [19]:
sessionInfo()

R version 4.2.2 (2022-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.9 (Ootpa)

Matrix products: default
BLAS/LAPACK: /homedir01/adini22/.conda/envs/cellhashing_preprocessing/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] viridis_0.6.2           viridisLite_0.4.2       dplyr_1.1.2            
 [4] tidyr_1.3.0             ensembldb_2.22.0        AnnotationFilter_1.22.0
 [7] GenomicFeatures_1.50.3  AnnotationDbi_1.60.0    Biobase_2.58.0         
[10] GenomicRanges_1