This notebook is for preparing all datasets for integration. 

This involves:
* reading in each dataset
* check metadata all correct
* add additional metadata regarding site and cancer_subtype
* add metadata for sample_type_major
* add metadata for integration_id --> samples that are not biologically distinct (eg. two biopsies from one tumour) get same id
* use integration id to merge layers --> layers in dataset will represent how they will be integrated 
* exclude any samples with <100 myeloid cells
* record number of cells

Backing up to rdm: 
``` bash
rsync -azvhp /scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/ /QRISdata/Q5935/nikita/scdata/MYELOID_CELLS/Myeloid_Cells_Integrate
```

In [1]:
#set wd
getwd()
setwd('/scratch/user/s4436039/scdata/Myeloid_Cells')
getwd()

In [14]:
#Load packages
library(dplyr)
library(Seurat)
library(patchwork)


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Loading required package: SeuratObject

Loading required package: sp


Attaching package: ‘SeuratObject’


The following object is masked from ‘package:base’:

    intersect




## GSE184880

In [41]:
HGSOC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE184880_myeloid.RDS")

In [42]:
HGSOC
HGSOC@project.name
head(HGSOC@meta.data)

An object of class Seurat 
27984 features across 7799 samples within 1 assay 
Active assay: RNA (27984 features, 2000 variable features)
 25 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE184880_Cancer1_AAACCCACAGCTGCCA-1,GSE184880,9374,2655,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,15.980371,1,1
GSE184880_Cancer1_AAACCCACATGACGGA-1,GSE184880,2659,1246,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,8.837909,1,1
GSE184880_Cancer1_AAACGAACAGTAGTGG-1,GSE184880,3020,1206,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,13.807947,1,1
GSE184880_Cancer1_AAACGAATCACCCTCA-1,GSE184880,50940,6660,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,8.531606,1,1
GSE184880_Cancer1_AAACGCTTCTCCACTG-1,GSE184880,10129,2880,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,11.225195,1,1
GSE184880_Cancer1_AAACGCTTCTGCTCTG-1,GSE184880,12756,3352,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,9.321104,1,1


In [43]:
table(HGSOC$sample_type)
table(HGSOC$cancer_type)
table(HGSOC$patient_id)
table(HGSOC$sample_id)


Healthy_ovary        tumour 
         1457          6342 


Healthy   HGSOC 
   1457    6342 


Cancer1 Cancer2 Cancer3 Cancer4 Cancer5 Cancer6 Cancer7   Norm1   Norm2   Norm3 
   2298    1080     577     792     695     652     248      54     281     360 
  Norm4   Norm5 
    193     569 


GSE184880_Healthy_Norm1 GSE184880_Healthy_Norm2 GSE184880_Healthy_Norm3 
                     54                     281                     360 
GSE184880_Healthy_Norm4 GSE184880_Healthy_Norm5 GSE184880_HGSOC_Cancer1 
                    193                     569                    2298 
GSE184880_HGSOC_Cancer2 GSE184880_HGSOC_Cancer3 GSE184880_HGSOC_Cancer4 
                   1080                     577                     792 
GSE184880_HGSOC_Cancer5 GSE184880_HGSOC_Cancer6 GSE184880_HGSOC_Cancer7 
                    695                     652                     248 

In [44]:
#set site metadata
HGSOC@meta.data$site <- "ovary"

In [45]:
#set subtype metadata

#split by cancer_type
HGSOC_tumour <- subset(HGSOC, subset = cancer_type %in% c("HGSOC"))
HGSOC_healthy <- subset(HGSOC, subset = cancer_type %in% c("Healthy"))

HGSOC_tumour@meta.data$cancer_subtype <- "HGSOC"
HGSOC_healthy@meta.data$cancer_subtype <- "NA"

HGSOC_tumour@meta.data$sample_type_major <- "primary tumour"
HGSOC_healthy@meta.data$sample_type_major <- "healthy"

#Merge seurat objects back together
HGSOC <- merge(HGSOC_tumour, y = c(HGSOC_healthy), project = "GSE184880")

In [46]:
#set integration_id metadata
HGSOC@meta.data$integration_id <- HGSOC@meta.data$sample_id

In [47]:
HGSOC
HGSOC@project.name
head(HGSOC@meta.data)

An object of class Seurat 
27984 features across 7799 samples within 1 assay 
Active assay: RNA (27984 features, 2000 variable features)
 26 layers present: counts.1.1, counts.10.2, counts.11.2, counts.12.2, counts.2.1, counts.3.1, counts.4.1, counts.5.1, counts.6.1, counts.7.1, data.1.1, data.2.1, data.3.1, data.4.1, data.5.1, data.6.1, data.7.1, scale.data.1, counts.8.2, counts.9.2, data.8.2, data.9.2, data.10.2, data.11.2, data.12.2, scale.data.2

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,cancer_subtype,sample_type_major,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE184880_Cancer1_AAACCCACAGCTGCCA-1,GSE184880,9374,2655,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,15.980371,1,1,ovary,HGSOC,primary tumour,GSE184880_HGSOC_Cancer1
GSE184880_Cancer1_AAACCCACATGACGGA-1,GSE184880,2659,1246,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,8.837909,1,1,ovary,HGSOC,primary tumour,GSE184880_HGSOC_Cancer1
GSE184880_Cancer1_AAACGAACAGTAGTGG-1,GSE184880,3020,1206,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,13.807947,1,1,ovary,HGSOC,primary tumour,GSE184880_HGSOC_Cancer1
GSE184880_Cancer1_AAACGAATCACCCTCA-1,GSE184880,50940,6660,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,8.531606,1,1,ovary,HGSOC,primary tumour,GSE184880_HGSOC_Cancer1
GSE184880_Cancer1_AAACGCTTCTCCACTG-1,GSE184880,10129,2880,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,11.225195,1,1,ovary,HGSOC,primary tumour,GSE184880_HGSOC_Cancer1
GSE184880_Cancer1_AAACGCTTCTGCTCTG-1,GSE184880,12756,3352,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,9.321104,1,1,ovary,HGSOC,primary tumour,GSE184880_HGSOC_Cancer1


In [48]:
#exclude any samples with <100 cells
table(HGSOC$integration_id)
#exclude Norm1
HGSOC <- subset(HGSOC, !(subset = integration_id %in% c("GSE184880_Healthy_Norm1")))
table(HGSOC$integration_id)


GSE184880_Healthy_Norm1 GSE184880_Healthy_Norm2 GSE184880_Healthy_Norm3 
                     54                     281                     360 
GSE184880_Healthy_Norm4 GSE184880_Healthy_Norm5 GSE184880_HGSOC_Cancer1 
                    193                     569                    2298 
GSE184880_HGSOC_Cancer2 GSE184880_HGSOC_Cancer3 GSE184880_HGSOC_Cancer4 
                   1080                     577                     792 
GSE184880_HGSOC_Cancer5 GSE184880_HGSOC_Cancer6 GSE184880_HGSOC_Cancer7 
                    695                     652                     248 


GSE184880_Healthy_Norm2 GSE184880_Healthy_Norm3 GSE184880_Healthy_Norm4 
                    281                     360                     193 
GSE184880_Healthy_Norm5 GSE184880_HGSOC_Cancer1 GSE184880_HGSOC_Cancer2 
                    569                    2298                    1080 
GSE184880_HGSOC_Cancer3 GSE184880_HGSOC_Cancer4 GSE184880_HGSOC_Cancer5 
                    577                     792                     695 
GSE184880_HGSOC_Cancer6 GSE184880_HGSOC_Cancer7 
                    652                     248 

In [49]:
#join layers and then split them by integration_id
Layers(HGSOC[["RNA"]])
#join layers
HGSOC[["RNA"]] <- JoinLayers(HGSOC[["RNA"]])
Layers(HGSOC[["RNA"]])
#split layers
HGSOC[["RNA"]] <- split(HGSOC[["RNA"]], f = HGSOC$integration_id)
Layers(HGSOC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [50]:
#record number of cells
table(HGSOC$integration_id)


GSE184880_Healthy_Norm2 GSE184880_Healthy_Norm3 GSE184880_Healthy_Norm4 
                    281                     360                     193 
GSE184880_Healthy_Norm5 GSE184880_HGSOC_Cancer1 GSE184880_HGSOC_Cancer2 
                    569                    2298                    1080 
GSE184880_HGSOC_Cancer3 GSE184880_HGSOC_Cancer4 GSE184880_HGSOC_Cancer5 
                    577                     792                     695 
GSE184880_HGSOC_Cancer6 GSE184880_HGSOC_Cancer7 
                    652                     248 

In [51]:
#re-export seurat object ready for integration
saveRDS(HGSOC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE184880_myeloid_int.RDS")

In [52]:
#remove all objects in R
rm(list = ls())

## GSE213243

In [53]:
HGSOC_tu <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE213243_Tumour_myeloid.RDS")
HGSOC_As <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE213243_Ascites_myeloid.RDS")

In [54]:
HGSOC_tu
HGSOC_tu@project.name
head(HGSOC_tu@meta.data)

HGSOC_As
HGSOC_As@project.name
head(HGSOC_As@meta.data)

An object of class Seurat 
58825 features across 804 samples within 1 assay 
Active assay: RNA (58825 features, 2000 variable features)
 3 layers present: counts, data, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.5,seurat_clusters
Unnamed: 0_level_1,<fct>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE213243_tumour_AAAGGTACACGCAGTC-1,GSE213243,8050,2780,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,19.962733,3,3
GSE213243_tumour_AAATGGACACACGCCA-1,GSE213243,5854,2467,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,4.936795,3,3
GSE213243_tumour_AACAAAGCAATTTCCT-1,GSE213243,6073,2541,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,6.323069,3,3
GSE213243_tumour_AACACACGTAGCTTTG-1,GSE213243,13497,3862,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,5.319701,3,3
GSE213243_tumour_AACACACTCGCTGTTC-1,GSE213243,8644,3306,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,10.596946,3,3
GSE213243_tumour_AACAGGGCAACCCTAA-1,GSE213243,6263,2562,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,3.544627,3,3


An object of class Seurat 
58825 features across 2688 samples within 1 assay 
Active assay: RNA (58825 features, 2000 variable features)
 3 layers present: counts, data, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.5,seurat_clusters
Unnamed: 0_level_1,<fct>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE213243_ascites_AAACCCAAGTAGCAAT-2,GSE213243,16943,4684,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,7.572449,5,5
GSE213243_ascites_AAACCCACAGTCGTTA-2,GSE213243,14219,3822,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,5.02145,1,1
GSE213243_ascites_AAACCCATCCGTAGTA-2,GSE213243,15634,4224,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,6.556224,5,5
GSE213243_ascites_AAACGAAAGTGCTCGC-2,GSE213243,3007,1377,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,28.766212,6,6
GSE213243_ascites_AAACGAAGTATGGTAA-2,GSE213243,13828,4227,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,4.122071,5,5
GSE213243_ascites_AAACGCTAGTATCTGC-2,GSE213243,12945,3944,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,8.937814,6,6


In [55]:
table(HGSOC_tu$sample_type)
table(HGSOC_tu$cancer_type)
table(HGSOC_tu$patient_id)
table(HGSOC_tu$sample_id)

table(HGSOC_As$sample_type)
table(HGSOC_As$cancer_type)
table(HGSOC_As$patient_id)
table(HGSOC_As$sample_id)


tumour 
   804 


HGSOC 
  804 


pt-1 
 804 


GSE213243_HGSOC_tumour 
                   804 


ascites 
   2688 


HGSOC 
 2688 


pt-1 
2688 


GSE213243_HGSOC_ascites 
                   2688 

In [56]:
#set site metadata
HGSOC_tu@meta.data$site <- "ovary"
HGSOC_As@meta.data$site <- "ascites fluid"

HGSOC_tu@meta.data$sample_type_major <- "primary tumour"
HGSOC_As@meta.data$sample_type_major <- "ascites"

In [57]:
#set subtype metadata

#split by cancer_type
HGSOC_tu@meta.data$cancer_subtype <- "HGSOC"
HGSOC_As@meta.data$cancer_subtype <- "HGSOC"

In [58]:
#merge objects
HGSOC <- merge(HGSOC_tu, y = c(HGSOC_As), project = "GSE213243")

In [59]:
#set integration_id metadata
HGSOC@meta.data$integration_id <- HGSOC@meta.data$sample_id

In [60]:
HGSOC
HGSOC@project.name
head(HGSOC@meta.data)
tail(HGSOC@meta.data)

An object of class Seurat 
58825 features across 3492 samples within 1 assay 
Active assay: RNA (58825 features, 2000 variable features)
 6 layers present: counts.1, counts.2, data.1, scale.data.1, data.2, scale.data.2

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.5,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE213243_tumour_AAAGGTACACGCAGTC-1,GSE213243,8050,2780,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,19.962733,3,3,ovary,primary tumour,HGSOC,GSE213243_HGSOC_tumour
GSE213243_tumour_AAATGGACACACGCCA-1,GSE213243,5854,2467,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,4.936795,3,3,ovary,primary tumour,HGSOC,GSE213243_HGSOC_tumour
GSE213243_tumour_AACAAAGCAATTTCCT-1,GSE213243,6073,2541,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,6.323069,3,3,ovary,primary tumour,HGSOC,GSE213243_HGSOC_tumour
GSE213243_tumour_AACACACGTAGCTTTG-1,GSE213243,13497,3862,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,5.319701,3,3,ovary,primary tumour,HGSOC,GSE213243_HGSOC_tumour
GSE213243_tumour_AACACACTCGCTGTTC-1,GSE213243,8644,3306,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,10.596946,3,3,ovary,primary tumour,HGSOC,GSE213243_HGSOC_tumour
GSE213243_tumour_AACAGGGCAACCCTAA-1,GSE213243,6263,2562,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,3.544627,3,3,ovary,primary tumour,HGSOC,GSE213243_HGSOC_tumour


Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.5,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE213243_ascites_TTTGATCGTTAGGCCC-2,GSE213243,20342,4702,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,5.899125,5,5,ascites fluid,ascites,HGSOC,GSE213243_HGSOC_ascites
GSE213243_ascites_TTTGATCTCTCGGCTT-2,GSE213243,1614,820,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,34.262701,1,1,ascites fluid,ascites,HGSOC,GSE213243_HGSOC_ascites
GSE213243_ascites_TTTGGAGCACGTCTCT-2,GSE213243,10549,3639,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,11.119537,6,6,ascites fluid,ascites,HGSOC,GSE213243_HGSOC_ascites
GSE213243_ascites_TTTGGAGGTCCTGGGT-2,GSE213243,4613,2061,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,12.421418,1,1,ascites fluid,ascites,HGSOC,GSE213243_HGSOC_ascites
GSE213243_ascites_TTTGGTTCATCCTATT-2,GSE213243,6073,2678,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,11.954553,1,1,ascites fluid,ascites,HGSOC,GSE213243_HGSOC_ascites
GSE213243_ascites_TTTGTTGCATGATGCT-2,GSE213243,14293,4430,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,5.044427,6,6,ascites fluid,ascites,HGSOC,GSE213243_HGSOC_ascites


In [61]:
#exclude any samples with <100 cells
table(HGSOC$integration_id)
#none to exclude


GSE213243_HGSOC_ascites  GSE213243_HGSOC_tumour 
                   2688                     804 

In [62]:
#join layers and then split them by integration_id
Layers(HGSOC[["RNA"]])
#join layers
HGSOC[["RNA"]] <- JoinLayers(HGSOC[["RNA"]])
Layers(HGSOC[["RNA"]])
#split layers
HGSOC[["RNA"]] <- split(HGSOC[["RNA"]], f = HGSOC$integration_id)
Layers(HGSOC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [63]:
#record number of cells
table(HGSOC$integration_id)


GSE213243_HGSOC_ascites  GSE213243_HGSOC_tumour 
                   2688                     804 

In [64]:
#re-export seurat object ready for integration
saveRDS(HGSOC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE213243_myeloid_int.RDS")

In [65]:
#remove all objects in R
rm(list = ls())

## GSE217517

In [66]:
HGSOC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE217517_myeloid.RDS")

In [67]:
HGSOC
HGSOC@project.name
head(HGSOC@meta.data)

An object of class Seurat 
36601 features across 8457 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 17 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.5,seurat_clusters,RNA_snn_res.0.2
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<fct>
GSE217517_pt1_AAACGAAAGAACCCGA-1,GSE217517,7268,2217,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,3.76995,9,1,1
GSE217517_pt1_AAAGAACCAGGGCTTC-1,GSE217517,20132,4339,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,7.634612,5,1,1
GSE217517_pt1_AAAGAACTCCATGAGT-1,GSE217517,4183,1410,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,35.142242,9,1,1
GSE217517_pt1_AAAGGATTCTATTTCG-1,GSE217517,3037,1274,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,6.914718,9,1,1
GSE217517_pt1_AAATGGACACTGAGGA-1,GSE217517,9516,2822,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,2.847835,5,1,1
GSE217517_pt1_AACAGGGGTCATCGGC-1,GSE217517,22104,4611,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,7.69544,9,1,1


In [68]:
table(HGSOC$sample_type)
table(HGSOC$cancer_type)
table(HGSOC$patient_id)
table(HGSOC$sample_id)


tumour 
  8457 


HGSOC 
 8457 


 pt1  pt2  pt3  pt4  pt5  pt6  pt7  pt8 
 842  966 2678 1517 1004   37 1054  359 


GSE217517_HGSOC_pt1 GSE217517_HGSOC_pt2 GSE217517_HGSOC_pt3 GSE217517_HGSOC_pt4 
                842                 966                2678                1517 
GSE217517_HGSOC_pt5 GSE217517_HGSOC_pt6 GSE217517_HGSOC_pt7 GSE217517_HGSOC_pt8 
               1004                  37                1054                 359 

In [70]:
#set site metadata
HGSOC@meta.data$site <- "ovary"
HGSOC@meta.data$sample_type_major <- "primary tumour"

In [71]:
#set subtype metadata
HGSOC@meta.data$cancer_subtype <- "HGSOC"

In [72]:
#set integration_id metadata
HGSOC@meta.data$integration_id <- HGSOC@meta.data$sample_id

In [73]:
HGSOC
HGSOC@project.name
head(HGSOC@meta.data)

An object of class Seurat 
36601 features across 8457 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 17 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.5,seurat_clusters,RNA_snn_res.0.2,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
GSE217517_pt1_AAACGAAAGAACCCGA-1,GSE217517,7268,2217,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,3.76995,9,1,1,ovary,primary tumour,HGSOC,GSE217517_HGSOC_pt1
GSE217517_pt1_AAAGAACCAGGGCTTC-1,GSE217517,20132,4339,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,7.634612,5,1,1,ovary,primary tumour,HGSOC,GSE217517_HGSOC_pt1
GSE217517_pt1_AAAGAACTCCATGAGT-1,GSE217517,4183,1410,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,35.142242,9,1,1,ovary,primary tumour,HGSOC,GSE217517_HGSOC_pt1
GSE217517_pt1_AAAGGATTCTATTTCG-1,GSE217517,3037,1274,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,6.914718,9,1,1,ovary,primary tumour,HGSOC,GSE217517_HGSOC_pt1
GSE217517_pt1_AAATGGACACTGAGGA-1,GSE217517,9516,2822,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,2.847835,5,1,1,ovary,primary tumour,HGSOC,GSE217517_HGSOC_pt1
GSE217517_pt1_AACAGGGGTCATCGGC-1,GSE217517,22104,4611,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,7.69544,9,1,1,ovary,primary tumour,HGSOC,GSE217517_HGSOC_pt1


In [74]:
#exclude any samples with <100 cells
table(HGSOC$integration_id)
#exclude patient 6
HGSOC <- subset(HGSOC, !(subset = integration_id %in% c("GSE217517_HGSOC_pt6")))
table(HGSOC$integration_id)


GSE217517_HGSOC_pt1 GSE217517_HGSOC_pt2 GSE217517_HGSOC_pt3 GSE217517_HGSOC_pt4 
                842                 966                2678                1517 
GSE217517_HGSOC_pt5 GSE217517_HGSOC_pt6 GSE217517_HGSOC_pt7 GSE217517_HGSOC_pt8 
               1004                  37                1054                 359 


GSE217517_HGSOC_pt1 GSE217517_HGSOC_pt2 GSE217517_HGSOC_pt3 GSE217517_HGSOC_pt4 
                842                 966                2678                1517 
GSE217517_HGSOC_pt5 GSE217517_HGSOC_pt7 GSE217517_HGSOC_pt8 
               1004                1054                 359 

In [75]:
#join layers and then split them by integration_id
Layers(HGSOC[["RNA"]])
#join layers
HGSOC[["RNA"]] <- JoinLayers(HGSOC[["RNA"]])
Layers(HGSOC[["RNA"]])
#split layers
HGSOC[["RNA"]] <- split(HGSOC[["RNA"]], f = HGSOC$integration_id)
Layers(HGSOC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [76]:
#record number of cells
table(HGSOC$integration_id)


GSE217517_HGSOC_pt1 GSE217517_HGSOC_pt2 GSE217517_HGSOC_pt3 GSE217517_HGSOC_pt4 
                842                 966                2678                1517 
GSE217517_HGSOC_pt5 GSE217517_HGSOC_pt7 GSE217517_HGSOC_pt8 
               1004                1054                 359 

In [77]:
#re-export seurat object ready for integration
saveRDS(HGSOC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE217517_myeloid_int.RDS")

In [78]:
#remove all objects in R
rm(list = ls())

## PRJCA005422

In [79]:
HGSOC_As <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/PRJCA005422_ascites_myeloid.RDS")
HGSOC_Tu <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/PRJCA005422_tumour_myeloid.RDS")

In [80]:
HGSOC_As
HGSOC_As@project.name
head(HGSOC_As@meta.data)

HGSOC_Tu
HGSOC_Tu@project.name
head(HGSOC_Tu@meta.data)

An object of class Seurat 
27127 features across 16120 samples within 1 assay 
Active assay: RNA (27127 features, 2000 variable features)
 3 layers present: counts, data, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,Cellname,Samples,Groups,Patients,percent.mt,percent.ribo,percent.HSP,⋯,maintypes_2,maintypes_3,UMAP_1,UMAP_2,sample_type,cancer_type,patient_id,sample_id,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<fct>,<dbl>,<int>,<chr>,<chr>,<fct>,<fct>,<dbl>,<dbl>,<dbl>,⋯,<fct>,<chr>,<dbl>,<dbl>,<fct>,<chr>,<fct>,<chr>,<fct>,<fct>
PRJCA005422_EOC1_FS_cell_AACACGTGTCGGCACT,EOC1,24353,1770,EOC1_FS_cell_AACACGTGTCGGCACT,HGSOC1_AS,Ascites,HGSOC1,0.8007227,3.859894,0.151932,⋯,B,Lymphoid cells,0.1662883,13.255426,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8
PRJCA005422_EOC1_FS_cell_AACCGCGTCCCTAACC,EOC1,531,365,EOC1_FS_cell_AACCGCGTCCCTAACC,HGSOC1_AS,Ascites,HGSOC1,5.6497175,18.796992,0.1879699,⋯,B,Lymphoid cells,-6.0097427,11.557367,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,4,4
PRJCA005422_EOC1_FS_cell_AACTCCCAGTTTCCTT,EOC1,9273,3128,EOC1_FS_cell_AACTCCCAGTTTCCTT,HGSOC1_AS,Ascites,HGSOC1,3.5479349,10.673854,0.4743935,⋯,Proliferative cells,Proliferative cells,2.3708313,2.94219,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8
PRJCA005422_EOC1_FS_cell_AAGGCAGGTTAAAGTG,EOC1,4757,2120,EOC1_FS_cell_AAGGCAGGTTAAAGTG,HGSOC1_AS,Ascites,HGSOC1,9.7750683,15.762926,0.6094998,⋯,Proliferative cells,Proliferative cells,2.5225659,2.890291,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8
PRJCA005422_EOC1_FS_cell_AAGGCAGTCAACACTG,EOC1,19574,3727,EOC1_FS_cell_AAGGCAGTCAACACTG,HGSOC1_AS,Ascites,HGSOC1,7.1319097,24.63857,0.3473819,⋯,Proliferative cells,Proliferative cells,1.6956519,3.476046,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8
PRJCA005422_EOC1_FS_cell_ACACCAAAGCTAACTC,EOC1,22514,3971,EOC1_FS_cell_ACACCAAAGCTAACTC,HGSOC1_AS,Ascites,HGSOC1,5.7208848,20.034642,0.7683425,⋯,Proliferative cells,Proliferative cells,2.063814,12.25453,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8


An object of class Seurat 
27127 features across 13256 samples within 1 assay 
Active assay: RNA (27127 features, 2000 variable features)
 3 layers present: counts, data, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,Cellname,Samples,Groups,Patients,percent.mt,percent.ribo,percent.HSP,⋯,maintypes_2,maintypes_3,UMAP_1,UMAP_2,sample_type,cancer_type,patient_id,sample_id,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<fct>,<dbl>,<int>,<chr>,<chr>,<fct>,<fct>,<dbl>,<dbl>,<dbl>,⋯,<fct>,<chr>,<dbl>,<dbl>,<fct>,<chr>,<fct>,<chr>,<fct>,<fct>
PRJCA005422_EOC1_OC_cell_CCACGGACACCAGGCT,EOC1,631,422,EOC1_OC_cell_CCACGGACACCAGGCT,HGSOC1_PT,Primary Tumor,HGSOC1,5.0713154,16.4557,1.2658228,⋯,Proliferative cells,Proliferative cells,1.570391,2.864907,Primary Tumor,HGSOC,HGSOC1,PRJCA005422_HGSOC1_PT,0,0
PRJCA005422_EOC1_OC_cell_CCTACACAGAGTCTGG,EOC1,639,368,EOC1_OC_cell_CCTACACAGAGTCTGG,HGSOC1_PT,Primary Tumor,HGSOC1,3.7558685,25.0,1.40625,⋯,Proliferative cells,Proliferative cells,2.063956,3.946421,Primary Tumor,HGSOC,HGSOC1,PRJCA005422_HGSOC1_PT,0,0
PRJCA005422_EOC1_TM_cell_AGTTGGTTCACGCATA,EOC1,651,394,EOC1_TM_cell_AGTTGGTTCACGCATA,HGSOC1_MT,Metastatic Tumor,HGSOC1,0.4608295,26.72811,0.1536098,⋯,Proliferative cells,Proliferative cells,1.906724,3.60265,Metastatic Tumor,HGSOC,HGSOC1,PRJCA005422_HGSOC1_MT,0,0
PRJCA005422_EOC1_TM_cell_CGATCGGCACGCTTTC,EOC1,1480,787,EOC1_TM_cell_CGATCGGCACGCTTTC,HGSOC1_MT,Metastatic Tumor,HGSOC1,0.4054054,25.60811,0.6081081,⋯,Proliferative cells,Proliferative cells,2.0147,3.495903,Metastatic Tumor,HGSOC,HGSOC1,PRJCA005422_HGSOC1_MT,0,0
PRJCA005422_EOC1_TM_cell_CTCTAATTCTTTACGT,EOC1,1067,522,EOC1_TM_cell_CTCTAATTCTTTACGT,HGSOC1_MT,Metastatic Tumor,HGSOC1,1.2183693,38.23805,0.1874414,⋯,Proliferative cells,Proliferative cells,1.414386,3.496294,Metastatic Tumor,HGSOC,HGSOC1,PRJCA005422_HGSOC1_MT,0,0
PRJCA005422_EOC1_TM_cell_GCCTCTACACGGTTTA,EOC1,1629,792,EOC1_TM_cell_GCCTCTACACGGTTTA,HGSOC1_MT,Metastatic Tumor,HGSOC1,0.0,26.27379,0.9821977,⋯,Proliferative cells,Proliferative cells,1.58388,3.464499,Metastatic Tumor,HGSOC,HGSOC1,PRJCA005422_HGSOC1_MT,0,0


In [81]:
table(HGSOC_As$sample_type)
table(HGSOC_As$cancer_type)
table(HGSOC_As$patient_id)
table(HGSOC_As$sample_id)

table(HGSOC_Tu$sample_type)
table(HGSOC_Tu$cancer_type)
table(HGSOC_Tu$patient_id)
table(HGSOC_Tu$sample_id)


   Primary Tumor Metastatic Tumor       Lymph Node          Ascites 
               0                0                0            16120 
            PBMC 
               0 


HGSOC 
16120 


 HGSOC1  HGSOC2  HGSOC3  HGSOC4  HGSOC5  HGSOC6  HGSOC7  HGSOC8  HGSOC9 HGSOC10 
   1149    6695     662       0    1743     829       0    1110    3589     343 
   ECO1    UOC1   OCCC1      C1 
      0       0       0       0 


 PRJCA005422_HGSOC1_AS PRJCA005422_HGSOC10_AS  PRJCA005422_HGSOC2_AS 
                  1149                    343                   6695 
 PRJCA005422_HGSOC3_AS  PRJCA005422_HGSOC5_AS  PRJCA005422_HGSOC6_AS 
                   662                   1743                    829 
 PRJCA005422_HGSOC8_AS  PRJCA005422_HGSOC9_AS 
                  1110                   3589 


   Primary Tumor Metastatic Tumor       Lymph Node          Ascites 
            8041             5215                0                0 
            PBMC 
               0 


HGSOC 
13256 


 HGSOC1  HGSOC2  HGSOC3  HGSOC4  HGSOC5  HGSOC6  HGSOC7  HGSOC8  HGSOC9 HGSOC10 
   2639     633    3523    1104      70    2150    1179     121    1087     750 
   ECO1    UOC1   OCCC1      C1 
      0       0       0       0 


 PRJCA005422_HGSOC1_MT  PRJCA005422_HGSOC1_PT PRJCA005422_HGSOC10_PT 
                  1231                   1408                    750 
 PRJCA005422_HGSOC2_PT  PRJCA005422_HGSOC3_MT  PRJCA005422_HGSOC3_PT 
                   633                   1711                   1812 
 PRJCA005422_HGSOC4_MT  PRJCA005422_HGSOC4_PT  PRJCA005422_HGSOC5_PT 
                   816                    288                     70 
 PRJCA005422_HGSOC6_MT  PRJCA005422_HGSOC6_PT  PRJCA005422_HGSOC7_PT 
                  1457                    693                   1179 
 PRJCA005422_HGSOC8_PT  PRJCA005422_HGSOC9_PT 
                   121                   1087 

In [82]:
#set site metadata
HGSOC_As@meta.data$site <- "ascites fluid"
HGSOC_As@meta.data$sample_type_major <- "ascites"

#HGSOC_Tu need to split up primary and mets by location
HGSOC_Pr <- subset(HGSOC_Tu, subset = sample_type %in% c("Primary Tumor"))
HGSOC_Me <- subset(HGSOC_Tu, subset = sample_type %in% c("Metastatic Tumor"))

HGSOC_Pr@meta.data$site <- "ovary"
HGSOC_Me@meta.data$site <- "omentum"

HGSOC_Pr@meta.data$sample_type_major <- "primary tumour"
HGSOC_Me@meta.data$sample_type_major <- "metastatic tumour"

#Merge seurat objects back together
HGSOC <- merge(HGSOC_As, y = c(HGSOC_Pr, HGSOC_Me), project = "PRJCA005422")

In [84]:
#set subtype metadata
HGSOC@meta.data$cancer_subtype <- "HGSOC"

In [85]:
#set integration_id metadata
HGSOC@meta.data$integration_id <- HGSOC@meta.data$sample_id

In [86]:
HGSOC
HGSOC@project.name
head(HGSOC@meta.data)

An object of class Seurat 
27127 features across 29376 samples within 1 assay 
Active assay: RNA (27127 features, 2000 variable features)
 9 layers present: counts.1, counts.2, counts.3, data.1, scale.data.1, data.2, scale.data.2, data.3, scale.data.3

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,Cellname,Samples,Groups,Patients,percent.mt,percent.ribo,percent.HSP,⋯,sample_type,cancer_type,patient_id,sample_id,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,⋯,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
PRJCA005422_EOC1_FS_cell_AACACGTGTCGGCACT,EOC1,24353,1770,EOC1_FS_cell_AACACGTGTCGGCACT,HGSOC1_AS,Ascites,HGSOC1,0.8007227,3.859894,0.151932,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_AACCGCGTCCCTAACC,EOC1,531,365,EOC1_FS_cell_AACCGCGTCCCTAACC,HGSOC1_AS,Ascites,HGSOC1,5.6497175,18.796992,0.1879699,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,4,4,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_AACTCCCAGTTTCCTT,EOC1,9273,3128,EOC1_FS_cell_AACTCCCAGTTTCCTT,HGSOC1_AS,Ascites,HGSOC1,3.5479349,10.673854,0.4743935,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_AAGGCAGGTTAAAGTG,EOC1,4757,2120,EOC1_FS_cell_AAGGCAGGTTAAAGTG,HGSOC1_AS,Ascites,HGSOC1,9.7750683,15.762926,0.6094998,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_AAGGCAGTCAACACTG,EOC1,19574,3727,EOC1_FS_cell_AAGGCAGTCAACACTG,HGSOC1_AS,Ascites,HGSOC1,7.1319097,24.63857,0.3473819,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_ACACCAAAGCTAACTC,EOC1,22514,3971,EOC1_FS_cell_ACACCAAAGCTAACTC,HGSOC1_AS,Ascites,HGSOC1,5.7208848,20.034642,0.7683425,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS


In [87]:
#exclude any samples with <100 cells
table(HGSOC$integration_id)
#exclude patient HGSOC5 primary tumour
HGSOC <- subset(HGSOC, !(subset = integration_id %in% c("PRJCA005422_HGSOC5_PT")))
table(HGSOC$integration_id)


 PRJCA005422_HGSOC1_AS  PRJCA005422_HGSOC1_MT  PRJCA005422_HGSOC1_PT 
                  1149                   1231                   1408 
PRJCA005422_HGSOC10_AS PRJCA005422_HGSOC10_PT  PRJCA005422_HGSOC2_AS 
                   343                    750                   6695 
 PRJCA005422_HGSOC2_PT  PRJCA005422_HGSOC3_AS  PRJCA005422_HGSOC3_MT 
                   633                    662                   1711 
 PRJCA005422_HGSOC3_PT  PRJCA005422_HGSOC4_MT  PRJCA005422_HGSOC4_PT 
                  1812                    816                    288 
 PRJCA005422_HGSOC5_AS  PRJCA005422_HGSOC5_PT  PRJCA005422_HGSOC6_AS 
                  1743                     70                    829 
 PRJCA005422_HGSOC6_MT  PRJCA005422_HGSOC6_PT  PRJCA005422_HGSOC7_PT 
                  1457                    693                   1179 
 PRJCA005422_HGSOC8_AS  PRJCA005422_HGSOC8_PT  PRJCA005422_HGSOC9_AS 
                  1110                    121                   3589 
 PRJCA005422_HGSOC9


 PRJCA005422_HGSOC1_AS  PRJCA005422_HGSOC1_MT  PRJCA005422_HGSOC1_PT 
                  1149                   1231                   1408 
PRJCA005422_HGSOC10_AS PRJCA005422_HGSOC10_PT  PRJCA005422_HGSOC2_AS 
                   343                    750                   6695 
 PRJCA005422_HGSOC2_PT  PRJCA005422_HGSOC3_AS  PRJCA005422_HGSOC3_MT 
                   633                    662                   1711 
 PRJCA005422_HGSOC3_PT  PRJCA005422_HGSOC4_MT  PRJCA005422_HGSOC4_PT 
                  1812                    816                    288 
 PRJCA005422_HGSOC5_AS  PRJCA005422_HGSOC6_AS  PRJCA005422_HGSOC6_MT 
                  1743                    829                   1457 
 PRJCA005422_HGSOC6_PT  PRJCA005422_HGSOC7_PT  PRJCA005422_HGSOC8_AS 
                   693                   1179                   1110 
 PRJCA005422_HGSOC8_PT  PRJCA005422_HGSOC9_AS  PRJCA005422_HGSOC9_PT 
                   121                   3589                   1087 

In [88]:
#join layers and then split them by integration_id
Layers(HGSOC[["RNA"]])
#join layers
HGSOC[["RNA"]] <- JoinLayers(HGSOC[["RNA"]])
Layers(HGSOC[["RNA"]])
#split layers
HGSOC[["RNA"]] <- split(HGSOC[["RNA"]], f = HGSOC$integration_id)
Layers(HGSOC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [89]:
#record number of cells
HGSOC
HGSOC@project.name
head(HGSOC@meta.data)
tail(HGSOC@meta.data)
table(HGSOC$integration_id)

An object of class Seurat 
27127 features across 29306 samples within 1 assay 
Active assay: RNA (27127 features, 2000 variable features)
 43 layers present: counts.PRJCA005422_HGSOC1_AS, counts.PRJCA005422_HGSOC3_AS, counts.PRJCA005422_HGSOC2_AS, counts.PRJCA005422_HGSOC6_AS, counts.PRJCA005422_HGSOC5_AS, counts.PRJCA005422_HGSOC8_AS, counts.PRJCA005422_HGSOC9_AS, counts.PRJCA005422_HGSOC10_AS, counts.PRJCA005422_HGSOC1_PT, counts.PRJCA005422_HGSOC3_PT, counts.PRJCA005422_HGSOC2_PT, counts.PRJCA005422_HGSOC7_PT, counts.PRJCA005422_HGSOC6_PT, counts.PRJCA005422_HGSOC4_PT, counts.PRJCA005422_HGSOC8_PT, counts.PRJCA005422_HGSOC9_PT, counts.PRJCA005422_HGSOC10_PT, counts.PRJCA005422_HGSOC1_MT, counts.PRJCA005422_HGSOC3_MT, counts.PRJCA005422_HGSOC6_MT, counts.PRJCA005422_HGSOC4_MT, scale.data, data.PRJCA005422_HGSOC1_AS, data.PRJCA005422_HGSOC3_AS, data.PRJCA005422_HGSOC2_AS, data.PRJCA005422_HGSOC6_AS, data.PRJCA005422_HGSOC5_AS, data.PRJCA005422_HGSOC8_AS, data.PRJCA005422_HGSOC9_AS, da

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,Cellname,Samples,Groups,Patients,percent.mt,percent.ribo,percent.HSP,⋯,sample_type,cancer_type,patient_id,sample_id,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,⋯,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
PRJCA005422_EOC1_FS_cell_AACACGTGTCGGCACT,EOC1,24353,1770,EOC1_FS_cell_AACACGTGTCGGCACT,HGSOC1_AS,Ascites,HGSOC1,0.8007227,3.859894,0.151932,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_AACCGCGTCCCTAACC,EOC1,531,365,EOC1_FS_cell_AACCGCGTCCCTAACC,HGSOC1_AS,Ascites,HGSOC1,5.6497175,18.796992,0.1879699,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,4,4,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_AACTCCCAGTTTCCTT,EOC1,9273,3128,EOC1_FS_cell_AACTCCCAGTTTCCTT,HGSOC1_AS,Ascites,HGSOC1,3.5479349,10.673854,0.4743935,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_AAGGCAGGTTAAAGTG,EOC1,4757,2120,EOC1_FS_cell_AAGGCAGGTTAAAGTG,HGSOC1_AS,Ascites,HGSOC1,9.7750683,15.762926,0.6094998,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_AAGGCAGTCAACACTG,EOC1,19574,3727,EOC1_FS_cell_AAGGCAGTCAACACTG,HGSOC1_AS,Ascites,HGSOC1,7.1319097,24.63857,0.3473819,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_ACACCAAAGCTAACTC,EOC1,22514,3971,EOC1_FS_cell_ACACCAAAGCTAACTC,HGSOC1_AS,Ascites,HGSOC1,5.7208848,20.034642,0.7683425,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS


Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,Cellname,Samples,Groups,Patients,percent.mt,percent.ribo,percent.HSP,⋯,sample_type,cancer_type,patient_id,sample_id,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,⋯,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
PRJCA005422_EOC4_TM_cell_TTTCCTCGTTCAACCA,EOC4,12786,3090,EOC4_TM_cell_TTTCCTCGTTCAACCA,HGSOC4_MT,Metastatic Tumor,HGSOC4,0.9619897,7.382498,0.4223039,⋯,Metastatic Tumor,HGSOC,HGSOC4,PRJCA005422_HGSOC4_MT,0,0,omentum,metastatic tumour,HGSOC,PRJCA005422_HGSOC4_MT
PRJCA005422_EOC4_TM_cell_TTTCCTCTCTGGTATG,EOC4,13288,3440,EOC4_TM_cell_TTTCCTCTCTGGTATG,HGSOC4_MT,Metastatic Tumor,HGSOC4,3.6950632,15.937994,0.3988261,⋯,Metastatic Tumor,HGSOC,HGSOC4,PRJCA005422_HGSOC4_MT,0,0,omentum,metastatic tumour,HGSOC,PRJCA005422_HGSOC4_MT
PRJCA005422_EOC4_TM_cell_TTTCCTCTCTGGTTCC,EOC4,16083,3128,EOC4_TM_cell_TTTCCTCTCTGGTTCC,HGSOC4_MT,Metastatic Tumor,HGSOC4,2.8352919,7.181049,0.32952,⋯,Metastatic Tumor,HGSOC,HGSOC4,PRJCA005422_HGSOC4_MT,0,0,omentum,metastatic tumour,HGSOC,PRJCA005422_HGSOC4_MT
PRJCA005422_EOC4_TM_cell_TTTCCTCTCTGTCTCG,EOC4,23953,4481,EOC4_TM_cell_TTTCCTCTCTGTCTCG,HGSOC4_MT,Metastatic Tumor,HGSOC4,2.91404,8.449176,0.4049259,⋯,Metastatic Tumor,HGSOC,HGSOC4,PRJCA005422_HGSOC4_MT,0,0,omentum,metastatic tumour,HGSOC,PRJCA005422_HGSOC4_MT
PRJCA005422_EOC4_TM_cell_TTTGCGCAGGAGTCTG,EOC4,17788,3363,EOC4_TM_cell_TTTGCGCAGGAGTCTG,HGSOC4_MT,Metastatic Tumor,HGSOC4,4.1207556,15.797167,0.387902,⋯,Metastatic Tumor,HGSOC,HGSOC4,PRJCA005422_HGSOC4_MT,0,0,omentum,metastatic tumour,HGSOC,PRJCA005422_HGSOC4_MT
PRJCA005422_EOC4_TM_cell_TTTGTCATCCCAACGG,EOC4,16918,3537,EOC4_TM_cell_TTTGTCATCCCAACGG,HGSOC4_MT,Metastatic Tumor,HGSOC4,7.4063128,20.745907,0.4787517,⋯,Metastatic Tumor,HGSOC,HGSOC4,PRJCA005422_HGSOC4_MT,0,0,omentum,metastatic tumour,HGSOC,PRJCA005422_HGSOC4_MT



 PRJCA005422_HGSOC1_AS  PRJCA005422_HGSOC1_MT  PRJCA005422_HGSOC1_PT 
                  1149                   1231                   1408 
PRJCA005422_HGSOC10_AS PRJCA005422_HGSOC10_PT  PRJCA005422_HGSOC2_AS 
                   343                    750                   6695 
 PRJCA005422_HGSOC2_PT  PRJCA005422_HGSOC3_AS  PRJCA005422_HGSOC3_MT 
                   633                    662                   1711 
 PRJCA005422_HGSOC3_PT  PRJCA005422_HGSOC4_MT  PRJCA005422_HGSOC4_PT 
                  1812                    816                    288 
 PRJCA005422_HGSOC5_AS  PRJCA005422_HGSOC6_AS  PRJCA005422_HGSOC6_MT 
                  1743                    829                   1457 
 PRJCA005422_HGSOC6_PT  PRJCA005422_HGSOC7_PT  PRJCA005422_HGSOC8_AS 
                   693                   1179                   1110 
 PRJCA005422_HGSOC8_PT  PRJCA005422_HGSOC9_AS  PRJCA005422_HGSOC9_PT 
                   121                   3589                   1087 

In [90]:
#re-export seurat object ready for integration
saveRDS(HGSOC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/PRJCA005422_myeloid_int.RDS")

In [91]:
#remove all objects in R
rm(list = ls())

## GSE200218

In [123]:
MEL <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE200218_myeloid.RDS")

In [124]:
MEL
MEL@project.name
head(MEL@meta.data)

An object of class Seurat 
36601 features across 10371 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 11 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, data.1, data.2, data.3, data.4, data.5, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE200218_MBM01_AAACCTGAGCTGCAAG-1,GSE200218,15791,4063,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,4.439238,0,0
GSE200218_MBM01_AAACCTGCAATCGGTT-1,GSE200218,29993,5932,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.270763,2,2
GSE200218_MBM01_AAACCTGGTACTTGAC-1,GSE200218,21267,5177,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.28208,7,7
GSE200218_MBM01_AAACCTGGTTAAGATG-1,GSE200218,25744,5563,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.899938,0,0
GSE200218_MBM01_AAACCTGTCACGGTTA-1,GSE200218,14369,3779,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.257012,0,0
GSE200218_MBM01_AAACCTGTCCGCATCT-1,GSE200218,3921,2039,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,4.41214,0,0


In [125]:
table(MEL$sample_type)
table(MEL$cancer_type)
table(MEL$patient_id)
table(MEL$sample_id)


metastasis 
     10371 


melanoma brain mets 
              10371 


MBM01 MBM02 MBM03 MBM04 MBM05 
 1411  2035  1945  3143  1837 


GSE200218_MBM01 GSE200218_MBM02 GSE200218_MBM03 GSE200218_MBM04 GSE200218_MBM05 
           1411            2035            1945            3143            1837 

In [126]:
#set site and sample_type_major metadata
MEL@meta.data$site <- "brain"
MEL@meta.data$sample_type_major <- "metastatic tumour"

In [127]:
#set subtype metadata
MEL@meta.data$cancer_subtype <- "Melanoma"

In [128]:
#set integration_id metadata
MEL@meta.data$integration_id <- MEL@meta.data$sample_id

In [129]:
MEL
MEL@project.name
head(MEL@meta.data)

An object of class Seurat 
36601 features across 10371 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 11 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, data.1, data.2, data.3, data.4, data.5, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
GSE200218_MBM01_AAACCTGAGCTGCAAG-1,GSE200218,15791,4063,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,4.439238,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGCAATCGGTT-1,GSE200218,29993,5932,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.270763,2,2,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGGTACTTGAC-1,GSE200218,21267,5177,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.28208,7,7,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGGTTAAGATG-1,GSE200218,25744,5563,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.899938,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGTCACGGTTA-1,GSE200218,14369,3779,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.257012,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGTCCGCATCT-1,GSE200218,3921,2039,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,4.41214,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM01


In [130]:
#exclude any samples with <100 cells
table(MEL$integration_id)
#none to exclude


GSE200218_MBM01 GSE200218_MBM02 GSE200218_MBM03 GSE200218_MBM04 GSE200218_MBM05 
           1411            2035            1945            3143            1837 

In [131]:
#join layers and then split them by integration_id
Layers(MEL[["RNA"]])
#join layers
MEL[["RNA"]] <- JoinLayers(MEL[["RNA"]])
Layers(MEL[["RNA"]])
#split layers
MEL[["RNA"]] <- split(MEL[["RNA"]], f = MEL$integration_id)
Layers(MEL[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [132]:
#record number of cells
MEL
MEL@project.name
head(MEL@meta.data)
tail(MEL@meta.data)
table(MEL$integration_id)

An object of class Seurat 
36601 features across 10371 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 11 layers present: data.GSE200218_MBM01, data.GSE200218_MBM02, data.GSE200218_MBM03, data.GSE200218_MBM04, data.GSE200218_MBM05, scale.data, counts.GSE200218_MBM01, counts.GSE200218_MBM02, counts.GSE200218_MBM03, counts.GSE200218_MBM04, counts.GSE200218_MBM05
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
GSE200218_MBM01_AAACCTGAGCTGCAAG-1,GSE200218,15791,4063,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,4.439238,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGCAATCGGTT-1,GSE200218,29993,5932,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.270763,2,2,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGGTACTTGAC-1,GSE200218,21267,5177,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.28208,7,7,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGGTTAAGATG-1,GSE200218,25744,5563,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.899938,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGTCACGGTTA-1,GSE200218,14369,3779,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.257012,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGTCCGCATCT-1,GSE200218,3921,2039,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,4.41214,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM01


Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
GSE200218_MBM05_TTTGCGCTCATGCTCC-1,GSE200218,9728,2563,metastasis,melanoma brain mets,MBM05,GSE200218_MBM05,9.477796,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM05
GSE200218_MBM05_TTTGCGCTCCTCGCAT-1,GSE200218,13511,3309,metastasis,melanoma brain mets,MBM05,GSE200218_MBM05,3.449042,2,2,brain,metastatic tumour,Melanoma,GSE200218_MBM05
GSE200218_MBM05_TTTGGTTTCTCTAAGG-1,GSE200218,15440,3540,metastasis,melanoma brain mets,MBM05,GSE200218_MBM05,4.410622,2,2,brain,metastatic tumour,Melanoma,GSE200218_MBM05
GSE200218_MBM05_TTTGTCAAGATGGCGT-1,GSE200218,10913,2889,metastasis,melanoma brain mets,MBM05,GSE200218_MBM05,4.370934,2,2,brain,metastatic tumour,Melanoma,GSE200218_MBM05
GSE200218_MBM05_TTTGTCAAGTTAGGTA-1,GSE200218,5539,2048,metastasis,melanoma brain mets,MBM05,GSE200218_MBM05,5.361979,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM05
GSE200218_MBM05_TTTGTCATCTTGGGTA-1,GSE200218,1780,1076,metastasis,melanoma brain mets,MBM05,GSE200218_MBM05,10.561798,7,7,brain,metastatic tumour,Melanoma,GSE200218_MBM05



GSE200218_MBM01 GSE200218_MBM02 GSE200218_MBM03 GSE200218_MBM04 GSE200218_MBM05 
           1411            2035            1945            3143            1837 

In [134]:
#re-export seurat object ready for integration
saveRDS(MEL, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE200218_myeloid_int.RDS")

In [135]:
#remove all objects in R
rm(list = ls())

## GSE215120

In [136]:
MEL_Ac <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE215120_AcMEL_myeloid.RDS")
MEL_Cu <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE215120_CuMEL_myeloid.RDS")

In [137]:
MEL_Ac
MEL_Ac@project.name
head(MEL_Ac@meta.data)

MEL_Cu
MEL_Cu@project.name
head(MEL_Cu@meta.data)

An object of class Seurat 
33538 features across 787 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 13 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, data.1, data.2, data.3, data.4, data.5, data.6, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE215120_AM1_AAACCTGGTTGCTCCT-1,GSE215120,20298,3789,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,0.9754656,12,12
GSE215120_AM1_AAAGATGTCCAAATGC-1,GSE215120,5574,1721,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,6.0459275,12,12
GSE215120_AM1_AAAGTAGTCGGTGTTA-1,GSE215120,13432,2759,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,2.1515783,12,12
GSE215120_AM1_AAATGCCCAGAGCCAA-1,GSE215120,17143,2659,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,1.2249898,12,12
GSE215120_AM1_AAATGCCGTTTGGCGC-1,GSE215120,3603,1012,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,3.6081044,12,12
GSE215120_AM1_AAATGCCTCATGTCCC-1,GSE215120,14482,2882,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,1.0357685,12,12


An object of class Seurat 
33538 features across 427 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 9 layers present: counts.1, counts.2, counts.3, counts.4, data.1, data.2, data.3, data.4, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE215120_CM1_AAATGCCCATTACCTT-1,GSE215120,7596,1914,tumour,Cutaneous Melanoma,CM1,GSE215120_Cut_MEL_CM1,3.1200632,11,11
GSE215120_CM1_AACTCCCAGCCGATTT-1,GSE215120,4828,1341,tumour,Cutaneous Melanoma,CM1,GSE215120_Cut_MEL_CM1,0.9113505,11,11
GSE215120_CM1_AACTCCCTCGGCGCAT-1,GSE215120,7064,1684,tumour,Cutaneous Melanoma,CM1,GSE215120_Cut_MEL_CM1,2.5622877,11,11
GSE215120_CM1_AATCCAGTCAGGCCCA-1,GSE215120,10178,2223,tumour,Cutaneous Melanoma,CM1,GSE215120_Cut_MEL_CM1,2.0141482,11,11
GSE215120_CM1_ACACCAAGTCTTCTCG-1,GSE215120,5097,1378,tumour,Cutaneous Melanoma,CM1,GSE215120_Cut_MEL_CM1,0.3923877,11,11
GSE215120_CM1_ACACCCTCAATACGCT-1,GSE215120,7358,1667,tumour,Cutaneous Melanoma,CM1,GSE215120_Cut_MEL_CM1,2.133732,11,11


In [138]:
table(MEL_Ac$sample_type)
table(MEL_Ac$cancer_type)
table(MEL_Ac$patient_id)
table(MEL_Ac$sample_id)

table(MEL_Cu$sample_type)
table(MEL_Cu$cancer_type)
table(MEL_Cu$patient_id)
table(MEL_Cu$sample_id)


tumour 
   787 


Acral Melanoma 
           787 


AM1 AM2 AM3 AM4 AM5 AM6 
260  23 101   9 279 115 


GSE215120_Acral_MEL_AM1 GSE215120_Acral_MEL_AM2 GSE215120_Acral_MEL_AM3 
                    260                      23                     101 
GSE215120_Acral_MEL_AM4 GSE215120_Acral_MEL_AM5 GSE215120_Acral_MEL_AM6 
                      9                     279                     115 


LN metastasis        tumour 
          162           265 


Cutaneous Melanoma 
               427 


CM1 CM2 CM3 
295  32 100 


 GSE215120_Cut_MEL_CM1  GSE215120_Cut_MEL_CM2  GSE215120_Cut_MEL_CM3 
                   133                     32                    100 
GSE215120_MEL_mets_CM1 
                   162 

In [139]:
#split by cancer_type
MEL_Cu_Tu <- subset(MEL_Cu, subset = sample_type %in% c("tumour"))
MEL_Cu_LN <- subset(MEL_Cu, subset = sample_type %in% c("LN metastasis"))

#set site and sample_type_major metadata
MEL_Ac@meta.data$site <- "skin"
MEL_Cu_Tu@meta.data$site <- "skin"
MEL_Cu_LN@meta.data$site <- "lymph node"

MEL_Ac@meta.data$sample_type_major <- "primary tumour"
MEL_Cu_Tu@meta.data$sample_type_major <- "primary tumour"
MEL_Cu_LN@meta.data$sample_type_major <- "metastatic tumour"

#set subtype metadata
MEL_Ac@meta.data$cancer_subtype <- "Acral Melanoma"
MEL_Cu_Tu@meta.data$cancer_subtype <- "Melanoma"
MEL_Cu_LN@meta.data$cancer_subtype <- "Melanoma"

#Merge seurat objects back together
MEL <- merge(MEL_Ac, y = c(MEL_Cu_Tu, MEL_Cu_LN), project = "GSE215120")

In [140]:
#set integration_id metadata
MEL@meta.data$integration_id <- MEL@meta.data$sample_id

In [141]:
MEL
MEL@project.name
head(MEL@meta.data)

An object of class Seurat 
33538 features across 1214 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 23 layers present: counts.1.1, counts.1.2, counts.2.1, counts.2.2, counts.3.1, counts.3.2, counts.4.1, counts.4.3, counts.5.1, counts.6.1, data.1.1, data.2.1, data.3.1, data.4.1, data.5.1, data.6.1, scale.data.1, data.1.2, data.2.2, data.3.2, scale.data.2, data.4.3, scale.data.3

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE215120_AM1_AAACCTGGTTGCTCCT-1,GSE215120,20298,3789,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,0.9754656,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAAGATGTCCAAATGC-1,GSE215120,5574,1721,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,6.0459275,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAAGTAGTCGGTGTTA-1,GSE215120,13432,2759,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,2.1515783,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAATGCCCAGAGCCAA-1,GSE215120,17143,2659,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,1.2249898,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAATGCCGTTTGGCGC-1,GSE215120,3603,1012,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,3.6081044,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAATGCCTCATGTCCC-1,GSE215120,14482,2882,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,1.0357685,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1


In [142]:
#exclude any samples with <100 cells
table(MEL$integration_id)
#exclude AM2, AM4, CM2
MEL <- subset(MEL, !(subset = integration_id %in% c("GSE215120_Acral_MEL_AM2","GSE215120_Acral_MEL_AM4","GSE215120_Cut_MEL_CM2")))
table(MEL$integration_id)


GSE215120_Acral_MEL_AM1 GSE215120_Acral_MEL_AM2 GSE215120_Acral_MEL_AM3 
                    260                      23                     101 
GSE215120_Acral_MEL_AM4 GSE215120_Acral_MEL_AM5 GSE215120_Acral_MEL_AM6 
                      9                     279                     115 
  GSE215120_Cut_MEL_CM1   GSE215120_Cut_MEL_CM2   GSE215120_Cut_MEL_CM3 
                    133                      32                     100 
 GSE215120_MEL_mets_CM1 
                    162 


GSE215120_Acral_MEL_AM1 GSE215120_Acral_MEL_AM3 GSE215120_Acral_MEL_AM5 
                    260                     101                     279 
GSE215120_Acral_MEL_AM6   GSE215120_Cut_MEL_CM1   GSE215120_Cut_MEL_CM3 
                    115                     133                     100 
 GSE215120_MEL_mets_CM1 
                    162 

In [143]:
#join layers and then split them by integration_id
Layers(MEL[["RNA"]])
#join layers
MEL[["RNA"]] <- JoinLayers(MEL[["RNA"]])
Layers(MEL[["RNA"]])
#split layers
MEL[["RNA"]] <- split(MEL[["RNA"]], f = MEL$integration_id)
Layers(MEL[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [144]:
#record number of cells
MEL
MEL@project.name
head(MEL@meta.data)
tail(MEL@meta.data)
table(MEL$integration_id)

An object of class Seurat 
33538 features across 1150 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 15 layers present: counts.GSE215120_Acral_MEL_AM1, counts.GSE215120_Acral_MEL_AM3, counts.GSE215120_Acral_MEL_AM5, counts.GSE215120_Acral_MEL_AM6, counts.GSE215120_Cut_MEL_CM1, counts.GSE215120_Cut_MEL_CM3, counts.GSE215120_MEL_mets_CM1, scale.data, data.GSE215120_Acral_MEL_AM1, data.GSE215120_Acral_MEL_AM3, data.GSE215120_Acral_MEL_AM5, data.GSE215120_Acral_MEL_AM6, data.GSE215120_Cut_MEL_CM1, data.GSE215120_Cut_MEL_CM3, data.GSE215120_MEL_mets_CM1

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE215120_AM1_AAACCTGGTTGCTCCT-1,GSE215120,20298,3789,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,0.9754656,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAAGATGTCCAAATGC-1,GSE215120,5574,1721,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,6.0459275,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAAGTAGTCGGTGTTA-1,GSE215120,13432,2759,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,2.1515783,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAATGCCCAGAGCCAA-1,GSE215120,17143,2659,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,1.2249898,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAATGCCGTTTGGCGC-1,GSE215120,3603,1012,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,3.6081044,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAATGCCTCATGTCCC-1,GSE215120,14482,2882,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,1.0357685,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1


Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE215120_CM1_mets_TTGGTTTAGTGCAACG-1,GSE215120,627,355,LN metastasis,Cutaneous Melanoma,CM1,GSE215120_MEL_mets_CM1,3.668262,11,11,lymph node,metastatic tumour,Melanoma,GSE215120_MEL_mets_CM1
GSE215120_CM1_mets_TTTAGTCAGTTCTCTT-1,GSE215120,2740,748,LN metastasis,Cutaneous Melanoma,CM1,GSE215120_MEL_mets_CM1,4.19708,11,11,lymph node,metastatic tumour,Melanoma,GSE215120_MEL_mets_CM1
GSE215120_CM1_mets_TTTCACACACCGGAAA-1,GSE215120,1174,529,LN metastasis,Cutaneous Melanoma,CM1,GSE215120_MEL_mets_CM1,22.146508,11,11,lymph node,metastatic tumour,Melanoma,GSE215120_MEL_mets_CM1
GSE215120_CM1_mets_TTTCATGTCCACTAGA-1,GSE215120,732,395,LN metastasis,Cutaneous Melanoma,CM1,GSE215120_MEL_mets_CM1,2.322404,11,11,lymph node,metastatic tumour,Melanoma,GSE215120_MEL_mets_CM1
GSE215120_CM1_mets_TTTGGAGGTTGAGAGC-1,GSE215120,4265,1376,LN metastasis,Cutaneous Melanoma,CM1,GSE215120_MEL_mets_CM1,4.220399,11,11,lymph node,metastatic tumour,Melanoma,GSE215120_MEL_mets_CM1
GSE215120_CM1_mets_TTTGGTTTCGAGGCAA-1,GSE215120,671,384,LN metastasis,Cutaneous Melanoma,CM1,GSE215120_MEL_mets_CM1,1.043219,11,11,lymph node,metastatic tumour,Melanoma,GSE215120_MEL_mets_CM1



GSE215120_Acral_MEL_AM1 GSE215120_Acral_MEL_AM3 GSE215120_Acral_MEL_AM5 
                    260                     101                     279 
GSE215120_Acral_MEL_AM6   GSE215120_Cut_MEL_CM1   GSE215120_Cut_MEL_CM3 
                    115                     133                     100 
 GSE215120_MEL_mets_CM1 
                    162 

In [145]:
#re-export seurat object ready for integration
saveRDS(MEL, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE215120_myeloid_int.RDS")

In [146]:
#remove all objects in R
rm(list = ls())

## PRJNA907381

In [3]:
MEL <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/PRJNA907381_myeloid.RDS")

In [4]:
MEL
MEL@project.name
head(MEL@meta.data)

An object of class Seurat 
36601 features across 2723 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 17 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
PRJNA907381_MEL022_iLN_AAAGAACCAGCGCGTT-1,PRJNA907381,17285,3704,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,2.100087,6,6
PRJNA907381_MEL022_iLN_AAAGGGCTCCATAGAC-1,PRJNA907381,42925,6544,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,3.93477,6,6
PRJNA907381_MEL022_iLN_AACAAAGCAAGTATAG-1,PRJNA907381,16549,3569,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,6.689226,6,6
PRJNA907381_MEL022_iLN_AACAAGACAGGATTCT-1,PRJNA907381,18108,3854,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,4.870775,6,6
PRJNA907381_MEL022_iLN_AACCACATCTTTCCAA-1,PRJNA907381,31754,5097,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,3.231089,6,6
PRJNA907381_MEL022_iLN_AACCATGAGAAGTCTA-1,PRJNA907381,26158,4705,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,4.908632,6,6


In [5]:
table(MEL$sample_type)
table(MEL$cancer_type)
table(MEL$patient_id)
table(MEL$sample_id)


      LN mets uninvolved LN 
         1536          1187 


            Healthy Metastatic Melanoma 
               1187                1536 


MEL002 MEL009 MEL014 MEL018 MEL022 
   614    404    743    785    177 


PRJNA907381_MEL002_iLN PRJNA907381_MEL002_uLN PRJNA907381_MEL009_iLN 
                   164                    450                    404 
PRJNA907381_MEL014_iLN PRJNA907381_MEL014_uLN PRJNA907381_MEL018_iLN 
                   422                    321                    369 
PRJNA907381_MEL018_uLN PRJNA907381_MEL022_iLN 
                   416                    177 

In [6]:
#split by cancer_type
MEL_Tu <- subset(MEL, subset = cancer_type %in% c("Metastatic Melanoma"))
MEL_H <- subset(MEL, subset = cancer_type %in% c("Healthy"))

#set site and sample_type_major metadata
MEL_Tu@meta.data$site <- "lymph node"
MEL_H@meta.data$site <- "lymph node"

MEL_Tu@meta.data$sample_type_major <- "metastatic tumour"
MEL_H@meta.data$sample_type_major <- "healthy"

#set subtype metadata
MEL_Tu@meta.data$cancer_subtype <- "Melanoma"
MEL_H@meta.data$cancer_subtype <- "NA"

#Merge seurat objects back together
MEL <- merge(MEL_Tu, y = c(MEL_H), project = "PRJNA907381")

In [7]:
#set integration_id metadata
MEL@meta.data$integration_id <- MEL@meta.data$sample_id

In [9]:
MEL
MEL@project.name
head(MEL@meta.data)
tail(MEL@meta.data)

An object of class Seurat 
36601 features across 2723 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 18 layers present: counts.1.1, counts.3.1, counts.5.1, counts.6.1, counts.8.1, data.1.1, data.3.1, data.5.1, data.6.1, data.8.1, scale.data.1, counts.2.2, counts.4.2, counts.7.2, data.2.2, data.4.2, data.7.2, scale.data.2

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
PRJNA907381_MEL022_iLN_AAAGAACCAGCGCGTT-1,PRJNA907381,17285,3704,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,2.100087,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AAAGGGCTCCATAGAC-1,PRJNA907381,42925,6544,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,3.93477,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AACAAAGCAAGTATAG-1,PRJNA907381,16549,3569,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,6.689226,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AACAAGACAGGATTCT-1,PRJNA907381,18108,3854,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,4.870775,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AACCACATCTTTCCAA-1,PRJNA907381,31754,5097,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,3.231089,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AACCATGAGAAGTCTA-1,PRJNA907381,26158,4705,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,4.908632,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN


Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
PRJNA907381_MEL002_uLN_TTTACCACAAATCAAG-1,PRJNA907381,34326,5679,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,3.629902,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTATGCGTTGCCGCA-1,PRJNA907381,20339,4758,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,6.544078,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTCATGTCGACGACC-1,PRJNA907381,30296,5207,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,3.383285,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTCCTCGTTCGAACT-1,PRJNA907381,50410,5915,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,4.558619,16,16,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTGGAGAGTTCTCTT-1,PRJNA907381,13123,2408,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,15.133735,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTGGTTAGTGGCGAT-1,PRJNA907381,25688,5435,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,7.4237,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN


In [11]:
#exclude any samples with <100 cells
table(MEL$integration_id)
#none to exclude


PRJNA907381_MEL002_iLN PRJNA907381_MEL002_uLN PRJNA907381_MEL009_iLN 
                   164                    450                    404 
PRJNA907381_MEL014_iLN PRJNA907381_MEL014_uLN PRJNA907381_MEL018_iLN 
                   422                    321                    369 
PRJNA907381_MEL018_uLN PRJNA907381_MEL022_iLN 
                   416                    177 

In [12]:
#join layers and then split them by integration_id
Layers(MEL[["RNA"]])
#join layers
MEL[["RNA"]] <- JoinLayers(MEL[["RNA"]])
Layers(MEL[["RNA"]])
#split layers
MEL[["RNA"]] <- split(MEL[["RNA"]], f = MEL$integration_id)
Layers(MEL[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [13]:
#record number of cells
MEL
MEL@project.name
head(MEL@meta.data)
tail(MEL@meta.data)
table(MEL$integration_id)

An object of class Seurat 
36601 features across 2723 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 17 layers present: counts.PRJNA907381_MEL022_iLN, counts.PRJNA907381_MEL018_iLN, counts.PRJNA907381_MEL014_iLN, counts.PRJNA907381_MEL009_iLN, counts.PRJNA907381_MEL002_iLN, counts.PRJNA907381_MEL018_uLN, counts.PRJNA907381_MEL014_uLN, counts.PRJNA907381_MEL002_uLN, scale.data, data.PRJNA907381_MEL022_iLN, data.PRJNA907381_MEL018_iLN, data.PRJNA907381_MEL014_iLN, data.PRJNA907381_MEL009_iLN, data.PRJNA907381_MEL002_iLN, data.PRJNA907381_MEL018_uLN, data.PRJNA907381_MEL014_uLN, data.PRJNA907381_MEL002_uLN

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
PRJNA907381_MEL022_iLN_AAAGAACCAGCGCGTT-1,PRJNA907381,17285,3704,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,2.100087,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AAAGGGCTCCATAGAC-1,PRJNA907381,42925,6544,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,3.93477,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AACAAAGCAAGTATAG-1,PRJNA907381,16549,3569,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,6.689226,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AACAAGACAGGATTCT-1,PRJNA907381,18108,3854,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,4.870775,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AACCACATCTTTCCAA-1,PRJNA907381,31754,5097,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,3.231089,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AACCATGAGAAGTCTA-1,PRJNA907381,26158,4705,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,4.908632,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN


Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
PRJNA907381_MEL002_uLN_TTTACCACAAATCAAG-1,PRJNA907381,34326,5679,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,3.629902,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTATGCGTTGCCGCA-1,PRJNA907381,20339,4758,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,6.544078,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTCATGTCGACGACC-1,PRJNA907381,30296,5207,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,3.383285,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTCCTCGTTCGAACT-1,PRJNA907381,50410,5915,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,4.558619,16,16,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTGGAGAGTTCTCTT-1,PRJNA907381,13123,2408,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,15.133735,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTGGTTAGTGGCGAT-1,PRJNA907381,25688,5435,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,7.4237,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN



PRJNA907381_MEL002_iLN PRJNA907381_MEL002_uLN PRJNA907381_MEL009_iLN 
                   164                    450                    404 
PRJNA907381_MEL014_iLN PRJNA907381_MEL014_uLN PRJNA907381_MEL018_iLN 
                   422                    321                    369 
PRJNA907381_MEL018_uLN PRJNA907381_MEL022_iLN 
                   416                    177 

In [14]:
#re-export seurat object ready for integration
saveRDS(MEL, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/PRJNA907381_myeloid_int.RDS")

In [15]:
#remove all objects in R
rm(list = ls())

## GSE161529

In [16]:
BRE <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE161529_myeloid.RDS")

In [17]:
BRE
BRE@project.name
head(BRE@meta.data)

An object of class Seurat 
33538 features across 24082 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 117 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, counts.19, counts.20, counts.21, counts.22, counts.23, counts.24, counts.25, counts.26, counts.27, counts.28, counts.29, counts.30, counts.31, counts.32, counts.33, counts.34, counts.35, counts.36, counts.37, counts.38, counts.39, counts.40, counts.41, counts.42, counts.43, counts.44, counts.45, counts.46, counts.47, counts.48, counts.49, counts.50, counts.51, counts.52, counts.53, counts.54, counts.55, counts.56, counts.57, counts.58, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, data.19, data.20, data.21, data.22, data.23, data.24, data.

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE161529_B10023_AAAGCAATCCAGTATG-1,GSE161529,576,348,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,5.555556,1,1
GSE161529_B10023_AACTCCCCACAAGACG-1,GSE161529,1279,534,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,2.345582,1,1
GSE161529_B10023_AAGGTTCTCCTTGCCA-1,GSE161529,3314,1095,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,4.405552,1,1
GSE161529_B10023_ACATACGGTGGCGAAT-1,GSE161529,2616,922,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,2.561162,1,1
GSE161529_B10023_ACCCACTCATATGGTC-1,GSE161529,3568,1209,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,3.559417,1,1
GSE161529_B10023_ACTGCTCTCATGCAAC-1,GSE161529,1140,463,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,8.421053,1,1


In [18]:
table(BRE$sample_type)
table(BRE$cancer_type)
table(BRE$patient_id)
table(BRE$sample_id)


Healthy breast        LN mets pre-neoplastic         tumour 
          1136            979            287          21680 


      BRCA1 pre-neoplastic                 BRCA1 TNBC 
                       287                       4946 
          ER breast cancer      ER breast cancer mets 
                      9298                        976 
                   Healthy         HER2 breast cancer 
                      1136                       4076 
     male ER breast cancer male ER breast cancer mets 
                      1673                          3 
                      TNBC 
                      1687 


0001 0019 0021 0023 0025 0029 0031 0032 0033 0040 0042 0043 0056 0064 0068 0069 
 452  122    6  117  578  416  905  100   73  637 2116 1454   74  246   10   60 
0090 0092 0095 0106 0114 0125 0126 0131 0135 0151 0161 0163 0167 0169 0173 0176 
 129   50   44  285  494  201  474   49  684  290  161  475  681  433 1610  979 
0177 0178 0230 0233 0275 0288 0308 0319 0337 0342 0360 0372 0554 0894 4031 
3084 1666   22  203    4    2  377  495 1594   57  202  118 1754   40   59 


      GSE161529_BRCA1_TNBC_0131       GSE161529_BRCA1_TNBC_0177 
                             49                            3084 
      GSE161529_BRCA1_TNBC_0554       GSE161529_BRCA1_TNBC_4031 
                           1754                              59 
     GSE161529_ER_breast_ER0001      GSE161529_ER_breast_ER0025 
                            452                             578 
  GSE161529_ER_breast_ER0029_7C   GSE161529_ER_breast_ER0029_9C 
                            248                             168 
     GSE161529_ER_breast_ER0032      GSE161529_ER_breast_ER0040 
                            100                             350 
     GSE161529_ER_breast_ER0042      GSE161529_ER_breast_ER0043 
                           2116                            1448 
     GSE161529_ER_breast_ER0056      GSE161529_ER_breast_ER0064 
                             20                             154 
     GSE161529_ER_breast_ER0114      GSE161529_ER_breast_ER0125 
                        

In [19]:
table(BRE$cancer_type)


      BRCA1 pre-neoplastic                 BRCA1 TNBC 
                       287                       4946 
          ER breast cancer      ER breast cancer mets 
                      9298                        976 
                   Healthy         HER2 breast cancer 
                      1136                       4076 
     male ER breast cancer male ER breast cancer mets 
                      1673                          3 
                      TNBC 
                      1687 

In [20]:
#set site metadata, split by sample_type
BRE_H <- subset(BRE, subset = sample_type %in% c("Healthy breast"))
BRE_LN <- subset(BRE, subset = sample_type %in% c("LN mets"))
BRE_pre <- subset(BRE, subset = sample_type %in% c("pre-neoplastic"))
BRE_T <- subset(BRE, subset = sample_type %in% c("tumour"))

BRE_H@meta.data$site <- "breast"
BRE_LN@meta.data$site <- "lymph node"
BRE_pre@meta.data$site <- "breast"
BRE_T@meta.data$site <- "breast"

#set sample_type_major metadata
BRE_H@meta.data$sample_type_major <- "healthy"
BRE_LN@meta.data$sample_type_major <- "metastatic tumour"
BRE_pre@meta.data$sample_type_major <- "pre-neoplastic BRCA1"
BRE_T@meta.data$sample_type_major <- "primary tumour"

#merge back together 
BRE <- merge(BRE_H, y = c(BRE_LN, BRE_pre, BRE_T), project = "GSE161529")

#set cancer_subtype metadata, split by cancer_type 
BRE_pre <- subset(BRE, subset = cancer_type %in% c("BRCA1 pre-neoplastic"))
BRE_B_TNBC <- subset(BRE, subset = cancer_type %in% c("BRCA1 TNBC"))
BRE_ER <- subset(BRE, subset = cancer_type %in% c("ER breast cancer"))
BRE_ER_mets <- subset(BRE, subset = cancer_type %in% c("ER breast cancer mets"))
BRE_H <- subset(BRE, subset = cancer_type %in% c("Healthy"))
BRE_HER2 <- subset(BRE, subset = cancer_type %in% c("HER2 breast cancer"))
BRE_m_ER <- subset(BRE, subset = cancer_type %in% c("male ER breast cancer"))
BRE_m_ER_mets <- subset(BRE, subset = cancer_type %in% c("male ER breast cancer mets"))
BRE_TNBC <- subset(BRE, subset = cancer_type %in% c("TNBC"))

BRE_pre@meta.data$cancer_subtype <- "NA"
BRE_B_TNBC@meta.data$cancer_subtype <- "BRCA1 TNBC" 
BRE_ER@meta.data$cancer_subtype <- "ER Breast Cancer" 
BRE_ER_mets@meta.data$cancer_subtype <- "ER Breast Cancer" 
BRE_H@meta.data$cancer_subtype <- "NA" 
BRE_HER2@meta.data$cancer_subtype <- "HER2 Breast Cancer" 
BRE_m_ER@meta.data$cancer_subtype <- "male ER Breast Cancer" 
BRE_m_ER_mets@meta.data$cancer_subtype <- "male ER Breast Cancer" 
BRE_TNBC@meta.data$cancer_subtype <- "TNBC" 

#merge back together 
BRE <- merge(BRE_pre, y = c(BRE_B_TNBC, BRE_ER, BRE_ER_mets, BRE_H, BRE_HER2, BRE_m_ER, BRE_m_ER_mets, BRE_TNBC), project = "GSE161529")

In [21]:
#set integration_id metadata
BRE@meta.data$integration_id <- BRE@meta.data$sample_id

In [22]:
BRE
BRE@project.name
head(BRE@meta.data)
tail(BRE@meta.data)

An object of class Seurat 
33538 features across 24082 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 125 layers present: counts.1.3.1, counts.2.3.1, counts.3.3.1, counts.4.3.1, data.1.3.1, data.2.3.1, data.3.3.1, data.4.3.1, scale.data.3.1, counts.55.4.2, counts.56.4.2, counts.57.4.2, counts.58.4.2, data.55.4.2, data.56.4.2, data.57.4.2, data.58.4.2, scale.data.4.2, counts.5.4.3, counts.6.4.3, counts.7.4.3, counts.8.4.3, counts.9.4.3, counts.10.4.3, counts.12.4.3, counts.13.4.3, counts.15.4.3, counts.17.4.3, counts.19.4.3, counts.20.4.3, counts.21.4.3, counts.22.4.3, counts.23.4.3, counts.25.4.3, counts.27.4.3, counts.28.4.3, data.5.4.3, data.6.4.3, data.7.4.3, data.8.4.3, data.9.4.3, data.10.4.3, data.12.4.3, data.13.4.3, data.15.4.3, data.17.4.3, data.19.4.3, data.20.4.3, data.21.4.3, data.22.4.3, data.23.4.3, data.25.4.3, data.27.4.3, data.28.4.3, scale.data.4.3, counts.11.2.4, counts.14.2.4, counts.16.2.4, counts.18.2.4, counts.24.2.4, counts.2

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE161529_B10023_AAAGCAATCCAGTATG-1,GSE161529,576,348,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,5.555556,1,1,breast,pre-neoplastic BRCA1,,GSE161529_pre-neo_B10023
GSE161529_B10023_AACTCCCCACAAGACG-1,GSE161529,1279,534,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,2.345582,1,1,breast,pre-neoplastic BRCA1,,GSE161529_pre-neo_B10023
GSE161529_B10023_AAGGTTCTCCTTGCCA-1,GSE161529,3314,1095,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,4.405552,1,1,breast,pre-neoplastic BRCA1,,GSE161529_pre-neo_B10023
GSE161529_B10023_ACATACGGTGGCGAAT-1,GSE161529,2616,922,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,2.561162,1,1,breast,pre-neoplastic BRCA1,,GSE161529_pre-neo_B10023
GSE161529_B10023_ACCCACTCATATGGTC-1,GSE161529,3568,1209,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,3.559417,1,1,breast,pre-neoplastic BRCA1,,GSE161529_pre-neo_B10023
GSE161529_B10023_ACTGCTCTCATGCAAC-1,GSE161529,1140,463,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,8.421053,1,1,breast,pre-neoplastic BRCA1,,GSE161529_pre-neo_B10023


Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE161529_TN0135_TTTCCTCTCGAGAGCA-1,GSE161529,3019,1094,tumour,TNBC,135,GSE161529_TNBC_0135,2.51739,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGCGCCACGGTTTA-1,GSE161529,2429,906,tumour,TNBC,135,GSE161529_TNBC_0135,3.252367,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGCGCGTAAACGCG-1,GSE161529,9095,2480,tumour,TNBC,135,GSE161529_TNBC_0135,5.17867,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGGTTTCGACCAGC-1,GSE161529,3207,1341,tumour,TNBC,135,GSE161529_TNBC_0135,4.70845,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGTCAAGTCCGTAT-1,GSE161529,4551,1365,tumour,TNBC,135,GSE161529_TNBC_0135,2.966381,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGTCACACAGGAGT-1,GSE161529,5355,1455,tumour,TNBC,135,GSE161529_TNBC_0135,2.745098,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135


In [24]:
#exclude any samples with <100 cells
table(BRE$integration_id)
#exclude 21 samples
BRE <- subset(BRE, !(subset = integration_id %in% c("GSE161529_BRCA1_TNBC_0131","GSE161529_BRCA1_TNBC_4031","GSE161529_ER_breast_ER0056","GSE161529_ER_breast_mets_ER0043","GSE161529_ER_breast_mets_ER0056","GSE161529_ER_breast_mets_ER0064","GSE161529_Healthy_breast_0021","GSE161529_Healthy_breast_0023","GSE161529_Healthy_breast_0064","GSE161529_Healthy_breast_0092","GSE161529_Healthy_breast_0095","GSE161529_Healthy_breast_0230","GSE161529_Healthy_breast_0275","GSE161529_Healthy_breast_0288","GSE161529_Healthy_breast_0342","GSE161529_HER2_breast_0069","GSE161529_mER_breast_0068","GSE161529_mER_breast_mets_0068","GSE161529_pre-neo_B10023","GSE161529_pre-neo_B10033","GSE161529_pre-neo_B10894")))
table(BRE$integration_id)


      GSE161529_BRCA1_TNBC_0131       GSE161529_BRCA1_TNBC_0177 
                             49                            3084 
      GSE161529_BRCA1_TNBC_0554       GSE161529_BRCA1_TNBC_4031 
                           1754                              59 
     GSE161529_ER_breast_ER0001      GSE161529_ER_breast_ER0025 
                            452                             578 
  GSE161529_ER_breast_ER0029_7C   GSE161529_ER_breast_ER0029_9C 
                            248                             168 
     GSE161529_ER_breast_ER0032      GSE161529_ER_breast_ER0040 
                            100                             350 
     GSE161529_ER_breast_ER0042      GSE161529_ER_breast_ER0043 
                           2116                            1448 
     GSE161529_ER_breast_ER0056      GSE161529_ER_breast_ER0064 
                             20                             154 
     GSE161529_ER_breast_ER0114      GSE161529_ER_breast_ER0125 
                        


      GSE161529_BRCA1_TNBC_0177       GSE161529_BRCA1_TNBC_0554 
                           3084                            1754 
     GSE161529_ER_breast_ER0001      GSE161529_ER_breast_ER0025 
                            452                             578 
  GSE161529_ER_breast_ER0029_7C   GSE161529_ER_breast_ER0029_9C 
                            248                             168 
     GSE161529_ER_breast_ER0032      GSE161529_ER_breast_ER0040 
                            100                             350 
     GSE161529_ER_breast_ER0042      GSE161529_ER_breast_ER0043 
                           2116                            1448 
     GSE161529_ER_breast_ER0064      GSE161529_ER_breast_ER0114 
                            154                             250 
     GSE161529_ER_breast_ER0125      GSE161529_ER_breast_ER0151 
                            201                             290 
     GSE161529_ER_breast_ER0163      GSE161529_ER_breast_ER0167 
                        

In [25]:
#check all categories still present
table(BRE$sample_type)
table(BRE$cancer_type)
table(BRE$patient_id)
table(BRE$sample_id)


Healthy breast        LN mets pre-neoplastic         tumour 
           876            827            129          21485 


 BRCA1 pre-neoplastic            BRCA1 TNBC      ER breast cancer 
                  129                  4838                  9278 
ER breast cancer mets               Healthy    HER2 breast cancer 
                  827                   876                  4016 
male ER breast cancer                  TNBC 
                 1666                  1687 


0001 0019 0025 0029 0031 0032 0040 0042 0043 0064 0090 0106 0114 0125 0126 0135 
 452  122  578  416  905  100  637 2116 1448  154  129  285  494  201  474  684 
0151 0161 0163 0167 0169 0173 0176 0177 0178 0233 0308 0319 0337 0360 0372 0554 
 290  161  475  681  433 1610  979 3084 1666  203  377  495 1594  202  118 1754 


      GSE161529_BRCA1_TNBC_0177       GSE161529_BRCA1_TNBC_0554 
                           3084                            1754 
     GSE161529_ER_breast_ER0001      GSE161529_ER_breast_ER0025 
                            452                             578 
  GSE161529_ER_breast_ER0029_7C   GSE161529_ER_breast_ER0029_9C 
                            248                             168 
     GSE161529_ER_breast_ER0032      GSE161529_ER_breast_ER0040 
                            100                             350 
     GSE161529_ER_breast_ER0042      GSE161529_ER_breast_ER0043 
                           2116                            1448 
     GSE161529_ER_breast_ER0064      GSE161529_ER_breast_ER0114 
                            154                             250 
     GSE161529_ER_breast_ER0125      GSE161529_ER_breast_ER0151 
                            201                             290 
     GSE161529_ER_breast_ER0163      GSE161529_ER_breast_ER0167 
                        

In [26]:
#as only one pre-neo BRCA1 sample left decided to exclude category, same with male ER Breast Cancer
BRE <- subset(BRE, !(subset = integration_id %in% c("GSE161529_pre-neo_B10090","GSE161529_mER_breast_0178")))
table(BRE$integration_id)


      GSE161529_BRCA1_TNBC_0177       GSE161529_BRCA1_TNBC_0554 
                           3084                            1754 
     GSE161529_ER_breast_ER0001      GSE161529_ER_breast_ER0025 
                            452                             578 
  GSE161529_ER_breast_ER0029_7C   GSE161529_ER_breast_ER0029_9C 
                            248                             168 
     GSE161529_ER_breast_ER0032      GSE161529_ER_breast_ER0040 
                            100                             350 
     GSE161529_ER_breast_ER0042      GSE161529_ER_breast_ER0043 
                           2116                            1448 
     GSE161529_ER_breast_ER0064      GSE161529_ER_breast_ER0114 
                            154                             250 
     GSE161529_ER_breast_ER0125      GSE161529_ER_breast_ER0151 
                            201                             290 
     GSE161529_ER_breast_ER0163      GSE161529_ER_breast_ER0167 
                        

In [28]:
#check what categories still present
table(BRE$sample_type)
table(BRE$cancer_type)
table(BRE$patient_id)
table(BRE$sample_id)
table(BRE$site)
table(BRE$cancer_subtype)
table(BRE$sample_type_major)


Healthy breast        LN mets         tumour 
           876            827          19819 


           BRCA1 TNBC      ER breast cancer ER breast cancer mets 
                 4838                  9278                   827 
              Healthy    HER2 breast cancer                  TNBC 
                  876                  4016                  1687 


0001 0019 0025 0029 0031 0032 0040 0042 0043 0064 0106 0114 0125 0126 0135 0151 
 452  122  578  416  905  100  637 2116 1448  154  285  494  201  474  684  290 
0161 0163 0167 0169 0173 0176 0177 0233 0308 0319 0337 0360 0372 0554 
 161  475  681  433 1610  979 3084  203  377  495 1594  202  118 1754 


      GSE161529_BRCA1_TNBC_0177       GSE161529_BRCA1_TNBC_0554 
                           3084                            1754 
     GSE161529_ER_breast_ER0001      GSE161529_ER_breast_ER0025 
                            452                             578 
  GSE161529_ER_breast_ER0029_7C   GSE161529_ER_breast_ER0029_9C 
                            248                             168 
     GSE161529_ER_breast_ER0032      GSE161529_ER_breast_ER0040 
                            100                             350 
     GSE161529_ER_breast_ER0042      GSE161529_ER_breast_ER0043 
                           2116                            1448 
     GSE161529_ER_breast_ER0064      GSE161529_ER_breast_ER0114 
                            154                             250 
     GSE161529_ER_breast_ER0125      GSE161529_ER_breast_ER0151 
                            201                             290 
     GSE161529_ER_breast_ER0163      GSE161529_ER_breast_ER0167 
                        


    breast lymph node 
     20695        827 


        BRCA1 TNBC   ER Breast Cancer HER2 Breast Cancer                 NA 
              4838              10105               4016                876 
              TNBC 
              1687 


          healthy metastatic tumour    primary tumour 
              876               827             19819 

In [31]:
#realised two samples are not biologically distinct: GSE161529_ER_breast_ER0029_7C and  GSE161529_ER_breast_ER0029_9C
#need to ammend integration_id so they have the same

BRE_29 <- subset(BRE, subset = integration_id %in% c("GSE161529_ER_breast_ER0029_7C","GSE161529_ER_breast_ER0029_9C"))
BRE_else <- subset(BRE, !(subset = integration_id %in% c("GSE161529_ER_breast_ER0029_7C","GSE161529_ER_breast_ER0029_9C")))

BRE_29@meta.data$integration_id <- "GSE161529_ER_breast_ER0029"

In [32]:
BRE <- merge(BRE_29, y = c(BRE_else), project = "GSE161529")

In [34]:
BRE
table(BRE$integration_id)

An object of class Seurat 
33538 features across 21522 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 72 layers present: counts.GSE161529_ER_breast_ER0029_7C.1, counts.GSE161529_ER_breast_ER0029_9C.1, scale.data.1, data.GSE161529_ER_breast_ER0029_7C.1, data.GSE161529_ER_breast_ER0029_9C.1, counts.GSE161529_BRCA1_TNBC_0177.2, counts.GSE161529_BRCA1_TNBC_0554.2, counts.GSE161529_ER_breast_ER0001.2, counts.GSE161529_ER_breast_ER0025.2, counts.GSE161529_ER_breast_ER0032.2, counts.GSE161529_ER_breast_ER0040.2, counts.GSE161529_ER_breast_ER0042.2, counts.GSE161529_ER_breast_ER0043.2, counts.GSE161529_ER_breast_ER0064.2, counts.GSE161529_ER_breast_ER0114.2, counts.GSE161529_ER_breast_ER0125.2, counts.GSE161529_ER_breast_ER0151.2, counts.GSE161529_ER_breast_ER0163.2, counts.GSE161529_ER_breast_ER0167.2, counts.GSE161529_ER_breast_ER0173.2, counts.GSE161529_ER_breast_ER0319.2, counts.GSE161529_ER_breast_ER0360.2, counts.GSE161529_ER_breast_mets_ER0040.2, cou


      GSE161529_BRCA1_TNBC_0177       GSE161529_BRCA1_TNBC_0554 
                           3084                            1754 
     GSE161529_ER_breast_ER0001      GSE161529_ER_breast_ER0025 
                            452                             578 
     GSE161529_ER_breast_ER0029      GSE161529_ER_breast_ER0032 
                            416                             100 
     GSE161529_ER_breast_ER0040      GSE161529_ER_breast_ER0042 
                            350                            2116 
     GSE161529_ER_breast_ER0043      GSE161529_ER_breast_ER0064 
                           1448                             154 
     GSE161529_ER_breast_ER0114      GSE161529_ER_breast_ER0125 
                            250                             201 
     GSE161529_ER_breast_ER0151      GSE161529_ER_breast_ER0163 
                            290                             475 
     GSE161529_ER_breast_ER0167      GSE161529_ER_breast_ER0173 
                        

In [35]:
#join layers and then split them by integration_id
Layers(BRE[["RNA"]])
#join layers
BRE[["RNA"]] <- JoinLayers(BRE[["RNA"]])
Layers(BRE[["RNA"]])
#split layers
BRE[["RNA"]] <- split(BRE[["RNA"]], f = BRE$integration_id)
Layers(BRE[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [36]:
#record number of cells
BRE
BRE@project.name
head(BRE@meta.data)
tail(BRE@meta.data)
table(BRE$integration_id)

An object of class Seurat 
33538 features across 21522 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 69 layers present: counts.GSE161529_ER_breast_ER0029, counts.GSE161529_BRCA1_TNBC_0177, counts.GSE161529_BRCA1_TNBC_0554, counts.GSE161529_ER_breast_ER0001, counts.GSE161529_ER_breast_ER0025, counts.GSE161529_ER_breast_ER0032, counts.GSE161529_ER_breast_ER0040, counts.GSE161529_ER_breast_ER0042, counts.GSE161529_ER_breast_ER0043, counts.GSE161529_ER_breast_ER0064, counts.GSE161529_ER_breast_ER0114, counts.GSE161529_ER_breast_ER0125, counts.GSE161529_ER_breast_ER0151, counts.GSE161529_ER_breast_ER0163, counts.GSE161529_ER_breast_ER0167, counts.GSE161529_ER_breast_ER0173, counts.GSE161529_ER_breast_ER0319, counts.GSE161529_ER_breast_ER0360, counts.GSE161529_ER_breast_mets_ER0040, counts.GSE161529_ER_breast_mets_ER0167, counts.GSE161529_ER_breast_mets_ER0173, counts.GSE161529_Healthy_breast_0019, counts.GSE161529_Healthy_breast_0169, counts.GSE161529_H

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE161529_ER0029_7C_AAAGTAGAGGAGTTGC-1,GSE161529,1592,723,tumour,ER breast cancer,29,GSE161529_ER_breast_ER0029_7C,4.899497,1,1,breast,primary tumour,ER Breast Cancer,GSE161529_ER_breast_ER0029
GSE161529_ER0029_7C_AAATGCCAGCTGTCTA-1,GSE161529,812,443,tumour,ER breast cancer,29,GSE161529_ER_breast_ER0029_7C,2.463054,1,1,breast,primary tumour,ER Breast Cancer,GSE161529_ER_breast_ER0029
GSE161529_ER0029_7C_AAATGCCTCAAACAAG-1,GSE161529,2514,1017,tumour,ER breast cancer,29,GSE161529_ER_breast_ER0029_7C,3.778839,1,1,breast,primary tumour,ER Breast Cancer,GSE161529_ER_breast_ER0029
GSE161529_ER0029_7C_AACACGTTCTTAACCT-1,GSE161529,2614,1052,tumour,ER breast cancer,29,GSE161529_ER_breast_ER0029_7C,4.20811,1,1,breast,primary tumour,ER Breast Cancer,GSE161529_ER_breast_ER0029
GSE161529_ER0029_7C_AACCATGGTACCGAGA-1,GSE161529,2055,857,tumour,ER breast cancer,29,GSE161529_ER_breast_ER0029_7C,7.055961,1,1,breast,primary tumour,ER Breast Cancer,GSE161529_ER_breast_ER0029
GSE161529_ER0029_7C_AACTCCCAGGCTAGCA-1,GSE161529,7702,2304,tumour,ER breast cancer,29,GSE161529_ER_breast_ER0029_7C,3.791223,1,1,breast,primary tumour,ER Breast Cancer,GSE161529_ER_breast_ER0029


Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE161529_TN0135_TTTCCTCTCGAGAGCA-1,GSE161529,3019,1094,tumour,TNBC,135,GSE161529_TNBC_0135,2.51739,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGCGCCACGGTTTA-1,GSE161529,2429,906,tumour,TNBC,135,GSE161529_TNBC_0135,3.252367,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGCGCGTAAACGCG-1,GSE161529,9095,2480,tumour,TNBC,135,GSE161529_TNBC_0135,5.17867,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGGTTTCGACCAGC-1,GSE161529,3207,1341,tumour,TNBC,135,GSE161529_TNBC_0135,4.70845,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGTCAAGTCCGTAT-1,GSE161529,4551,1365,tumour,TNBC,135,GSE161529_TNBC_0135,2.966381,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGTCACACAGGAGT-1,GSE161529,5355,1455,tumour,TNBC,135,GSE161529_TNBC_0135,2.745098,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135



      GSE161529_BRCA1_TNBC_0177       GSE161529_BRCA1_TNBC_0554 
                           3084                            1754 
     GSE161529_ER_breast_ER0001      GSE161529_ER_breast_ER0025 
                            452                             578 
     GSE161529_ER_breast_ER0029      GSE161529_ER_breast_ER0032 
                            416                             100 
     GSE161529_ER_breast_ER0040      GSE161529_ER_breast_ER0042 
                            350                            2116 
     GSE161529_ER_breast_ER0043      GSE161529_ER_breast_ER0064 
                           1448                             154 
     GSE161529_ER_breast_ER0114      GSE161529_ER_breast_ER0125 
                            250                             201 
     GSE161529_ER_breast_ER0151      GSE161529_ER_breast_ER0163 
                            290                             475 
     GSE161529_ER_breast_ER0167      GSE161529_ER_breast_ER0173 
                        

In [37]:
#re-export seurat object ready for integration
saveRDS(BRE, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE161529_myeloid_int.RDS")

In [38]:
#remove all objects in R
rm(list = ls())

## GSE176078

In [3]:
BRE <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE176078_myeloid.RDS")

In [5]:
BRE
BRE@project.name
head(BRE@meta.data)

An object of class Seurat 
29733 features across 9374 samples within 1 assay 
Active assay: RNA (29733 features, 2000 variable features)
 53 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, counts.19, counts.20, counts.21, counts.22, counts.23, counts.24, counts.25, counts.26, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, data.19, data.20, data.21, data.22, data.23, data.24, data.25, data.26, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,X,percent.mito,subtype,celltype_subset,celltype_minor,celltype_major,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE176078_HER2_CID3586_AACCATGCAGGTCGTC,CID3586,6925,1897,CID3586_AACCATGCAGGTCGTC,2.194946,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,2.194946,1,1
GSE176078_HER2_CID3586_AACTTTCGTGACCAAG,CID3586,8552,2318,CID3586_AACTTTCGTGACCAAG,2.958372,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,2.958372,1,1
GSE176078_HER2_CID3586_AAGGTTCAGTCCTCCT,CID3586,9355,2382,CID3586_AAGGTTCAGTCCTCCT,2.501336,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,2.501336,1,1
GSE176078_HER2_CID3586_ACTATCTGTCTAAAGA,CID3586,16706,2903,CID3586_ACTATCTGTCTAAAGA,4.579193,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,4.579193,1,1
GSE176078_HER2_CID3586_ATTACTCAGACTTTCG,CID3586,9537,2520,CID3586_ATTACTCAGACTTTCG,3.827199,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,3.827199,1,1
GSE176078_HER2_CID3586_CACTCCAGTTCGCTAA,CID3586,9162,2323,CID3586_CACTCCAGTTCGCTAA,2.619515,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,2.619515,1,1


In [6]:
table(BRE$sample_type)
table(BRE$cancer_type)
table(BRE$patient_id)
table(BRE$sample_id)


tumour 
  9374 


  ER Breast Cancer HER2 Breast Cancer               TNBC 
              1691               1341               6342 


 CID3586  CID3838  CID3921  CID3941  CID3946  CID3948  CID3963  CID4040 
     143      445      376       36      157      116      476       47 
 CID4066  CID4067 CID4290A  CID4398 CID44041  CID4461  CID4463  CID4465 
     213      263      339      198      103       44      101      177 
 CID4471  CID4495 CID44971 CID44991  CID4513  CID4515 CID45171  CID4523 
     250      889      608      199     2845      529      164      359 
CID4530N  CID4535 
      87      210 


   GSE176078_ER_breast_CID3941    GSE176078_ER_breast_CID3948 
                            36                            116 
   GSE176078_ER_breast_CID4040    GSE176078_ER_breast_CID4067 
                            47                            263 
  GSE176078_ER_breast_CID4290A    GSE176078_ER_breast_CID4398 
                           339                            198 
   GSE176078_ER_breast_CID4461    GSE176078_ER_breast_CID4463 
                            44                            101 
   GSE176078_ER_breast_CID4471   GSE176078_ER_breast_CID4530N 
                           250                             87 
   GSE176078_ER_breast_CID4535  GSE176078_HER2_breast_CID3586 
                           210                            143 
 GSE176078_HER2_breast_CID3838  GSE176078_HER2_breast_CID3921 
                           445                            376 
 GSE176078_HER2_breast_CID4066 GSE176078_HER2_breast_CID45171 
                           213                        

In [9]:
#set site metadata
BRE@meta.data$site <- "breast"

#set sample_type_major
BRE@meta.data$sample_type_major <- "primary tumour"

In [11]:
#set cancer_subtype metadata
BRE@meta.data$cancer_subtype <- BRE@meta.data$cancer_type

In [13]:
#set integration_id metadata
BRE@meta.data$integration_id <- BRE@meta.data$sample_id

In [14]:
BRE
BRE@project.name
head(BRE@meta.data)

An object of class Seurat 
29733 features across 9374 samples within 1 assay 
Active assay: RNA (29733 features, 2000 variable features)
 53 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, counts.19, counts.20, counts.21, counts.22, counts.23, counts.24, counts.25, counts.26, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, data.19, data.20, data.21, data.22, data.23, data.24, data.25, data.26, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,X,percent.mito,subtype,celltype_subset,celltype_minor,celltype_major,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
GSE176078_HER2_CID3586_AACCATGCAGGTCGTC,CID3586,6925,1897,CID3586_AACCATGCAGGTCGTC,2.194946,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,2.194946,1,1,breast,primary tumour,HER2 Breast Cancer,GSE176078_HER2_breast_CID3586
GSE176078_HER2_CID3586_AACTTTCGTGACCAAG,CID3586,8552,2318,CID3586_AACTTTCGTGACCAAG,2.958372,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,2.958372,1,1,breast,primary tumour,HER2 Breast Cancer,GSE176078_HER2_breast_CID3586
GSE176078_HER2_CID3586_AAGGTTCAGTCCTCCT,CID3586,9355,2382,CID3586_AAGGTTCAGTCCTCCT,2.501336,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,2.501336,1,1,breast,primary tumour,HER2 Breast Cancer,GSE176078_HER2_breast_CID3586
GSE176078_HER2_CID3586_ACTATCTGTCTAAAGA,CID3586,16706,2903,CID3586_ACTATCTGTCTAAAGA,4.579193,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,4.579193,1,1,breast,primary tumour,HER2 Breast Cancer,GSE176078_HER2_breast_CID3586
GSE176078_HER2_CID3586_ATTACTCAGACTTTCG,CID3586,9537,2520,CID3586_ATTACTCAGACTTTCG,3.827199,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,3.827199,1,1,breast,primary tumour,HER2 Breast Cancer,GSE176078_HER2_breast_CID3586
GSE176078_HER2_CID3586_CACTCCAGTTCGCTAA,CID3586,9162,2323,CID3586_CACTCCAGTTCGCTAA,2.619515,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,2.619515,1,1,breast,primary tumour,HER2 Breast Cancer,GSE176078_HER2_breast_CID3586


In [17]:
#exclude any samples with <100 cells
table(BRE$integration_id)
#exclude ER_breast_CID3941, ER_breast_CID4040, ER_breast_CID4461, ER_breast_CID4530N
BRE <- subset(BRE, !(subset = integration_id %in% c("GSE176078_ER_breast_CID3941","GSE176078_ER_breast_CID4040","GSE176078_ER_breast_CID4461","GSE176078_ER_breast_CID4530N")))
table(BRE$integration_id)


   GSE176078_ER_breast_CID3941    GSE176078_ER_breast_CID3948 
                            36                            116 
   GSE176078_ER_breast_CID4040    GSE176078_ER_breast_CID4067 
                            47                            263 
  GSE176078_ER_breast_CID4290A    GSE176078_ER_breast_CID4398 
                           339                            198 
   GSE176078_ER_breast_CID4461    GSE176078_ER_breast_CID4463 
                            44                            101 
   GSE176078_ER_breast_CID4471   GSE176078_ER_breast_CID4530N 
                           250                             87 
   GSE176078_ER_breast_CID4535  GSE176078_HER2_breast_CID3586 
                           210                            143 
 GSE176078_HER2_breast_CID3838  GSE176078_HER2_breast_CID3921 
                           445                            376 
 GSE176078_HER2_breast_CID4066 GSE176078_HER2_breast_CID45171 
                           213                        


   GSE176078_ER_breast_CID3948    GSE176078_ER_breast_CID4067 
                           116                            263 
  GSE176078_ER_breast_CID4290A    GSE176078_ER_breast_CID4398 
                           339                            198 
   GSE176078_ER_breast_CID4463    GSE176078_ER_breast_CID4471 
                           101                            250 
   GSE176078_ER_breast_CID4535  GSE176078_HER2_breast_CID3586 
                           210                            143 
 GSE176078_HER2_breast_CID3838  GSE176078_HER2_breast_CID3921 
                           445                            376 
 GSE176078_HER2_breast_CID4066 GSE176078_HER2_breast_CID45171 
                           213                            164 
        GSE176078_TNBC_CID3946         GSE176078_TNBC_CID3963 
                           157                            476 
       GSE176078_TNBC_CID44041         GSE176078_TNBC_CID4465 
                           103                        

In [18]:
#join layers and then split them by integration_id
Layers(BRE[["RNA"]])
#join layers
BRE[["RNA"]] <- JoinLayers(BRE[["RNA"]])
Layers(BRE[["RNA"]])
#split layers
BRE[["RNA"]] <- split(BRE[["RNA"]], f = BRE$integration_id)
Layers(BRE[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [20]:
#record number of cells
table(BRE$integration_id)
BRE


   GSE176078_ER_breast_CID3948    GSE176078_ER_breast_CID4067 
                           116                            263 
  GSE176078_ER_breast_CID4290A    GSE176078_ER_breast_CID4398 
                           339                            198 
   GSE176078_ER_breast_CID4463    GSE176078_ER_breast_CID4471 
                           101                            250 
   GSE176078_ER_breast_CID4535  GSE176078_HER2_breast_CID3586 
                           210                            143 
 GSE176078_HER2_breast_CID3838  GSE176078_HER2_breast_CID3921 
                           445                            376 
 GSE176078_HER2_breast_CID4066 GSE176078_HER2_breast_CID45171 
                           213                            164 
        GSE176078_TNBC_CID3946         GSE176078_TNBC_CID3963 
                           157                            476 
       GSE176078_TNBC_CID44041         GSE176078_TNBC_CID4465 
                           103                        

An object of class Seurat 
29733 features across 9160 samples within 1 assay 
Active assay: RNA (29733 features, 2000 variable features)
 45 layers present: data.GSE176078_HER2_breast_CID3586, data.GSE176078_HER2_breast_CID3838, data.GSE176078_HER2_breast_CID3921, data.GSE176078_TNBC_CID3946, data.GSE176078_ER_breast_CID3948, data.GSE176078_TNBC_CID3963, data.GSE176078_HER2_breast_CID4066, data.GSE176078_ER_breast_CID4067, data.GSE176078_ER_breast_CID4290A, data.GSE176078_ER_breast_CID4398, data.GSE176078_ER_breast_CID4463, data.GSE176078_TNBC_CID4465, data.GSE176078_ER_breast_CID4471, data.GSE176078_TNBC_CID4495, data.GSE176078_TNBC_CID4513, data.GSE176078_TNBC_CID4515, data.GSE176078_TNBC_CID4523, data.GSE176078_ER_breast_CID4535, data.GSE176078_TNBC_CID44041, data.GSE176078_TNBC_CID44971, data.GSE176078_TNBC_CID44991, data.GSE176078_HER2_breast_CID45171, scale.data, counts.GSE176078_HER2_breast_CID3586, counts.GSE176078_HER2_breast_CID3838, counts.GSE176078_HER2_breast_CID3921, coun

In [21]:
#re-export seurat object ready for integration
saveRDS(BRE, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE176078_myeloid_int.RDS")

In [22]:
#remove all objects in R
rm(list = ls())

## GSE195861

In [27]:
BRE <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE195861_myeloid.RDS")

In [28]:
BRE
BRE@project.name
head(BRE@meta.data)

An object of class Seurat 
33538 features across 15286 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 41 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, counts.19, counts.20, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, data.19, data.20, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE195861_Healthy_AAACGAACACTGGACC-1,GSE195861,736,427,Healthy_breast,Healthy,Norm1,GSE195861_Healthy,1.766304,4,4
GSE195861_Healthy_AAATGGAGTCCAGGTC-1,GSE195861,1038,585,Healthy_breast,Healthy,Norm1,GSE195861_Healthy,10.693642,1,1
GSE195861_Healthy_AACAAAGAGTCATCCA-1,GSE195861,18003,4395,Healthy_breast,Healthy,Norm1,GSE195861_Healthy,10.381603,1,1
GSE195861_Healthy_AACCATGCACGACAAG-1,GSE195861,1212,825,Healthy_breast,Healthy,Norm1,GSE195861_Healthy,1.732673,1,1
GSE195861_Healthy_AACGTCAGTAGACAAT-1,GSE195861,680,462,Healthy_breast,Healthy,Norm1,GSE195861_Healthy,3.970588,1,1
GSE195861_Healthy_AAGCGTTCAGATAAAC-1,GSE195861,2446,1053,Healthy_breast,Healthy,Norm1,GSE195861_Healthy,2.085037,1,1


In [29]:
table(BRE$sample_type)
table(BRE$cancer_type)
table(BRE$patient_id)
table(BRE$sample_id)


Healthy_breast         LN_met         tumour 
           158           1278          13850 


   DCIS Healthy     IDC 
  12436     158    2692 


Norm1   pt1  pt10  pt11  pt12  pt13   pt2   pt3   pt4   pt5   pt6   pt7   pt8 
  158    62   274   131   254   246 10629   432   163   705   172   273   596 
  pt9 
 1191 


GSE195861_DCIS_tumour_pt1 GSE195861_DCIS_tumour_pt2 GSE195861_DCIS_tumour_pt3 
                       62                     10629                       432 
GSE195861_DCIS_tumour_pt4 GSE195861_DCIS_tumour_pt5 GSE195861_DCIS_tumour_pt6 
                      163                       705                       172 
GSE195861_DCIS_tumour_pt7         GSE195861_Healthy GSE195861_IDC_LN-met_pt10 
                      273                       158                       197 
GSE195861_IDC_LN-met_pt11 GSE195861_IDC_LN-met_pt12 GSE195861_IDC_LN-met_pt13 
                       97                       145                       155 
 GSE195861_IDC_LN-met_pt8  GSE195861_IDC_LN-met_pt9 GSE195861_IDC_tumour_pt10 
                      240                       444                        77 
GSE195861_IDC_tumour_pt11 GSE195861_IDC_tumour_pt12 GSE195861_IDC_tumour_pt13 
                       34                       109                        91 
 GSE195861_IDC_tumour_pt8  GSE195861_IDC_tumour_pt9

In [32]:
#set site metadata

#split by sample_type
BRE_H <- subset(BRE, subset = sample_type %in% c("Healthy_breast"))
BRE_LN <- subset(BRE, subset = sample_type %in% c("LN_met"))
BRE_T <- subset(BRE, subset = sample_type %in% c("tumour"))

BRE_H@meta.data$site <- "breast"
BRE_LN@meta.data$site <- "lymph node"
BRE_T@meta.data$site <- "breast"

BRE_H@meta.data$sample_type_major <- "healthy"
BRE_LN@meta.data$sample_type_major <- "metastatic tumour"
BRE_T@meta.data$sample_type_major <- "primary tumour"

#merge back together 
BRE <- merge(BRE_H, y = c(BRE_LN, BRE_T), project = "GSE195861")

In [33]:
#set cancer_subtype metadata

#split by cancer_type
BRE_D <- subset(BRE, subset = cancer_type %in% c("DCIS"))
BRE_H <- subset(BRE, subset = cancer_type %in% c("Healthy"))
BRE_I <- subset(BRE, subset = cancer_type %in% c("IDC"))

BRE_D@meta.data$cancer_subtype <- "Breast DCIS"
BRE_H@meta.data$cancer_subtype <- "NA"
BRE_I@meta.data$cancer_subtype <- "Breast IDC"

#merge back together 
BRE <- merge(BRE_D, y = c(BRE_H, BRE_I), project = "GSE195861")

In [34]:
#set integration_id metadata
BRE@meta.data$integration_id <- BRE@meta.data$sample_id

In [35]:
BRE
BRE@project.name
head(BRE@meta.data)

An object of class Seurat 
33538 features across 15286 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 44 layers present: counts.2.3.3.1, counts.3.3.3.1, counts.4.3.3.1, counts.5.3.3.1, counts.6.3.3.1, counts.7.3.3.1, counts.8.3.3.1, data.2.3.3.1, data.3.3.3.1, data.4.3.3.1, data.5.3.3.1, data.6.3.3.1, data.7.3.3.1, data.8.3.3.1, scale.data.3.3.1, counts.1.1.1.2, data.1.1.1.2, scale.data.1.1.2, counts.15.2.2.3, counts.16.2.2.3, counts.17.2.2.3, counts.18.2.2.3, counts.19.2.2.3, counts.20.2.2.3, data.15.2.2.3, data.16.2.2.3, data.17.2.2.3, data.18.2.2.3, data.19.2.2.3, data.20.2.2.3, scale.data.2.2.3, counts.10.3.3.3, counts.11.3.3.3, counts.12.3.3.3, counts.13.3.3.3, counts.14.3.3.3, counts.9.3.3.3, data.9.3.3.3, data.10.3.3.3, data.11.3.3.3, data.12.3.3.3, data.13.3.3.3, data.14.3.3.3, scale.data.3.3.3

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE195861_DCIS_tumour_pt1_AAAGGATCAAATCAGA-1,GSE195861,95030,7236,tumour,DCIS,pt1,GSE195861_DCIS_tumour_pt1,0.689256,4,4,breast,primary tumour,Breast DCIS,GSE195861_DCIS_tumour_pt1
GSE195861_DCIS_tumour_pt1_AAATGGACAATTGCTG-1,GSE195861,116697,8039,tumour,DCIS,pt1,GSE195861_DCIS_tumour_pt1,4.363437,4,4,breast,primary tumour,Breast DCIS,GSE195861_DCIS_tumour_pt1
GSE195861_DCIS_tumour_pt1_AACCTTTCAGCAATTC-1,GSE195861,65271,7326,tumour,DCIS,pt1,GSE195861_DCIS_tumour_pt1,5.464908,4,4,breast,primary tumour,Breast DCIS,GSE195861_DCIS_tumour_pt1
GSE195861_DCIS_tumour_pt1_ACAAAGACATCTTCGC-1,GSE195861,26262,4457,tumour,DCIS,pt1,GSE195861_DCIS_tumour_pt1,8.27812,1,1,breast,primary tumour,Breast DCIS,GSE195861_DCIS_tumour_pt1
GSE195861_DCIS_tumour_pt1_ACAAAGAGTGCCTATA-1,GSE195861,862,417,tumour,DCIS,pt1,GSE195861_DCIS_tumour_pt1,6.38051,4,4,breast,primary tumour,Breast DCIS,GSE195861_DCIS_tumour_pt1
GSE195861_DCIS_tumour_pt1_ACAAGCTGTGTCATGT-1,GSE195861,4024,1333,tumour,DCIS,pt1,GSE195861_DCIS_tumour_pt1,4.324056,4,4,breast,primary tumour,Breast DCIS,GSE195861_DCIS_tumour_pt1


In [37]:
#exclude any samples with <100 cells
table(BRE$integration_id)
#exclude GSE195861_DCIS_tumour_pt1, GSE195861_IDC_LN-met_pt11, GSE195861_IDC_tumour_pt10, GSE195861_IDC_tumour_pt11, GSE195861_IDC_tumour_pt13
BRE <- subset(BRE, !(subset = integration_id %in% c("GSE195861_DCIS_tumour_pt1","GSE195861_IDC_LN-met_pt11","GSE195861_IDC_tumour_pt10","GSE195861_IDC_tumour_pt11","GSE195861_IDC_tumour_pt13")))
table(BRE$integration_id)


GSE195861_DCIS_tumour_pt1 GSE195861_DCIS_tumour_pt2 GSE195861_DCIS_tumour_pt3 
                       62                     10629                       432 
GSE195861_DCIS_tumour_pt4 GSE195861_DCIS_tumour_pt5 GSE195861_DCIS_tumour_pt6 
                      163                       705                       172 
GSE195861_DCIS_tumour_pt7         GSE195861_Healthy GSE195861_IDC_LN-met_pt10 
                      273                       158                       197 
GSE195861_IDC_LN-met_pt11 GSE195861_IDC_LN-met_pt12 GSE195861_IDC_LN-met_pt13 
                       97                       145                       155 
 GSE195861_IDC_LN-met_pt8  GSE195861_IDC_LN-met_pt9 GSE195861_IDC_tumour_pt10 
                      240                       444                        77 
GSE195861_IDC_tumour_pt11 GSE195861_IDC_tumour_pt12 GSE195861_IDC_tumour_pt13 
                       34                       109                        91 
 GSE195861_IDC_tumour_pt8  GSE195861_IDC_tumour_pt9


GSE195861_DCIS_tumour_pt2 GSE195861_DCIS_tumour_pt3 GSE195861_DCIS_tumour_pt4 
                    10629                       432                       163 
GSE195861_DCIS_tumour_pt5 GSE195861_DCIS_tumour_pt6 GSE195861_DCIS_tumour_pt7 
                      705                       172                       273 
        GSE195861_Healthy GSE195861_IDC_LN-met_pt10 GSE195861_IDC_LN-met_pt12 
                      158                       197                       145 
GSE195861_IDC_LN-met_pt13  GSE195861_IDC_LN-met_pt8  GSE195861_IDC_LN-met_pt9 
                      155                       240                       444 
GSE195861_IDC_tumour_pt12  GSE195861_IDC_tumour_pt8  GSE195861_IDC_tumour_pt9 
                      109                       356                       747 

In [38]:
#join layers and then split them by integration_id
Layers(BRE[["RNA"]])
#join layers
BRE[["RNA"]] <- JoinLayers(BRE[["RNA"]])
Layers(BRE[["RNA"]])
#split layers
BRE[["RNA"]] <- split(BRE[["RNA"]], f = BRE$integration_id)
Layers(BRE[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [39]:
#record number of cells
table(BRE$integration_id)
BRE


GSE195861_DCIS_tumour_pt2 GSE195861_DCIS_tumour_pt3 GSE195861_DCIS_tumour_pt4 
                    10629                       432                       163 
GSE195861_DCIS_tumour_pt5 GSE195861_DCIS_tumour_pt6 GSE195861_DCIS_tumour_pt7 
                      705                       172                       273 
        GSE195861_Healthy GSE195861_IDC_LN-met_pt10 GSE195861_IDC_LN-met_pt12 
                      158                       197                       145 
GSE195861_IDC_LN-met_pt13  GSE195861_IDC_LN-met_pt8  GSE195861_IDC_LN-met_pt9 
                      155                       240                       444 
GSE195861_IDC_tumour_pt12  GSE195861_IDC_tumour_pt8  GSE195861_IDC_tumour_pt9 
                      109                       356                       747 

An object of class Seurat 
33538 features across 14925 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 31 layers present: counts.GSE195861_DCIS_tumour_pt2, counts.GSE195861_DCIS_tumour_pt3, counts.GSE195861_DCIS_tumour_pt4, counts.GSE195861_DCIS_tumour_pt5, counts.GSE195861_DCIS_tumour_pt6, counts.GSE195861_DCIS_tumour_pt7, counts.GSE195861_Healthy, counts.GSE195861_IDC_LN-met_pt8, counts.GSE195861_IDC_LN-met_pt9, counts.GSE195861_IDC_LN-met_pt10, counts.GSE195861_IDC_LN-met_pt12, counts.GSE195861_IDC_LN-met_pt13, counts.GSE195861_IDC_tumour_pt8, counts.GSE195861_IDC_tumour_pt9, counts.GSE195861_IDC_tumour_pt12, scale.data, data.GSE195861_DCIS_tumour_pt2, data.GSE195861_DCIS_tumour_pt3, data.GSE195861_DCIS_tumour_pt4, data.GSE195861_DCIS_tumour_pt5, data.GSE195861_DCIS_tumour_pt6, data.GSE195861_DCIS_tumour_pt7, data.GSE195861_Healthy, data.GSE195861_IDC_LN-met_pt8, data.GSE195861_IDC_LN-met_pt9, data.GSE195861_IDC_LN-met_pt10, data.GSE195861_IDC_LN-

In [40]:
#re-export seurat object ready for integration
saveRDS(BRE, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE195861_myeloid_int.RDS")

In [41]:
#remove all objects in R
rm(list = ls())

## GSE199515 

In [3]:
BRE <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE199515_myeloid.RDS")

In [4]:
BRE
BRE@project.name
head(BRE@meta.data)

An object of class Seurat 
33694 features across 499 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 7 layers present: counts.1, counts.2, counts.3, data.1, data.2, data.3, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE199515_TNBC1_AAACCTGCACGGATAG-1,GSE199515,3229,1175,tumour,TNBC,TNBC1,GSE199515_TNBC1,1.982038,6,6
GSE199515_TNBC1_AAACGGGTCGTAGGTT-1,GSE199515,3356,1058,tumour,TNBC,TNBC1,GSE199515_TNBC1,2.532777,6,6
GSE199515_TNBC1_AAAGATGTCATTGCGA-1,GSE199515,4532,1215,tumour,TNBC,TNBC1,GSE199515_TNBC1,2.250662,6,6
GSE199515_TNBC1_AACTCCCAGTAATCCC-1,GSE199515,4175,999,tumour,TNBC,TNBC1,GSE199515_TNBC1,3.664671,6,6
GSE199515_TNBC1_AACTCCCTCTCTGTCG-1,GSE199515,2921,908,tumour,TNBC,TNBC1,GSE199515_TNBC1,11.982198,6,6
GSE199515_TNBC1_AAGGAGCGTTCCGTCT-1,GSE199515,3657,1308,tumour,TNBC,TNBC1,GSE199515_TNBC1,3.773585,6,6


In [5]:
table(BRE$sample_type)
table(BRE$cancer_type)
table(BRE$patient_id)
table(BRE$sample_id)


tumour 
   499 


TNBC 
 499 


TNBC1 TNBC2 TNBC3 
  301    64   134 


GSE199515_TNBC1 GSE199515_TNBC2 GSE199515_TNBC3 
            301              64             134 

In [6]:
#set site metadata
BRE@meta.data$site <- "breast"
#set sample_type_major metadata
BRE@meta.data$sample_type_major <- "primary tumour"
#set cancer_subtype metadata
BRE@meta.data$cancer_subtype <- "TNBC"
#set integration_id metadata
BRE@meta.data$integration_id <- BRE@meta.data$sample_id

In [7]:
BRE
BRE@project.name
head(BRE@meta.data)

An object of class Seurat 
33694 features across 499 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 7 layers present: counts.1, counts.2, counts.3, data.1, data.2, data.3, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
GSE199515_TNBC1_AAACCTGCACGGATAG-1,GSE199515,3229,1175,tumour,TNBC,TNBC1,GSE199515_TNBC1,1.982038,6,6,breast,primary tumour,TNBC,GSE199515_TNBC1
GSE199515_TNBC1_AAACGGGTCGTAGGTT-1,GSE199515,3356,1058,tumour,TNBC,TNBC1,GSE199515_TNBC1,2.532777,6,6,breast,primary tumour,TNBC,GSE199515_TNBC1
GSE199515_TNBC1_AAAGATGTCATTGCGA-1,GSE199515,4532,1215,tumour,TNBC,TNBC1,GSE199515_TNBC1,2.250662,6,6,breast,primary tumour,TNBC,GSE199515_TNBC1
GSE199515_TNBC1_AACTCCCAGTAATCCC-1,GSE199515,4175,999,tumour,TNBC,TNBC1,GSE199515_TNBC1,3.664671,6,6,breast,primary tumour,TNBC,GSE199515_TNBC1
GSE199515_TNBC1_AACTCCCTCTCTGTCG-1,GSE199515,2921,908,tumour,TNBC,TNBC1,GSE199515_TNBC1,11.982198,6,6,breast,primary tumour,TNBC,GSE199515_TNBC1
GSE199515_TNBC1_AAGGAGCGTTCCGTCT-1,GSE199515,3657,1308,tumour,TNBC,TNBC1,GSE199515_TNBC1,3.773585,6,6,breast,primary tumour,TNBC,GSE199515_TNBC1


In [9]:
#exclude any samples with <100 cells
table(BRE$integration_id)
#exclude TNBC2
BRE <- subset(BRE, !(subset = integration_id %in% c("GSE199515_TNBC2")))
table(BRE$integration_id)


GSE199515_TNBC1 GSE199515_TNBC2 GSE199515_TNBC3 
            301              64             134 


GSE199515_TNBC1 GSE199515_TNBC3 
            301             134 

In [10]:
#join layers and then split them by integration_id
Layers(BRE[["RNA"]])
#join layers
BRE[["RNA"]] <- JoinLayers(BRE[["RNA"]])
Layers(BRE[["RNA"]])
#split layers
BRE[["RNA"]] <- split(BRE[["RNA"]], f = BRE$integration_id)
Layers(BRE[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [11]:
#record number of cells
table(BRE$integration_id)
BRE


GSE199515_TNBC1 GSE199515_TNBC3 
            301             134 

An object of class Seurat 
33694 features across 435 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 5 layers present: data.GSE199515_TNBC1, data.GSE199515_TNBC3, scale.data, counts.GSE199515_TNBC1, counts.GSE199515_TNBC3
 2 dimensional reductions calculated: pca, umap

In [12]:
#re-export seurat object ready for integration
saveRDS(BRE, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE199515_myeloid_int.RDS")

In [13]:
#remove all objects in R
rm(list = ls())

## GSE225600

In [14]:
BRE <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE225600_myeloid.RDS")

In [15]:
BRE
BRE@project.name
head(BRE@meta.data)

An object of class Seurat 
36601 features across 2135 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 17 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,cancer_type,sample_meta,sample_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE225600_LN_mets_pt2_AAAGGATCAATGAAAC-L2,GSE225600,1343,616,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,5.807893,8,8
GSE225600_LN_mets_pt2_AAAGTGATCAATCGGT-L2,GSE225600,579,439,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,2.417962,8,8
GSE225600_LN_mets_pt2_AACAACCAGCAAGCCA-L2,GSE225600,621,443,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,4.186795,8,8
GSE225600_LN_mets_pt2_AACACACTCTAATTCC-L2,GSE225600,1096,680,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,4.379562,8,8
GSE225600_LN_mets_pt2_AACCATGCAACCACGC-L2,GSE225600,407,294,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,13.267813,8,8
GSE225600_LN_mets_pt2_AACCCAAAGTAGTCAA-L2,GSE225600,606,279,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,42.574257,8,8


In [16]:
table(BRE$sample_type)
table(BRE$cancer_type)
table(BRE$patient_id)
table(BRE$sample_id)


LN mets  tumour 
    794    1341 


breast cancer 
         2135 


pt2 pt3 pt6 pt7 
501 180 601 853 


   GSE225600_BC_LN_mets_pt2    GSE225600_BC_LN_mets_pt3 
                        328                          94 
   GSE225600_BC_LN_mets_pt6    GSE225600_BC_LN_mets_pt7 
                         67                         305 
GSE225600_breast_tumour_pt2 GSE225600_breast_tumour_pt3 
                        173                          86 
GSE225600_breast_tumour_pt6 GSE225600_breast_tumour_pt7 
                        534                         548 

In [17]:
#set site and sample_type_major metadata

#split by sample_type
BRE_LN <- subset(BRE, subset = sample_type %in% c("LN mets"))
BRE_T <- subset(BRE, subset = sample_type %in% c("tumour"))

BRE_LN@meta.data$site <- "lymph node"
BRE_T@meta.data$site <- "breast"

BRE_LN@meta.data$sample_type_major <- "metastatic tumour"
BRE_T@meta.data$sample_type_major <- "primary tumour"

#merge back together 
BRE <- merge(BRE_LN, y = c(BRE_T), project = "GSE225600")

In [18]:
#set cancer_subtype metadata
BRE@meta.data$cancer_subtype <- "Breast IDC"

In [19]:
#set integration_id metadata
BRE@meta.data$integration_id <- BRE@meta.data$sample_id

In [20]:
BRE
BRE@project.name
head(BRE@meta.data)

An object of class Seurat 
36601 features across 2135 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 18 layers present: counts.1.1, counts.2.1, counts.3.1, counts.4.1, data.1.1, data.2.1, data.3.1, data.4.1, scale.data.1, counts.5.2, counts.6.2, counts.7.2, counts.8.2, data.5.2, data.6.2, data.7.2, data.8.2, scale.data.2

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,cancer_type,sample_meta,sample_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE225600_LN_mets_pt2_AAAGGATCAATGAAAC-L2,GSE225600,1343,616,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,5.807893,8,8,lymph node,metastatic tumour,Breast IDC,GSE225600_BC_LN_mets_pt2
GSE225600_LN_mets_pt2_AAAGTGATCAATCGGT-L2,GSE225600,579,439,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,2.417962,8,8,lymph node,metastatic tumour,Breast IDC,GSE225600_BC_LN_mets_pt2
GSE225600_LN_mets_pt2_AACAACCAGCAAGCCA-L2,GSE225600,621,443,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,4.186795,8,8,lymph node,metastatic tumour,Breast IDC,GSE225600_BC_LN_mets_pt2
GSE225600_LN_mets_pt2_AACACACTCTAATTCC-L2,GSE225600,1096,680,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,4.379562,8,8,lymph node,metastatic tumour,Breast IDC,GSE225600_BC_LN_mets_pt2
GSE225600_LN_mets_pt2_AACCATGCAACCACGC-L2,GSE225600,407,294,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,13.267813,8,8,lymph node,metastatic tumour,Breast IDC,GSE225600_BC_LN_mets_pt2
GSE225600_LN_mets_pt2_AACCCAAAGTAGTCAA-L2,GSE225600,606,279,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,42.574257,8,8,lymph node,metastatic tumour,Breast IDC,GSE225600_BC_LN_mets_pt2


In [22]:
#exclude any samples with <100 cells
table(BRE$integration_id)
#exclude LN_pt3, LN_pt6, T_pt3
BRE <- subset(BRE, !(subset = integration_id %in% c("GSE225600_BC_LN_mets_pt3","GSE225600_BC_LN_mets_pt6","GSE225600_breast_tumour_pt3")))
table(BRE$integration_id)


   GSE225600_BC_LN_mets_pt2    GSE225600_BC_LN_mets_pt3 
                        328                          94 
   GSE225600_BC_LN_mets_pt6    GSE225600_BC_LN_mets_pt7 
                         67                         305 
GSE225600_breast_tumour_pt2 GSE225600_breast_tumour_pt3 
                        173                          86 
GSE225600_breast_tumour_pt6 GSE225600_breast_tumour_pt7 
                        534                         548 


   GSE225600_BC_LN_mets_pt2    GSE225600_BC_LN_mets_pt7 
                        328                         305 
GSE225600_breast_tumour_pt2 GSE225600_breast_tumour_pt6 
                        173                         534 
GSE225600_breast_tumour_pt7 
                        548 

In [23]:
#join layers and then split them by integration_id
Layers(BRE[["RNA"]])
#join layers
BRE[["RNA"]] <- JoinLayers(BRE[["RNA"]])
Layers(BRE[["RNA"]])
#split layers
BRE[["RNA"]] <- split(BRE[["RNA"]], f = BRE$integration_id)
Layers(BRE[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [25]:
#record number of cells
table(BRE$integration_id)
BRE@project.name
BRE


   GSE225600_BC_LN_mets_pt2    GSE225600_BC_LN_mets_pt7 
                        328                         305 
GSE225600_breast_tumour_pt2 GSE225600_breast_tumour_pt6 
                        173                         534 
GSE225600_breast_tumour_pt7 
                        548 

An object of class Seurat 
36601 features across 1888 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 11 layers present: counts.GSE225600_BC_LN_mets_pt2, counts.GSE225600_BC_LN_mets_pt7, counts.GSE225600_breast_tumour_pt2, counts.GSE225600_breast_tumour_pt6, counts.GSE225600_breast_tumour_pt7, scale.data, data.GSE225600_BC_LN_mets_pt2, data.GSE225600_BC_LN_mets_pt7, data.GSE225600_breast_tumour_pt2, data.GSE225600_breast_tumour_pt6, data.GSE225600_breast_tumour_pt7

In [26]:
#re-export seurat object ready for integration
saveRDS(BRE, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE225600_myeloid_int.RDS")

In [27]:
#remove all objects in R
rm(list = ls())

## GSE162498

In [28]:
LUNG <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE162498_myeloid.RDS")

In [29]:
LUNG
LUNG@project.name
head(LUNG@meta.data)

An object of class Seurat 
45068 features across 28781 samples within 1 assay 
Active assay: RNA (45068 features, 2000 variable features)
 27 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE162498_NSCLC_P34_AAACGGGCATAGTAAG-1,GSE162498,980,450,tumour,NSCLC,P34,GSE162498_NSCLC_P34,8.265306,4,4
GSE162498_NSCLC_P34_AAAGTAGCAAACTGCT-1,GSE162498,11484,2224,tumour,NSCLC,P34,GSE162498_NSCLC_P34,3.692093,8,8
GSE162498_NSCLC_P34_AAATGCCCAGTCGATT-1,GSE162498,524,200,tumour,NSCLC,P34,GSE162498_NSCLC_P34,17.557252,4,4
GSE162498_NSCLC_P34_AAATGCCGTCACACGC-1,GSE162498,3104,300,tumour,NSCLC,P34,GSE162498_NSCLC_P34,83.82732,9,9
GSE162498_NSCLC_P34_AACACGTAGATACACA-1,GSE162498,604,311,tumour,NSCLC,P34,GSE162498_NSCLC_P34,17.218543,4,4
GSE162498_NSCLC_P34_AACACGTCAGGGTACA-1,GSE162498,15436,3018,tumour,NSCLC,P34,GSE162498_NSCLC_P34,5.93418,8,8


In [30]:
table(LUNG$sample_type)
table(LUNG$cancer_type)
table(LUNG$patient_id)
table(LUNG$sample_id)


adjacent healthy           tumour 
            2024            26757 


healthy   NSCLC 
   2024   26757 


 P34  P35  P42  P43  P46  P47  P55  P57  P58  P60  P61 
 482  685 4404 1868  999 7144 1579 5809  680 2654 2477 


GSE162498_healthy_P60 GSE162498_healthy_P61   GSE162498_NSCLC_P34 
                  809                  1215                   482 
  GSE162498_NSCLC_P35   GSE162498_NSCLC_P42   GSE162498_NSCLC_P43 
                  685                  4404                  1868 
  GSE162498_NSCLC_P46   GSE162498_NSCLC_P47   GSE162498_NSCLC_P55 
                  999                  7144                  1579 
  GSE162498_NSCLC_P57   GSE162498_NSCLC_P58   GSE162498_NSCLC_P60 
                 5809                   680                  1845 
  GSE162498_NSCLC_P61 
                 1262 

In [31]:
#set site metadata
LUNG@meta.data$site <- "lung"

#set sample_type_major metadata

#split by cancer_type
LUNG_H <- subset(LUNG, subset = cancer_type %in% c("healthy"))
LUNG_T <- subset(LUNG, subset = cancer_type %in% c("NSCLC"))

LUNG_H@meta.data$sample_type_major <- "healthy"
LUNG_T@meta.data$sample_type_major <- "primary tumour"

#set cancer_subtype metadata
LUNG_H@meta.data$cancer_subtype <- "NA"
LUNG_T@meta.data$cancer_subtype <- "NSCLC"

#merge back together 
LUNG <- merge(LUNG_H, y = c(LUNG_T), project = "GSE162498")

In [32]:
#set integration_id metadata
LUNG@meta.data$integration_id <- LUNG@meta.data$sample_id

In [33]:
LUNG
LUNG@project.name
head(LUNG@meta.data)

An object of class Seurat 
45068 features across 28781 samples within 1 assay 
Active assay: RNA (45068 features, 2000 variable features)
 28 layers present: counts.10.1, counts.12.1, data.10.1, data.12.1, scale.data.1, counts.1.2, counts.2.2, counts.3.2, counts.4.2, counts.5.2, counts.6.2, counts.7.2, counts.8.2, counts.9.2, counts.11.2, counts.13.2, data.1.2, data.2.2, data.3.2, data.4.2, data.5.2, data.6.2, data.7.2, data.8.2, data.9.2, data.11.2, data.13.2, scale.data.2

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE162498_Healthy_P60_AAACCTGGTTCCACTC-1,GSE162498,1608,763,adjacent healthy,healthy,P60,GSE162498_healthy_P60,9.328358,4,4,lung,healthy,,GSE162498_healthy_P60
GSE162498_Healthy_P60_AAACCTGTCTCCGGTT-1,GSE162498,1054,502,adjacent healthy,healthy,P60,GSE162498_healthy_P60,21.157495,4,4,lung,healthy,,GSE162498_healthy_P60
GSE162498_Healthy_P60_AAACGGGCAATAGAGT-1,GSE162498,1262,718,adjacent healthy,healthy,P60,GSE162498_healthy_P60,13.946117,4,4,lung,healthy,,GSE162498_healthy_P60
GSE162498_Healthy_P60_AAACGGGGTTGTGGCC-1,GSE162498,1373,790,adjacent healthy,healthy,P60,GSE162498_healthy_P60,13.692644,4,4,lung,healthy,,GSE162498_healthy_P60
GSE162498_Healthy_P60_AAAGATGCAGTCAGAG-1,GSE162498,853,520,adjacent healthy,healthy,P60,GSE162498_healthy_P60,13.130129,4,4,lung,healthy,,GSE162498_healthy_P60
GSE162498_Healthy_P60_AAAGATGGTCAAAGCG-1,GSE162498,583,358,adjacent healthy,healthy,P60,GSE162498_healthy_P60,1.02916,6,6,lung,healthy,,GSE162498_healthy_P60


In [35]:
#exclude any samples with <100 cells
table(LUNG$integration_id)
#none to exclude 
#BRE <- subset(BRE, !(subset = integration_id %in% c("")))
#table(BRE$integration_id)


GSE162498_healthy_P60 GSE162498_healthy_P61   GSE162498_NSCLC_P34 
                  809                  1215                   482 
  GSE162498_NSCLC_P35   GSE162498_NSCLC_P42   GSE162498_NSCLC_P43 
                  685                  4404                  1868 
  GSE162498_NSCLC_P46   GSE162498_NSCLC_P47   GSE162498_NSCLC_P55 
                  999                  7144                  1579 
  GSE162498_NSCLC_P57   GSE162498_NSCLC_P58   GSE162498_NSCLC_P60 
                 5809                   680                  1845 
  GSE162498_NSCLC_P61 
                 1262 

In [36]:
#join layers and then split them by integration_id
Layers(LUNG[["RNA"]])
#join layers
LUNG[["RNA"]] <- JoinLayers(LUNG[["RNA"]])
Layers(LUNG[["RNA"]])
#split layers
LUNG[["RNA"]] <- split(LUNG[["RNA"]], f = LUNG$integration_id)
Layers(LUNG[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [38]:
#record number of cells
table(LUNG$integration_id)
LUNG
LUNG@project.name


GSE162498_healthy_P60 GSE162498_healthy_P61   GSE162498_NSCLC_P34 
                  809                  1215                   482 
  GSE162498_NSCLC_P35   GSE162498_NSCLC_P42   GSE162498_NSCLC_P43 
                  685                  4404                  1868 
  GSE162498_NSCLC_P46   GSE162498_NSCLC_P47   GSE162498_NSCLC_P55 
                  999                  7144                  1579 
  GSE162498_NSCLC_P57   GSE162498_NSCLC_P58   GSE162498_NSCLC_P60 
                 5809                   680                  1845 
  GSE162498_NSCLC_P61 
                 1262 

An object of class Seurat 
45068 features across 28781 samples within 1 assay 
Active assay: RNA (45068 features, 2000 variable features)
 27 layers present: counts.GSE162498_healthy_P60, counts.GSE162498_healthy_P61, counts.GSE162498_NSCLC_P34, counts.GSE162498_NSCLC_P35, counts.GSE162498_NSCLC_P42, counts.GSE162498_NSCLC_P43, counts.GSE162498_NSCLC_P46, counts.GSE162498_NSCLC_P47, counts.GSE162498_NSCLC_P55, counts.GSE162498_NSCLC_P57, counts.GSE162498_NSCLC_P58, counts.GSE162498_NSCLC_P60, counts.GSE162498_NSCLC_P61, scale.data, data.GSE162498_healthy_P60, data.GSE162498_healthy_P61, data.GSE162498_NSCLC_P34, data.GSE162498_NSCLC_P35, data.GSE162498_NSCLC_P42, data.GSE162498_NSCLC_P43, data.GSE162498_NSCLC_P46, data.GSE162498_NSCLC_P47, data.GSE162498_NSCLC_P55, data.GSE162498_NSCLC_P57, data.GSE162498_NSCLC_P58, data.GSE162498_NSCLC_P60, data.GSE162498_NSCLC_P61

In [39]:
#re-export seurat object ready for integration
saveRDS(LUNG, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE162498_myeloid_int.RDS")

In [40]:
#remove all objects in R
rm(list = ls())

## GSE131907

* note: these were originally labelled LUAD but in paper they say that LUAD is the most common subtype of NSCLC, so to be consistent with other datasets will define cancer_subtype here as NSCLC

In [3]:
LUNG <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE131907_myeloid.RDS")

In [4]:
LUNG
LUNG@project.name
head(LUNG@meta.data)

An object of class Seurat 
29634 features across 36524 samples within 1 assay 
Active assay: RNA (29634 features, 2000 variable features)
 87 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, counts.19, counts.20, counts.21, counts.22, counts.23, counts.24, counts.25, counts.26, counts.27, counts.28, counts.29, counts.30, counts.31, counts.32, counts.33, counts.34, counts.35, counts.36, counts.37, counts.38, counts.39, counts.40, counts.41, counts.42, counts.43, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, data.19, data.20, data.21, data.22, data.23, data.24, data.25, data.26, data.27, data.28, data.29, data.30, data.31, data.32, data.33, data.34, data.35, data.36, data.37, data.38, data.39, data.40, data.41, data.42, data.43, 

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_meta,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE131907_LUAD_Tu_T0006_AAACCTGAGTTGCAGG_LUNG_T06,GSE131907,1695,711,LUNG_T06,tumour,LUAD,Pt_0006,GSE131907_LUAD_T0006,9.498525,1,1
GSE131907_LUAD_Tu_T0006_AAACCTGTCCAGAAGG_LUNG_T06,GSE131907,9826,2260,LUNG_T06,tumour,LUAD,Pt_0006,GSE131907_LUAD_T0006,2.869937,1,1
GSE131907_LUAD_Tu_T0006_AAAGATGTCTCATTCA_LUNG_T06,GSE131907,13178,3079,LUNG_T06,tumour,LUAD,Pt_0006,GSE131907_LUAD_T0006,4.932463,1,1
GSE131907_LUAD_Tu_T0006_AAAGCAAGTAATTGGA_LUNG_T06,GSE131907,6779,1826,LUNG_T06,tumour,LUAD,Pt_0006,GSE131907_LUAD_T0006,4.912229,1,1
GSE131907_LUAD_Tu_T0006_AAATGCCCATTACGAC_LUNG_T06,GSE131907,24381,4356,LUNG_T06,tumour,LUAD,Pt_0006,GSE131907_LUAD_T0006,4.335343,1,1
GSE131907_LUAD_Tu_T0006_AAATGCCTCGTGGGAA_LUNG_T06,GSE131907,9638,2226,LUNG_T06,tumour,LUAD,Pt_0006,GSE131907_LUAD_T0006,3.662586,1,1


In [5]:
table(LUNG$sample_type)
table(LUNG$cancer_type)
table(LUNG$patient_id)
table(LUNG$sample_id)


  brain mets Healthy Lung      LN mets       tumour 
        5405        16531         4953         9635 


Healthy    LUAD 
  16531   19993 


Pt_0001 Pt_0006 Pt_0008 Pt_0009 Pt_0018 Pt_0019 Pt_0020 Pt_0025 Pt_0028 Pt_0030 
   1311    1825    1581    2608    3021    2302    4368     581    1760    1890 
Pt_0031 Pt_0034 Pt_1006 Pt_1010 Pt_1011 Pt_1012 Pt_1013 Pt_1015 Pt_1019 Pt_1028 
   1737    2016     368     766     732    2062     138     310     322     100 
Pt_1049 Pt_1051 Pt_1058 Pt_3002 Pt_3003 Pt_3004 Pt_3006 Pt_3007 Pt_3012 Pt_3013 
    272     623     426     812     101     328     433     273     259     552 
Pt_3016 Pt_3017 Pt_3019 
    177     695    1775 


GSE131907_Healthy_N0001 GSE131907_Healthy_N0006 GSE131907_Healthy_N0008 
                   1311                    1272                    1324 
GSE131907_Healthy_N0009 GSE131907_Healthy_N0018 GSE131907_Healthy_N0019 
                   1144                    2050                    1144 
GSE131907_Healthy_N0020 GSE131907_Healthy_N0028 GSE131907_Healthy_N0030 
                   3489                     783                     955 
GSE131907_Healthy_N0031 GSE131907_Healthy_N0034    GSE131907_LUAD_B3002 
                   1337                    1722                     812 
   GSE131907_LUAD_B3003    GSE131907_LUAD_B3004    GSE131907_LUAD_B3006 
                    101                     328                     433 
   GSE131907_LUAD_B3007    GSE131907_LUAD_B3012    GSE131907_LUAD_B3013 
                    273                     259                     552 
   GSE131907_LUAD_B3016    GSE131907_LUAD_B3017    GSE131907_LUAD_B3019 
                    177                     695   

In [6]:
#split by sample_type
LUNG_B <- subset(LUNG, subset = sample_type %in% c("brain mets"))
LUNG_H <- subset(LUNG, subset = sample_type %in% c("Healthy Lung"))
LUNG_LN <- subset(LUNG, subset = sample_type %in% c("LN mets"))
LUNG_T <- subset(LUNG, subset = sample_type %in% c("tumour"))

#set site metadata
LUNG_B@meta.data$site <- "brain"
LUNG_H@meta.data$site <- "lung"
LUNG_LN@meta.data$site <- "lymph node"
LUNG_T@meta.data$site <- "lung"

#set sample_type_major metadata
LUNG_B@meta.data$sample_type_major <- "metastatic tumour"
LUNG_H@meta.data$sample_type_major <- "healthy"
LUNG_LN@meta.data$sample_type_major <- "metastatic tumour"
LUNG_T@meta.data$sample_type_major <- "primary tumour"

#set cancer_subtype metadata
LUNG_B@meta.data$cancer_subtype <- "NSCLC"
LUNG_H@meta.data$cancer_subtype <- "NA"
LUNG_LN@meta.data$cancer_subtype <- "NSCLC"
LUNG_T@meta.data$cancer_subtype <- "NSCLC"

#merge back together 
LUNG <- merge(LUNG_B, y = c(LUNG_H, LUNG_LN, LUNG_T), project = "GSE131907")

In [7]:
#set integration_id metadata
LUNG@meta.data$integration_id <- LUNG@meta.data$sample_id

In [8]:
LUNG
LUNG@project.name
head(LUNG@meta.data)

An object of class Seurat 
29634 features across 36524 samples within 1 assay 
Active assay: RNA (29634 features, 2000 variable features)
 90 layers present: counts.34.1, counts.35.1, counts.36.1, counts.37.1, counts.38.1, counts.39.1, counts.40.1, counts.41.1, counts.42.1, counts.43.1, data.34.1, data.35.1, data.36.1, data.37.1, data.38.1, data.39.1, data.40.1, data.41.1, data.42.1, data.43.1, scale.data.1, counts.16.2, counts.17.2, counts.18.2, counts.19.2, counts.20.2, counts.21.2, counts.22.2, counts.23.2, counts.24.2, counts.25.2, counts.26.2, data.16.2, data.17.2, data.18.2, data.19.2, data.20.2, data.21.2, data.22.2, data.23.2, data.24.2, data.25.2, data.26.2, scale.data.2, counts.27.3, counts.28.3, counts.29.3, counts.30.3, counts.31.3, counts.32.3, counts.33.3, data.27.3, data.28.3, data.29.3, data.30.3, data.31.3, data.32.3, data.33.3, scale.data.3, counts.1.4, counts.2.4, counts.3.4, counts.4.4, counts.5.4, counts.6.4, counts.7.4, counts.8.4, counts.9.4, counts.10.4, counts.

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_meta,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE131907_LUAD_Brain_mets_B3002_AAACCTGAGCAGACTG_NS_02,GSE131907,7771,1755,NS_02,brain mets,LUAD,Pt_3002,GSE131907_LUAD_B3002,2.792433,9,9,brain,metastatic tumour,NSCLC,GSE131907_LUAD_B3002
GSE131907_LUAD_Brain_mets_B3002_AAACCTGGTGTAATGA_NS_02,GSE131907,16681,3274,NS_02,brain mets,LUAD,Pt_3002,GSE131907_LUAD_B3002,3.08135,1,1,brain,metastatic tumour,NSCLC,GSE131907_LUAD_B3002
GSE131907_LUAD_Brain_mets_B3002_AAACGGGAGACCACGA_NS_02,GSE131907,12842,2797,NS_02,brain mets,LUAD,Pt_3002,GSE131907_LUAD_B3002,3.807818,1,1,brain,metastatic tumour,NSCLC,GSE131907_LUAD_B3002
GSE131907_LUAD_Brain_mets_B3002_AAACGGGAGTGAATTG_NS_02,GSE131907,23071,3522,NS_02,brain mets,LUAD,Pt_3002,GSE131907_LUAD_B3002,3.541242,1,1,brain,metastatic tumour,NSCLC,GSE131907_LUAD_B3002
GSE131907_LUAD_Brain_mets_B3002_AAACGGGAGTTCGATC_NS_02,GSE131907,8474,2098,NS_02,brain mets,LUAD,Pt_3002,GSE131907_LUAD_B3002,2.100543,1,1,brain,metastatic tumour,NSCLC,GSE131907_LUAD_B3002
GSE131907_LUAD_Brain_mets_B3002_AAACGGGCACGTGAGA_NS_02,GSE131907,1081,459,NS_02,brain mets,LUAD,Pt_3002,GSE131907_LUAD_B3002,4.810361,1,1,brain,metastatic tumour,NSCLC,GSE131907_LUAD_B3002


In [9]:
#exclude any samples with <100 cells
table(LUNG$integration_id)
#none to exclude 
#BRE <- subset(BRE, !(subset = integration_id %in% c("")))
#table(BRE$integration_id)


GSE131907_Healthy_N0001 GSE131907_Healthy_N0006 GSE131907_Healthy_N0008 
                   1311                    1272                    1324 
GSE131907_Healthy_N0009 GSE131907_Healthy_N0018 GSE131907_Healthy_N0019 
                   1144                    2050                    1144 
GSE131907_Healthy_N0020 GSE131907_Healthy_N0028 GSE131907_Healthy_N0030 
                   3489                     783                     955 
GSE131907_Healthy_N0031 GSE131907_Healthy_N0034    GSE131907_LUAD_B3002 
                   1337                    1722                     812 
   GSE131907_LUAD_B3003    GSE131907_LUAD_B3004    GSE131907_LUAD_B3006 
                    101                     328                     433 
   GSE131907_LUAD_B3007    GSE131907_LUAD_B3012    GSE131907_LUAD_B3013 
                    273                     259                     552 
   GSE131907_LUAD_B3016    GSE131907_LUAD_B3017    GSE131907_LUAD_B3019 
                    177                     695   

In [10]:
#join layers and then split them by integration_id
Layers(LUNG[["RNA"]])
#join layers
LUNG[["RNA"]] <- JoinLayers(LUNG[["RNA"]])
Layers(LUNG[["RNA"]])
#split layers
LUNG[["RNA"]] <- split(LUNG[["RNA"]], f = LUNG$integration_id)
Layers(LUNG[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [11]:
#record number of cells
table(LUNG$integration_id)
LUNG
LUNG@project.name


GSE131907_Healthy_N0001 GSE131907_Healthy_N0006 GSE131907_Healthy_N0008 
                   1311                    1272                    1324 
GSE131907_Healthy_N0009 GSE131907_Healthy_N0018 GSE131907_Healthy_N0019 
                   1144                    2050                    1144 
GSE131907_Healthy_N0020 GSE131907_Healthy_N0028 GSE131907_Healthy_N0030 
                   3489                     783                     955 
GSE131907_Healthy_N0031 GSE131907_Healthy_N0034    GSE131907_LUAD_B3002 
                   1337                    1722                     812 
   GSE131907_LUAD_B3003    GSE131907_LUAD_B3004    GSE131907_LUAD_B3006 
                    101                     328                     433 
   GSE131907_LUAD_B3007    GSE131907_LUAD_B3012    GSE131907_LUAD_B3013 
                    273                     259                     552 
   GSE131907_LUAD_B3016    GSE131907_LUAD_B3017    GSE131907_LUAD_B3019 
                    177                     695   

An object of class Seurat 
29634 features across 36524 samples within 1 assay 
Active assay: RNA (29634 features, 2000 variable features)
 87 layers present: counts.GSE131907_LUAD_B3002, counts.GSE131907_LUAD_B3003, counts.GSE131907_LUAD_B3004, counts.GSE131907_LUAD_B3006, counts.GSE131907_LUAD_B3007, counts.GSE131907_LUAD_B3012, counts.GSE131907_LUAD_B3013, counts.GSE131907_LUAD_B3016, counts.GSE131907_LUAD_B3017, counts.GSE131907_LUAD_B3019, counts.GSE131907_Healthy_N0001, counts.GSE131907_Healthy_N0006, counts.GSE131907_Healthy_N0008, counts.GSE131907_Healthy_N0009, counts.GSE131907_Healthy_N0018, counts.GSE131907_Healthy_N0019, counts.GSE131907_Healthy_N0020, counts.GSE131907_Healthy_N0028, counts.GSE131907_Healthy_N0030, counts.GSE131907_Healthy_N0031, counts.GSE131907_Healthy_N0034, counts.GSE131907_LUAD_L1010, counts.GSE131907_LUAD_L1011, counts.GSE131907_LUAD_L1012, counts.GSE131907_LUAD_L1013, counts.GSE131907_LUAD_L1015, counts.GSE131907_LUAD_L1019, counts.GSE131907_LUAD_L105

In [12]:
#re-export seurat object ready for integration
saveRDS(LUNG, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE131907_myeloid_int.RDS")

In [13]:
#remove all objects in R
rm(list = ls())

## PMID32561858

In [14]:
BC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/PMID32561858_BC_myeloid.RDS")
CRC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/PMID32561858_CRC_myeloid.RDS")

In [16]:
#do BC first

In [18]:
BC
BC@project.name
head(BC@meta.data)


An object of class Seurat 
33694 features across 3171 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 29 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,Cell,nGene,nUMI,CellFromTumor,PatientNumber,TumorType,TumorSite,CellType,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<int>,<int>,<lgl>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
PMID32561858_BC_Pt41_sc5rJUQ024_AAACCTGCAACAACCT,sc5rJUQ024,1624,894,sc5rJUQ024_AAACCTGCAACAACCT,894,1624,True,41,BC,Biopsy,DC,tumour,breast cancer,BC_41,PMID32561858_breast_41,2.216749,6,6
PMID32561858_BC_Pt41_sc5rJUQ024_AAACCTGTCAACGAAA,sc5rJUQ024,18953,4233,sc5rJUQ024_AAACCTGTCAACGAAA,4233,18953,True,41,BC,Biopsy,Myeloid,tumour,breast cancer,BC_41,PMID32561858_breast_41,4.505883,6,6
PMID32561858_BC_Pt41_sc5rJUQ024_AAACGGGCACAGAGGT,sc5rJUQ024,4891,1818,sc5rJUQ024_AAACGGGCACAGAGGT,1818,4891,True,41,BC,Biopsy,Myeloid,tumour,breast cancer,BC_41,PMID32561858_breast_41,5.050092,6,6
PMID32561858_BC_Pt41_sc5rJUQ024_AAATGCCAGCCTTGAT,sc5rJUQ024,15100,3675,sc5rJUQ024_AAATGCCAGCCTTGAT,3675,15100,True,41,BC,Biopsy,DC,tumour,breast cancer,BC_41,PMID32561858_breast_41,5.589404,6,6
PMID32561858_BC_Pt41_sc5rJUQ024_AACCATGCAGTAAGAT,sc5rJUQ024,13798,3473,sc5rJUQ024_AACCATGCAGTAAGAT,3473,13798,True,41,BC,Biopsy,Myeloid,tumour,breast cancer,BC_41,PMID32561858_breast_41,4.189013,6,6
PMID32561858_BC_Pt41_sc5rJUQ024_AACTCAGGTCCCGACA,sc5rJUQ024,7307,2150,sc5rJUQ024_AACTCAGGTCCCGACA,2150,7307,True,41,BC,Biopsy,DC,tumour,breast cancer,BC_41,PMID32561858_breast_41,11.140003,6,6


In [19]:
table(BC$sample_type)
table(BC$cancer_type)
table(BC$patient_id)
table(BC$sample_id)


tumour 
  3171 


breast cancer 
         3171 


BC_41 BC_42 BC_43 BC_44 BC_45 BC_46 BC_47 BC_48 BC_49 BC_50 BC_51 BC_52 BC_53 
  155   120   698   177   213    19   111   238   436   284   422   107    86 
BC_54 
  105 


PMID32561858_breast_41 PMID32561858_breast_42 PMID32561858_breast_43 
                   155                    120                    698 
PMID32561858_breast_44 PMID32561858_breast_45 PMID32561858_breast_46 
                   177                    213                     19 
PMID32561858_breast_47 PMID32561858_breast_48 PMID32561858_breast_49 
                   111                    238                    436 
PMID32561858_breast_50 PMID32561858_breast_51 PMID32561858_breast_52 
                   284                    422                    107 
PMID32561858_breast_53 PMID32561858_breast_54 
                    86                    105 

In [21]:
#looking into the paper PMID32561858 can see most of the breast cancer samples are IDC, two are not. 
#But not clearwhich as different numbering system, so may just have to stick with Breast Cancer

In [22]:
#set site metadata
BC@meta.data$site <- "breast"

#set sample_type_major metadata
BC@meta.data$sample_type_major <- "primary tumour"

#set cancer_subtype metadata
BC@meta.data$cancer_subtype <- "Breast Cancer"

In [23]:
#set integration_id metadata
BC@meta.data$integration_id <- BC@meta.data$sample_id

In [24]:
BC
BC@project.name
head(BC@meta.data)

An object of class Seurat 
33694 features across 3171 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 29 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,Cell,nGene,nUMI,CellFromTumor,PatientNumber,TumorType,TumorSite,⋯,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<int>,<int>,<lgl>,<int>,<chr>,<chr>,⋯,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
PMID32561858_BC_Pt41_sc5rJUQ024_AAACCTGCAACAACCT,sc5rJUQ024,1624,894,sc5rJUQ024_AAACCTGCAACAACCT,894,1624,True,41,BC,Biopsy,⋯,breast cancer,BC_41,PMID32561858_breast_41,2.216749,6,6,breast,primary tumour,Breast Cancer,PMID32561858_breast_41
PMID32561858_BC_Pt41_sc5rJUQ024_AAACCTGTCAACGAAA,sc5rJUQ024,18953,4233,sc5rJUQ024_AAACCTGTCAACGAAA,4233,18953,True,41,BC,Biopsy,⋯,breast cancer,BC_41,PMID32561858_breast_41,4.505883,6,6,breast,primary tumour,Breast Cancer,PMID32561858_breast_41
PMID32561858_BC_Pt41_sc5rJUQ024_AAACGGGCACAGAGGT,sc5rJUQ024,4891,1818,sc5rJUQ024_AAACGGGCACAGAGGT,1818,4891,True,41,BC,Biopsy,⋯,breast cancer,BC_41,PMID32561858_breast_41,5.050092,6,6,breast,primary tumour,Breast Cancer,PMID32561858_breast_41
PMID32561858_BC_Pt41_sc5rJUQ024_AAATGCCAGCCTTGAT,sc5rJUQ024,15100,3675,sc5rJUQ024_AAATGCCAGCCTTGAT,3675,15100,True,41,BC,Biopsy,⋯,breast cancer,BC_41,PMID32561858_breast_41,5.589404,6,6,breast,primary tumour,Breast Cancer,PMID32561858_breast_41
PMID32561858_BC_Pt41_sc5rJUQ024_AACCATGCAGTAAGAT,sc5rJUQ024,13798,3473,sc5rJUQ024_AACCATGCAGTAAGAT,3473,13798,True,41,BC,Biopsy,⋯,breast cancer,BC_41,PMID32561858_breast_41,4.189013,6,6,breast,primary tumour,Breast Cancer,PMID32561858_breast_41
PMID32561858_BC_Pt41_sc5rJUQ024_AACTCAGGTCCCGACA,sc5rJUQ024,7307,2150,sc5rJUQ024_AACTCAGGTCCCGACA,2150,7307,True,41,BC,Biopsy,⋯,breast cancer,BC_41,PMID32561858_breast_41,11.140003,6,6,breast,primary tumour,Breast Cancer,PMID32561858_breast_41


In [26]:
#exclude any samples with <100 cells
table(BC$integration_id)
#exclude pt46, pt53
BC <- subset(BC, !(subset = integration_id %in% c("PMID32561858_breast_46","PMID32561858_breast_53")))
table(BC$integration_id)


PMID32561858_breast_41 PMID32561858_breast_42 PMID32561858_breast_43 
                   155                    120                    698 
PMID32561858_breast_44 PMID32561858_breast_45 PMID32561858_breast_46 
                   177                    213                     19 
PMID32561858_breast_47 PMID32561858_breast_48 PMID32561858_breast_49 
                   111                    238                    436 
PMID32561858_breast_50 PMID32561858_breast_51 PMID32561858_breast_52 
                   284                    422                    107 
PMID32561858_breast_53 PMID32561858_breast_54 
                    86                    105 


PMID32561858_breast_41 PMID32561858_breast_42 PMID32561858_breast_43 
                   155                    120                    698 
PMID32561858_breast_44 PMID32561858_breast_45 PMID32561858_breast_47 
                   177                    213                    111 
PMID32561858_breast_48 PMID32561858_breast_49 PMID32561858_breast_50 
                   238                    436                    284 
PMID32561858_breast_51 PMID32561858_breast_52 PMID32561858_breast_54 
                   422                    107                    105 

In [28]:
#repeat for CRC

In [30]:
CRC
CRC@project.name
head(CRC@meta.data)

An object of class Seurat 
33694 features across 4219 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 43 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, counts.19, counts.20, counts.21, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, data.19, data.20, data.21, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,Cell,nGene,nUMI,CellFromTumor,PatientNumber,TumorType,TumorSite,CellType,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<int>,<int>,<lgl>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
PMID32561858_CRC_Pt31_Btumour_scrEXT002_AAAGATGAGTTCGATC,scrEXT002,2110,911,scrEXT002_AAAGATGAGTTCGATC,911,2110,True,31,CRC,B,Myeloid,tumour border,CRC,CRC_31,PMID32561858_CRC_31_BTu,4.218009,2,2
PMID32561858_CRC_Pt31_Btumour_scrEXT002_AAAGATGGTTCTGTTT,scrEXT002,1581,645,scrEXT002_AAAGATGGTTCTGTTT,645,1581,True,31,CRC,B,Myeloid,tumour border,CRC,CRC_31,PMID32561858_CRC_31_BTu,3.036053,2,2
PMID32561858_CRC_Pt31_Btumour_scrEXT002_AAAGATGTCAAACCGT,scrEXT002,17178,2873,scrEXT002_AAAGATGTCAAACCGT,2873,17178,True,31,CRC,B,Myeloid,tumour border,CRC,CRC_31,PMID32561858_CRC_31_BTu,1.792991,2,2
PMID32561858_CRC_Pt31_Btumour_scrEXT002_AAATGCCAGCGCTCCA,scrEXT002,5370,1748,scrEXT002_AAATGCCAGCGCTCCA,1748,5370,True,31,CRC,B,Myeloid,tumour border,CRC,CRC_31,PMID32561858_CRC_31_BTu,5.456238,2,2
PMID32561858_CRC_Pt31_Btumour_scrEXT002_AAATGCCGTAAGAGGA,scrEXT002,16689,3334,scrEXT002_AAATGCCGTAAGAGGA,3334,16689,True,31,CRC,B,Myeloid,tumour border,CRC,CRC_31,PMID32561858_CRC_31_BTu,3.954701,2,2
PMID32561858_CRC_Pt31_Btumour_scrEXT002_AACCATGAGGACTGGT,scrEXT002,13002,3024,scrEXT002_AACCATGAGGACTGGT,3024,13002,True,31,CRC,B,Myeloid,tumour border,CRC,CRC_31,PMID32561858_CRC_31_BTu,3.022612,2,2


In [31]:
table(CRC$sample_type)
table(CRC$cancer_type)
table(CRC$patient_id)
table(CRC$sample_id)


Healthy colon tumour border   tumour core 
         1222          1465          1532 


          CRC CRC - Healthy 
         2997          1222 


CRC_31 CRC_32 CRC_33 CRC_35 CRC_36 CRC_37 CRC_38 
  1336    963    784    456    219    188    273 


   PMID32561858_CRC_31_BTu    PMID32561858_CRC_31_CTu 
                       463                        439 
PMID32561858_CRC_31_Normal    PMID32561858_CRC_32_BTu 
                       434                        262 
   PMID32561858_CRC_32_CTu PMID32561858_CRC_32_Normal 
                       385                        316 
   PMID32561858_CRC_33_BTu    PMID32561858_CRC_33_CTu 
                       337                        312 
PMID32561858_CRC_33_Normal    PMID32561858_CRC_35_BTu 
                       135                        135 
   PMID32561858_CRC_35_CTu PMID32561858_CRC_35_Normal 
                       150                        171 
   PMID32561858_CRC_36_BTu    PMID32561858_CRC_36_CTu 
                        72                        124 
PMID32561858_CRC_36_Normal    PMID32561858_CRC_37_BTu 
                        23                        107 
   PMID32561858_CRC_37_CTu PMID32561858_CRC_37_Normal 
                        34                         47 
   PMID32

In [34]:
#split by cancer_type
CRC_T <- subset(CRC, subset = cancer_type %in% c("CRC"))
CRC_H <- subset(CRC, subset = cancer_type %in% c("CRC - Healthy"))

#set site metadata
CRC_T@meta.data$site <- "colon"
CRC_H@meta.data$site <- "colon"

#set sample_type_major metadata
CRC_T@meta.data$sample_type_major <- "primary tumour"
CRC_H@meta.data$sample_type_major <- "healthy"

#set cancer_subtype metadata
CRC_T@meta.data$cancer_subtype <- "CRC"
CRC_H@meta.data$cancer_subtype <- "NA"

In [36]:
table(CRC_T$sample_type)


tumour border   tumour core 
         1465          1532 

In [37]:
table(CRC_T$sample_id)


PMID32561858_CRC_31_BTu PMID32561858_CRC_31_CTu PMID32561858_CRC_32_BTu 
                    463                     439                     262 
PMID32561858_CRC_32_CTu PMID32561858_CRC_33_BTu PMID32561858_CRC_33_CTu 
                    385                     337                     312 
PMID32561858_CRC_35_BTu PMID32561858_CRC_35_CTu PMID32561858_CRC_36_BTu 
                    135                     150                      72 
PMID32561858_CRC_36_CTu PMID32561858_CRC_37_BTu PMID32561858_CRC_37_CTu 
                    124                     107                      34 
PMID32561858_CRC_38_BTu PMID32561858_CRC_38_CTu 
                     89                      88 

In [43]:
#have split by cancer_type, now split tumour also by sample id
CRC_31 <- subset(CRC_T, subset = sample_id %in% c("PMID32561858_CRC_31_BTu","PMID32561858_CRC_31_CTu"))
CRC_32 <- subset(CRC_T, subset = sample_id %in% c("PMID32561858_CRC_32_BTu","PMID32561858_CRC_32_CTu"))
CRC_33 <- subset(CRC_T, subset = sample_id %in% c("PMID32561858_CRC_33_BTu","PMID32561858_CRC_33_CTu"))
CRC_35 <- subset(CRC_T, subset = sample_id %in% c("PMID32561858_CRC_35_BTu","PMID32561858_CRC_35_CTu"))
CRC_36 <- subset(CRC_T, subset = sample_id %in% c("PMID32561858_CRC_36_BTu","PMID32561858_CRC_36_CTu"))
CRC_37 <- subset(CRC_T, subset = sample_id %in% c("PMID32561858_CRC_37_BTu","PMID32561858_CRC_37_CTu"))
CRC_38 <- subset(CRC_T, subset = sample_id %in% c("PMID32561858_CRC_38_BTu","PMID32561858_CRC_38_CTu"))

#set integration_id metadata
CRC_31@meta.data$integration_id <- "PMID32561858_CRC_31_Tu"
CRC_32@meta.data$integration_id <- "PMID32561858_CRC_32_Tu"
CRC_33@meta.data$integration_id <- "PMID32561858_CRC_33_Tu"
CRC_35@meta.data$integration_id <- "PMID32561858_CRC_35_Tu"
CRC_36@meta.data$integration_id <- "PMID32561858_CRC_36_Tu"
CRC_37@meta.data$integration_id <- "PMID32561858_CRC_37_Tu"
CRC_38@meta.data$integration_id <- "PMID32561858_CRC_38_Tu"

CRC_H@meta.data$integration_id <- CRC_H@meta.data$sample_id

In [44]:
#merge back together 
CRC_T <- merge(CRC_31, y = c(CRC_32,CRC_33,CRC_35,CRC_36,CRC_37,CRC_38), project = "PMID32561858")
CRC <- merge(CRC_T, y = c(CRC_H), project = "PMID32561858")

In [45]:
CRC
CRC@project.name
head(CRC@meta.data)
tail(CRC@meta.data)

An object of class Seurat 
33694 features across 4219 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 50 layers present: counts.1.1.1.1.1, counts.2.1.1.1.1, data.1.1.1.1.1, data.2.1.1.1.1, scale.data.1.1.1.1, counts.4.1.2.2.1, counts.5.1.2.2.1, data.4.1.2.2.1, data.5.1.2.2.1, scale.data.1.2.2.1, counts.7.1.3.3.1, counts.8.1.3.3.1, data.7.1.3.3.1, data.8.1.3.3.1, scale.data.1.3.3.1, counts.10.1.4.4.1, counts.11.1.4.4.1, data.10.1.4.4.1, data.11.1.4.4.1, scale.data.1.4.4.1, counts.13.1.5.5.1, counts.14.1.5.5.1, data.13.1.5.5.1, data.14.1.5.5.1, scale.data.1.5.5.1, counts.16.1.6.6.1, counts.17.1.6.6.1, data.16.1.6.6.1, data.17.1.6.6.1, scale.data.1.6.6.1, counts.19.1.7.7.1, counts.20.1.7.7.1, data.19.1.7.7.1, data.20.1.7.7.1, scale.data.1.7.7.1, counts.12.2.2, counts.15.2.2, counts.18.2.2, counts.21.2.2, counts.3.2.2, counts.6.2.2, counts.9.2.2, data.3.2.2, data.6.2.2, data.9.2.2, data.12.2.2, data.15.2.2, data.18.2.2, data.21.2.2, scale.data.2.2

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,Cell,nGene,nUMI,CellFromTumor,PatientNumber,TumorType,TumorSite,⋯,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<int>,<int>,<lgl>,<int>,<chr>,<chr>,⋯,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
PMID32561858_CRC_Pt31_Btumour_scrEXT002_AAAGATGAGTTCGATC,scrEXT002,2110,911,scrEXT002_AAAGATGAGTTCGATC,911,2110,True,31,CRC,B,⋯,CRC,CRC_31,PMID32561858_CRC_31_BTu,4.218009,2,2,colon,primary tumour,CRC,PMID32561858_CRC_31_Tu
PMID32561858_CRC_Pt31_Btumour_scrEXT002_AAAGATGGTTCTGTTT,scrEXT002,1581,645,scrEXT002_AAAGATGGTTCTGTTT,645,1581,True,31,CRC,B,⋯,CRC,CRC_31,PMID32561858_CRC_31_BTu,3.036053,2,2,colon,primary tumour,CRC,PMID32561858_CRC_31_Tu
PMID32561858_CRC_Pt31_Btumour_scrEXT002_AAAGATGTCAAACCGT,scrEXT002,17178,2873,scrEXT002_AAAGATGTCAAACCGT,2873,17178,True,31,CRC,B,⋯,CRC,CRC_31,PMID32561858_CRC_31_BTu,1.792991,2,2,colon,primary tumour,CRC,PMID32561858_CRC_31_Tu
PMID32561858_CRC_Pt31_Btumour_scrEXT002_AAATGCCAGCGCTCCA,scrEXT002,5370,1748,scrEXT002_AAATGCCAGCGCTCCA,1748,5370,True,31,CRC,B,⋯,CRC,CRC_31,PMID32561858_CRC_31_BTu,5.456238,2,2,colon,primary tumour,CRC,PMID32561858_CRC_31_Tu
PMID32561858_CRC_Pt31_Btumour_scrEXT002_AAATGCCGTAAGAGGA,scrEXT002,16689,3334,scrEXT002_AAATGCCGTAAGAGGA,3334,16689,True,31,CRC,B,⋯,CRC,CRC_31,PMID32561858_CRC_31_BTu,3.954701,2,2,colon,primary tumour,CRC,PMID32561858_CRC_31_Tu
PMID32561858_CRC_Pt31_Btumour_scrEXT002_AACCATGAGGACTGGT,scrEXT002,13002,3024,scrEXT002_AACCATGAGGACTGGT,3024,13002,True,31,CRC,B,⋯,CRC,CRC_31,PMID32561858_CRC_31_BTu,3.022612,2,2,colon,primary tumour,CRC,PMID32561858_CRC_31_Tu


Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,Cell,nGene,nUMI,CellFromTumor,PatientNumber,TumorType,TumorSite,⋯,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<int>,<int>,<lgl>,<int>,<chr>,<chr>,⋯,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
PMID32561858_CRC_Pt38_Normal_scrEXT029_TTCGGTCCAAGCTGGA,scrEXT029,4073,1394,scrEXT029_TTCGGTCCAAGCTGGA,1394,4073,False,38,CRC,N,⋯,CRC - Healthy,CRC_38,PMID32561858_CRC_38_Normal,6.604468,2,2,colon,healthy,,PMID32561858_CRC_38_Normal
PMID32561858_CRC_Pt38_Normal_scrEXT029_TTCGGTCTCTACTCAT,scrEXT029,6781,1934,scrEXT029_TTCGGTCTCTACTCAT,1934,6781,False,38,CRC,N,⋯,CRC - Healthy,CRC_38,PMID32561858_CRC_38_Normal,5.780858,2,2,colon,healthy,,PMID32561858_CRC_38_Normal
PMID32561858_CRC_Pt38_Normal_scrEXT029_TTGGCAACAATAGCGG,scrEXT029,4447,1398,scrEXT029_TTGGCAACAATAGCGG,1398,4447,False,38,CRC,N,⋯,CRC - Healthy,CRC_38,PMID32561858_CRC_38_Normal,4.272543,2,2,colon,healthy,,PMID32561858_CRC_38_Normal
PMID32561858_CRC_Pt38_Normal_scrEXT029_TTGTAGGGTAGTAGTA,scrEXT029,3037,1125,scrEXT029_TTGTAGGGTAGTAGTA,1125,3037,False,38,CRC,N,⋯,CRC - Healthy,CRC_38,PMID32561858_CRC_38_Normal,8.396444,2,2,colon,healthy,,PMID32561858_CRC_38_Normal
PMID32561858_CRC_Pt38_Normal_scrEXT029_TTTGTCAAGAAACGCC,scrEXT029,4979,1251,scrEXT029_TTTGTCAAGAAACGCC,1251,4979,False,38,CRC,N,⋯,CRC - Healthy,CRC_38,PMID32561858_CRC_38_Normal,9.600321,2,2,colon,healthy,,PMID32561858_CRC_38_Normal
PMID32561858_CRC_Pt38_Normal_scrEXT029_TTTGTCAAGGAGTAGA,scrEXT029,1013,525,scrEXT029_TTTGTCAAGGAGTAGA,525,1013,False,38,CRC,N,⋯,CRC - Healthy,CRC_38,PMID32561858_CRC_38_Normal,24.876604,2,2,colon,healthy,,PMID32561858_CRC_38_Normal


In [46]:
table(CRC$sample_id)
table(CRC$integration_id)


   PMID32561858_CRC_31_BTu    PMID32561858_CRC_31_CTu 
                       463                        439 
PMID32561858_CRC_31_Normal    PMID32561858_CRC_32_BTu 
                       434                        262 
   PMID32561858_CRC_32_CTu PMID32561858_CRC_32_Normal 
                       385                        316 
   PMID32561858_CRC_33_BTu    PMID32561858_CRC_33_CTu 
                       337                        312 
PMID32561858_CRC_33_Normal    PMID32561858_CRC_35_BTu 
                       135                        135 
   PMID32561858_CRC_35_CTu PMID32561858_CRC_35_Normal 
                       150                        171 
   PMID32561858_CRC_36_BTu    PMID32561858_CRC_36_CTu 
                        72                        124 
PMID32561858_CRC_36_Normal    PMID32561858_CRC_37_BTu 
                        23                        107 
   PMID32561858_CRC_37_CTu PMID32561858_CRC_37_Normal 
                        34                         47 
   PMID32


PMID32561858_CRC_31_Normal     PMID32561858_CRC_31_Tu 
                       434                        902 
PMID32561858_CRC_32_Normal     PMID32561858_CRC_32_Tu 
                       316                        647 
PMID32561858_CRC_33_Normal     PMID32561858_CRC_33_Tu 
                       135                        649 
PMID32561858_CRC_35_Normal     PMID32561858_CRC_35_Tu 
                       171                        285 
PMID32561858_CRC_36_Normal     PMID32561858_CRC_36_Tu 
                        23                        196 
PMID32561858_CRC_37_Normal     PMID32561858_CRC_37_Tu 
                        47                        141 
PMID32561858_CRC_38_Normal     PMID32561858_CRC_38_Tu 
                        96                        177 

In [49]:
#exclude any samples with <100 cells
table(CRC$integration_id)
#exclude Normal pt 36, 37, 38
CRC <- subset(CRC, !(subset = integration_id %in% c("PMID32561858_CRC_36_Normal","PMID32561858_CRC_37_Normal","PMID32561858_CRC_38_Normal")))
table(CRC$integration_id)


PMID32561858_CRC_31_Normal     PMID32561858_CRC_31_Tu 
                       434                        902 
PMID32561858_CRC_32_Normal     PMID32561858_CRC_32_Tu 
                       316                        647 
PMID32561858_CRC_33_Normal     PMID32561858_CRC_33_Tu 
                       135                        649 
PMID32561858_CRC_35_Normal     PMID32561858_CRC_35_Tu 
                       171                        285 
PMID32561858_CRC_36_Normal     PMID32561858_CRC_36_Tu 
                        23                        196 
PMID32561858_CRC_37_Normal     PMID32561858_CRC_37_Tu 
                        47                        141 
PMID32561858_CRC_38_Normal     PMID32561858_CRC_38_Tu 
                        96                        177 


PMID32561858_CRC_31_Normal     PMID32561858_CRC_31_Tu 
                       434                        902 
PMID32561858_CRC_32_Normal     PMID32561858_CRC_32_Tu 
                       316                        647 
PMID32561858_CRC_33_Normal     PMID32561858_CRC_33_Tu 
                       135                        649 
PMID32561858_CRC_35_Normal     PMID32561858_CRC_35_Tu 
                       171                        285 
    PMID32561858_CRC_36_Tu     PMID32561858_CRC_37_Tu 
                       196                        141 
    PMID32561858_CRC_38_Tu 
                       177 

In [50]:
#merge Breast and CRC
PMID <- merge(CRC, y = c(BC), project = "PMID32561858")

In [51]:
#join layers and then split them by integration_id
Layers(PMID[["RNA"]])
#join layers
PMID[["RNA"]] <- JoinLayers(PMID[["RNA"]])
Layers(PMID[["RNA"]])
#split layers
PMID[["RNA"]] <- split(PMID[["RNA"]], f = PMID$integration_id)
Layers(PMID[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [52]:
#record number of cells
table(PMID$integration_id)
PMID
PMID@project.name


    PMID32561858_breast_41     PMID32561858_breast_42 
                       155                        120 
    PMID32561858_breast_43     PMID32561858_breast_44 
                       698                        177 
    PMID32561858_breast_45     PMID32561858_breast_47 
                       213                        111 
    PMID32561858_breast_48     PMID32561858_breast_49 
                       238                        436 
    PMID32561858_breast_50     PMID32561858_breast_51 
                       284                        422 
    PMID32561858_breast_52     PMID32561858_breast_54 
                       107                        105 
PMID32561858_CRC_31_Normal     PMID32561858_CRC_31_Tu 
                       434                        902 
PMID32561858_CRC_32_Normal     PMID32561858_CRC_32_Tu 
                       316                        647 
PMID32561858_CRC_33_Normal     PMID32561858_CRC_33_Tu 
                       135                        649 
PMID32561

An object of class Seurat 
33694 features across 7119 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 47 layers present: counts.PMID32561858_CRC_31_Tu, counts.PMID32561858_CRC_32_Tu, counts.PMID32561858_CRC_33_Tu, counts.PMID32561858_CRC_35_Tu, counts.PMID32561858_CRC_36_Tu, counts.PMID32561858_CRC_37_Tu, counts.PMID32561858_CRC_38_Tu, counts.PMID32561858_CRC_31_Normal, counts.PMID32561858_CRC_32_Normal, counts.PMID32561858_CRC_33_Normal, counts.PMID32561858_CRC_35_Normal, counts.PMID32561858_breast_41, counts.PMID32561858_breast_42, counts.PMID32561858_breast_43, counts.PMID32561858_breast_44, counts.PMID32561858_breast_45, counts.PMID32561858_breast_47, counts.PMID32561858_breast_48, counts.PMID32561858_breast_49, counts.PMID32561858_breast_50, counts.PMID32561858_breast_51, counts.PMID32561858_breast_52, counts.PMID32561858_breast_54, scale.data, data.PMID32561858_CRC_31_Tu, data.PMID32561858_CRC_32_Tu, data.PMID32561858_CRC_33_Tu, data.PMID325618

In [53]:
#re-export seurat object ready for integration
saveRDS(PMID, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/PMID32561858_myeloid_int.RDS")

In [54]:
#remove all objects in R
rm(list = ls())

## GSE112271

In [3]:
HCC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE112271_myeloid.RDS")

In [4]:
HCC
HCC@project.name
head(HCC@meta.data)

An object of class Seurat 
32738 features across 7452 samples within 1 assay 
Active assay: RNA (32738 features, 2000 variable features)
 15 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, data.1, data.2, data.3, data.4, data.5, data.6, data.7, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE112271_Pt13a_AAACCTGAGCACCGTC-1,GSE112271,5321,1472,tumour,HCC,Pt13,GSE112271_HCC_Pt13_region-a,9.058448,7,7
GSE112271_Pt13a_AAACCTGCACCGAAAG-1,GSE112271,2229,895,tumour,HCC,Pt13,GSE112271_HCC_Pt13_region-a,28.218932,7,7
GSE112271_Pt13a_AAACCTGTCCACGTTC-1,GSE112271,4390,1528,tumour,HCC,Pt13,GSE112271_HCC_Pt13_region-a,15.71754,7,7
GSE112271_Pt13a_AAACGGGTCGGAAACG-1,GSE112271,2733,974,tumour,HCC,Pt13,GSE112271_HCC_Pt13_region-a,5.122576,7,7
GSE112271_Pt13a_AAAGATGCACATTAGC-1,GSE112271,2213,957,tumour,HCC,Pt13,GSE112271_HCC_Pt13_region-a,24.356078,7,7
GSE112271_Pt13a_AAAGATGCACTGTCGG-1,GSE112271,3110,1013,tumour,HCC,Pt13,GSE112271_HCC_Pt13_region-a,5.016077,7,7


In [5]:
table(HCC$sample_type)
table(HCC$cancer_type)
table(HCC$patient_id)
table(HCC$sample_id)


tumour 
  7452 


 HCC 
7452 


Pt13 Pt14 
2993 4459 


GSE112271_HCC_Pt13_region-a GSE112271_HCC_Pt13_region-b 
                        934                         274 
GSE112271_HCC_Pt13_region-c GSE112271_HCC_Pt14_region-a 
                       1785                         184 
GSE112271_HCC_Pt14_region-b GSE112271_HCC_Pt14_region-c 
                        523                        1446 
GSE112271_HCC_Pt14_region-d 
                       2306 

In [6]:
#set site metadata
HCC@meta.data$site <- "liver"

#set sample_type_major metadata
HCC@meta.data$sample_type_major <- "primary tumour"

#set cancer_subtype metadata
HCC@meta.data$cancer_subtype <- "HCC"

In [7]:
#want to merge regions from each patient so need to give same integration id

#split by patient_id
HCC_13 <- subset(HCC, subset = patient_id %in% c("Pt13"))
HCC_14 <- subset(HCC, subset = patient_id %in% c("Pt14"))

#set integration_id metadata
HCC_13@meta.data$integration_id <- "GSE112271_HCC_Pt13"
HCC_14@meta.data$integration_id <- "GSE112271_HCC_Pt14"

#merge back together 
HCC <- merge(HCC_13, y = c(HCC_14), project = "GSE112271")

In [8]:
HCC
HCC@project.name
head(HCC@meta.data)

An object of class Seurat 
32738 features across 7452 samples within 1 assay 
Active assay: RNA (32738 features, 2000 variable features)
 16 layers present: counts.1.1, counts.2.1, counts.3.1, data.1.1, data.2.1, data.3.1, scale.data.1, counts.4.2, counts.5.2, counts.6.2, counts.7.2, data.4.2, data.5.2, data.6.2, data.7.2, scale.data.2

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE112271_Pt13a_AAACCTGAGCACCGTC-1,GSE112271,5321,1472,tumour,HCC,Pt13,GSE112271_HCC_Pt13_region-a,9.058448,7,7,liver,primary tumour,HCC,GSE112271_HCC_Pt13
GSE112271_Pt13a_AAACCTGCACCGAAAG-1,GSE112271,2229,895,tumour,HCC,Pt13,GSE112271_HCC_Pt13_region-a,28.218932,7,7,liver,primary tumour,HCC,GSE112271_HCC_Pt13
GSE112271_Pt13a_AAACCTGTCCACGTTC-1,GSE112271,4390,1528,tumour,HCC,Pt13,GSE112271_HCC_Pt13_region-a,15.71754,7,7,liver,primary tumour,HCC,GSE112271_HCC_Pt13
GSE112271_Pt13a_AAACGGGTCGGAAACG-1,GSE112271,2733,974,tumour,HCC,Pt13,GSE112271_HCC_Pt13_region-a,5.122576,7,7,liver,primary tumour,HCC,GSE112271_HCC_Pt13
GSE112271_Pt13a_AAAGATGCACATTAGC-1,GSE112271,2213,957,tumour,HCC,Pt13,GSE112271_HCC_Pt13_region-a,24.356078,7,7,liver,primary tumour,HCC,GSE112271_HCC_Pt13
GSE112271_Pt13a_AAAGATGCACTGTCGG-1,GSE112271,3110,1013,tumour,HCC,Pt13,GSE112271_HCC_Pt13_region-a,5.016077,7,7,liver,primary tumour,HCC,GSE112271_HCC_Pt13


In [9]:
#exclude any samples with <100 cells
table(HCC$integration_id)
#none to exclude 
#BRE <- subset(BRE, !(subset = integration_id %in% c("")))
#table(BRE$integration_id)


GSE112271_HCC_Pt13 GSE112271_HCC_Pt14 
              2993               4459 

In [10]:
#join layers and then split them by integration_id
Layers(HCC[["RNA"]])
#join layers
HCC[["RNA"]] <- JoinLayers(HCC[["RNA"]])
Layers(HCC[["RNA"]])
#split layers
HCC[["RNA"]] <- split(HCC[["RNA"]], f = HCC$integration_id)
Layers(HCC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [11]:
#record number of cells
table(HCC$integration_id)
HCC
HCC@project.name


GSE112271_HCC_Pt13 GSE112271_HCC_Pt14 
              2993               4459 

An object of class Seurat 
32738 features across 7452 samples within 1 assay 
Active assay: RNA (32738 features, 2000 variable features)
 5 layers present: counts.GSE112271_HCC_Pt13, counts.GSE112271_HCC_Pt14, scale.data, data.GSE112271_HCC_Pt13, data.GSE112271_HCC_Pt14

In [12]:
#re-export seurat object ready for integration
saveRDS(HCC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE112271_myeloid_int.RDS")

In [13]:
#remove all objects in R
rm(list = ls())

## GSE189903

In [14]:
Data <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE189903_myeloid.RDS")

In [15]:
Data
Data@project.name
head(Data@meta.data)

An object of class Seurat 
33538 features across 18630 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 69 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, counts.19, counts.20, counts.21, counts.22, counts.23, counts.24, counts.25, counts.26, counts.27, counts.28, counts.29, counts.30, counts.31, counts.32, counts.33, counts.34, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, data.19, data.20, data.21, data.22, data.23, data.24, data.25, data.26, data.27, data.28, data.29, data.30, data.31, data.32, data.33, data.34, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE189903_1CB_AAGACCTAGTGCGATG-1,GSE189903,776,330,tumour border,iCCA,Pt_1C,GSE189903_iCCA_Pt1C_border,1.03092784,5,5
GSE189903_1CB_AAGTCTGGTTAAAGAC-1,GSE189903,3415,1421,tumour border,iCCA,Pt_1C,GSE189903_iCCA_Pt1C_border,7.64275256,5,5
GSE189903_1CB_ACATACGGTCGATTGT-1,GSE189903,22969,3458,tumour border,iCCA,Pt_1C,GSE189903_iCCA_Pt1C_border,16.76172232,5,5
GSE189903_1CB_ACCAGTACATCCTAGA-1,GSE189903,16811,3819,tumour border,iCCA,Pt_1C,GSE189903_iCCA_Pt1C_border,2.40913687,5,5
GSE189903_1CB_ACTGATGCAAGCGATG-1,GSE189903,1596,619,tumour border,iCCA,Pt_1C,GSE189903_iCCA_Pt1C_border,37.6566416,5,5
GSE189903_1CB_ACTGATGGTAGAAGGA-1,GSE189903,1156,581,tumour border,iCCA,Pt_1C,GSE189903_iCCA_Pt1C_border,0.08650519,5,5


In [16]:
table(Data$sample_type)
table(Data$cancer_type)
table(Data$patient_id)
table(Data$sample_id)


Healthy liver        tumour tumour border 
         4411         11878          2341 


    HCC Healthy    iCCA 
   9897    4411    4322 


Pt_1C Pt_1H Pt_2C Pt_2H Pt_3C Pt_3H Pt_4H 
 1275  2968  1409  1467  1872  2144  7495 


   GSE189903_HCC_Pt1H_border    GSE189903_HCC_Pt1H_normal 
                         336                         1867 
 GSE189903_HCC_Pt1H_tumour_1  GSE189903_HCC_Pt1H_tumour_2 
                         488                          149 
 GSE189903_HCC_Pt1H_tumour_3    GSE189903_HCC_Pt2H_border 
                         128                          972 
   GSE189903_HCC_Pt2H_normal  GSE189903_HCC_Pt2H_tumour_1 
                          30                           79 
 GSE189903_HCC_Pt2H_tumour_2  GSE189903_HCC_Pt2H_tumour_3 
                         158                          228 
   GSE189903_HCC_Pt3H_border    GSE189903_HCC_Pt3H_normal 
                         239                          383 
 GSE189903_HCC_Pt3H_tumour_1  GSE189903_HCC_Pt3H_tumour_2 
                         183                          442 
 GSE189903_HCC_Pt3H_tumour_3    GSE189903_HCC_Pt4H_border 
                         897                          451 
   GSE189903_HCC_Pt4H_normal  GSE189903_HCC_Pt4H_tumour

In [17]:
#set site metadata
Data@meta.data$site <- "liver"

In [18]:
table(Data$cancer_type)


    HCC Healthy    iCCA 
   9897    4411    4322 

In [21]:
#split by cancer_type
Data_HCC <- subset(Data, subset = cancer_type %in% c("HCC"))
Data_H <- subset(Data, subset = cancer_type %in% c("Healthy"))
Data_iCCA <- subset(Data, subset = cancer_type %in% c("iCCA"))

#set sample_type_major metadata
Data_HCC@meta.data$sample_type_major <- "primary tumour"
Data_H@meta.data$sample_type_major <- "healthy"
Data_iCCA@meta.data$sample_type_major <- "primary tumour"

#set cancer_subtype metadata
Data_HCC@meta.data$cancer_subtype <- "HCC"
Data_H@meta.data$cancer_subtype <- "NA"
Data_iCCA@meta.data$cancer_subtype <- "iCCA"

In [24]:
table(Data_HCC$patient_id)
table(Data_iCCA$patient_id)


Pt_1H Pt_2H Pt_3H Pt_4H 
 1101  1437  1761  5598 


Pt_1C Pt_2C Pt_3C 
 1198  1405  1719 

In [25]:
#want to merge regions from each patient so need to give same integration id

#already split by cancer_type, now split tumour by patient_id
Data_1H <- subset(Data_HCC, subset = patient_id %in% c("Pt_1H"))
Data_2H <- subset(Data_HCC, subset = patient_id %in% c("Pt_2H"))
Data_3H <- subset(Data_HCC, subset = patient_id %in% c("Pt_3H"))
Data_4H <- subset(Data_HCC, subset = patient_id %in% c("Pt_4H"))

Data_1C <- subset(Data_iCCA, subset = patient_id %in% c("Pt_1C"))
Data_2C <- subset(Data_iCCA, subset = patient_id %in% c("Pt_2C"))
Data_3C <- subset(Data_iCCA, subset = patient_id %in% c("Pt_3C"))


#set integration_id metadata
Data_1H@meta.data$integration_id <- "GSE189903_HCC_Pt1H_Tu"
Data_2H@meta.data$integration_id <- "GSE189903_HCC_Pt2H_Tu"
Data_3H@meta.data$integration_id <- "GSE189903_HCC_Pt3H_Tu"
Data_4H@meta.data$integration_id <- "GSE189903_HCC_Pt4H_Tu"
Data_1C@meta.data$integration_id <- "GSE189903_iCCA_Pt1C_Tu"
Data_2C@meta.data$integration_id <- "GSE189903_iCCA_Pt2C_Tu"
Data_3C@meta.data$integration_id <- "GSE189903_iCCA_Pt3C_Tu"

Data_H@meta.data$integration_id <- Data_H@meta.data$sample_id

#merge back together 
Data <- merge(Data_1H, y = c(Data_2H, Data_3H, Data_4H, Data_1C, Data_2C, Data_3C, Data_H), project = "GSE189903")

In [26]:
Data
Data@project.name
head(Data@meta.data)

An object of class Seurat 
33538 features across 18630 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 76 layers present: counts.15.1.1, counts.17.1.1, counts.18.1.1, counts.19.1.1, data.15.1.1, data.17.1.1, data.18.1.1, data.19.1.1, scale.data.1.1, counts.20.1.2, counts.22.1.2, counts.23.1.2, counts.24.1.2, data.20.1.2, data.22.1.2, data.23.1.2, data.24.1.2, scale.data.1.2, counts.25.1.3, counts.27.1.3, counts.28.1.3, counts.29.1.3, data.25.1.3, data.27.1.3, data.28.1.3, data.29.1.3, scale.data.1.3, counts.30.1.4, counts.32.1.4, counts.33.1.4, counts.34.1.4, data.30.1.4, data.32.1.4, data.33.1.4, data.34.1.4, scale.data.1.4, counts.1.3.5, counts.3.3.5, counts.4.3.5, counts.5.3.5, data.1.3.5, data.3.3.5, data.4.3.5, data.5.3.5, scale.data.3.5, counts.6.3.6, counts.8.3.6, counts.9.3.6, counts.10.3.6, data.6.3.6, data.8.3.6, data.9.3.6, data.10.3.6, scale.data.3.6, counts.11.3.7, counts.13.3.7, counts.14.3.7, data.11.3.7, data.13.3.7, data.14.3.7, scal

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE189903_1HB_AAACGGGTCTTCGAGA-1,GSE189903,281,206,tumour border,HCC,Pt_1H,GSE189903_HCC_Pt1H_border,7.117438,6,6,liver,primary tumour,HCC,GSE189903_HCC_Pt1H_Tu
GSE189903_1HB_AACACGTCATACTCTT-1,GSE189903,497,300,tumour border,HCC,Pt_1H,GSE189903_HCC_Pt1H_border,6.639839,5,5,liver,primary tumour,HCC,GSE189903_HCC_Pt1H_Tu
GSE189903_1HB_AACCGCGAGCTATGCT-1,GSE189903,409,273,tumour border,HCC,Pt_1H,GSE189903_HCC_Pt1H_border,2.444988,5,5,liver,primary tumour,HCC,GSE189903_HCC_Pt1H_Tu
GSE189903_1HB_AACCGCGGTCGCATCG-1,GSE189903,275,201,tumour border,HCC,Pt_1H,GSE189903_HCC_Pt1H_border,6.545455,6,6,liver,primary tumour,HCC,GSE189903_HCC_Pt1H_Tu
GSE189903_1HB_AACTCAGCACATTCGA-1,GSE189903,533,335,tumour border,HCC,Pt_1H,GSE189903_HCC_Pt1H_border,4.878049,5,5,liver,primary tumour,HCC,GSE189903_HCC_Pt1H_Tu
GSE189903_1HB_AACTCAGCACGGACAA-1,GSE189903,934,380,tumour border,HCC,Pt_1H,GSE189903_HCC_Pt1H_border,9.957173,5,5,liver,primary tumour,HCC,GSE189903_HCC_Pt1H_Tu


In [27]:
table(Data$sample_id)
table(Data$integration_id)


   GSE189903_HCC_Pt1H_border    GSE189903_HCC_Pt1H_normal 
                         336                         1867 
 GSE189903_HCC_Pt1H_tumour_1  GSE189903_HCC_Pt1H_tumour_2 
                         488                          149 
 GSE189903_HCC_Pt1H_tumour_3    GSE189903_HCC_Pt2H_border 
                         128                          972 
   GSE189903_HCC_Pt2H_normal  GSE189903_HCC_Pt2H_tumour_1 
                          30                           79 
 GSE189903_HCC_Pt2H_tumour_2  GSE189903_HCC_Pt2H_tumour_3 
                         158                          228 
   GSE189903_HCC_Pt3H_border    GSE189903_HCC_Pt3H_normal 
                         239                          383 
 GSE189903_HCC_Pt3H_tumour_1  GSE189903_HCC_Pt3H_tumour_2 
                         183                          442 
 GSE189903_HCC_Pt3H_tumour_3    GSE189903_HCC_Pt4H_border 
                         897                          451 
   GSE189903_HCC_Pt4H_normal  GSE189903_HCC_Pt4H_tumour


 GSE189903_HCC_Pt1H_normal      GSE189903_HCC_Pt1H_Tu 
                      1867                       1101 
 GSE189903_HCC_Pt2H_normal      GSE189903_HCC_Pt2H_Tu 
                        30                       1437 
 GSE189903_HCC_Pt3H_normal      GSE189903_HCC_Pt3H_Tu 
                       383                       1761 
 GSE189903_HCC_Pt4H_normal      GSE189903_HCC_Pt4H_Tu 
                      1897                       5598 
GSE189903_iCCA_Pt1C_normal     GSE189903_iCCA_Pt1C_Tu 
                        77                       1198 
GSE189903_iCCA_Pt2C_normal     GSE189903_iCCA_Pt2C_Tu 
                         4                       1405 
GSE189903_iCCA_Pt3C_normal     GSE189903_iCCA_Pt3C_Tu 
                       153                       1719 

In [29]:
#exclude any samples with <100 cells
table(Data$integration_id)
#exclude GSE189903_HCC_Pt2H_normal, GSE189903_iCCA_Pt1C_normal, GSE189903_iCCA_Pt2C_normal
Data <- subset(Data, !(subset = integration_id %in% c("GSE189903_HCC_Pt2H_normal","GSE189903_iCCA_Pt1C_normal","GSE189903_iCCA_Pt2C_normal")))
table(Data$integration_id)


 GSE189903_HCC_Pt1H_normal      GSE189903_HCC_Pt1H_Tu 
                      1867                       1101 
 GSE189903_HCC_Pt2H_normal      GSE189903_HCC_Pt2H_Tu 
                        30                       1437 
 GSE189903_HCC_Pt3H_normal      GSE189903_HCC_Pt3H_Tu 
                       383                       1761 
 GSE189903_HCC_Pt4H_normal      GSE189903_HCC_Pt4H_Tu 
                      1897                       5598 
GSE189903_iCCA_Pt1C_normal     GSE189903_iCCA_Pt1C_Tu 
                        77                       1198 
GSE189903_iCCA_Pt2C_normal     GSE189903_iCCA_Pt2C_Tu 
                         4                       1405 
GSE189903_iCCA_Pt3C_normal     GSE189903_iCCA_Pt3C_Tu 
                       153                       1719 


 GSE189903_HCC_Pt1H_normal      GSE189903_HCC_Pt1H_Tu 
                      1867                       1101 
     GSE189903_HCC_Pt2H_Tu  GSE189903_HCC_Pt3H_normal 
                      1437                        383 
     GSE189903_HCC_Pt3H_Tu  GSE189903_HCC_Pt4H_normal 
                      1761                       1897 
     GSE189903_HCC_Pt4H_Tu     GSE189903_iCCA_Pt1C_Tu 
                      5598                       1198 
    GSE189903_iCCA_Pt2C_Tu GSE189903_iCCA_Pt3C_normal 
                      1405                        153 
    GSE189903_iCCA_Pt3C_Tu 
                      1719 

In [30]:
#join layers and then split them by integration_id
Layers(Data[["RNA"]])
#join layers
Data[["RNA"]] <- JoinLayers(Data[["RNA"]])
Layers(Data[["RNA"]])
#split layers
Data[["RNA"]] <- split(Data[["RNA"]], f = Data$integration_id)
Layers(Data[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [31]:
#record number of cells
table(Data$integration_id)
Data
Data@project.name


 GSE189903_HCC_Pt1H_normal      GSE189903_HCC_Pt1H_Tu 
                      1867                       1101 
     GSE189903_HCC_Pt2H_Tu  GSE189903_HCC_Pt3H_normal 
                      1437                        383 
     GSE189903_HCC_Pt3H_Tu  GSE189903_HCC_Pt4H_normal 
                      1761                       1897 
     GSE189903_HCC_Pt4H_Tu     GSE189903_iCCA_Pt1C_Tu 
                      5598                       1198 
    GSE189903_iCCA_Pt2C_Tu GSE189903_iCCA_Pt3C_normal 
                      1405                        153 
    GSE189903_iCCA_Pt3C_Tu 
                      1719 

An object of class Seurat 
33538 features across 18519 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 23 layers present: counts.GSE189903_HCC_Pt1H_Tu, counts.GSE189903_HCC_Pt2H_Tu, counts.GSE189903_HCC_Pt3H_Tu, counts.GSE189903_HCC_Pt4H_Tu, counts.GSE189903_iCCA_Pt1C_Tu, counts.GSE189903_iCCA_Pt2C_Tu, counts.GSE189903_iCCA_Pt3C_Tu, counts.GSE189903_iCCA_Pt3C_normal, counts.GSE189903_HCC_Pt1H_normal, counts.GSE189903_HCC_Pt3H_normal, counts.GSE189903_HCC_Pt4H_normal, scale.data, data.GSE189903_HCC_Pt1H_Tu, data.GSE189903_HCC_Pt2H_Tu, data.GSE189903_HCC_Pt3H_Tu, data.GSE189903_HCC_Pt4H_Tu, data.GSE189903_iCCA_Pt1C_Tu, data.GSE189903_iCCA_Pt2C_Tu, data.GSE189903_iCCA_Pt3C_Tu, data.GSE189903_iCCA_Pt3C_normal, data.GSE189903_HCC_Pt1H_normal, data.GSE189903_HCC_Pt3H_normal, data.GSE189903_HCC_Pt4H_normal

In [32]:
#re-export seurat object ready for integration
saveRDS(Data, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE189903_myeloid_int.RDS")

In [33]:
#remove all objects in R
rm(list = ls())

## GSE162025

In [34]:
NPC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE162025_myeloid.RDS")

In [35]:
NPC
NPC@project.name
head(NPC@meta.data)

An object of class Seurat 
20930 features across 2255 samples within 1 assay 
Active assay: RNA (20930 features, 2000 variable features)
 21 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,RNA_snn_res.0.4
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<fct>
GSE162025_npc_1802_tumor_CTCATTATCATGTCCC,GSE162025,4197,1865,tumour,NPC,pt-1802,GSE162025_NPC_1802,8.911127,8,8,13
GSE162025_npc_1802_tumor_CAGCTAAAGGCAATTA,GSE162025,2978,1443,tumour,NPC,pt-1802,GSE162025_NPC_1802,4.298187,8,8,13
GSE162025_npc_1802_tumor_GCTGCTTTCGTTTATC,GSE162025,13014,3649,tumour,NPC,pt-1802,GSE162025_NPC_1802,3.442447,8,8,18
GSE162025_npc_1802_tumor_GTATTCTTCTCAACTT,GSE162025,4208,1833,tumour,NPC,pt-1802,GSE162025_NPC_1802,3.089354,8,8,13
GSE162025_npc_1802_tumor_AAATGCCAGTACGTTC,GSE162025,3261,1576,tumour,NPC,pt-1802,GSE162025_NPC_1802,4.293162,8,8,13
GSE162025_npc_1802_tumor_AAGTCTGAGGACCACA,GSE162025,8358,2619,tumour,NPC,pt-1802,GSE162025_NPC_1802,4.450826,8,8,13


In [36]:
table(NPC$sample_type)
table(NPC$cancer_type)
table(NPC$patient_id)
table(NPC$sample_id)


tumour 
  2255 


 NPC 
2255 


pt-1802 pt-1805 pt-1806 pt-1807 pt-1808 pt-1810 pt-1811 pt-1813 pt-1815 pt-1816 
    644      60     154      94      32     149     104     443     102     473 


GSE162025_NPC_1802 GSE162025_NPC_1805 GSE162025_NPC_1806 GSE162025_NPC_1807 
               644                 60                154                 94 
GSE162025_NPC_1808 GSE162025_NPC_1810 GSE162025_NPC_1811 GSE162025_NPC_1813 
                32                149                104                443 
GSE162025_NPC_1815 GSE162025_NPC_1816 
               102                473 

In [37]:
#set site metadata
NPC@meta.data$site <- "nasopharynx"

#set sample_type_major metadata
NPC@meta.data$sample_type_major <- "primary tumour"

#set cancer_subtype metadata
NPC@meta.data$cancer_subtype <- "NPC"

#set integration_id metadata
NPC@meta.data$integration_id <- NPC@meta.data$sample_id

In [38]:
NPC
NPC@project.name
head(NPC@meta.data)

An object of class Seurat 
20930 features across 2255 samples within 1 assay 
Active assay: RNA (20930 features, 2000 variable features)
 21 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,RNA_snn_res.0.4,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
GSE162025_npc_1802_tumor_CTCATTATCATGTCCC,GSE162025,4197,1865,tumour,NPC,pt-1802,GSE162025_NPC_1802,8.911127,8,8,13,nasopharynx,primary tumour,NPC,GSE162025_NPC_1802
GSE162025_npc_1802_tumor_CAGCTAAAGGCAATTA,GSE162025,2978,1443,tumour,NPC,pt-1802,GSE162025_NPC_1802,4.298187,8,8,13,nasopharynx,primary tumour,NPC,GSE162025_NPC_1802
GSE162025_npc_1802_tumor_GCTGCTTTCGTTTATC,GSE162025,13014,3649,tumour,NPC,pt-1802,GSE162025_NPC_1802,3.442447,8,8,18,nasopharynx,primary tumour,NPC,GSE162025_NPC_1802
GSE162025_npc_1802_tumor_GTATTCTTCTCAACTT,GSE162025,4208,1833,tumour,NPC,pt-1802,GSE162025_NPC_1802,3.089354,8,8,13,nasopharynx,primary tumour,NPC,GSE162025_NPC_1802
GSE162025_npc_1802_tumor_AAATGCCAGTACGTTC,GSE162025,3261,1576,tumour,NPC,pt-1802,GSE162025_NPC_1802,4.293162,8,8,13,nasopharynx,primary tumour,NPC,GSE162025_NPC_1802
GSE162025_npc_1802_tumor_AAGTCTGAGGACCACA,GSE162025,8358,2619,tumour,NPC,pt-1802,GSE162025_NPC_1802,4.450826,8,8,13,nasopharynx,primary tumour,NPC,GSE162025_NPC_1802


In [42]:
#exclude any samples with <100 cells
table(NPC$integration_id)
#exclude 1805, 1807, 1808
NPC <- subset(NPC, !(subset = integration_id %in% c("GSE162025_NPC_1805","GSE162025_NPC_1807","GSE162025_NPC_1808")))
table(NPC$integration_id)


GSE162025_NPC_1802 GSE162025_NPC_1805 GSE162025_NPC_1806 GSE162025_NPC_1807 
               644                 60                154                 94 
GSE162025_NPC_1808 GSE162025_NPC_1810 GSE162025_NPC_1811 GSE162025_NPC_1813 
                32                149                104                443 
GSE162025_NPC_1815 GSE162025_NPC_1816 
               102                473 


GSE162025_NPC_1802 GSE162025_NPC_1806 GSE162025_NPC_1810 GSE162025_NPC_1811 
               644                154                149                104 
GSE162025_NPC_1813 GSE162025_NPC_1815 GSE162025_NPC_1816 
               443                102                473 

In [43]:
#join layers and then split them by integration_id
Layers(NPC[["RNA"]])
#join layers
NPC[["RNA"]] <- JoinLayers(NPC[["RNA"]])
Layers(NPC[["RNA"]])
#split layers
NPC[["RNA"]] <- split(NPC[["RNA"]], f = NPC$integration_id)
Layers(NPC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [44]:
#record number of cells
table(NPC$integration_id)
NPC
NPC@project.name


GSE162025_NPC_1802 GSE162025_NPC_1806 GSE162025_NPC_1810 GSE162025_NPC_1811 
               644                154                149                104 
GSE162025_NPC_1813 GSE162025_NPC_1815 GSE162025_NPC_1816 
               443                102                473 

An object of class Seurat 
20930 features across 2069 samples within 1 assay 
Active assay: RNA (20930 features, 2000 variable features)
 15 layers present: data.GSE162025_NPC_1802, data.GSE162025_NPC_1806, data.GSE162025_NPC_1810, data.GSE162025_NPC_1811, data.GSE162025_NPC_1813, data.GSE162025_NPC_1815, data.GSE162025_NPC_1816, scale.data, counts.GSE162025_NPC_1802, counts.GSE162025_NPC_1806, counts.GSE162025_NPC_1810, counts.GSE162025_NPC_1811, counts.GSE162025_NPC_1813, counts.GSE162025_NPC_1815, counts.GSE162025_NPC_1816
 2 dimensional reductions calculated: pca, umap

In [45]:
#re-export seurat object ready for integration
saveRDS(NPC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE162025_myeloid_int.RDS")

In [46]:
#remove all objects in R
rm(list = ls())

## GSE139324

In [47]:
HNSCC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE139324_myeloid.RDS")

In [48]:
HNSCC
HNSCC@project.name
head(HNSCC@meta.data)

An object of class Seurat 
33694 features across 8995 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 63 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, counts.19, counts.20, counts.21, counts.22, counts.23, counts.24, counts.25, counts.26, counts.27, counts.28, counts.29, counts.30, counts.31, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, data.19, data.20, data.21, data.22, data.23, data.24, data.25, data.26, data.27, data.28, data.29, data.30, data.31, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,RNA_snn_res.0.5
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<fct>
GSE139324_HNSCC_1_AAACCTGTCTTCGAGA-1,GSE139324,5976,1717,tumour,HNSCC,HNSCC_1,GSE139324_HNSCC_1,4.534806,5,5,15
GSE139324_HNSCC_1_AAAGATGAGCATGGCA-1,GSE139324,7038,1751,tumour,HNSCC,HNSCC_1,GSE139324_HNSCC_1,3.3106,5,5,5
GSE139324_HNSCC_1_AAAGATGAGCTAGTCT-1,GSE139324,9276,2396,tumour,HNSCC,HNSCC_1,GSE139324_HNSCC_1,4.65718,5,5,13
GSE139324_HNSCC_1_AACCGCGCACACTGCG-1,GSE139324,18633,3528,tumour,HNSCC,HNSCC_1,GSE139324_HNSCC_1,3.434766,11,11,13
GSE139324_HNSCC_1_AACTCCCAGATGAGAG-1,GSE139324,4313,1379,tumour,HNSCC,HNSCC_1,GSE139324_HNSCC_1,4.219801,5,5,13
GSE139324_HNSCC_1_AACTCCCGTCATATCG-1,GSE139324,4335,1408,tumour,HNSCC,HNSCC_1,GSE139324_HNSCC_1,4.72895,5,5,15


In [49]:
table(HNSCC$sample_type)
table(HNSCC$cancer_type)
table(HNSCC$patient_id)
table(HNSCC$sample_id)


healthy_tonsil         tumour 
           113           8882 


Healthy   HNSCC 
    113    8882 


    HD_1     HD_2     HD_3     HD_4     HD_5  HNSCC_1 HNSCC_10 HNSCC_11 
      14       18       22       27       32      275      242      656 
HNSCC_12 HNSCC_13 HNSCC_14 HNSCC_15 HNSCC_16 HNSCC_17 HNSCC_18 HNSCC_19 
     288      816      708      578      579      327      490       29 
 HNSCC_2 HNSCC_20 HNSCC_21 HNSCC_22 HNSCC_23 HNSCC_24 HNSCC_25 HNSCC_26 
     132       25       21       54      629      148      133      342 
 HNSCC_3  HNSCC_4  HNSCC_5  HNSCC_6  HNSCC_7  HNSCC_8  HNSCC_9 
     143      308      111      192      610      593      453 


    GSE139324_HD_1     GSE139324_HD_2     GSE139324_HD_3     GSE139324_HD_4 
                14                 18                 22                 27 
    GSE139324_HD_5  GSE139324_HNSCC_1 GSE139324_HNSCC_10 GSE139324_HNSCC_11 
                32                275                242                656 
GSE139324_HNSCC_12 GSE139324_HNSCC_13 GSE139324_HNSCC_14 GSE139324_HNSCC_15 
               288                816                708                578 
GSE139324_HNSCC_16 GSE139324_HNSCC_17 GSE139324_HNSCC_18 GSE139324_HNSCC_19 
               579                327                490                 29 
 GSE139324_HNSCC_2 GSE139324_HNSCC_20 GSE139324_HNSCC_21 GSE139324_HNSCC_22 
               132                 25                 21                 54 
GSE139324_HNSCC_23 GSE139324_HNSCC_24 GSE139324_HNSCC_25 GSE139324_HNSCC_26 
               629                148                133                342 
 GSE139324_HNSCC_3  GSE139324_HNSCC_4  GSE139324_HNSCC_5  GSE139324_HNSCC_6

In [50]:
table(HNSCC$cancer_type)


Healthy   HNSCC 
    113    8882 

In [51]:
#split by cancer_type
HNSCC_H <- subset(HNSCC, subset = cancer_type %in% c("Healthy"))
HNSCC_T <- subset(HNSCC, subset = cancer_type %in% c("HNSCC"))

#set site metadata
HNSCC_H@meta.data$site <- "tonsil"
HNSCC_T@meta.data$site <- "head and neck"

#set sample_type_major metadata
HNSCC_H@meta.data$sample_type_major <- "healthy"
HNSCC_T@meta.data$sample_type_major <- "primary tumour"

#set cancer_subtype metadata
HNSCC_H@meta.data$cancer_subtype <- "NA"
HNSCC_T@meta.data$cancer_subtype <- "HNSCC"

#set integration_id metadata
HNSCC_H@meta.data$integration_id <- HNSCC_H@meta.data$sample_id
HNSCC_T@meta.data$integration_id <- HNSCC_T@meta.data$sample_id


#merge back together 
HNSCC <- merge(HNSCC_H, y = c(HNSCC_T), project = "GSE139324")

In [52]:
HNSCC
HNSCC@project.name
head(HNSCC@meta.data)

An object of class Seurat 
33694 features across 8995 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 64 layers present: counts.27.1, counts.28.1, counts.29.1, counts.30.1, counts.31.1, data.27.1, data.28.1, data.29.1, data.30.1, data.31.1, scale.data.1, counts.1.2, counts.2.2, counts.3.2, counts.4.2, counts.5.2, counts.6.2, counts.7.2, counts.8.2, counts.9.2, counts.10.2, counts.11.2, counts.12.2, counts.13.2, counts.14.2, counts.15.2, counts.16.2, counts.17.2, counts.18.2, counts.19.2, counts.20.2, counts.21.2, counts.22.2, counts.23.2, counts.24.2, counts.25.2, counts.26.2, data.1.2, data.2.2, data.3.2, data.4.2, data.5.2, data.6.2, data.7.2, data.8.2, data.9.2, data.10.2, data.11.2, data.12.2, data.13.2, data.14.2, data.15.2, data.16.2, data.17.2, data.18.2, data.19.2, data.20.2, data.21.2, data.22.2, data.23.2, data.24.2, data.25.2, data.26.2, scale.data.2

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,RNA_snn_res.0.5,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE139324_HD_1_ACTGATGCAAGACACG-1,GSE139324,7974,2492,healthy_tonsil,Healthy,HD_1,GSE139324_HD_1,2.14447,11,11,13,tonsil,healthy,,GSE139324_HD_1
GSE139324_HD_1_ATCTGCCCAATGGACG-1,GSE139324,15504,2953,healthy_tonsil,Healthy,HD_1,GSE139324_HD_1,3.515222,5,5,13,tonsil,healthy,,GSE139324_HD_1
GSE139324_HD_1_ATGAGGGAGGAGTTGC-1,GSE139324,4505,1405,healthy_tonsil,Healthy,HD_1,GSE139324_HD_1,9.567148,5,5,13,tonsil,healthy,,GSE139324_HD_1
GSE139324_HD_1_CGTTCTGAGTCGATAA-1,GSE139324,13968,2671,healthy_tonsil,Healthy,HD_1,GSE139324_HD_1,3.665521,5,5,13,tonsil,healthy,,GSE139324_HD_1
GSE139324_HD_1_CTCCTAGAGTCTCAAC-1,GSE139324,20506,4055,healthy_tonsil,Healthy,HD_1,GSE139324_HD_1,3.833024,5,5,5,tonsil,healthy,,GSE139324_HD_1
GSE139324_HD_1_CTCGTCAAGCGTGAGT-1,GSE139324,18511,3042,healthy_tonsil,Healthy,HD_1,GSE139324_HD_1,2.917184,5,5,13,tonsil,healthy,,GSE139324_HD_1


In [54]:
#exclude any samples with <100 cells
table(HNSCC$integration_id)
#exclude HD1, HD2, HD3, HD4, HD5, HNSCC19, HNSCC20, HNSCC21,HNSCC22
HNSCC <- subset(HNSCC, !(subset = integration_id %in% c("GSE139324_HD_1","GSE139324_HD_2","GSE139324_HD_3","GSE139324_HD_4","GSE139324_HD_5","GSE139324_HNSCC_19","GSE139324_HNSCC_20","GSE139324_HNSCC_21","GSE139324_HNSCC_22")))
table(HNSCC$integration_id)


    GSE139324_HD_1     GSE139324_HD_2     GSE139324_HD_3     GSE139324_HD_4 
                14                 18                 22                 27 
    GSE139324_HD_5  GSE139324_HNSCC_1 GSE139324_HNSCC_10 GSE139324_HNSCC_11 
                32                275                242                656 
GSE139324_HNSCC_12 GSE139324_HNSCC_13 GSE139324_HNSCC_14 GSE139324_HNSCC_15 
               288                816                708                578 
GSE139324_HNSCC_16 GSE139324_HNSCC_17 GSE139324_HNSCC_18 GSE139324_HNSCC_19 
               579                327                490                 29 
 GSE139324_HNSCC_2 GSE139324_HNSCC_20 GSE139324_HNSCC_21 GSE139324_HNSCC_22 
               132                 25                 21                 54 
GSE139324_HNSCC_23 GSE139324_HNSCC_24 GSE139324_HNSCC_25 GSE139324_HNSCC_26 
               629                148                133                342 
 GSE139324_HNSCC_3  GSE139324_HNSCC_4  GSE139324_HNSCC_5  GSE139324_HNSCC_6


 GSE139324_HNSCC_1 GSE139324_HNSCC_10 GSE139324_HNSCC_11 GSE139324_HNSCC_12 
               275                242                656                288 
GSE139324_HNSCC_13 GSE139324_HNSCC_14 GSE139324_HNSCC_15 GSE139324_HNSCC_16 
               816                708                578                579 
GSE139324_HNSCC_17 GSE139324_HNSCC_18  GSE139324_HNSCC_2 GSE139324_HNSCC_23 
               327                490                132                629 
GSE139324_HNSCC_24 GSE139324_HNSCC_25 GSE139324_HNSCC_26  GSE139324_HNSCC_3 
               148                133                342                143 
 GSE139324_HNSCC_4  GSE139324_HNSCC_5  GSE139324_HNSCC_6  GSE139324_HNSCC_7 
               308                111                192                610 
 GSE139324_HNSCC_8  GSE139324_HNSCC_9 
               593                453 

In [55]:
#join layers and then split them by integration_id
Layers(HNSCC[["RNA"]])
#join layers
HNSCC[["RNA"]] <- JoinLayers(HNSCC[["RNA"]])
Layers(HNSCC[["RNA"]])
#split layers
HNSCC[["RNA"]] <- split(HNSCC[["RNA"]], f = HNSCC$integration_id)
Layers(HNSCC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data.2’, ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [56]:
#record number of cells
table(HNSCC$integration_id)
HNSCC
HNSCC@project.name


 GSE139324_HNSCC_1 GSE139324_HNSCC_10 GSE139324_HNSCC_11 GSE139324_HNSCC_12 
               275                242                656                288 
GSE139324_HNSCC_13 GSE139324_HNSCC_14 GSE139324_HNSCC_15 GSE139324_HNSCC_16 
               816                708                578                579 
GSE139324_HNSCC_17 GSE139324_HNSCC_18  GSE139324_HNSCC_2 GSE139324_HNSCC_23 
               327                490                132                629 
GSE139324_HNSCC_24 GSE139324_HNSCC_25 GSE139324_HNSCC_26  GSE139324_HNSCC_3 
               148                133                342                143 
 GSE139324_HNSCC_4  GSE139324_HNSCC_5  GSE139324_HNSCC_6  GSE139324_HNSCC_7 
               308                111                192                610 
 GSE139324_HNSCC_8  GSE139324_HNSCC_9 
               593                453 

An object of class Seurat 
33694 features across 8753 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 46 layers present: data.GSE139324_HNSCC_1, data.GSE139324_HNSCC_2, data.GSE139324_HNSCC_3, data.GSE139324_HNSCC_4, data.GSE139324_HNSCC_5, data.GSE139324_HNSCC_6, data.GSE139324_HNSCC_7, data.GSE139324_HNSCC_8, data.GSE139324_HNSCC_9, data.GSE139324_HNSCC_10, data.GSE139324_HNSCC_11, data.GSE139324_HNSCC_12, data.GSE139324_HNSCC_13, data.GSE139324_HNSCC_14, data.GSE139324_HNSCC_15, data.GSE139324_HNSCC_16, data.GSE139324_HNSCC_17, data.GSE139324_HNSCC_18, data.GSE139324_HNSCC_23, data.GSE139324_HNSCC_24, data.GSE139324_HNSCC_25, data.GSE139324_HNSCC_26, scale.data.2, scale.data, counts.GSE139324_HNSCC_1, counts.GSE139324_HNSCC_2, counts.GSE139324_HNSCC_3, counts.GSE139324_HNSCC_4, counts.GSE139324_HNSCC_5, counts.GSE139324_HNSCC_6, counts.GSE139324_HNSCC_7, counts.GSE139324_HNSCC_8, counts.GSE139324_HNSCC_9, counts.GSE139324_HNSCC_10, counts.GSE13932

In [57]:
#re-export seurat object ready for integration
saveRDS(HNSCC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE139324_myeloid_int.RDS")

In [58]:
#remove all objects in R
rm(list = ls())

## GSE164690

In [59]:
HNSCC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE164690_myeloid.RDS")

In [60]:
HNSCC
HNSCC@project.name
head(HNSCC@meta.data)

An object of class Seurat 
33545 features across 9526 samples within 1 assay 
Active assay: RNA (33545 features, 2000 variable features)
 37 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.5,seurat_clusters,RNA_snn_res.0.2
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<fct>
GSE164690_HN01_AAACCTGCACCAGATT-1,GSE164690,9140,2104,tumour,HNSCC,HN01,GSE164690_HNSCC_HN01,3.030635,18,11,11
GSE164690_HN01_AAACCTGCAGGGTTAG-1,GSE164690,11142,3130,tumour,HNSCC,HN01,GSE164690_HNSCC_HN01,3.62592,5,2,2
GSE164690_HN01_AAACGGGCAAGTACCT-1,GSE164690,26390,4858,tumour,HNSCC,HN01,GSE164690_HNSCC_HN01,5.752179,13,2,2
GSE164690_HN01_AAACGGGCACTAGTAC-1,GSE164690,14516,3446,tumour,HNSCC,HN01,GSE164690_HNSCC_HN01,3.60981,5,2,2
GSE164690_HN01_AAAGTAGTCCATTCTA-1,GSE164690,15592,3281,tumour,HNSCC,HN01,GSE164690_HNSCC_HN01,3.2196,5,2,2
GSE164690_HN01_AAATGCCAGACATAAC-1,GSE164690,8071,2262,tumour,HNSCC,HN01,GSE164690_HNSCC_HN01,7.025152,5,2,2


In [61]:
table(HNSCC$sample_type)
table(HNSCC$cancer_type)
table(HNSCC$patient_id)
table(HNSCC$sample_id)


tumour 
  9526 


HNSCC 
 9526 


HN01 HN02 HN03 HN04 HN05 HN06 HN07 HN08 HN09 HN10 HN11 HN12 HN13 HN14 HN15 HN16 
 648  687  767  321  772  314  860  805  724  641  404  171  152  391  549  100 
HN17 HN18 
 395  825 


GSE164690_HNSCC_HN01 GSE164690_HNSCC_HN02 GSE164690_HNSCC_HN03 
                 648                  687                  767 
GSE164690_HNSCC_HN04 GSE164690_HNSCC_HN05 GSE164690_HNSCC_HN06 
                 321                  772                  314 
GSE164690_HNSCC_HN07 GSE164690_HNSCC_HN08 GSE164690_HNSCC_HN09 
                 860                  805                  724 
GSE164690_HNSCC_HN10 GSE164690_HNSCC_HN11 GSE164690_HNSCC_HN12 
                 641                  404                  171 
GSE164690_HNSCC_HN13 GSE164690_HNSCC_HN14 GSE164690_HNSCC_HN15 
                 152                  391                  549 
GSE164690_HNSCC_HN16 GSE164690_HNSCC_HN17 GSE164690_HNSCC_HN18 
                 100                  395                  825 

In [62]:
#set site metadata
HNSCC@meta.data$site <- "head and neck"

#set sample_type_major metadata
HNSCC@meta.data$sample_type_major <- "primary tumour"

#set cancer_subtype metadata
HNSCC@meta.data$cancer_subtype <- "HNSCC"

#set integration_id metadata
HNSCC@meta.data$integration_id <- HNSCC@meta.data$sample_id


In [63]:
HNSCC
HNSCC@project.name
head(HNSCC@meta.data)

An object of class Seurat 
33545 features across 9526 samples within 1 assay 
Active assay: RNA (33545 features, 2000 variable features)
 37 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.5,seurat_clusters,RNA_snn_res.0.2,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
GSE164690_HN01_AAACCTGCACCAGATT-1,GSE164690,9140,2104,tumour,HNSCC,HN01,GSE164690_HNSCC_HN01,3.030635,18,11,11,head and neck,primary tumour,HNSCC,GSE164690_HNSCC_HN01
GSE164690_HN01_AAACCTGCAGGGTTAG-1,GSE164690,11142,3130,tumour,HNSCC,HN01,GSE164690_HNSCC_HN01,3.62592,5,2,2,head and neck,primary tumour,HNSCC,GSE164690_HNSCC_HN01
GSE164690_HN01_AAACGGGCAAGTACCT-1,GSE164690,26390,4858,tumour,HNSCC,HN01,GSE164690_HNSCC_HN01,5.752179,13,2,2,head and neck,primary tumour,HNSCC,GSE164690_HNSCC_HN01
GSE164690_HN01_AAACGGGCACTAGTAC-1,GSE164690,14516,3446,tumour,HNSCC,HN01,GSE164690_HNSCC_HN01,3.60981,5,2,2,head and neck,primary tumour,HNSCC,GSE164690_HNSCC_HN01
GSE164690_HN01_AAAGTAGTCCATTCTA-1,GSE164690,15592,3281,tumour,HNSCC,HN01,GSE164690_HNSCC_HN01,3.2196,5,2,2,head and neck,primary tumour,HNSCC,GSE164690_HNSCC_HN01
GSE164690_HN01_AAATGCCAGACATAAC-1,GSE164690,8071,2262,tumour,HNSCC,HN01,GSE164690_HNSCC_HN01,7.025152,5,2,2,head and neck,primary tumour,HNSCC,GSE164690_HNSCC_HN01


In [65]:
#exclude any samples with <100 cells
table(HNSCC$integration_id)
#none to exclude 
#HNSCC <- subset(HNSCC, !(subset = integration_id %in% c("GSE139324_HD_1","GSE139324_HD_2","GSE139324_HD_3","GSE139324_HD_4","GSE139324_HD_5","GSE139324_HNSCC_19","GSE139324_HNSCC_20","GSE139324_HNSCC_21","GSE139324_HNSCC_22")))
#table(HNSCC$integration_id)


GSE164690_HNSCC_HN01 GSE164690_HNSCC_HN02 GSE164690_HNSCC_HN03 
                 648                  687                  767 
GSE164690_HNSCC_HN04 GSE164690_HNSCC_HN05 GSE164690_HNSCC_HN06 
                 321                  772                  314 
GSE164690_HNSCC_HN07 GSE164690_HNSCC_HN08 GSE164690_HNSCC_HN09 
                 860                  805                  724 
GSE164690_HNSCC_HN10 GSE164690_HNSCC_HN11 GSE164690_HNSCC_HN12 
                 641                  404                  171 
GSE164690_HNSCC_HN13 GSE164690_HNSCC_HN14 GSE164690_HNSCC_HN15 
                 152                  391                  549 
GSE164690_HNSCC_HN16 GSE164690_HNSCC_HN17 GSE164690_HNSCC_HN18 
                 100                  395                  825 

In [66]:
#join layers and then split them by integration_id
Layers(HNSCC[["RNA"]])
#join layers
HNSCC[["RNA"]] <- JoinLayers(HNSCC[["RNA"]])
Layers(HNSCC[["RNA"]])
#split layers
HNSCC[["RNA"]] <- split(HNSCC[["RNA"]], f = HNSCC$integration_id)
Layers(HNSCC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [67]:
#record number of cells
table(HNSCC$integration_id)
HNSCC
HNSCC@project.name


GSE164690_HNSCC_HN01 GSE164690_HNSCC_HN02 GSE164690_HNSCC_HN03 
                 648                  687                  767 
GSE164690_HNSCC_HN04 GSE164690_HNSCC_HN05 GSE164690_HNSCC_HN06 
                 321                  772                  314 
GSE164690_HNSCC_HN07 GSE164690_HNSCC_HN08 GSE164690_HNSCC_HN09 
                 860                  805                  724 
GSE164690_HNSCC_HN10 GSE164690_HNSCC_HN11 GSE164690_HNSCC_HN12 
                 641                  404                  171 
GSE164690_HNSCC_HN13 GSE164690_HNSCC_HN14 GSE164690_HNSCC_HN15 
                 152                  391                  549 
GSE164690_HNSCC_HN16 GSE164690_HNSCC_HN17 GSE164690_HNSCC_HN18 
                 100                  395                  825 

An object of class Seurat 
33545 features across 9526 samples within 1 assay 
Active assay: RNA (33545 features, 2000 variable features)
 37 layers present: data.GSE164690_HNSCC_HN01, data.GSE164690_HNSCC_HN02, data.GSE164690_HNSCC_HN03, data.GSE164690_HNSCC_HN04, data.GSE164690_HNSCC_HN05, data.GSE164690_HNSCC_HN06, data.GSE164690_HNSCC_HN07, data.GSE164690_HNSCC_HN08, data.GSE164690_HNSCC_HN09, data.GSE164690_HNSCC_HN10, data.GSE164690_HNSCC_HN11, data.GSE164690_HNSCC_HN12, data.GSE164690_HNSCC_HN13, data.GSE164690_HNSCC_HN14, data.GSE164690_HNSCC_HN15, data.GSE164690_HNSCC_HN16, data.GSE164690_HNSCC_HN17, data.GSE164690_HNSCC_HN18, scale.data, counts.GSE164690_HNSCC_HN01, counts.GSE164690_HNSCC_HN02, counts.GSE164690_HNSCC_HN03, counts.GSE164690_HNSCC_HN04, counts.GSE164690_HNSCC_HN05, counts.GSE164690_HNSCC_HN06, counts.GSE164690_HNSCC_HN07, counts.GSE164690_HNSCC_HN08, counts.GSE164690_HNSCC_HN09, counts.GSE164690_HNSCC_HN10, counts.GSE164690_HNSCC_HN11, counts.GSE164690_HNSCC_HN1

In [68]:
#re-export seurat object ready for integration
saveRDS(HNSCC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE164690_myeloid_int.RDS")

In [69]:
#remove all objects in R
rm(list = ls())

## GSE173468

In [70]:
HNSCC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE173468_myeloid.RDS")

In [71]:
HNSCC
HNSCC@project.name
head(HNSCC@meta.data)

An object of class Seurat 
36601 features across 4434 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 37 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE173468_N3_AAATGCCTCATAACCG-1,GSE173468,3796,1228,Healthy,Healthy,N3,GSE173468_Healthy_N3,4.3730242,5,5
GSE173468_N3_AACTGGTCAGCTGGCT-1,GSE173468,871,425,Healthy,Healthy,N3,GSE173468_Healthy_N3,1.1481056,5,5
GSE173468_N3_ACACCGGCAGCCTTTC-1,GSE173468,2879,1401,Healthy,Healthy,N3,GSE173468_Healthy_N3,0.24314,5,5
GSE173468_N3_ACACTGACAAGGTTCT-1,GSE173468,16743,2870,Healthy,Healthy,N3,GSE173468_Healthy_N3,0.185152,5,5
GSE173468_N3_AGAGTGGTCTGCAAGT-1,GSE173468,4944,1105,Healthy,Healthy,N3,GSE173468_Healthy_N3,0.2224919,5,5
GSE173468_N3_AGCTTGACAGGGTACA-1,GSE173468,615,357,Healthy,Healthy,N3,GSE173468_Healthy_N3,0.0,5,5


In [72]:
table(HNSCC$sample_type)
table(HNSCC$cancer_type)
table(HNSCC$patient_id)
table(HNSCC$sample_id)


    Healthy      tumour tumour-mets 
        184        4032         218 


Healthy   HNSCC 
    184    4250 


     N3      N5      N6      T1     T10     T13     T14     T19     T22     T25 
     59      40      85     294     232     130     259     481     204     357 
    T26     T27  T29Met T29Prim      T3      T5      T6      T8 
    308     411     218     302     316     446     130     162 


    GSE173468_Healthy_N3     GSE173468_Healthy_N5     GSE173468_Healthy_N6 
                      59                       40                       85 
GSE173468_HNSCC_Mets_T29 GSE173468_HNSCC_Prim_T29       GSE173468_HNSCC_T1 
                     218                      302                      294 
     GSE173468_HNSCC_T10      GSE173468_HNSCC_T13      GSE173468_HNSCC_T14 
                     232                      130                      259 
     GSE173468_HNSCC_T19      GSE173468_HNSCC_T22      GSE173468_HNSCC_T25 
                     481                      204                      357 
     GSE173468_HNSCC_T26      GSE173468_HNSCC_T27       GSE173468_HNSCC_T3 
                     308                      411                      316 
      GSE173468_HNSCC_T5       GSE173468_HNSCC_T6       GSE173468_HNSCC_T8 
                     446                      130                      162 

In [78]:
#exclude any samples with <100 cells
table(HNSCC$patient_id)
#exclude the 3 normal patients 
HNSCC <- subset(HNSCC, !(subset = patient_id %in% c("N3","N5","N6")))
table(HNSCC$patient_id)

#it is unclear from study site location of the one metastasis sample, so excluding this as well
HNSCC <- subset(HNSCC, !(subset = patient_id %in% c("T29Met")))
table(HNSCC$patient_id)



     T1     T10     T13     T14     T19     T22     T25     T26     T27  T29Met 
    294     232     130     259     481     204     357     308     411     218 
T29Prim      T3      T5      T6      T8 
    302     316     446     130     162 


     T1     T10     T13     T14     T19     T22     T25     T26     T27  T29Met 
    294     232     130     259     481     204     357     308     411     218 
T29Prim      T3      T5      T6      T8 
    302     316     446     130     162 


     T1     T10     T13     T14     T19     T22     T25     T26     T27 T29Prim 
    294     232     130     259     481     204     357     308     411     302 
     T3      T5      T6      T8 
    316     446     130     162 

In [79]:
table(HNSCC$sample_type)
table(HNSCC$cancer_type)
table(HNSCC$patient_id)
table(HNSCC$sample_id)


tumour 
  4032 


HNSCC 
 4032 


     T1     T10     T13     T14     T19     T22     T25     T26     T27 T29Prim 
    294     232     130     259     481     204     357     308     411     302 
     T3      T5      T6      T8 
    316     446     130     162 


GSE173468_HNSCC_Prim_T29       GSE173468_HNSCC_T1      GSE173468_HNSCC_T10 
                     302                      294                      232 
     GSE173468_HNSCC_T13      GSE173468_HNSCC_T14      GSE173468_HNSCC_T19 
                     130                      259                      481 
     GSE173468_HNSCC_T22      GSE173468_HNSCC_T25      GSE173468_HNSCC_T26 
                     204                      357                      308 
     GSE173468_HNSCC_T27       GSE173468_HNSCC_T3       GSE173468_HNSCC_T5 
                     411                      316                      446 
      GSE173468_HNSCC_T6       GSE173468_HNSCC_T8 
                     130                      162 

In [81]:
#rename T29 patient and sample id to match others

#split by patient_id
HNSCC_29 <- subset(HNSCC, subset = patient_id %in% c("T29Prim"))
HNSCC_else <- subset(HNSCC, !(subset = patient_id %in% c("T29Prim")))

HNSCC_29@meta.data$patient_id <- "T29"
HNSCC_29@meta.data$sample_id <- "GSE173468_HNSCC_T29"

#merge back together 
HNSCC <- merge(HNSCC_29, y = c(HNSCC_else), project = "GSE173468")

In [82]:
table(HNSCC$sample_type)
table(HNSCC$cancer_type)
table(HNSCC$patient_id)
table(HNSCC$sample_id)


tumour 
  4032 


HNSCC 
 4032 


 T1 T10 T13 T14 T19 T22 T25 T26 T27 T29  T3  T5  T6  T8 
294 232 130 259 481 204 357 308 411 302 316 446 130 162 


 GSE173468_HNSCC_T1 GSE173468_HNSCC_T10 GSE173468_HNSCC_T13 GSE173468_HNSCC_T14 
                294                 232                 130                 259 
GSE173468_HNSCC_T19 GSE173468_HNSCC_T22 GSE173468_HNSCC_T25 GSE173468_HNSCC_T26 
                481                 204                 357                 308 
GSE173468_HNSCC_T27 GSE173468_HNSCC_T29  GSE173468_HNSCC_T3  GSE173468_HNSCC_T5 
                411                 302                 316                 446 
 GSE173468_HNSCC_T6  GSE173468_HNSCC_T8 
                130                 162 

In [83]:
#set site metadata
HNSCC@meta.data$site <- "head and neck"

#set sample_type_major metadata
HNSCC@meta.data$sample_type_major <- "primary tumour"

#set cancer_subtype metadata
HNSCC@meta.data$cancer_subtype <- "HNSCC"

#set integration_id metadata
HNSCC@meta.data$integration_id <- HNSCC@meta.data$sample_id

In [84]:
HNSCC
HNSCC@project.name
head(HNSCC@meta.data)

An object of class Seurat 
36601 features across 4032 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 30 layers present: counts.18.1, data.18.1, scale.data.1, counts.4.2, counts.5.2, counts.6.2, counts.7.2, counts.8.2, counts.9.2, counts.10.2, counts.11.2, counts.12.2, counts.13.2, counts.14.2, counts.15.2, counts.16.2, data.4.2, data.5.2, data.6.2, data.7.2, data.8.2, data.9.2, data.10.2, data.11.2, data.12.2, data.13.2, data.14.2, data.15.2, data.16.2, scale.data.2

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE173468_T29Primary_AAACCTGAGTCATGCT-1,GSE173468,12530,2679,tumour,HNSCC,T29,GSE173468_HNSCC_T29,3.735036,5,5,head and neck,primary tumour,HNSCC,GSE173468_HNSCC_T29
GSE173468_T29Primary_AAACGGGTCCGAAGAG-1,GSE173468,2934,1232,tumour,HNSCC,T29,GSE173468_HNSCC_T29,2.522154,5,5,head and neck,primary tumour,HNSCC,GSE173468_HNSCC_T29
GSE173468_T29Primary_AAAGATGTCTGGTTCC-1,GSE173468,15745,2989,tumour,HNSCC,T29,GSE173468_HNSCC_T29,4.903144,5,5,head and neck,primary tumour,HNSCC,GSE173468_HNSCC_T29
GSE173468_T29Primary_AACCGCGAGGATCGCA-1,GSE173468,11603,3338,tumour,HNSCC,T29,GSE173468_HNSCC_T29,7.937602,5,5,head and neck,primary tumour,HNSCC,GSE173468_HNSCC_T29
GSE173468_T29Primary_AACTCAGGTGCAGTAG-1,GSE173468,12396,2540,tumour,HNSCC,T29,GSE173468_HNSCC_T29,2.605679,5,5,head and neck,primary tumour,HNSCC,GSE173468_HNSCC_T29
GSE173468_T29Primary_AACTCCCGTACATGTC-1,GSE173468,4417,1648,tumour,HNSCC,T29,GSE173468_HNSCC_T29,7.516414,5,5,head and neck,primary tumour,HNSCC,GSE173468_HNSCC_T29


In [86]:
#exclude any samples with <100 cells
table(HNSCC$integration_id)
#no more to exclude 


 GSE173468_HNSCC_T1 GSE173468_HNSCC_T10 GSE173468_HNSCC_T13 GSE173468_HNSCC_T14 
                294                 232                 130                 259 
GSE173468_HNSCC_T19 GSE173468_HNSCC_T22 GSE173468_HNSCC_T25 GSE173468_HNSCC_T26 
                481                 204                 357                 308 
GSE173468_HNSCC_T27 GSE173468_HNSCC_T29  GSE173468_HNSCC_T3  GSE173468_HNSCC_T5 
                411                 302                 316                 446 
 GSE173468_HNSCC_T6  GSE173468_HNSCC_T8 
                130                 162 

In [87]:
#join layers and then split them by integration_id
Layers(HNSCC[["RNA"]])
#join layers
HNSCC[["RNA"]] <- JoinLayers(HNSCC[["RNA"]])
Layers(HNSCC[["RNA"]])
#split layers
HNSCC[["RNA"]] <- split(HNSCC[["RNA"]], f = HNSCC$integration_id)
Layers(HNSCC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [88]:
#record number of cells
table(HNSCC$integration_id)
HNSCC
HNSCC@project.name


 GSE173468_HNSCC_T1 GSE173468_HNSCC_T10 GSE173468_HNSCC_T13 GSE173468_HNSCC_T14 
                294                 232                 130                 259 
GSE173468_HNSCC_T19 GSE173468_HNSCC_T22 GSE173468_HNSCC_T25 GSE173468_HNSCC_T26 
                481                 204                 357                 308 
GSE173468_HNSCC_T27 GSE173468_HNSCC_T29  GSE173468_HNSCC_T3  GSE173468_HNSCC_T5 
                411                 302                 316                 446 
 GSE173468_HNSCC_T6  GSE173468_HNSCC_T8 
                130                 162 

An object of class Seurat 
36601 features across 4032 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 29 layers present: counts.GSE173468_HNSCC_T29, counts.GSE173468_HNSCC_T1, counts.GSE173468_HNSCC_T3, counts.GSE173468_HNSCC_T5, counts.GSE173468_HNSCC_T6, counts.GSE173468_HNSCC_T8, counts.GSE173468_HNSCC_T10, counts.GSE173468_HNSCC_T13, counts.GSE173468_HNSCC_T14, counts.GSE173468_HNSCC_T19, counts.GSE173468_HNSCC_T22, counts.GSE173468_HNSCC_T25, counts.GSE173468_HNSCC_T26, counts.GSE173468_HNSCC_T27, scale.data, data.GSE173468_HNSCC_T29, data.GSE173468_HNSCC_T1, data.GSE173468_HNSCC_T3, data.GSE173468_HNSCC_T5, data.GSE173468_HNSCC_T6, data.GSE173468_HNSCC_T8, data.GSE173468_HNSCC_T10, data.GSE173468_HNSCC_T13, data.GSE173468_HNSCC_T14, data.GSE173468_HNSCC_T19, data.GSE173468_HNSCC_T22, data.GSE173468_HNSCC_T25, data.GSE173468_HNSCC_T26, data.GSE173468_HNSCC_T27

In [89]:
#re-export seurat object ready for integration
saveRDS(HNSCC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE173468_myeloid_int.RDS")

In [90]:
#remove all objects in R
rm(list = ls())

## GSE188737

In [3]:
HNSCC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE188737_myeloid.RDS")

In [4]:
HNSCC
HNSCC@project.name
head(HNSCC@meta.data)

An object of class Seurat 
23148 features across 4897 samples within 1 assay 
Active assay: RNA (23148 features, 2000 variable features)
 29 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sampleID,percent.mt,origin,patientID,P_Mid,seurat_clusters,genecount,cell_type,sample_type,cancer_type,patient_id,sample_id,RNA_snn_res.0.2
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<dbl>,<dbl>,<chr>,<chr>,<chr>,<fct>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<fct>
GSE188737_HNSCC_HN237P_AAACGGGTCCTTGCCA-1,10X_hn,370,232,1,8.918919,HN237P,HN237,P,4,< 300,TAMs,primary tumour,HNSCC,Pt237,GSE188737_HNSCC_237_Primary,4
GSE188737_HNSCC_HN237P_AAAGCAAAGGAGTTGC-1,10X_hn,483,302,1,6.418219,HN237P,HN237,P,4,300-500,TAMs,primary tumour,HNSCC,Pt237,GSE188737_HNSCC_237_Primary,4
GSE188737_HNSCC_HN237P_AACCATGAGAGTCGGT-1,10X_hn,854,526,1,4.449649,HN237P,HN237,P,4,500-1000,TAMs,primary tumour,HNSCC,Pt237,GSE188737_HNSCC_237_Primary,4
GSE188737_HNSCC_HN237P_AACCATGTCTTTAGGG-1,10X_hn,580,383,1,4.827586,HN237P,HN237,P,4,300-500,TAMs,primary tumour,HNSCC,Pt237,GSE188737_HNSCC_237_Primary,4
GSE188737_HNSCC_HN237P_AACCGCGCACCGAAAG-1,10X_hn,480,283,1,9.375,HN237P,HN237,P,4,< 300,TAMs,primary tumour,HNSCC,Pt237,GSE188737_HNSCC_237_Primary,4
GSE188737_HNSCC_HN237P_ACCGTAATCCGAGCCA-1,10X_hn,549,313,1,4.735883,HN237P,HN237,P,4,300-500,TAMs,primary tumour,HNSCC,Pt237,GSE188737_HNSCC_237_Primary,4


In [5]:
table(HNSCC$sample_type)
table(HNSCC$cancer_type)
table(HNSCC$patient_id)
table(HNSCC$sample_id)


       LN mets primary tumour 
          2093           2804 


HNSCC 
 4897 


Pt237 Pt242 Pt251 Pt257 Pt263 Pt272 Pt279 
  145   189   319  1313  1462   884   585 


   GSE188737_HNSCC_237_mets GSE188737_HNSCC_237_Primary 
                         68                          77 
   GSE188737_HNSCC_242_mets GSE188737_HNSCC_242_Primary 
                         84                         105 
   GSE188737_HNSCC_251_mets GSE188737_HNSCC_251_Primary 
                        100                         219 
   GSE188737_HNSCC_257_mets GSE188737_HNSCC_257_Primary 
                        575                         738 
   GSE188737_HNSCC_263_mets GSE188737_HNSCC_263_Primary 
                        547                         915 
   GSE188737_HNSCC_272_mets GSE188737_HNSCC_272_Primary 
                        502                         382 
   GSE188737_HNSCC_279_mets GSE188737_HNSCC_279_Primary 
                        217                         368 

In [7]:
#split by sample_type
HNSCC_LN <- subset(HNSCC, subset = sample_type %in% c("LN mets"))
HNSCC_T <- subset(HNSCC, subset = sample_type %in% c("primary tumour"))

#set site metadata
HNSCC_LN@meta.data$site <- "lymph node"
HNSCC_T@meta.data$site <- "head and neck"

#set sample_type_major metadata
HNSCC_LN@meta.data$sample_type_major <- "metastatic tumour"
HNSCC_T@meta.data$sample_type_major <- "primary tumour"

#set cancer_subtype metadata
HNSCC_LN@meta.data$cancer_subtype <- "HNSCC"
HNSCC_T@meta.data$cancer_subtype <- "HNSCC"

#set integration_id metadata
HNSCC_LN@meta.data$integration_id <- HNSCC_LN@meta.data$sample_id
HNSCC_T@meta.data$integration_id <- HNSCC_T@meta.data$sample_id


#merge back together 
HNSCC <- merge(HNSCC_LN, y = c(HNSCC_T), project = "GSE188737")

In [8]:
HNSCC
HNSCC@project.name
head(HNSCC@meta.data)

An object of class Seurat 
23148 features across 4897 samples within 1 assay 
Active assay: RNA (23148 features, 2000 variable features)
 30 layers present: counts.8.1, counts.9.1, counts.10.1, counts.11.1, counts.12.1, counts.13.1, counts.14.1, data.8.1, data.9.1, data.10.1, data.11.1, data.12.1, data.13.1, data.14.1, scale.data.1, counts.1.2, counts.2.2, counts.3.2, counts.4.2, counts.5.2, counts.6.2, counts.7.2, data.1.2, data.2.2, data.3.2, data.4.2, data.5.2, data.6.2, data.7.2, scale.data.2

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sampleID,percent.mt,origin,patientID,P_Mid,seurat_clusters,genecount,cell_type,sample_type,cancer_type,patient_id,sample_id,RNA_snn_res.0.2,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<dbl>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE188737_HNSCC_HN237M_CAGCAGCAGATACACA-2,10X_hn,1745,1047,2,3.266476,HN237M,HN237,M,4,1000-3000,B cells,LN mets,HNSCC,Pt237,GSE188737_HNSCC_237_mets,4,lymph node,metastatic tumour,HNSCC,GSE188737_HNSCC_237_mets
GSE188737_HNSCC_HN237M_TTAGGACGTCTCAACA-2,10X_hn,419,273,2,2.863962,HN237M,HN237,M,4,< 300,B cells,LN mets,HNSCC,Pt237,GSE188737_HNSCC_237_mets,4,lymph node,metastatic tumour,HNSCC,GSE188737_HNSCC_237_mets
GSE188737_HNSCC_HN237M_AACTCAGGTTCAGCGC-2,10X_hn,968,541,2,4.028926,HN237M,HN237,M,4,500-1000,TAMs,LN mets,HNSCC,Pt237,GSE188737_HNSCC_237_mets,4,lymph node,metastatic tumour,HNSCC,GSE188737_HNSCC_237_mets
GSE188737_HNSCC_HN237M_AACTCAGTCCTAGAAC-2,10X_hn,654,456,2,3.975535,HN237M,HN237,M,4,300-500,TAMs,LN mets,HNSCC,Pt237,GSE188737_HNSCC_237_mets,4,lymph node,metastatic tumour,HNSCC,GSE188737_HNSCC_237_mets
GSE188737_HNSCC_HN237M_AACTCTTTCCAATGGT-2,10X_hn,1040,697,2,9.519231,HN237M,HN237,M,4,500-1000,TAMs,LN mets,HNSCC,Pt237,GSE188737_HNSCC_237_mets,4,lymph node,metastatic tumour,HNSCC,GSE188737_HNSCC_237_mets
GSE188737_HNSCC_HN237M_AGAGCTTAGTACCGGA-2,10X_hn,1104,695,2,4.257246,HN237M,HN237,M,4,500-1000,TAMs,LN mets,HNSCC,Pt237,GSE188737_HNSCC_237_mets,4,lymph node,metastatic tumour,HNSCC,GSE188737_HNSCC_237_mets


In [10]:
#exclude any samples with <100 cells
table(HNSCC$integration_id)
#exclude 3 samples
HNSCC <- subset(HNSCC, !(subset = integration_id %in% c("GSE188737_HNSCC_237_mets","GSE188737_HNSCC_237_Primary","GSE188737_HNSCC_242_mets")))
table(HNSCC$integration_id)


   GSE188737_HNSCC_237_mets GSE188737_HNSCC_237_Primary 
                         68                          77 
   GSE188737_HNSCC_242_mets GSE188737_HNSCC_242_Primary 
                         84                         105 
   GSE188737_HNSCC_251_mets GSE188737_HNSCC_251_Primary 
                        100                         219 
   GSE188737_HNSCC_257_mets GSE188737_HNSCC_257_Primary 
                        575                         738 
   GSE188737_HNSCC_263_mets GSE188737_HNSCC_263_Primary 
                        547                         915 
   GSE188737_HNSCC_272_mets GSE188737_HNSCC_272_Primary 
                        502                         382 
   GSE188737_HNSCC_279_mets GSE188737_HNSCC_279_Primary 
                        217                         368 


GSE188737_HNSCC_242_Primary    GSE188737_HNSCC_251_mets 
                        105                         100 
GSE188737_HNSCC_251_Primary    GSE188737_HNSCC_257_mets 
                        219                         575 
GSE188737_HNSCC_257_Primary    GSE188737_HNSCC_263_mets 
                        738                         547 
GSE188737_HNSCC_263_Primary    GSE188737_HNSCC_272_mets 
                        915                         502 
GSE188737_HNSCC_272_Primary    GSE188737_HNSCC_279_mets 
                        382                         217 
GSE188737_HNSCC_279_Primary 
                        368 

In [11]:
#join layers and then split them by integration_id
Layers(HNSCC[["RNA"]])
#join layers
HNSCC[["RNA"]] <- JoinLayers(HNSCC[["RNA"]])
Layers(HNSCC[["RNA"]])
#split layers
HNSCC[["RNA"]] <- split(HNSCC[["RNA"]], f = HNSCC$integration_id)
Layers(HNSCC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [12]:
#record number of cells
table(HNSCC$integration_id)
HNSCC
HNSCC@project.name


GSE188737_HNSCC_242_Primary    GSE188737_HNSCC_251_mets 
                        105                         100 
GSE188737_HNSCC_251_Primary    GSE188737_HNSCC_257_mets 
                        219                         575 
GSE188737_HNSCC_257_Primary    GSE188737_HNSCC_263_mets 
                        738                         547 
GSE188737_HNSCC_263_Primary    GSE188737_HNSCC_272_mets 
                        915                         502 
GSE188737_HNSCC_272_Primary    GSE188737_HNSCC_279_mets 
                        382                         217 
GSE188737_HNSCC_279_Primary 
                        368 

An object of class Seurat 
23148 features across 4668 samples within 1 assay 
Active assay: RNA (23148 features, 2000 variable features)
 23 layers present: counts.GSE188737_HNSCC_251_mets, counts.GSE188737_HNSCC_257_mets, counts.GSE188737_HNSCC_263_mets, counts.GSE188737_HNSCC_272_mets, counts.GSE188737_HNSCC_279_mets, counts.GSE188737_HNSCC_242_Primary, counts.GSE188737_HNSCC_251_Primary, counts.GSE188737_HNSCC_257_Primary, counts.GSE188737_HNSCC_263_Primary, counts.GSE188737_HNSCC_272_Primary, counts.GSE188737_HNSCC_279_Primary, scale.data, data.GSE188737_HNSCC_251_mets, data.GSE188737_HNSCC_257_mets, data.GSE188737_HNSCC_263_mets, data.GSE188737_HNSCC_272_mets, data.GSE188737_HNSCC_279_mets, data.GSE188737_HNSCC_242_Primary, data.GSE188737_HNSCC_251_Primary, data.GSE188737_HNSCC_257_Primary, data.GSE188737_HNSCC_263_Primary, data.GSE188737_HNSCC_272_Primary, data.GSE188737_HNSCC_279_Primary

In [13]:
#re-export seurat object ready for integration
saveRDS(HNSCC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE188737_myeloid_int.RDS")

In [14]:
#remove all objects in R
rm(list = ls())

## GSE234933

In [3]:
HNSCC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE234933_myeloid.RDS")

In [4]:
HNSCC
HNSCC@project.name
head(HNSCC@meta.data)

An object of class Seurat 
33514 features across 42631 samples within 1 assay 
Active assay: RNA (33514 features, 2000 variable features)
 105 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, counts.19, counts.20, counts.21, counts.22, counts.23, counts.24, counts.25, counts.26, counts.27, counts.28, counts.29, counts.30, counts.31, counts.32, counts.33, counts.34, counts.35, counts.36, counts.37, counts.38, counts.39, counts.40, counts.41, counts.42, counts.43, counts.44, counts.45, counts.46, counts.47, counts.48, counts.49, counts.50, counts.51, counts.52, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, data.19, data.20, data.21, data.22, data.23, data.24, data.25, data.26, data.27, data.28, data.29, data.30, data.31, data.32,

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE234933_HNSCC_tu_HN1_AAACCTGTCGGCGGTT-1,GSE234933,909,444,primary tumour,HNSCC,HN1,GSE234933_HNSCC_tu_HN1,0.770077,2,2
GSE234933_HNSCC_tu_HN1_AAACGGGAGTACGCGA-1,GSE234933,1152,416,primary tumour,HNSCC,HN1,GSE234933_HNSCC_tu_HN1,0.5208333,2,2
GSE234933_HNSCC_tu_HN1_AAACGGGGTGACTACT-1,GSE234933,1715,647,primary tumour,HNSCC,HN1,GSE234933_HNSCC_tu_HN1,1.0495627,2,2
GSE234933_HNSCC_tu_HN1_AAAGATGGTCGCTTTC-1,GSE234933,11304,2947,primary tumour,HNSCC,HN1,GSE234933_HNSCC_tu_HN1,8.2271762,2,2
GSE234933_HNSCC_tu_HN1_AAATGCCTCAAGCCTA-1,GSE234933,4426,1562,primary tumour,HNSCC,HN1,GSE234933_HNSCC_tu_HN1,7.8400362,2,2
GSE234933_HNSCC_tu_HN1_AACTCCCCATAGAAAC-1,GSE234933,3999,1288,primary tumour,HNSCC,HN1,GSE234933_HNSCC_tu_HN1,7.6019005,2,2


In [5]:
table(HNSCC$sample_type)
table(HNSCC$cancer_type)
table(HNSCC$patient_id)
table(HNSCC$sample_id)


       metastasis    primary tumour tumour recurrence 
            13111             21826              7694 


HNSCC 
42631 


 HN1 HN13 HN14 HN17  HN2 HN20 HN21 HN22 HN23 HN25 HN26 HN27 HN28 HN29 HN30 HN31 
 658  138  163 1771 1821  189  391  304  939 1070 1082  445  111  376  385  631 
HN32 HN33 HN34 HN35 HN37 HN38 HN39 HN40 HN42 HN43 HN45 HN46 HN49 HN50 HN52 HN55 
 466 2835  310  854  215  263  236  804 3306 1408  535 3867 1138   29  111  345 
HN57 HN58 HN59 HN60 HN61 HN63 HN64 HN66 HN67 HN68  HN7 HN70 HN71 HN72 HN73 HN74 
 292  620 1056  765 1337  566 1078  114 1758 1736   49  536 1279  693  201  372 
HN75 HN76 HN77  HN8 
 710 1171 1086   16 


GSE234933_HNSCC_mets_HN13 GSE234933_HNSCC_mets_HN14  GSE234933_HNSCC_mets_HN2 
                      138                       163                      1821 
GSE234933_HNSCC_mets_HN21 GSE234933_HNSCC_mets_HN25 GSE234933_HNSCC_mets_HN27 
                      391                      1070                       445 
GSE234933_HNSCC_mets_HN33 GSE234933_HNSCC_mets_HN34 GSE234933_HNSCC_mets_HN42 
                     2835                       310                      3306 
GSE234933_HNSCC_mets_HN61 GSE234933_HNSCC_mets_HN71  GSE234933_HNSCC_mets_HN8 
                     1337                      1279                        16 
 GSE234933_HNSCC_rec_HN20  GSE234933_HNSCC_rec_HN22  GSE234933_HNSCC_rec_HN23 
                      189                       304                       939 
 GSE234933_HNSCC_rec_HN26  GSE234933_HNSCC_rec_HN28  GSE234933_HNSCC_rec_HN29 
                     1082                       111                       376 
 GSE234933_HNSCC_rec_HN32  GSE234933_HNSCC_rec_HN35

In [6]:
#set cancer_subtype metadata
HNSCC@meta.data$cancer_subtype <- "HNSCC"

#set integration_id metadata
HNSCC@meta.data$integration_id <- HNSCC@meta.data$sample_id

In [7]:
table(HNSCC$sample_type)


       metastasis    primary tumour tumour recurrence 
            13111             21826              7694 

In [8]:
#split by sample_type
HNSCC_M <- subset(HNSCC, subset = sample_type %in% c("metastasis"))
HNSCC_T <- subset(HNSCC, subset = sample_type %in% c("primary tumour"))
HNSCC_R <- subset(HNSCC, subset = sample_type %in% c("tumour recurrence"))


#set site metadata for tumours and recurrence
HNSCC_T@meta.data$site <- "head and neck"
HNSCC_R@meta.data$site <- "head and neck"

#set sample_type_major metadata
HNSCC_M@meta.data$sample_type_major <- "metastatic tumour"
HNSCC_T@meta.data$sample_type_major <- "primary tumour"
HNSCC_R@meta.data$sample_type_major <- "local recurrence"

In [10]:
table(HNSCC_M$sample_id)


GSE234933_HNSCC_mets_HN13 GSE234933_HNSCC_mets_HN14  GSE234933_HNSCC_mets_HN2 
                      138                       163                      1821 
GSE234933_HNSCC_mets_HN21 GSE234933_HNSCC_mets_HN25 GSE234933_HNSCC_mets_HN27 
                      391                      1070                       445 
GSE234933_HNSCC_mets_HN33 GSE234933_HNSCC_mets_HN34 GSE234933_HNSCC_mets_HN42 
                     2835                       310                      3306 
GSE234933_HNSCC_mets_HN61 GSE234933_HNSCC_mets_HN71  GSE234933_HNSCC_mets_HN8 
                     1337                      1279                        16 

In [11]:
# set site metadata for metastatic samples

#split by sample_id
HNSCC_M_13 <- subset(HNSCC_M, subset = sample_id %in% c("GSE234933_HNSCC_mets_HN13"))
HNSCC_M_14 <- subset(HNSCC_M, subset = sample_id %in% c("GSE234933_HNSCC_mets_HN14"))
HNSCC_M_02 <- subset(HNSCC_M, subset = sample_id %in% c("GSE234933_HNSCC_mets_HN2"))
HNSCC_M_21 <- subset(HNSCC_M, subset = sample_id %in% c("GSE234933_HNSCC_mets_HN21"))
HNSCC_M_25 <- subset(HNSCC_M, subset = sample_id %in% c("GSE234933_HNSCC_mets_HN25"))
HNSCC_M_27 <- subset(HNSCC_M, subset = sample_id %in% c("GSE234933_HNSCC_mets_HN27"))
HNSCC_M_33 <- subset(HNSCC_M, subset = sample_id %in% c("GSE234933_HNSCC_mets_HN33"))
HNSCC_M_34 <- subset(HNSCC_M, subset = sample_id %in% c("GSE234933_HNSCC_mets_HN34"))
HNSCC_M_42 <- subset(HNSCC_M, subset = sample_id %in% c("GSE234933_HNSCC_mets_HN42"))
HNSCC_M_61 <- subset(HNSCC_M, subset = sample_id %in% c("GSE234933_HNSCC_mets_HN61"))
HNSCC_M_71 <- subset(HNSCC_M, subset = sample_id %in% c("GSE234933_HNSCC_mets_HN71"))
HNSCC_M_08 <- subset(HNSCC_M, subset = sample_id %in% c("GSE234933_HNSCC_mets_HN8"))

#set site based off info in original GEO
HNSCC_M_13@meta.data$site <- "skin"
HNSCC_M_14@meta.data$site <- "pleura"
HNSCC_M_02@meta.data$site <- "lung"
HNSCC_M_21@meta.data$site <- "liver"
HNSCC_M_25@meta.data$site <- "lung"
HNSCC_M_27@meta.data$site <- "lung"
HNSCC_M_33@meta.data$site <- "lung"
HNSCC_M_34@meta.data$site <- "lung"
HNSCC_M_42@meta.data$site <- "lung"
HNSCC_M_61@meta.data$site <- "liver"
HNSCC_M_71@meta.data$site <- "liver"
HNSCC_M_08@meta.data$site <- "sternum"

#merge back together 
HNSCC_M <- merge(HNSCC_M_13, y = c(HNSCC_M_14, HNSCC_M_02, HNSCC_M_21, HNSCC_M_25, HNSCC_M_27, HNSCC_M_33, HNSCC_M_34, HNSCC_M_42, HNSCC_M_61, HNSCC_M_71, HNSCC_M_08), project = "GSE234933")

In [12]:
#merge back together 
HNSCC <- merge(HNSCC_M, y = c(HNSCC_T, HNSCC_R), project = "GSE234933")

In [13]:
HNSCC
HNSCC@project.name
head(HNSCC@meta.data)

An object of class Seurat 
33514 features across 42631 samples within 1 assay 
Active assay: RNA (33514 features, 2000 variable features)
 118 layers present: counts.5.1.1, data.5.1.1, scale.data.1.1, counts.6.2.1, data.6.2.1, scale.data.2.1, counts.2.3.1, data.2.3.1, scale.data.3.1, counts.9.4.1, data.9.4.1, scale.data.4.1, counts.12.5.1, data.12.5.1, scale.data.5.1, counts.14.6.1, data.14.6.1, scale.data.6.1, counts.20.7.1, data.20.7.1, scale.data.7.1, counts.21.8.1, data.21.8.1, scale.data.8.1, counts.27.9.1, data.27.9.1, scale.data.9.1, counts.39.10.1, data.39.10.1, scale.data.10.1, counts.46.11.1, data.46.11.1, scale.data.11.1, counts.4.12.1, data.4.12.1, scale.data.12.1, counts.1.2, counts.3.2, counts.7.2, counts.17.2, counts.18.2, counts.25.2, counts.26.2, counts.30.2, counts.31.2, counts.32.2, counts.33.2, counts.36.2, counts.37.2, counts.38.2, counts.40.2, counts.41.2, counts.43.2, counts.44.2, counts.45.2, counts.47.2, counts.49.2, counts.50.2, counts.51.2, counts.52.2, data.

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,cancer_subtype,integration_id,sample_type_major,site
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE234933_HNSCC_mets_HN13_AAACGGGCAAGTCTAC-1,GSE234933,722,377,metastasis,HNSCC,HN13,GSE234933_HNSCC_mets_HN13,5.540166,2,2,HNSCC,GSE234933_HNSCC_mets_HN13,metastatic tumour,skin
GSE234933_HNSCC_mets_HN13_AAATGCCAGAGCTATA-1,GSE234933,352,229,metastasis,HNSCC,HN13,GSE234933_HNSCC_mets_HN13,9.943182,2,2,HNSCC,GSE234933_HNSCC_mets_HN13,metastatic tumour,skin
GSE234933_HNSCC_mets_HN13_AACCATGAGAGGTTAT-1,GSE234933,2403,964,metastasis,HNSCC,HN13,GSE234933_HNSCC_mets_HN13,2.163962,2,2,HNSCC,GSE234933_HNSCC_mets_HN13,metastatic tumour,skin
GSE234933_HNSCC_mets_HN13_AACCGCGTCCAGGGCT-1,GSE234933,284,204,metastasis,HNSCC,HN13,GSE234933_HNSCC_mets_HN13,2.816901,2,2,HNSCC,GSE234933_HNSCC_mets_HN13,metastatic tumour,skin
GSE234933_HNSCC_mets_HN13_AACTTTCTCTTGACGA-1,GSE234933,1216,552,metastasis,HNSCC,HN13,GSE234933_HNSCC_mets_HN13,10.444079,2,2,HNSCC,GSE234933_HNSCC_mets_HN13,metastatic tumour,skin
GSE234933_HNSCC_mets_HN13_AATCGGTAGAGCTGGT-1,GSE234933,405,257,metastasis,HNSCC,HN13,GSE234933_HNSCC_mets_HN13,2.222222,2,2,HNSCC,GSE234933_HNSCC_mets_HN13,metastatic tumour,skin


In [15]:
#exclude any samples with <100 cells
table(HNSCC$integration_id)
#exclude GSE234933_HNSCC_mets_HN8 and GSE234933_HNSCC_tu_HN50 and GSE234933_HNSCC_tu_HN7
HNSCC <- subset(HNSCC, !(subset = integration_id %in% c("GSE234933_HNSCC_mets_HN8","GSE234933_HNSCC_tu_HN50","GSE234933_HNSCC_tu_HN7")))
table(HNSCC$integration_id)


GSE234933_HNSCC_mets_HN13 GSE234933_HNSCC_mets_HN14  GSE234933_HNSCC_mets_HN2 
                      138                       163                      1821 
GSE234933_HNSCC_mets_HN21 GSE234933_HNSCC_mets_HN25 GSE234933_HNSCC_mets_HN27 
                      391                      1070                       445 
GSE234933_HNSCC_mets_HN33 GSE234933_HNSCC_mets_HN34 GSE234933_HNSCC_mets_HN42 
                     2835                       310                      3306 
GSE234933_HNSCC_mets_HN61 GSE234933_HNSCC_mets_HN71  GSE234933_HNSCC_mets_HN8 
                     1337                      1279                        16 
 GSE234933_HNSCC_rec_HN20  GSE234933_HNSCC_rec_HN22  GSE234933_HNSCC_rec_HN23 
                      189                       304                       939 
 GSE234933_HNSCC_rec_HN26  GSE234933_HNSCC_rec_HN28  GSE234933_HNSCC_rec_HN29 
                     1082                       111                       376 
 GSE234933_HNSCC_rec_HN32  GSE234933_HNSCC_rec_HN35


GSE234933_HNSCC_mets_HN13 GSE234933_HNSCC_mets_HN14  GSE234933_HNSCC_mets_HN2 
                      138                       163                      1821 
GSE234933_HNSCC_mets_HN21 GSE234933_HNSCC_mets_HN25 GSE234933_HNSCC_mets_HN27 
                      391                      1070                       445 
GSE234933_HNSCC_mets_HN33 GSE234933_HNSCC_mets_HN34 GSE234933_HNSCC_mets_HN42 
                     2835                       310                      3306 
GSE234933_HNSCC_mets_HN61 GSE234933_HNSCC_mets_HN71  GSE234933_HNSCC_rec_HN20 
                     1337                      1279                       189 
 GSE234933_HNSCC_rec_HN22  GSE234933_HNSCC_rec_HN23  GSE234933_HNSCC_rec_HN26 
                      304                       939                      1082 
 GSE234933_HNSCC_rec_HN28  GSE234933_HNSCC_rec_HN29  GSE234933_HNSCC_rec_HN32 
                      111                       376                       466 
 GSE234933_HNSCC_rec_HN35  GSE234933_HNSCC_rec_HN37

In [16]:
table(HNSCC$site)


head and neck         liver          lung        pleura          skin 
        29442          3007          9787           163           138 

In [17]:
#join layers and then split them by integration_id
Layers(HNSCC[["RNA"]])
#join layers
HNSCC[["RNA"]] <- JoinLayers(HNSCC[["RNA"]])
Layers(HNSCC[["RNA"]])
#split layers
HNSCC[["RNA"]] <- split(HNSCC[["RNA"]], f = HNSCC$integration_id)
Layers(HNSCC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [18]:
#record number of cells
table(HNSCC$integration_id)
HNSCC
HNSCC@project.name


GSE234933_HNSCC_mets_HN13 GSE234933_HNSCC_mets_HN14  GSE234933_HNSCC_mets_HN2 
                      138                       163                      1821 
GSE234933_HNSCC_mets_HN21 GSE234933_HNSCC_mets_HN25 GSE234933_HNSCC_mets_HN27 
                      391                      1070                       445 
GSE234933_HNSCC_mets_HN33 GSE234933_HNSCC_mets_HN34 GSE234933_HNSCC_mets_HN42 
                     2835                       310                      3306 
GSE234933_HNSCC_mets_HN61 GSE234933_HNSCC_mets_HN71  GSE234933_HNSCC_rec_HN20 
                     1337                      1279                       189 
 GSE234933_HNSCC_rec_HN22  GSE234933_HNSCC_rec_HN23  GSE234933_HNSCC_rec_HN26 
                      304                       939                      1082 
 GSE234933_HNSCC_rec_HN28  GSE234933_HNSCC_rec_HN29  GSE234933_HNSCC_rec_HN32 
                      111                       376                       466 
 GSE234933_HNSCC_rec_HN35  GSE234933_HNSCC_rec_HN37

An object of class Seurat 
33514 features across 42537 samples within 1 assay 
Active assay: RNA (33514 features, 2000 variable features)
 99 layers present: counts.GSE234933_HNSCC_mets_HN13, counts.GSE234933_HNSCC_mets_HN14, counts.GSE234933_HNSCC_mets_HN2, counts.GSE234933_HNSCC_mets_HN21, counts.GSE234933_HNSCC_mets_HN25, counts.GSE234933_HNSCC_mets_HN27, counts.GSE234933_HNSCC_mets_HN33, counts.GSE234933_HNSCC_mets_HN34, counts.GSE234933_HNSCC_mets_HN42, counts.GSE234933_HNSCC_mets_HN61, counts.GSE234933_HNSCC_mets_HN71, counts.GSE234933_HNSCC_tu_HN1, counts.GSE234933_HNSCC_tu_HN17, counts.GSE234933_HNSCC_tu_HN30, counts.GSE234933_HNSCC_tu_HN31, counts.GSE234933_HNSCC_tu_HN39, counts.GSE234933_HNSCC_tu_HN40, counts.GSE234933_HNSCC_tu_HN46, counts.GSE234933_HNSCC_tu_HN49, counts.GSE234933_HNSCC_tu_HN52, counts.GSE234933_HNSCC_tu_HN58, counts.GSE234933_HNSCC_tu_HN59, counts.GSE234933_HNSCC_tu_HN60, counts.GSE234933_HNSCC_tu_HN63, counts.GSE234933_HNSCC_tu_HN64, counts.GSE234933_HNSCC

In [19]:
#re-export seurat object ready for integration
saveRDS(HNSCC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE234933_myeloid_int.RDS")

In [20]:
#remove all objects in R
rm(list = ls())

## GSE154778

In [23]:
PDAC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE154778_myeloid.RDS")

In [24]:
PDAC
PDAC@project.name
head(PDAC@meta.data)

An object of class Seurat 
51911 features across 1718 samples within 1 assay 
Active assay: RNA (51911 features, 2000 variable features)
 31 layers present: counts.1, counts.2, counts.3, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, data.1, data.2, data.3, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE154778_PDAC_P01_AACATATGTGGAAA-1,GSE154778,3365,363,primary tumour,PDAC,P01,GSE154778_PDAC_primary_01,0.05943536,2,2
GSE154778_PDAC_P01_AAGCAAGATAGCGT-1,GSE154778,32616,2221,primary tumour,PDAC,P01,GSE154778_PDAC_primary_01,0.79715477,2,2
GSE154778_PDAC_P01_AATTCCTGCTGATG-1,GSE154778,4215,1311,primary tumour,PDAC,P01,GSE154778_PDAC_primary_01,2.27758007,2,2
GSE154778_PDAC_P01_ACAAATTGTCCAAG-1,GSE154778,14291,916,primary tumour,PDAC,P01,GSE154778_PDAC_primary_01,0.37786019,2,2
GSE154778_PDAC_P01_ACCACGCTCTTATC-1,GSE154778,1097,411,primary tumour,PDAC,P01,GSE154778_PDAC_primary_01,5.10483136,2,2
GSE154778_PDAC_P01_ACTGGCCTGGAAGC-1,GSE154778,561,212,primary tumour,PDAC,P01,GSE154778_PDAC_primary_01,2.13903743,2,2


In [25]:
table(PDAC$sample_type)
table(PDAC$cancer_type)
table(PDAC$patient_id)
table(PDAC$sample_id)


    metastasis primary tumour 
           345           1373 


PDAC 
1718 


MET01 MET02 MET03 MET04 MET05 MET06   P01   P02   P03   P05   P06   P07   P08 
   12    19    29    99    93    93    50    23    14    88    62   630   115 
  P09   P10 
  293    98 


   GSE154778_PDAC_mets_01    GSE154778_PDAC_mets_02    GSE154778_PDAC_mets_03 
                       12                        19                        29 
   GSE154778_PDAC_mets_04    GSE154778_PDAC_mets_05    GSE154778_PDAC_mets_06 
                       99                        93                        93 
GSE154778_PDAC_primary_01 GSE154778_PDAC_primary_02 GSE154778_PDAC_primary_03 
                       50                        23                        14 
GSE154778_PDAC_primary_05 GSE154778_PDAC_primary_06 GSE154778_PDAC_primary_07 
                       88                        62                       630 
GSE154778_PDAC_primary_08 GSE154778_PDAC_primary_09 GSE154778_PDAC_primary_10 
                      115                       293                        98 

In [28]:
#almost every sample in this dataset does not have sufficient cells, 
#only keeping the 3 primary samples that do

PDAC <- subset(PDAC, subset = sample_id %in% c("GSE154778_PDAC_primary_07","GSE154778_PDAC_primary_08","GSE154778_PDAC_primary_09"))

In [29]:
table(PDAC$sample_type)
table(PDAC$cancer_type)
table(PDAC$patient_id)
table(PDAC$sample_id)


primary tumour 
          1038 


PDAC 
1038 


P07 P08 P09 
630 115 293 


GSE154778_PDAC_primary_07 GSE154778_PDAC_primary_08 GSE154778_PDAC_primary_09 
                      630                       115                       293 

In [30]:
#set cancer_subtype metadata
PDAC@meta.data$cancer_subtype <- "PDAC"

#set integration_id metadata
PDAC@meta.data$integration_id <- PDAC@meta.data$sample_id

#set site metadata
PDAC@meta.data$site <- "pancreas"

#set sample_type_major metadata
PDAC@meta.data$sample_type_major <- "primary tumour"

In [31]:
PDAC
PDAC@project.name
head(PDAC@meta.data)

An object of class Seurat 
32738 features across 1038 samples within 1 assay 
Active assay: RNA (32738 features, 1834 variable features)
 7 layers present: counts.7, counts.8, counts.9, data.7, data.8, data.9, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,cancer_subtype,integration_id,site,sample_type_major
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
GSE154778_PDAC_P07_AAACCTGAGGTTCCTA-1,GSE154778,19324,3375,primary tumour,PDAC,P07,GSE154778_PDAC_primary_07,3.881184,2,2,PDAC,GSE154778_PDAC_primary_07,pancreas,primary tumour
GSE154778_PDAC_P07_AAACCTGGTACAGTTC-1,GSE154778,7785,2260,primary tumour,PDAC,P07,GSE154778_PDAC_primary_07,2.491972,2,2,PDAC,GSE154778_PDAC_primary_07,pancreas,primary tumour
GSE154778_PDAC_P07_AAACGGGCAGTCAGCC-1,GSE154778,14092,2857,primary tumour,PDAC,P07,GSE154778_PDAC_primary_07,4.903491,2,2,PDAC,GSE154778_PDAC_primary_07,pancreas,primary tumour
GSE154778_PDAC_P07_AAACGGGGTAATCACC-1,GSE154778,11402,2870,primary tumour,PDAC,P07,GSE154778_PDAC_primary_07,2.429398,2,2,PDAC,GSE154778_PDAC_primary_07,pancreas,primary tumour
GSE154778_PDAC_P07_AAAGATGAGAAGGGTA-1,GSE154778,5437,1839,primary tumour,PDAC,P07,GSE154778_PDAC_primary_07,3.880817,2,2,PDAC,GSE154778_PDAC_primary_07,pancreas,primary tumour
GSE154778_PDAC_P07_AAAGTAGAGATAGTCA-1,GSE154778,21022,4366,primary tumour,PDAC,P07,GSE154778_PDAC_primary_07,6.545524,2,2,PDAC,GSE154778_PDAC_primary_07,pancreas,primary tumour


In [33]:
#exclude any samples with <100 cells
table(PDAC$integration_id)
#already excluded above


GSE154778_PDAC_primary_07 GSE154778_PDAC_primary_08 GSE154778_PDAC_primary_09 
                      630                       115                       293 

In [34]:
#join layers and then split them by integration_id
Layers(PDAC[["RNA"]])
#join layers
PDAC[["RNA"]] <- JoinLayers(PDAC[["RNA"]])
Layers(PDAC[["RNA"]])
#split layers
PDAC[["RNA"]] <- split(PDAC[["RNA"]], f = PDAC$integration_id)
Layers(PDAC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [35]:
#record number of cells
table(PDAC$integration_id)
PDAC
PDAC@project.name


GSE154778_PDAC_primary_07 GSE154778_PDAC_primary_08 GSE154778_PDAC_primary_09 
                      630                       115                       293 

An object of class Seurat 
32738 features across 1038 samples within 1 assay 
Active assay: RNA (32738 features, 1834 variable features)
 7 layers present: data.GSE154778_PDAC_primary_07, data.GSE154778_PDAC_primary_08, data.GSE154778_PDAC_primary_09, scale.data, counts.GSE154778_PDAC_primary_07, counts.GSE154778_PDAC_primary_08, counts.GSE154778_PDAC_primary_09
 2 dimensional reductions calculated: pca, umap

In [37]:
#re-export seurat object ready for integration
saveRDS(PDAC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE154778_myeloid_int.RDS")

In [38]:
#remove all objects in R
rm(list = ls())

## GSE156405

In [39]:
PDAC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE156405_myeloid.RDS")

In [40]:
PDAC
PDAC@project.name
head(PDAC@meta.data)

An object of class Seurat 
33694 features across 3621 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 17 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,graph_res.0.2,RNA_snn_res.0.3
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<fct>,<fct>
GSE156405_PDAC_P1_AAACCTGGTTCCCTTG-1,GSE156405,7115,2117,tumour,PDAC primary,P1,GSE156405_PDAC_Primary_P1,4.286718,5,5,2,5
GSE156405_PDAC_P1_AACTCAGCAAGAAAGG-1,GSE156405,7337,2388,tumour,PDAC primary,P1,GSE156405_PDAC_Primary_P1,3.952569,5,5,2,5
GSE156405_PDAC_P1_AAGGCAGGTTGGAGGT-1,GSE156405,2418,1069,tumour,PDAC primary,P1,GSE156405_PDAC_Primary_P1,3.598015,9,9,2,8
GSE156405_PDAC_P1_AATCCAGCATTAGCCA-1,GSE156405,3548,1461,tumour,PDAC primary,P1,GSE156405_PDAC_Primary_P1,3.523112,9,9,2,8
GSE156405_PDAC_P1_ACAGCTACAGGTGGAT-1,GSE156405,1928,922,tumour,PDAC primary,P1,GSE156405_PDAC_Primary_P1,3.73444,9,9,2,8
GSE156405_PDAC_P1_ACATACGCATGCAACT-1,GSE156405,1876,906,tumour,PDAC primary,P1,GSE156405_PDAC_Primary_P1,4.317697,9,9,2,8


In [42]:
table(PDAC$sample_type)
table(PDAC$cancer_type)
table(PDAC$patient_id)
table(PDAC$sample_id)


metastasis     tumour 
      1590       2031 


  PDAC Liver mets    PDAC Lung mets      PDAC primary PDAC vaginal mets 
              893               525              2031               172 


 LiM  LuM   P1   P2   P3   P4   P5   VM 
 893  525   69  203 1137  542   80  172 


  GSE156405_PDAC_liver_mets    GSE156405_PDAC_lung_mets 
                        893                         525 
  GSE156405_PDAC_Primary_P1   GSE156405_PDAC_Primary_P2 
                         69                         203 
  GSE156405_PDAC_Primary_P3   GSE156405_PDAC_Primary_P4 
                       1137                         542 
  GSE156405_PDAC_Primary_P5 GSE156405_PDAC_vaginal_mets 
                         80                         172 

In [43]:
table(PDAC$cancer_type)


  PDAC Liver mets    PDAC Lung mets      PDAC primary PDAC vaginal mets 
              893               525              2031               172 

In [44]:
#split by cancer_type
PDAC_LiM <- subset(PDAC, subset = cancer_type %in% c("PDAC Liver mets"))
PDAC_LuM <- subset(PDAC, subset = cancer_type %in% c("PDAC Lung mets"))
PDAC_T <- subset(PDAC, subset = cancer_type %in% c("PDAC primary"))
PDAC_V <- subset(PDAC, subset = cancer_type %in% c("PDAC vaginal mets"))

#set cancer_subtype metadata
PDAC_LiM@meta.data$cancer_subtype <- "PDAC"
PDAC_LuM@meta.data$cancer_subtype <- "PDAC"
PDAC_T@meta.data$cancer_subtype <- "PDAC"
PDAC_V@meta.data$cancer_subtype <- "PDAC"

#set integration_id metadata
PDAC_LiM@meta.data$integration_id <- PDAC_LiM@meta.data$sample_id
PDAC_LuM@meta.data$integration_id <- PDAC_LuM@meta.data$sample_id
PDAC_T@meta.data$integration_id <- PDAC_T@meta.data$sample_id
PDAC_V@meta.data$integration_id <- PDAC_V@meta.data$sample_id

#set site metadata 
PDAC_LiM@meta.data$site <- "liver"
PDAC_LuM@meta.data$site <- "lung"
PDAC_T@meta.data$site <- "pancreas"
PDAC_V@meta.data$site <- "vagina"

#set sample_type_major metadata
PDAC_LiM@meta.data$sample_type_major <- "metastatic tumour"
PDAC_LuM@meta.data$sample_type_major <- "metastatic tumour"
PDAC_T@meta.data$sample_type_major <- "primary tumour"
PDAC_V@meta.data$sample_type_major <- "metastatic tumour"

#merge back together 
PDAC <- merge(PDAC_LiM, y = c(PDAC_LuM, PDAC_T, PDAC_V), project = "GSE156405")

In [45]:
PDAC
PDAC@project.name
head(PDAC@meta.data)

An object of class Seurat 
33694 features across 3621 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 20 layers present: counts.6.1, data.6.1, scale.data.1, counts.7.2, data.7.2, scale.data.2, counts.1.3, counts.2.3, counts.3.3, counts.4.3, counts.5.3, data.1.3, data.2.3, data.3.3, data.4.3, data.5.3, scale.data.3, counts.8.4, data.8.4, scale.data.4

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,graph_res.0.2,RNA_snn_res.0.3,cancer_subtype,integration_id,site,sample_type_major
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE156405_PDAC_Li_mets_AAACCTGGTGTCGCTG-1,GSE156405,11801,2682,metastasis,PDAC Liver mets,LiM,GSE156405_PDAC_liver_mets,3.49123,5,5,2,5,PDAC,GSE156405_PDAC_liver_mets,liver,metastatic tumour
GSE156405_PDAC_Li_mets_AAACCTGTCAGTTCGA-1,GSE156405,5606,2007,metastasis,PDAC Liver mets,LiM,GSE156405_PDAC_liver_mets,5.119515,9,9,2,8,PDAC,GSE156405_PDAC_liver_mets,liver,metastatic tumour
GSE156405_PDAC_Li_mets_AAACGGGCACAGATTC-1,GSE156405,7533,2143,metastasis,PDAC Liver mets,LiM,GSE156405_PDAC_liver_mets,2.681535,5,5,2,5,PDAC,GSE156405_PDAC_liver_mets,liver,metastatic tumour
GSE156405_PDAC_Li_mets_AAACGGGTCACAACGT-1,GSE156405,11587,3053,metastasis,PDAC Liver mets,LiM,GSE156405_PDAC_liver_mets,1.967722,5,5,2,5,PDAC,GSE156405_PDAC_liver_mets,liver,metastatic tumour
GSE156405_PDAC_Li_mets_AAAGATGAGGCTATCT-1,GSE156405,7132,2160,metastasis,PDAC Liver mets,LiM,GSE156405_PDAC_liver_mets,4.641054,9,9,2,8,PDAC,GSE156405_PDAC_liver_mets,liver,metastatic tumour
GSE156405_PDAC_Li_mets_AAAGATGGTCTAGCGC-1,GSE156405,6772,1907,metastasis,PDAC Liver mets,LiM,GSE156405_PDAC_liver_mets,3.780272,5,5,2,5,PDAC,GSE156405_PDAC_liver_mets,liver,metastatic tumour


In [47]:
#exclude any samples with <100 cells
table(PDAC$integration_id)
#exclude Primary_P1, Primary P5
PDAC <- subset(PDAC, !(subset = integration_id %in% c("GSE156405_PDAC_Primary_P1","GSE156405_PDAC_Primary_P5")))
table(PDAC$integration_id)


  GSE156405_PDAC_liver_mets    GSE156405_PDAC_lung_mets 
                        893                         525 
  GSE156405_PDAC_Primary_P1   GSE156405_PDAC_Primary_P2 
                         69                         203 
  GSE156405_PDAC_Primary_P3   GSE156405_PDAC_Primary_P4 
                       1137                         542 
  GSE156405_PDAC_Primary_P5 GSE156405_PDAC_vaginal_mets 
                         80                         172 


  GSE156405_PDAC_liver_mets    GSE156405_PDAC_lung_mets 
                        893                         525 
  GSE156405_PDAC_Primary_P2   GSE156405_PDAC_Primary_P3 
                        203                        1137 
  GSE156405_PDAC_Primary_P4 GSE156405_PDAC_vaginal_mets 
                        542                         172 

In [48]:
#join layers and then split them by integration_id
Layers(PDAC[["RNA"]])
#join layers
PDAC[["RNA"]] <- JoinLayers(PDAC[["RNA"]])
Layers(PDAC[["RNA"]])
#split layers
PDAC[["RNA"]] <- split(PDAC[["RNA"]], f = PDAC$integration_id)
Layers(PDAC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [49]:
#record number of cells
table(PDAC$integration_id)
PDAC
PDAC@project.name


  GSE156405_PDAC_liver_mets    GSE156405_PDAC_lung_mets 
                        893                         525 
  GSE156405_PDAC_Primary_P2   GSE156405_PDAC_Primary_P3 
                        203                        1137 
  GSE156405_PDAC_Primary_P4 GSE156405_PDAC_vaginal_mets 
                        542                         172 

An object of class Seurat 
33694 features across 3472 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 13 layers present: counts.GSE156405_PDAC_liver_mets, counts.GSE156405_PDAC_lung_mets, counts.GSE156405_PDAC_Primary_P2, counts.GSE156405_PDAC_Primary_P3, counts.GSE156405_PDAC_Primary_P4, counts.GSE156405_PDAC_vaginal_mets, scale.data, data.GSE156405_PDAC_liver_mets, data.GSE156405_PDAC_lung_mets, data.GSE156405_PDAC_Primary_P2, data.GSE156405_PDAC_Primary_P3, data.GSE156405_PDAC_Primary_P4, data.GSE156405_PDAC_vaginal_mets

In [51]:
#re-export seurat object ready for integration
saveRDS(PDAC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE156405_myeloid_int.RDS")

In [52]:
#remove all objects in R
rm(list = ls())

## GSE197177

In [29]:
PDAC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE197177_myeloid.RDS")

In [30]:
PDAC
PDAC@project.name
head(PDAC@meta.data)

An object of class Seurat 
33538 features across 9726 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 17 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE197177_Case1-YF_AAACGCTCAAGGTCTT-1,GSE197177,24707,4808,tumour,PDAC,Case1,GSE197177_PDAC_Case1_tumour,18.043469,2,2
GSE197177_Case1-YF_AAAGAACAGTAGAGTT-1,GSE197177,12170,2973,tumour,PDAC,Case1,GSE197177_PDAC_Case1_tumour,8.085456,2,2
GSE197177_Case1-YF_AAAGTCCAGGATTTAG-1,GSE197177,9174,2759,tumour,PDAC,Case1,GSE197177_PDAC_Case1_tumour,7.030739,2,2
GSE197177_Case1-YF_AAAGTCCCAATCTAGC-1,GSE197177,4380,1743,tumour,PDAC,Case1,GSE197177_PDAC_Case1_tumour,6.232877,2,2
GSE197177_Case1-YF_AAAGTCCGTGAGCCAA-1,GSE197177,15420,3272,tumour,PDAC,Case1,GSE197177_PDAC_Case1_tumour,4.850843,2,2
GSE197177_Case1-YF_AAAGTCCGTTCACGAT-1,GSE197177,5544,1891,tumour,PDAC,Case1,GSE197177_PDAC_Case1_tumour,5.717893,2,2


In [31]:
table(PDAC$sample_type)
table(PDAC$cancer_type)
table(PDAC$patient_id)
table(PDAC$sample_id)


healthy_pancreas     hepatic_mets           tumour 
               4             6979             2743 


PDAC 
9726 


Case1 Case2 Case3 Case4 
 3750  3095  2784    97 


   GSE197177_PDAC_Case1_mets  GSE197177_PDAC_Case1_tumour 
                        3245                          505 
GSE197177_PDAC_Case2_healthy    GSE197177_PDAC_Case2_mets 
                           4                         1627 
 GSE197177_PDAC_Case2_tumour    GSE197177_PDAC_Case3_mets 
                        1464                         2010 
 GSE197177_PDAC_Case3_tumour    GSE197177_PDAC_Case4_mets 
                         774                           97 

In [32]:
table(PDAC$sample_type)


healthy_pancreas     hepatic_mets           tumour 
               4             6979             2743 

In [33]:
#split by sample_type
PDAC_H <- subset(PDAC, subset = sample_type %in% c("healthy_pancreas"))
PDAC_M <- subset(PDAC, subset = sample_type %in% c("hepatic_mets"))
PDAC_T <- subset(PDAC, subset = sample_type %in% c("tumour"))

#set cancer_subtype metadata
PDAC_H@meta.data$cancer_subtype <- "NA"
PDAC_M@meta.data$cancer_subtype <- "PDAC"
PDAC_T@meta.data$cancer_subtype <- "PDAC"

#set integration_id metadata
PDAC_H@meta.data$integration_id <- PDAC_H@meta.data$sample_id
PDAC_M@meta.data$integration_id <- PDAC_M@meta.data$sample_id
PDAC_T@meta.data$integration_id <- PDAC_T@meta.data$sample_id

#set site metadata 
PDAC_H@meta.data$site <- "pancreas"
PDAC_M@meta.data$site <- "liver"
PDAC_T@meta.data$site <- "pancreas"

#set sample_type_major metadata
PDAC_H@meta.data$sample_type_major <- "healthy"
PDAC_M@meta.data$sample_type_major <- "metastatic tumour"
PDAC_T@meta.data$sample_type_major <- "primary tumour"

#merge back together 
PDAC <- merge(PDAC_H, y = c(PDAC_M, PDAC_T), project = "GSE197177")

In [34]:
PDAC
PDAC@project.name
head(PDAC@meta.data)

An object of class Seurat 
33538 features across 9726 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 19 layers present: counts.4.1, data.4.1, scale.data.1, counts.2.2, counts.5.2, counts.7.2, counts.8.2, data.2.2, data.5.2, data.7.2, data.8.2, scale.data.2, counts.1.3, counts.3.3, counts.6.3, data.1.3, data.3.3, data.6.3, scale.data.3

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,cancer_subtype,integration_id,site,sample_type_major
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE197177_Case2-ZC_ATTTCACCAACGACAG-1,GSE197177,2948,969,healthy_pancreas,PDAC,Case2,GSE197177_PDAC_Case2_healthy,3.3921303,2,2,,GSE197177_PDAC_Case2_healthy,pancreas,healthy
GSE197177_Case2-ZC_CACAGATGTAAGCGGT-1,GSE197177,10656,2905,healthy_pancreas,PDAC,Case2,GSE197177_PDAC_Case2_healthy,2.7027027,2,2,,GSE197177_PDAC_Case2_healthy,pancreas,healthy
GSE197177_Case2-ZC_TCATCCGCAGCGTTGC-1,GSE197177,13849,2262,healthy_pancreas,PDAC,Case2,GSE197177_PDAC_Case2_healthy,5.2928009,2,2,,GSE197177_PDAC_Case2_healthy,pancreas,healthy
GSE197177_Case2-ZC_TCCTTCTAGGTAGCAC-1,GSE197177,9439,2895,healthy_pancreas,PDAC,Case2,GSE197177_PDAC_Case2_healthy,4.6615108,2,2,,GSE197177_PDAC_Case2_healthy,pancreas,healthy
GSE197177_Case1-ZY_AAACCCACAAATTAGG-1,GSE197177,22265,4345,hepatic_mets,PDAC,Case1,GSE197177_PDAC_Case1_mets,5.1785313,2,2,PDAC,GSE197177_PDAC_Case1_mets,liver,metastatic tumour
GSE197177_Case1-ZY_AAACCCACAGTGTGCC-1,GSE197177,1288,574,hepatic_mets,PDAC,Case1,GSE197177_PDAC_Case1_mets,0.4658385,2,2,PDAC,GSE197177_PDAC_Case1_mets,liver,metastatic tumour


In [35]:
#exclude any samples with <100 cells
table(PDAC$integration_id)
#exclude GSE197177_PDAC_Case2_healthy, GSE197177_PDAC_Case4_mets
PDAC <- subset(PDAC, !(subset = integration_id %in% c("GSE197177_PDAC_Case2_healthy","GSE197177_PDAC_Case4_mets")))
table(PDAC$integration_id)


   GSE197177_PDAC_Case1_mets  GSE197177_PDAC_Case1_tumour 
                        3245                          505 
GSE197177_PDAC_Case2_healthy    GSE197177_PDAC_Case2_mets 
                           4                         1627 
 GSE197177_PDAC_Case2_tumour    GSE197177_PDAC_Case3_mets 
                        1464                         2010 
 GSE197177_PDAC_Case3_tumour    GSE197177_PDAC_Case4_mets 
                         774                           97 


  GSE197177_PDAC_Case1_mets GSE197177_PDAC_Case1_tumour 
                       3245                         505 
  GSE197177_PDAC_Case2_mets GSE197177_PDAC_Case2_tumour 
                       1627                        1464 
  GSE197177_PDAC_Case3_mets GSE197177_PDAC_Case3_tumour 
                       2010                         774 

In [36]:
#join layers and then split them by integration_id
Layers(PDAC[["RNA"]])
#join layers
PDAC[["RNA"]] <- JoinLayers(PDAC[["RNA"]])
Layers(PDAC[["RNA"]])
#split layers
PDAC[["RNA"]] <- split(PDAC[["RNA"]], f = PDAC$integration_id)
Layers(PDAC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [37]:
table(PDAC$sample_type)
table(PDAC$cancer_type)
table(PDAC$site)
table(PDAC$sample_type_major)
table(PDAC$cancer_subtype)


hepatic_mets       tumour 
        6882         2743 


PDAC 
9625 


   liver pancreas 
    6882     2743 


metastatic tumour    primary tumour 
             6882              2743 


PDAC 
9625 

In [38]:
#record number of cells
table(PDAC$integration_id)
PDAC
PDAC@project.name


  GSE197177_PDAC_Case1_mets GSE197177_PDAC_Case1_tumour 
                       3245                         505 
  GSE197177_PDAC_Case2_mets GSE197177_PDAC_Case2_tumour 
                       1627                        1464 
  GSE197177_PDAC_Case3_mets GSE197177_PDAC_Case3_tumour 
                       2010                         774 

An object of class Seurat 
33538 features across 9625 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 13 layers present: counts.GSE197177_PDAC_Case1_mets, counts.GSE197177_PDAC_Case2_mets, counts.GSE197177_PDAC_Case3_mets, counts.GSE197177_PDAC_Case1_tumour, counts.GSE197177_PDAC_Case2_tumour, counts.GSE197177_PDAC_Case3_tumour, scale.data, data.GSE197177_PDAC_Case1_mets, data.GSE197177_PDAC_Case2_mets, data.GSE197177_PDAC_Case3_mets, data.GSE197177_PDAC_Case1_tumour, data.GSE197177_PDAC_Case2_tumour, data.GSE197177_PDAC_Case3_tumour

In [39]:
#re-export seurat object ready for integration
saveRDS(PDAC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE197177_myeloid_int.RDS")

In [40]:
#remove all objects in R
rm(list = ls())

## GSE214295

In [17]:
PDAC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE214295_myeloid.RDS")

In [18]:
PDAC
PDAC@project.name
head(PDAC@meta.data)

An object of class Seurat 
36601 features across 2710 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 7 layers present: counts.1, counts.2, counts.3, data.1, data.2, data.3, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE214295_PC1_AAAGGATGTTGGATCT-1,GSE214295,2356,1018,tumour,PC,PC1,GSE214295_PC1,11.290323,10,10
GSE214295_PC1_AAAGGTAGTACTGGGA-1,GSE214295,1212,549,tumour,PC,PC1,GSE214295_PC1,3.547855,13,13
GSE214295_PC1_AACAACCTCGTTTACT-1,GSE214295,2796,1075,tumour,PC,PC1,GSE214295_PC1,4.756795,10,10
GSE214295_PC1_AACCACAGTGTTTACG-1,GSE214295,6774,2018,tumour,PC,PC1,GSE214295_PC1,5.3292,6,6
GSE214295_PC1_AACCCAACAATACAGA-1,GSE214295,13540,2291,tumour,PC,PC1,GSE214295_PC1,4.268833,6,6
GSE214295_PC1_AACTTCTCAAGTTGGG-1,GSE214295,7191,2397,tumour,PC,PC1,GSE214295_PC1,7.884856,6,6


In [19]:
table(PDAC$sample_type)
table(PDAC$cancer_type)
table(PDAC$patient_id)
table(PDAC$sample_id)


tumour 
  2710 


  PC 
2710 


 PC1  PC2  PC3 
 308  429 1973 


GSE214295_PC1 GSE214295_PC2 GSE214295_PC3 
          308           429          1973 

In [20]:
#set cancer_subtype metadata
PDAC@meta.data$cancer_subtype <- "PDAC"

#set integration_id metadata
PDAC@meta.data$integration_id <- PDAC@meta.data$sample_id

#set site metadata 
PDAC@meta.data$site <- "pancreas"

#set sample_type_major metadata
PDAC@meta.data$sample_type_major <- "primary tumour"


In [21]:
PDAC
PDAC@project.name
head(PDAC@meta.data)

An object of class Seurat 
36601 features across 2710 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 7 layers present: counts.1, counts.2, counts.3, data.1, data.2, data.3, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,cancer_subtype,integration_id,site,sample_type_major
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
GSE214295_PC1_AAAGGATGTTGGATCT-1,GSE214295,2356,1018,tumour,PC,PC1,GSE214295_PC1,11.290323,10,10,PDAC,GSE214295_PC1,pancreas,primary tumour
GSE214295_PC1_AAAGGTAGTACTGGGA-1,GSE214295,1212,549,tumour,PC,PC1,GSE214295_PC1,3.547855,13,13,PDAC,GSE214295_PC1,pancreas,primary tumour
GSE214295_PC1_AACAACCTCGTTTACT-1,GSE214295,2796,1075,tumour,PC,PC1,GSE214295_PC1,4.756795,10,10,PDAC,GSE214295_PC1,pancreas,primary tumour
GSE214295_PC1_AACCACAGTGTTTACG-1,GSE214295,6774,2018,tumour,PC,PC1,GSE214295_PC1,5.3292,6,6,PDAC,GSE214295_PC1,pancreas,primary tumour
GSE214295_PC1_AACCCAACAATACAGA-1,GSE214295,13540,2291,tumour,PC,PC1,GSE214295_PC1,4.268833,6,6,PDAC,GSE214295_PC1,pancreas,primary tumour
GSE214295_PC1_AACTTCTCAAGTTGGG-1,GSE214295,7191,2397,tumour,PC,PC1,GSE214295_PC1,7.884856,6,6,PDAC,GSE214295_PC1,pancreas,primary tumour


In [23]:
#exclude any samples with <100 cells
table(PDAC$integration_id)
#none to exclude


GSE214295_PC1 GSE214295_PC2 GSE214295_PC3 
          308           429          1973 

In [24]:
#join layers and then split them by integration_id
Layers(PDAC[["RNA"]])
#join layers
PDAC[["RNA"]] <- JoinLayers(PDAC[["RNA"]])
Layers(PDAC[["RNA"]])
#split layers
PDAC[["RNA"]] <- split(PDAC[["RNA"]], f = PDAC$integration_id)
Layers(PDAC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [25]:
#record number of cells
table(PDAC$integration_id)
PDAC
PDAC@project.name


GSE214295_PC1 GSE214295_PC2 GSE214295_PC3 
          308           429          1973 

An object of class Seurat 
36601 features across 2710 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 7 layers present: data.GSE214295_PC1, data.GSE214295_PC2, data.GSE214295_PC3, scale.data, counts.GSE214295_PC1, counts.GSE214295_PC2, counts.GSE214295_PC3
 2 dimensional reductions calculated: pca, umap

In [27]:
#re-export seurat object ready for integration
saveRDS(PDAC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE214295_myeloid_int.RDS")

In [28]:
#remove all objects in R
rm(list = ls())

## GSE231535

In [41]:
PDAC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE231535_myeloid.RDS")

In [42]:
PDAC
PDAC@project.name
head(PDAC@meta.data)

An object of class Seurat 
45068 features across 1832 samples within 1 assay 
Active assay: RNA (45068 features, 2000 variable features)
 5 layers present: counts.1, counts.2, data.1, data.2, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE231535_PDAC1_AAACCTGCAATAACGA-1,GSE231535,3585,1333,tumour,PDAC,PDAC1,GSE231535_PDAC1,3.0404463,3,3
GSE231535_PDAC1_AAACCTGGTCAGTGGA-1,GSE231535,2297,723,tumour,PDAC,PDAC1,GSE231535_PDAC1,14.6277754,3,3
GSE231535_PDAC1_AAACCTGTCCTCATTA-1,GSE231535,8631,2410,tumour,PDAC,PDAC1,GSE231535_PDAC1,1.216545,3,3
GSE231535_PDAC1_AAACGGGAGATCGATA-1,GSE231535,1895,692,tumour,PDAC,PDAC1,GSE231535_PDAC1,3.4828496,3,3
GSE231535_PDAC1_AAAGCAAAGCATCATC-1,GSE231535,2493,743,tumour,PDAC,PDAC1,GSE231535_PDAC1,0.8423586,3,3
GSE231535_PDAC1_AAAGCAAAGCCACGTC-1,GSE231535,3306,1344,tumour,PDAC,PDAC1,GSE231535_PDAC1,3.6600121,3,3


In [43]:
table(PDAC$sample_type)
table(PDAC$cancer_type)
table(PDAC$patient_id)
table(PDAC$sample_id)


tumour 
  1832 


PDAC 
1832 


PDAC1 PDAC2 
  450  1382 


GSE231535_PDAC1 GSE231535_PDAC2 
            450            1382 

In [44]:
#set cancer_subtype metadata
PDAC@meta.data$cancer_subtype <- "PDAC"

#set integration_id metadata
PDAC@meta.data$integration_id <- PDAC@meta.data$sample_id

#set site metadata 
PDAC@meta.data$site <- "pancreas"

#set sample_type_major metadata
PDAC@meta.data$sample_type_major <- "primary tumour"


In [45]:
PDAC
PDAC@project.name
head(PDAC@meta.data)

An object of class Seurat 
45068 features across 1832 samples within 1 assay 
Active assay: RNA (45068 features, 2000 variable features)
 5 layers present: counts.1, counts.2, data.1, data.2, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,cancer_subtype,integration_id,site,sample_type_major
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
GSE231535_PDAC1_AAACCTGCAATAACGA-1,GSE231535,3585,1333,tumour,PDAC,PDAC1,GSE231535_PDAC1,3.0404463,3,3,PDAC,GSE231535_PDAC1,pancreas,primary tumour
GSE231535_PDAC1_AAACCTGGTCAGTGGA-1,GSE231535,2297,723,tumour,PDAC,PDAC1,GSE231535_PDAC1,14.6277754,3,3,PDAC,GSE231535_PDAC1,pancreas,primary tumour
GSE231535_PDAC1_AAACCTGTCCTCATTA-1,GSE231535,8631,2410,tumour,PDAC,PDAC1,GSE231535_PDAC1,1.216545,3,3,PDAC,GSE231535_PDAC1,pancreas,primary tumour
GSE231535_PDAC1_AAACGGGAGATCGATA-1,GSE231535,1895,692,tumour,PDAC,PDAC1,GSE231535_PDAC1,3.4828496,3,3,PDAC,GSE231535_PDAC1,pancreas,primary tumour
GSE231535_PDAC1_AAAGCAAAGCATCATC-1,GSE231535,2493,743,tumour,PDAC,PDAC1,GSE231535_PDAC1,0.8423586,3,3,PDAC,GSE231535_PDAC1,pancreas,primary tumour
GSE231535_PDAC1_AAAGCAAAGCCACGTC-1,GSE231535,3306,1344,tumour,PDAC,PDAC1,GSE231535_PDAC1,3.6600121,3,3,PDAC,GSE231535_PDAC1,pancreas,primary tumour


In [46]:
#exclude any samples with <100 cells
table(PDAC$integration_id)
#none to exclude


GSE231535_PDAC1 GSE231535_PDAC2 
            450            1382 

In [47]:
#join layers and then split them by integration_id
Layers(PDAC[["RNA"]])
#join layers
PDAC[["RNA"]] <- JoinLayers(PDAC[["RNA"]])
Layers(PDAC[["RNA"]])
#split layers
PDAC[["RNA"]] <- split(PDAC[["RNA"]], f = PDAC$integration_id)
Layers(PDAC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [48]:
#record number of cells
table(PDAC$integration_id)
PDAC
PDAC@project.name


GSE231535_PDAC1 GSE231535_PDAC2 
            450            1382 

An object of class Seurat 
45068 features across 1832 samples within 1 assay 
Active assay: RNA (45068 features, 2000 variable features)
 5 layers present: data.GSE231535_PDAC1, data.GSE231535_PDAC2, scale.data, counts.GSE231535_PDAC1, counts.GSE231535_PDAC2
 2 dimensional reductions calculated: pca, umap

In [49]:
#re-export seurat object ready for integration
saveRDS(PDAC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE231535_myeloid_int.RDS")

In [50]:
#remove all objects in R
rm(list = ls())

## GSE183916

In [45]:
CRC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE183916_myeloid.RDS")

In [46]:
CRC
CRC@project.name
head(CRC@meta.data)

An object of class Seurat 
33808 features across 1934 samples within 1 assay 
Active assay: RNA (33808 features, 2000 variable features)
 3 layers present: counts, data, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,loc,pat_id,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE183916_CRC_prim_ptB_AAACCTGAGTTACGGG-1_1,SeuratProject,6088,3250,primary,B,tumour,CRC primary,B,GSE183916_CRC_primary_ptB,0.5091984,5,5
GSE183916_CRC_prim_ptB_AAACCTGTCCAGATCA-1_1,SeuratProject,9119,4019,primary,B,tumour,CRC primary,B,GSE183916_CRC_primary_ptB,0.2193223,5,5
GSE183916_CRC_prim_ptB_AAACCTGTCGCGGATC-1_1,SeuratProject,720,505,primary,B,tumour,CRC primary,B,GSE183916_CRC_primary_ptB,1.25,5,5
GSE183916_CRC_prim_ptB_AAAGCAACAGTCAGCC-1_1,SeuratProject,1145,801,primary,B,tumour,CRC primary,B,GSE183916_CRC_primary_ptB,0.349345,5,5
GSE183916_CRC_prim_ptB_AACCATGAGGAGCGAG-1_1,SeuratProject,5154,2160,primary,B,tumour,CRC primary,B,GSE183916_CRC_primary_ptB,0.4462553,5,5
GSE183916_CRC_prim_ptB_AACCATGTCGGAGCAA-1_1,SeuratProject,638,514,primary,B,tumour,CRC primary,B,GSE183916_CRC_primary_ptB,2.9780564,5,5


In [47]:
table(CRC$sample_type)
table(CRC$cancer_type)
table(CRC$patient_id)
table(CRC$sample_id)


tumour 
  1934 


   CRC mets CRC primary 
       1468         466 


   B    C    D    E    F 
1096   18   71  390  359 


   GSE183916_CRC_mets_ptB    GSE183916_CRC_mets_ptC    GSE183916_CRC_mets_ptD 
                      630                        18                        71 
   GSE183916_CRC_mets_ptE    GSE183916_CRC_mets_ptF GSE183916_CRC_primary_ptB 
                      390                       359                       466 

In [48]:
table(CRC$cancer_type)


   CRC mets CRC primary 
       1468         466 

In [49]:
#split by cancer_type
CRC_M <- subset(CRC, subset = cancer_type %in% c("CRC mets"))
CRC_T <- subset(CRC, subset = cancer_type %in% c("CRC primary"))

#set cancer_subtype metadata
CRC_M@meta.data$cancer_subtype <- "CRC"
CRC_T@meta.data$cancer_subtype <- "CRC"

#set integration_id metadata
CRC_M@meta.data$integration_id <- CRC_M@meta.data$sample_id
CRC_T@meta.data$integration_id <- CRC_T@meta.data$sample_id

#set sample_type_major metadata
CRC_M@meta.data$sample_type_major <- "metastatic tumour"
CRC_T@meta.data$sample_type_major <- "primary tumour"

#set site metadata 
CRC_M@meta.data$site <- "peritoneum"
CRC_T@meta.data$site <- "colon"

#merge back together 
CRC <- merge(CRC_M, y = c(CRC_T), project = "GSE183916")

In [50]:
CRC
CRC@project.name
head(CRC@meta.data)

An object of class Seurat 
33808 features across 1934 samples within 1 assay 
Active assay: RNA (33808 features, 0 variable features)
 2 layers present: counts, data

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,loc,pat_id,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,cancer_subtype,integration_id,sample_type_major,site
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE183916_CRC_mets_ptB_AAACGGGCAGGGTACA-1_2,SeuratProject,1035,753,metastasis,B,tumour,CRC mets,B,GSE183916_CRC_mets_ptB,0.28985507,5,5,CRC,GSE183916_CRC_mets_ptB,metastatic tumour,peritoneum
GSE183916_CRC_mets_ptB_AAACGGGGTCAGTGGA-1_2,SeuratProject,1433,977,metastasis,B,tumour,CRC mets,B,GSE183916_CRC_mets_ptB,0.76762038,5,5,CRC,GSE183916_CRC_mets_ptB,metastatic tumour,peritoneum
GSE183916_CRC_mets_ptB_AAAGATGAGCGTAGTG-1_2,SeuratProject,511,404,metastasis,B,tumour,CRC mets,B,GSE183916_CRC_mets_ptB,0.78277886,5,5,CRC,GSE183916_CRC_mets_ptB,metastatic tumour,peritoneum
GSE183916_CRC_mets_ptB_AAAGATGCACATAACC-1_2,SeuratProject,1239,843,metastasis,B,tumour,CRC mets,B,GSE183916_CRC_mets_ptB,0.08071025,5,5,CRC,GSE183916_CRC_mets_ptB,metastatic tumour,peritoneum
GSE183916_CRC_mets_ptB_AAAGATGGTCGAGTTT-1_2,SeuratProject,1172,836,metastasis,B,tumour,CRC mets,B,GSE183916_CRC_mets_ptB,0.93856655,5,5,CRC,GSE183916_CRC_mets_ptB,metastatic tumour,peritoneum
GSE183916_CRC_mets_ptB_AAAGCAATCCGTCAAA-1_2,SeuratProject,3323,1427,metastasis,B,tumour,CRC mets,B,GSE183916_CRC_mets_ptB,0.09027987,5,5,CRC,GSE183916_CRC_mets_ptB,metastatic tumour,peritoneum


In [51]:
#exclude any samples with <100 cells
table(CRC$integration_id)
#exclude ptC mets and ptD mets
CRC <- subset(CRC, !(subset = integration_id %in% c("GSE183916_CRC_mets_ptC","GSE183916_CRC_mets_ptD")))
table(CRC$integration_id)


   GSE183916_CRC_mets_ptB    GSE183916_CRC_mets_ptC    GSE183916_CRC_mets_ptD 
                      630                        18                        71 
   GSE183916_CRC_mets_ptE    GSE183916_CRC_mets_ptF GSE183916_CRC_primary_ptB 
                      390                       359                       466 


   GSE183916_CRC_mets_ptB    GSE183916_CRC_mets_ptE    GSE183916_CRC_mets_ptF 
                      630                       390                       359 
GSE183916_CRC_primary_ptB 
                      466 

In [52]:
Layers(CRC[["RNA"]])
CRC[["RNA"]] <- split(CRC[["RNA"]], f = CRC$integration_id)
Layers(CRC[["RNA"]])

“Input is a v3 assay and `split()` only works for v5 assays; converting
[36m•[39m to a v5 assay”


“Assay RNA changing from Assay to Assay5”


In [53]:
#Above is slightly different to normal as got error when tried to run JoinLayers
#Error: Error in UseMethod(generic = "JoinLayers", object = object): no applicable method for 'JoinLayers' applied to an object of class "c('Assay', 'KeyMixin')"

#join layers and then split them by integration_id
#Layers(CRC[["RNA"]])
#join layers
#CRC[["RNA"]] <- JoinLayers(CRC[["RNA"]])
#Layers(CRC[["RNA"]])
#split layers
#CRC[["RNA"]] <- split(CRC[["RNA"]], f = CRC$integration_id)
#Layers(CRC[["RNA"]])


In [54]:
#record number of cells
table(CRC$integration_id)
CRC
CRC@project.name


   GSE183916_CRC_mets_ptB    GSE183916_CRC_mets_ptE    GSE183916_CRC_mets_ptF 
                      630                       390                       359 
GSE183916_CRC_primary_ptB 
                      466 

An object of class Seurat 
33808 features across 1845 samples within 1 assay 
Active assay: RNA (33808 features, 0 variable features)
 8 layers present: counts.GSE183916_CRC_mets_ptB, counts.GSE183916_CRC_mets_ptE, counts.GSE183916_CRC_mets_ptF, counts.GSE183916_CRC_primary_ptB, data.GSE183916_CRC_mets_ptB, data.GSE183916_CRC_mets_ptE, data.GSE183916_CRC_mets_ptF, data.GSE183916_CRC_primary_ptB

In [55]:
#re-export seurat object ready for integration
saveRDS(CRC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE183916_myeloid_int.RDS")

In [56]:
#remove all objects in R
rm(list = ls())

## GSE224090

In [3]:
GLIO <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE224090_myeloid.RDS")

In [4]:
GLIO
GLIO@project.name
head(GLIO@meta.data)

An object of class Seurat 
36601 features across 8983 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 5 layers present: counts.1, counts.2, data.1, data.2, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE224090_GBM_NU02954_AAACCTGAGAGTACCG-1,GSE224090,2350,1233,tumour,GBM,pt-NU02954,GSE224090_GBM_NU02954,1.2340426,0,0
GSE224090_GBM_NU02954_AAACCTGCAGCCTATA-1,GSE224090,5452,1957,tumour,GBM,pt-NU02954,GSE224090_GBM_NU02954,1.6691123,0,0
GSE224090_GBM_NU02954_AAACCTGCAGCGTAAG-1,GSE224090,3364,1633,tumour,GBM,pt-NU02954,GSE224090_GBM_NU02954,1.3376932,0,0
GSE224090_GBM_NU02954_AAACCTGCAGCTATTG-1,GSE224090,14321,3352,tumour,GBM,pt-NU02954,GSE224090_GBM_NU02954,0.8309476,0,0
GSE224090_GBM_NU02954_AAACCTGGTAAATACG-1,GSE224090,3694,1587,tumour,GBM,pt-NU02954,GSE224090_GBM_NU02954,0.378993,0,0
GSE224090_GBM_NU02954_AAACCTGGTGCGAAAC-1,GSE224090,4302,1617,tumour,GBM,pt-NU02954,GSE224090_GBM_NU02954,2.5104603,0,0


In [5]:
table(GLIO$sample_type)
table(GLIO$cancer_type)
table(GLIO$patient_id)
table(GLIO$sample_id)


tumour 
  8983 


 GBM 
8983 


pt-NU02954 pt-NU03014 
      7663       1320 


GSE224090_GBM_NU02954 GSE224090_GBM_NU03014 
                 7663                  1320 

In [6]:
#set cancer_subtype metadata
GLIO@meta.data$cancer_subtype <- "GBM"

#set integration_id metadata
GLIO@meta.data$integration_id <- GLIO@meta.data$sample_id

#set site metadata 
GLIO@meta.data$site <- "brain"

#set sample_type_major metadata
GLIO@meta.data$sample_type_major <- "primary tumour"

In [7]:
GLIO
GLIO@project.name
head(GLIO@meta.data)

An object of class Seurat 
36601 features across 8983 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 5 layers present: counts.1, counts.2, data.1, data.2, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,cancer_subtype,integration_id,site,sample_type_major
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
GSE224090_GBM_NU02954_AAACCTGAGAGTACCG-1,GSE224090,2350,1233,tumour,GBM,pt-NU02954,GSE224090_GBM_NU02954,1.2340426,0,0,GBM,GSE224090_GBM_NU02954,brain,primary tumour
GSE224090_GBM_NU02954_AAACCTGCAGCCTATA-1,GSE224090,5452,1957,tumour,GBM,pt-NU02954,GSE224090_GBM_NU02954,1.6691123,0,0,GBM,GSE224090_GBM_NU02954,brain,primary tumour
GSE224090_GBM_NU02954_AAACCTGCAGCGTAAG-1,GSE224090,3364,1633,tumour,GBM,pt-NU02954,GSE224090_GBM_NU02954,1.3376932,0,0,GBM,GSE224090_GBM_NU02954,brain,primary tumour
GSE224090_GBM_NU02954_AAACCTGCAGCTATTG-1,GSE224090,14321,3352,tumour,GBM,pt-NU02954,GSE224090_GBM_NU02954,0.8309476,0,0,GBM,GSE224090_GBM_NU02954,brain,primary tumour
GSE224090_GBM_NU02954_AAACCTGGTAAATACG-1,GSE224090,3694,1587,tumour,GBM,pt-NU02954,GSE224090_GBM_NU02954,0.378993,0,0,GBM,GSE224090_GBM_NU02954,brain,primary tumour
GSE224090_GBM_NU02954_AAACCTGGTGCGAAAC-1,GSE224090,4302,1617,tumour,GBM,pt-NU02954,GSE224090_GBM_NU02954,2.5104603,0,0,GBM,GSE224090_GBM_NU02954,brain,primary tumour


In [10]:
#exclude any samples with <100 cells
table(GLIO$integration_id)
#none to exclude


GSE224090_GBM_NU02954 GSE224090_GBM_NU03014 
                 7663                  1320 

In [11]:
#join layers and then split them by integration_id
Layers(GLIO[["RNA"]])
#join layers
GLIO[["RNA"]] <- JoinLayers(GLIO[["RNA"]])
Layers(GLIO[["RNA"]])
#split layers
GLIO[["RNA"]] <- split(GLIO[["RNA"]], f = GLIO$integration_id)
Layers(GLIO[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [13]:
#record number of cells
table(GLIO$integration_id)
GLIO
GLIO@project.name


GSE224090_GBM_NU02954 GSE224090_GBM_NU03014 
                 7663                  1320 

An object of class Seurat 
36601 features across 8983 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 5 layers present: data.GSE224090_GBM_NU02954, data.GSE224090_GBM_NU03014, scale.data, counts.GSE224090_GBM_NU02954, counts.GSE224090_GBM_NU03014
 2 dimensional reductions calculated: pca, umap

In [14]:
#re-export seurat object ready for integration
saveRDS(GLIO, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE224090_myeloid_int.RDS")

In [15]:
#remove all objects in R
rm(list = ls())

## GSE235676

In [3]:
GLIO <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE235676_myeloid.RDS")

In [4]:
GLIO
GLIO@project.name
head(GLIO@meta.data)

An object of class Seurat 
33694 features across 23362 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 37 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE235676_GLIO_Pt01_AAACCCACAACGCATT-1,GSE235676,36594,5585,tumour,GBM,Pt01,GSE235676_GBM_Pt01,5.14292,0,0
GSE235676_GLIO_Pt01_AAACCCATCCATTCAT-1,GSE235676,3611,1068,tumour,GBM,Pt01,GSE235676_GBM_Pt01,43.03517,0,0
GSE235676_GLIO_Pt01_AAACCCATCTTACCGC-1,GSE235676,2506,837,tumour,GBM,Pt01,GSE235676_GBM_Pt01,46.967279,0,0
GSE235676_GLIO_Pt01_AAACGAAAGCTCTGTA-1,GSE235676,1564,654,tumour,GBM,Pt01,GSE235676_GBM_Pt01,25.831202,0,0
GSE235676_GLIO_Pt01_AAACGAATCCCGAACG-1,GSE235676,21043,4385,tumour,GBM,Pt01,GSE235676_GBM_Pt01,6.729079,0,0
GSE235676_GLIO_Pt01_AAACGCTTCCACTGGG-1,GSE235676,3093,1448,tumour,GBM,Pt01,GSE235676_GBM_Pt01,1.875202,0,0


In [5]:
table(GLIO$sample_type)
table(GLIO$cancer_type)
table(GLIO$patient_id)
table(GLIO$sample_id)


tumour 
 23362 


  GBM 
23362 


Pt01 Pt02 Pt03 Pt04 Pt05 Pt06 Pt07 Pt08 Pt09 Pt10 Pt11 Pt12 Pt13 Pt14 Pt15 Pt16 
1699  178  510 1300 4619 1232  458 3397  114 1003  582  833 2045 1174  534  470 
Pt17 Pt18 
1415 1799 


GSE235676_GBM_Pt01 GSE235676_GBM_Pt02 GSE235676_GBM_Pt03 GSE235676_GBM_Pt04 
              1699                178                510               1300 
GSE235676_GBM_Pt05 GSE235676_GBM_Pt06 GSE235676_GBM_Pt07 GSE235676_GBM_Pt08 
              4619               1232                458               3397 
GSE235676_GBM_Pt09 GSE235676_GBM_Pt10 GSE235676_GBM_Pt11 GSE235676_GBM_Pt12 
               114               1003                582                833 
GSE235676_GBM_Pt13 GSE235676_GBM_Pt14 GSE235676_GBM_Pt15 GSE235676_GBM_Pt16 
              2045               1174                534                470 
GSE235676_GBM_Pt17 GSE235676_GBM_Pt18 
              1415               1799 

In [7]:
#set cancer_subtype metadata
GLIO@meta.data$cancer_subtype <- "GBM"

#set integration_id metadata
GLIO@meta.data$integration_id <- GLIO@meta.data$sample_id

#set site metadata 
GLIO@meta.data$site <- "brain"

#set sample_type_major metadata
GLIO@meta.data$sample_type_major <- "primary tumour"

In [8]:
GLIO
GLIO@project.name
head(GLIO@meta.data)

An object of class Seurat 
33694 features across 23362 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 37 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,cancer_subtype,integration_id,site,sample_type_major
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
GSE235676_GLIO_Pt01_AAACCCACAACGCATT-1,GSE235676,36594,5585,tumour,GBM,Pt01,GSE235676_GBM_Pt01,5.14292,0,0,GBM,GSE235676_GBM_Pt01,brain,primary tumour
GSE235676_GLIO_Pt01_AAACCCATCCATTCAT-1,GSE235676,3611,1068,tumour,GBM,Pt01,GSE235676_GBM_Pt01,43.03517,0,0,GBM,GSE235676_GBM_Pt01,brain,primary tumour
GSE235676_GLIO_Pt01_AAACCCATCTTACCGC-1,GSE235676,2506,837,tumour,GBM,Pt01,GSE235676_GBM_Pt01,46.967279,0,0,GBM,GSE235676_GBM_Pt01,brain,primary tumour
GSE235676_GLIO_Pt01_AAACGAAAGCTCTGTA-1,GSE235676,1564,654,tumour,GBM,Pt01,GSE235676_GBM_Pt01,25.831202,0,0,GBM,GSE235676_GBM_Pt01,brain,primary tumour
GSE235676_GLIO_Pt01_AAACGAATCCCGAACG-1,GSE235676,21043,4385,tumour,GBM,Pt01,GSE235676_GBM_Pt01,6.729079,0,0,GBM,GSE235676_GBM_Pt01,brain,primary tumour
GSE235676_GLIO_Pt01_AAACGCTTCCACTGGG-1,GSE235676,3093,1448,tumour,GBM,Pt01,GSE235676_GBM_Pt01,1.875202,0,0,GBM,GSE235676_GBM_Pt01,brain,primary tumour


In [9]:
#exclude any samples with <100 cells
table(GLIO$integration_id)
#none to exclude


GSE235676_GBM_Pt01 GSE235676_GBM_Pt02 GSE235676_GBM_Pt03 GSE235676_GBM_Pt04 
              1699                178                510               1300 
GSE235676_GBM_Pt05 GSE235676_GBM_Pt06 GSE235676_GBM_Pt07 GSE235676_GBM_Pt08 
              4619               1232                458               3397 
GSE235676_GBM_Pt09 GSE235676_GBM_Pt10 GSE235676_GBM_Pt11 GSE235676_GBM_Pt12 
               114               1003                582                833 
GSE235676_GBM_Pt13 GSE235676_GBM_Pt14 GSE235676_GBM_Pt15 GSE235676_GBM_Pt16 
              2045               1174                534                470 
GSE235676_GBM_Pt17 GSE235676_GBM_Pt18 
              1415               1799 

In [10]:
#join layers and then split them by integration_id
Layers(GLIO[["RNA"]])
#join layers
GLIO[["RNA"]] <- JoinLayers(GLIO[["RNA"]])
Layers(GLIO[["RNA"]])
#split layers
GLIO[["RNA"]] <- split(GLIO[["RNA"]], f = GLIO$integration_id)
Layers(GLIO[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [11]:
#record number of cells
table(GLIO$integration_id)
GLIO
GLIO@project.name


GSE235676_GBM_Pt01 GSE235676_GBM_Pt02 GSE235676_GBM_Pt03 GSE235676_GBM_Pt04 
              1699                178                510               1300 
GSE235676_GBM_Pt05 GSE235676_GBM_Pt06 GSE235676_GBM_Pt07 GSE235676_GBM_Pt08 
              4619               1232                458               3397 
GSE235676_GBM_Pt09 GSE235676_GBM_Pt10 GSE235676_GBM_Pt11 GSE235676_GBM_Pt12 
               114               1003                582                833 
GSE235676_GBM_Pt13 GSE235676_GBM_Pt14 GSE235676_GBM_Pt15 GSE235676_GBM_Pt16 
              2045               1174                534                470 
GSE235676_GBM_Pt17 GSE235676_GBM_Pt18 
              1415               1799 

An object of class Seurat 
33694 features across 23362 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 37 layers present: data.GSE235676_GBM_Pt01, data.GSE235676_GBM_Pt02, data.GSE235676_GBM_Pt03, data.GSE235676_GBM_Pt04, data.GSE235676_GBM_Pt05, data.GSE235676_GBM_Pt06, data.GSE235676_GBM_Pt07, data.GSE235676_GBM_Pt08, data.GSE235676_GBM_Pt09, data.GSE235676_GBM_Pt10, data.GSE235676_GBM_Pt11, data.GSE235676_GBM_Pt12, data.GSE235676_GBM_Pt13, data.GSE235676_GBM_Pt14, data.GSE235676_GBM_Pt15, data.GSE235676_GBM_Pt16, data.GSE235676_GBM_Pt17, data.GSE235676_GBM_Pt18, scale.data, counts.GSE235676_GBM_Pt01, counts.GSE235676_GBM_Pt02, counts.GSE235676_GBM_Pt03, counts.GSE235676_GBM_Pt04, counts.GSE235676_GBM_Pt05, counts.GSE235676_GBM_Pt06, counts.GSE235676_GBM_Pt07, counts.GSE235676_GBM_Pt08, counts.GSE235676_GBM_Pt09, counts.GSE235676_GBM_Pt10, counts.GSE235676_GBM_Pt11, counts.GSE235676_GBM_Pt12, counts.GSE235676_GBM_Pt13, counts.GSE235676_GBM_Pt14, co

In [12]:
#re-export seurat object ready for integration
saveRDS(GLIO, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE235676_myeloid_int.RDS")

In [13]:
#remove all objects in R
rm(list = ls())

## GSE223063

In [14]:
GLIO <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE223063_myeloid.RDS")

In [15]:
GLIO
GLIO@project.name
head(GLIO@meta.data)

An object of class Seurat 
33538 features across 6298 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 13 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, data.1, data.2, data.3, data.4, data.5, data.6, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE223063_GLIO_HFC1_AAACCCAGTAACCCGC-1,GSE223063,8509,2336,tumour,Glioblastoma,1_HFC,GSE223063_HFC1,19.4970032,5,5
GSE223063_GLIO_HFC1_AAACGAACAGGTTCAT-1,GSE223063,22497,3943,tumour,Glioblastoma,1_HFC,GSE223063_HFC1,7.0542739,2,2
GSE223063_GLIO_HFC1_AAACGAATCCATTGTT-1,GSE223063,5392,1606,tumour,Glioblastoma,1_HFC,GSE223063_HFC1,0.3523739,10,10
GSE223063_GLIO_HFC1_AAACGCTAGTAATTGG-1,GSE223063,21959,4275,tumour,Glioblastoma,1_HFC,GSE223063_HFC1,7.1633499,2,2
GSE223063_GLIO_HFC1_AAAGGGCCAGCTCGGT-1,GSE223063,4470,1560,tumour,Glioblastoma,1_HFC,GSE223063_HFC1,5.704698,5,5
GSE223063_GLIO_HFC1_AAAGGTAAGAATCTAG-1,GSE223063,47645,6125,tumour,Glioblastoma,1_HFC,GSE223063_HFC1,7.8686116,2,2


In [16]:
table(GLIO$sample_type)
table(GLIO$cancer_type)
table(GLIO$patient_id)
table(GLIO$sample_id)


tumour 
  6298 


Glioblastoma 
        6298 


1_HFC 1_LFC 2_HFC 2_LFC 3_HFC 3_LFC 
  462  4087   348    99   472   830 


GSE223063_HFC1 GSE223063_HFC2 GSE223063_HFC3 GSE223063_LFC1 GSE223063_LFC2 
           462            348            472           4087             99 
GSE223063_LFC3 
           830 

In [17]:
#set cancer_subtype metadata
GLIO@meta.data$cancer_subtype <- "GBM"

#set site metadata 
GLIO@meta.data$site <- "brain"

#set sample_type_major metadata
GLIO@meta.data$sample_type_major <- "primary tumour"

In [18]:
table(GLIO$patient_id)


1_HFC 1_LFC 2_HFC 2_LFC 3_HFC 3_LFC 
  462  4087   348    99   472   830 

In [19]:
#split by patient_id for each patient
GLIO_1 <- subset(GLIO, subset = patient_id %in% c("1_HFC","1_LFC"))
GLIO_2 <- subset(GLIO, subset = patient_id %in% c("2_HFC","2_LFC"))
GLIO_3 <- subset(GLIO, subset = patient_id %in% c("3_HFC","3_LFC"))

#set integration_id metadata
GLIO_1@meta.data$integration_id <- "GGSE223063_GLIO_1"
GLIO_2@meta.data$integration_id <- "GGSE223063_GLIO_2"
GLIO_3@meta.data$integration_id <- "GGSE223063_GLIO_3"

#merge back together 
GLIO <- merge(GLIO_1, y = c(GLIO_2, GLIO_3), project = "GSE223063")

In [20]:
GLIO
GLIO@project.name
head(GLIO@meta.data)

An object of class Seurat 
33538 features across 6298 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 15 layers present: counts.1.1, counts.4.1, data.1.1, data.4.1, scale.data.1, counts.2.2, counts.5.2, data.2.2, data.5.2, scale.data.2, counts.3.3, counts.6.3, data.3.3, data.6.3, scale.data.3

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,cancer_subtype,site,sample_type_major,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE223063_GLIO_HFC1_AAACCCAGTAACCCGC-1,GSE223063,8509,2336,tumour,Glioblastoma,1_HFC,GSE223063_HFC1,19.4970032,5,5,GBM,brain,primary tumour,GGSE223063_GLIO_1
GSE223063_GLIO_HFC1_AAACGAACAGGTTCAT-1,GSE223063,22497,3943,tumour,Glioblastoma,1_HFC,GSE223063_HFC1,7.0542739,2,2,GBM,brain,primary tumour,GGSE223063_GLIO_1
GSE223063_GLIO_HFC1_AAACGAATCCATTGTT-1,GSE223063,5392,1606,tumour,Glioblastoma,1_HFC,GSE223063_HFC1,0.3523739,10,10,GBM,brain,primary tumour,GGSE223063_GLIO_1
GSE223063_GLIO_HFC1_AAACGCTAGTAATTGG-1,GSE223063,21959,4275,tumour,Glioblastoma,1_HFC,GSE223063_HFC1,7.1633499,2,2,GBM,brain,primary tumour,GGSE223063_GLIO_1
GSE223063_GLIO_HFC1_AAAGGGCCAGCTCGGT-1,GSE223063,4470,1560,tumour,Glioblastoma,1_HFC,GSE223063_HFC1,5.704698,5,5,GBM,brain,primary tumour,GGSE223063_GLIO_1
GSE223063_GLIO_HFC1_AAAGGTAAGAATCTAG-1,GSE223063,47645,6125,tumour,Glioblastoma,1_HFC,GSE223063_HFC1,7.8686116,2,2,GBM,brain,primary tumour,GGSE223063_GLIO_1


In [21]:
#exclude any samples with <100 cells
table(GLIO$integration_id)
#none to exclude


GGSE223063_GLIO_1 GGSE223063_GLIO_2 GGSE223063_GLIO_3 
             4549               447              1302 

In [22]:
#join layers and then split them by integration_id
Layers(GLIO[["RNA"]])
#join layers
GLIO[["RNA"]] <- JoinLayers(GLIO[["RNA"]])
Layers(GLIO[["RNA"]])
#split layers
GLIO[["RNA"]] <- split(GLIO[["RNA"]], f = GLIO$integration_id)
Layers(GLIO[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [23]:
#record number of cells
table(GLIO$integration_id)
GLIO
GLIO@project.name


GGSE223063_GLIO_1 GGSE223063_GLIO_2 GGSE223063_GLIO_3 
             4549               447              1302 

An object of class Seurat 
33538 features across 6298 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 7 layers present: counts.GGSE223063_GLIO_1, counts.GGSE223063_GLIO_2, counts.GGSE223063_GLIO_3, scale.data, data.GGSE223063_GLIO_1, data.GGSE223063_GLIO_2, data.GGSE223063_GLIO_3

In [24]:
#re-export seurat object ready for integration
saveRDS(GLIO, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE223063_myeloid_int.RDS")

In [25]:
#remove all objects in R
rm(list = ls())

## GSE167297

In [57]:
GAC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE167297_myeloid.RDS")

In [58]:
GAC
GAC@project.name
head(GAC@meta.data)

An object of class Seurat 
32738 features across 2369 samples within 1 assay 
Active assay: RNA (32738 features, 2000 variable features)
 29 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE167297_GC_Pt1_Norm_ACAGCCGAGTAGCCGA-1,GSE167297,3291,716,Healthy,Healthy,Pt_1,GSE167297_GC_Pt1_Norm,0.8204193,5,5
GSE167297_GC_Pt1_Norm_ACTTGTTTCTTGCATT-1,GSE167297,704,356,Healthy,Healthy,Pt_1,GSE167297_GC_Pt1_Norm,7.1022727,5,5
GSE167297_GC_Pt1_Norm_CTCGTCAGTCAAAGAT-1,GSE167297,767,367,Healthy,Healthy,Pt_1,GSE167297_GC_Pt1_Norm,33.8983051,5,5
GSE167297_GC_Pt1_Norm_GCTTCCAAGCTACCGC-1,GSE167297,790,324,Healthy,Healthy,Pt_1,GSE167297_GC_Pt1_Norm,10.1265823,5,5
GSE167297_GC_Pt1_Norm_GGAATAAAGAATCTCC-1,GSE167297,2218,851,Healthy,Healthy,Pt_1,GSE167297_GC_Pt1_Norm,11.496844,5,5
GSE167297_GC_Pt1_Norm_GGAGCAATCCGAACGC-1,GSE167297,5094,1111,Healthy,Healthy,Pt_1,GSE167297_GC_Pt1_Norm,5.3788771,5,5


In [59]:
table(GAC$sample_type)
table(GAC$cancer_type)
table(GAC$patient_id)
table(GAC$sample_id)


Healthy  tumour 
     40    2329 


Diffuse-type GC Deep  Diffuse-type GC Sup              Healthy 
                1773                  556                   40 


Pt_1 Pt_2 Pt_3 Pt_4 Pt_5 
 177  457 1360  261  114 


GSE167297_GC_Pt1_Deep GSE167297_GC_Pt1_Norm  GSE167297_GC_Pt1_Sup 
                   58                    10                   109 
GSE167297_GC_Pt2_Deep  GSE167297_GC_Pt2_Sup GSE167297_GC_Pt3_Deep 
                  266                   191                  1231 
GSE167297_GC_Pt3_Norm  GSE167297_GC_Pt3_Sup GSE167297_GC_Pt4_Deep 
                    5                   124                   136 
GSE167297_GC_Pt4_Norm  GSE167297_GC_Pt4_Sup GSE167297_GC_Pt5_Deep 
                   13                   112                    82 
GSE167297_GC_Pt5_Norm  GSE167297_GC_Pt5_Sup 
                   12                    20 

In [60]:
#Remove the healthy controls now as all less than 100 cells
GAC <- subset(GAC, !(subset = cancer_type %in% c("Healthy")))

In [61]:
table(GAC$sample_type)
table(GAC$cancer_type)
table(GAC$patient_id)
table(GAC$sample_id)


tumour 
  2329 


Diffuse-type GC Deep  Diffuse-type GC Sup 
                1773                  556 


Pt_1 Pt_2 Pt_3 Pt_4 Pt_5 
 167  457 1355  248  102 


GSE167297_GC_Pt1_Deep  GSE167297_GC_Pt1_Sup GSE167297_GC_Pt2_Deep 
                   58                   109                   266 
 GSE167297_GC_Pt2_Sup GSE167297_GC_Pt3_Deep  GSE167297_GC_Pt3_Sup 
                  191                  1231                   124 
GSE167297_GC_Pt4_Deep  GSE167297_GC_Pt4_Sup GSE167297_GC_Pt5_Deep 
                  136                   112                    82 
 GSE167297_GC_Pt5_Sup 
                   20 

In [62]:
#set cancer_subtype metadata
GAC@meta.data$cancer_subtype <- "GAC"

#set site metadata 
GAC@meta.data$site <- "stomach"

#set sample_type_major metadata
GAC@meta.data$sample_type_major <- "primary tumour"

In [63]:
table(GAC$patient_id)


Pt_1 Pt_2 Pt_3 Pt_4 Pt_5 
 167  457 1355  248  102 

In [64]:
#split by patient_id for each patient
GAC_1 <- subset(GAC, subset = patient_id %in% c("Pt_1"))
GAC_2 <- subset(GAC, subset = patient_id %in% c("Pt_2"))
GAC_3 <- subset(GAC, subset = patient_id %in% c("Pt_3"))
GAC_4 <- subset(GAC, subset = patient_id %in% c("Pt_4"))
GAC_5 <- subset(GAC, subset = patient_id %in% c("Pt_5"))

In [65]:
table(GAC_1$sample_id)
table(GAC_2$sample_id)
table(GAC_3$sample_id)
table(GAC_4$sample_id)
table(GAC_5$sample_id)


GSE167297_GC_Pt1_Deep  GSE167297_GC_Pt1_Sup 
                   58                   109 


GSE167297_GC_Pt2_Deep  GSE167297_GC_Pt2_Sup 
                  266                   191 


GSE167297_GC_Pt3_Deep  GSE167297_GC_Pt3_Sup 
                 1231                   124 


GSE167297_GC_Pt4_Deep  GSE167297_GC_Pt4_Sup 
                  136                   112 


GSE167297_GC_Pt5_Deep  GSE167297_GC_Pt5_Sup 
                   82                    20 

In [66]:
#set integration_id metadata
GAC_1@meta.data$integration_id <- "GSE167297_GAC_Pt1"
GAC_2@meta.data$integration_id <- "GSE167297_GAC_Pt2"
GAC_3@meta.data$integration_id <- "GSE167297_GAC_Pt3"
GAC_4@meta.data$integration_id <- "GSE167297_GAC_Pt4"
GAC_5@meta.data$integration_id <- "GSE167297_GAC_Pt5"

#merge back together 
GAC <- merge(GAC_1, y = c(GAC_2, GAC_3, GAC_4, GAC_5), project = "GSE167297")

In [67]:
GAC
GAC@project.name
head(GAC@meta.data)

An object of class Seurat 
32738 features across 2329 samples within 1 assay 
Active assay: RNA (32738 features, 2000 variable features)
 25 layers present: counts.2.1, counts.3.1, data.2.1, data.3.1, scale.data.1, counts.4.2, counts.5.2, data.4.2, data.5.2, scale.data.2, counts.7.3, counts.8.3, data.7.3, data.8.3, scale.data.3, counts.10.4, counts.11.4, data.10.4, data.11.4, scale.data.4, counts.13.5, counts.14.5, data.13.5, data.14.5, scale.data.5

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,cancer_subtype,site,sample_type_major,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE167297_GC_Pt1_Sup_AAACCTGAGCGTTTAC-1,GSE167297,2462,667,tumour,Diffuse-type GC Sup,Pt_1,GSE167297_GC_Pt1_Sup,27.822908,5,5,GAC,stomach,primary tumour,GSE167297_GAC_Pt1
GSE167297_GC_Pt1_Sup_AAAGCAAGTTACGTCA-1,GSE167297,12409,1926,tumour,Diffuse-type GC Sup,Pt_1,GSE167297_GC_Pt1_Sup,3.884278,5,5,GAC,stomach,primary tumour,GSE167297_GAC_Pt1
GSE167297_GC_Pt1_Sup_AACCATGAGCCCAGCT-1,GSE167297,10298,1661,tumour,Diffuse-type GC Sup,Pt_1,GSE167297_GC_Pt1_Sup,3.641484,5,5,GAC,stomach,primary tumour,GSE167297_GAC_Pt1
GSE167297_GC_Pt1_Sup_AACTCCCAGAAGCCCA-1,GSE167297,2160,671,tumour,Diffuse-type GC Sup,Pt_1,GSE167297_GC_Pt1_Sup,32.5,5,5,GAC,stomach,primary tumour,GSE167297_GAC_Pt1
GSE167297_GC_Pt1_Sup_AAGACCTGTAGCTCCG-1,GSE167297,1653,622,tumour,Diffuse-type GC Sup,Pt_1,GSE167297_GC_Pt1_Sup,10.46582,5,5,GAC,stomach,primary tumour,GSE167297_GAC_Pt1
GSE167297_GC_Pt1_Sup_AATCCAGTCTCAAGTG-1,GSE167297,5282,1358,tumour,Diffuse-type GC Sup,Pt_1,GSE167297_GC_Pt1_Sup,3.653919,11,11,GAC,stomach,primary tumour,GSE167297_GAC_Pt1


In [68]:
#exclude any samples with <100 cells
table(GAC$integration_id)
#none to exclude


GSE167297_GAC_Pt1 GSE167297_GAC_Pt2 GSE167297_GAC_Pt3 GSE167297_GAC_Pt4 
              167               457              1355               248 
GSE167297_GAC_Pt5 
              102 

In [69]:
#join layers and then split them by integration_id
Layers(GAC[["RNA"]])
#join layers
GAC[["RNA"]] <- JoinLayers(GAC[["RNA"]])
Layers(GAC[["RNA"]])
#split layers
GAC[["RNA"]] <- split(GAC[["RNA"]], f = GAC$integration_id)
Layers(GAC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [70]:
#record number of cells
table(GAC$integration_id)
GAC
GAC@project.name


GSE167297_GAC_Pt1 GSE167297_GAC_Pt2 GSE167297_GAC_Pt3 GSE167297_GAC_Pt4 
              167               457              1355               248 
GSE167297_GAC_Pt5 
              102 

An object of class Seurat 
32738 features across 2329 samples within 1 assay 
Active assay: RNA (32738 features, 2000 variable features)
 11 layers present: counts.GSE167297_GAC_Pt1, counts.GSE167297_GAC_Pt2, counts.GSE167297_GAC_Pt3, counts.GSE167297_GAC_Pt4, counts.GSE167297_GAC_Pt5, scale.data, data.GSE167297_GAC_Pt1, data.GSE167297_GAC_Pt2, data.GSE167297_GAC_Pt3, data.GSE167297_GAC_Pt4, data.GSE167297_GAC_Pt5

In [71]:
#re-export seurat object ready for integration
saveRDS(GAC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE167297_myeloid_int.RDS")

In [72]:
#remove all objects in R
rm(list = ls())

## GSE234129

In [73]:
GAC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE234129_myeloid.RDS")

In [74]:
GAC
GAC@project.name
head(GAC@meta.data)

An object of class Seurat 
27176 features across 1186 samples within 1 assay 
Active assay: RNA (27176 features, 2000 variable features)
 23 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,cell_barcodes,patient,sample,celltype,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE234129_GAC_ACTTGTTCACAGCCCA_N-QJJ-5,GSE234129,880,393,ACTTGTTCACAGCCCA_N-QJJ-5,MDA_Pt1,MDA_Pt1-Ad,TAM_C4,Healthy,Healthy,Pt-1,GSE234129_Healthy_Pt-1,6.590909,3,3
GSE234129_GAC_AGCATACCAACTGCTA_N-QJJ-5,GSE234129,2758,849,AGCATACCAACTGCTA_N-QJJ-5,MDA_Pt1,MDA_Pt1-Ad,TAM_C4,Healthy,Healthy,Pt-1,GSE234129_Healthy_Pt-1,2.066715,3,3
GSE234129_GAC_CTACACCCATTCGACA_N-QJJ-5,GSE234129,807,362,CTACACCCATTCGACA_N-QJJ-5,MDA_Pt1,MDA_Pt1-Ad,TAM_C4,Healthy,Healthy,Pt-1,GSE234129_Healthy_Pt-1,14.745973,3,3
GSE234129_GAC_GTATTCTAGTAGGTGC_N-QJJ-5,GSE234129,1128,454,GTATTCTAGTAGGTGC_N-QJJ-5,MDA_Pt1,MDA_Pt1-Ad,TAM_C4,Healthy,Healthy,Pt-1,GSE234129_Healthy_Pt-1,7.978723,3,3
GSE234129_GAC_AAACCTGAGGATGGAA_Ca-QJJ-5,GSE234129,12689,2845,AAACCTGAGGATGGAA_Ca-QJJ-5,MDA_Pt1,MDA_Pt1-Ca,TAM_C3,tumour,GAC primary,Pt-1,GSE234129_GAC_Pt-1,4.704862,3,3
GSE234129_GAC_AAACCTGGTCGAGTTT_Ca-QJJ-5,GSE234129,5271,2044,AAACCTGGTCGAGTTT_Ca-QJJ-5,MDA_Pt1,MDA_Pt1-Ca,TAM_C0,tumour,GAC primary,Pt-1,GSE234129_GAC_Pt-1,5.236198,3,3


In [75]:
table(GAC$sample_type)
table(GAC$cancer_type)
table(GAC$patient_id)
table(GAC$sample_id)


Healthy  tumour 
     81    1105 


  GAC liver mets GAC ovarian mets      GAC primary          Healthy 
             163              344              598               81 


Pt-1 Pt-2 Pt-3 Pt-4 Pt-5 Pt-9 
 197  165  115   70  445  194 


        GSE234129_GAC_Pt-1         GSE234129_GAC_Pt-2 
                       193                        165 
        GSE234129_GAC_Pt-3         GSE234129_GAC_Pt-4 
                       115                         70 
        GSE234129_GAC_Pt-5         GSE234129_GAC_Pt-9 
                        39                         16 
GSE234129_GAC-Li-mets_Pt-9 GSE234129_GAC-Ov-mets_Pt-5 
                       163                        344 
    GSE234129_Healthy_Pt-1     GSE234129_Healthy_Pt-5 
                         4                         62 
    GSE234129_Healthy_Pt-9 
                        15 

In [76]:
#Remove the healthy controls now as all less than 100 cells
GAC <- subset(GAC, !(subset = sample_type %in% c("Healthy")))

In [77]:
table(GAC$sample_type)
table(GAC$cancer_type)
table(GAC$patient_id)
table(GAC$sample_id)


tumour 
  1105 


  GAC liver mets GAC ovarian mets      GAC primary 
             163              344              598 


Pt-1 Pt-2 Pt-3 Pt-4 Pt-5 Pt-9 
 193  165  115   70  383  179 


        GSE234129_GAC_Pt-1         GSE234129_GAC_Pt-2 
                       193                        165 
        GSE234129_GAC_Pt-3         GSE234129_GAC_Pt-4 
                       115                         70 
        GSE234129_GAC_Pt-5         GSE234129_GAC_Pt-9 
                        39                         16 
GSE234129_GAC-Li-mets_Pt-9 GSE234129_GAC-Ov-mets_Pt-5 
                       163                        344 

In [78]:
#split by cancer_type 
GAC_L <- subset(GAC, subset = cancer_type %in% c("GAC liver mets"))
GAC_O <- subset(GAC, subset = cancer_type %in% c("GAC ovarian mets"))
GAC_T <- subset(GAC, subset = cancer_type %in% c("GAC primary"))

#set cancer_subtype metadata
GAC_L@meta.data$cancer_subtype <- "GAC"
GAC_O@meta.data$cancer_subtype <- "GAC"
GAC_T@meta.data$cancer_subtype <- "GAC"

#set site metadata 
GAC_L@meta.data$site <- "liver"
GAC_O@meta.data$site <- "ovary"
GAC_T@meta.data$site <- "stomach"

#set sample_type_major metadata
GAC_L@meta.data$sample_type_major <- "metastatic tumour"
GAC_O@meta.data$sample_type_major <- "metastatic tumour"
GAC_T@meta.data$sample_type_major <- "primary tumour"

#set integration_id metadata
GAC_L@meta.data$integration_id <- GAC_L@meta.data$sample_id
GAC_O@meta.data$integration_id <- GAC_O@meta.data$sample_id
GAC_T@meta.data$integration_id <- GAC_T@meta.data$sample_id

#merge back together 
GAC <- merge(GAC_L, y = c(GAC_O, GAC_T), project = "GSE234129")

In [79]:
GAC
GAC@project.name
head(GAC@meta.data)

An object of class Seurat 
27176 features across 1105 samples within 1 assay 
Active assay: RNA (27176 features, 2000 variable features)
 19 layers present: counts.11.1, data.11.1, scale.data.1, counts.8.2, data.8.2, scale.data.2, counts.2.3, counts.3.3, counts.4.3, counts.5.3, counts.7.3, counts.10.3, data.2.3, data.3.3, data.4.3, data.5.3, data.7.3, data.10.3, scale.data.3

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,cell_barcodes,patient,sample,celltype,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,cancer_subtype,site,sample_type_major,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE234129_GAC_AAACGGGCAAGCGATG_M1-0327,GSE234129,12754,2949,AAACGGGCAAGCGATG_M1-0327,MDA_Pt9,MDA_Pt9-Li,TAM_C0,tumour,GAC liver mets,Pt-9,GSE234129_GAC-Li-mets_Pt-9,4.382939,3,3,GAC,liver,metastatic tumour,GSE234129_GAC-Li-mets_Pt-9
GSE234129_GAC_AAAGCAACAACACGCC_M1-0327,GSE234129,4607,1612,AAAGCAACAACACGCC_M1-0327,MDA_Pt9,MDA_Pt9-Li,TAM_C3,tumour,GAC liver mets,Pt-9,GSE234129_GAC-Li-mets_Pt-9,5.578468,3,3,GAC,liver,metastatic tumour,GSE234129_GAC-Li-mets_Pt-9
GSE234129_GAC_AAATGCCAGGTGTGGT_M1-0327,GSE234129,3070,1403,AAATGCCAGGTGTGGT_M1-0327,MDA_Pt9,MDA_Pt9-Li,TAM_C3,tumour,GAC liver mets,Pt-9,GSE234129_GAC-Li-mets_Pt-9,11.856678,3,3,GAC,liver,metastatic tumour,GSE234129_GAC-Li-mets_Pt-9
GSE234129_GAC_AACCATGCACCACCAG_M1-0327,GSE234129,1458,736,AACCATGCACCACCAG_M1-0327,MDA_Pt9,MDA_Pt9-Li,Classical Mono_C2,tumour,GAC liver mets,Pt-9,GSE234129_GAC-Li-mets_Pt-9,10.836763,3,3,GAC,liver,metastatic tumour,GSE234129_GAC-Li-mets_Pt-9
GSE234129_GAC_AACGTTGGTTCCGTCT_M1-0327,GSE234129,3365,1394,AACGTTGGTTCCGTCT_M1-0327,MDA_Pt9,MDA_Pt9-Li,Classical Mono_C1,tumour,GAC liver mets,Pt-9,GSE234129_GAC-Li-mets_Pt-9,7.726597,3,3,GAC,liver,metastatic tumour,GSE234129_GAC-Li-mets_Pt-9
GSE234129_GAC_AACTCAGGTAAGTAGT_M1-0327,GSE234129,4682,802,AACTCAGGTAAGTAGT_M1-0327,MDA_Pt9,MDA_Pt9-Li,TAM_C4,tumour,GAC liver mets,Pt-9,GSE234129_GAC-Li-mets_Pt-9,13.114054,3,3,GAC,liver,metastatic tumour,GSE234129_GAC-Li-mets_Pt-9


In [81]:
#exclude any samples with <100 cells
table(GAC$integration_id)
#exclude GSE234129_GAC_Pt-4, GSE234129_GAC_Pt-5, GSE234129_GAC_Pt-9
GAC <- subset(GAC, !(subset = integration_id %in% c("GSE234129_GAC_Pt-4","GSE234129_GAC_Pt-5","GSE234129_GAC_Pt-9")))
table(GAC$integration_id)


        GSE234129_GAC_Pt-1         GSE234129_GAC_Pt-2 
                       193                        165 
        GSE234129_GAC_Pt-3         GSE234129_GAC_Pt-4 
                       115                         70 
        GSE234129_GAC_Pt-5         GSE234129_GAC_Pt-9 
                        39                         16 
GSE234129_GAC-Li-mets_Pt-9 GSE234129_GAC-Ov-mets_Pt-5 
                       163                        344 


        GSE234129_GAC_Pt-1         GSE234129_GAC_Pt-2 
                       193                        165 
        GSE234129_GAC_Pt-3 GSE234129_GAC-Li-mets_Pt-9 
                       115                        163 
GSE234129_GAC-Ov-mets_Pt-5 
                       344 

In [82]:
#join layers and then split them by integration_id
Layers(GAC[["RNA"]])
#join layers
GAC[["RNA"]] <- JoinLayers(GAC[["RNA"]])
Layers(GAC[["RNA"]])
#split layers
GAC[["RNA"]] <- split(GAC[["RNA"]], f = GAC$integration_id)
Layers(GAC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [83]:
#record number of cells
table(GAC$integration_id)
GAC
GAC@project.name


        GSE234129_GAC_Pt-1         GSE234129_GAC_Pt-2 
                       193                        165 
        GSE234129_GAC_Pt-3 GSE234129_GAC-Li-mets_Pt-9 
                       115                        163 
GSE234129_GAC-Ov-mets_Pt-5 
                       344 

An object of class Seurat 
27176 features across 980 samples within 1 assay 
Active assay: RNA (27176 features, 2000 variable features)
 11 layers present: counts.GSE234129_GAC-Li-mets_Pt-9, counts.GSE234129_GAC-Ov-mets_Pt-5, counts.GSE234129_GAC_Pt-1, counts.GSE234129_GAC_Pt-2, counts.GSE234129_GAC_Pt-3, scale.data, data.GSE234129_GAC-Li-mets_Pt-9, data.GSE234129_GAC-Ov-mets_Pt-5, data.GSE234129_GAC_Pt-1, data.GSE234129_GAC_Pt-2, data.GSE234129_GAC_Pt-3

In [84]:
#re-export seurat object ready for integration
saveRDS(GAC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE234129_myeloid_int.RDS")

In [85]:
#remove all objects in R
rm(list = ls())

## GSE180661

In [1]:
OC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE180661_myeloid.RDS")

In [2]:
OC
OC@project.name
head(OC@meta.data)

Loading required package: SeuratObject

Loading required package: sp




Attaching package: ‘SeuratObject’


The following object is masked from ‘package:base’:

    intersect




An object of class Seurat 
32223 features across 202223 samples within 1 assay 
Active assay: RNA (32223 features, 0 variable features)
 2 layers present: counts, data

Unnamed: 0_level_0,sample,cell_type,percent.mt,nCount_RNA,nFeature_RNA,umap50_1,umap50_2,cluster_label,cluster_label_sub,cell_type_super,⋯,tumor_site,tumor_supersite,sort_parameters,therapy,surgery,sample_type,cancer_type,sample_id,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<chr>,<dbl>,<dbl>,<int>,<dbl>,<dbl>,<chr>,<chr>,<chr>,⋯,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE180661_HGSOC_SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM_AACACACCAAGACCTT,SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM,Myeloid.cell,11.516333,10255,3341,-6.259316,-6.726644,M2.SELENOP,,Myeloid.super,⋯,Omentum,Omentum,"singlet, live, CD45+",pre-Rx,S1,Omentum,HGSOC,SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM,1,1
GSE180661_HGSOC_SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM_AACAGGGAGTCCCGGT,SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM,Myeloid.cell,6.619271,14488,4002,-6.357782,-7.509463,Clearing.M,,Myeloid.super,⋯,Omentum,Omentum,"singlet, live, CD45+",pre-Rx,S1,Omentum,HGSOC,SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM,1,1
GSE180661_HGSOC_SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM_AACCACAAGACAGCGT,SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM,Myeloid.cell,14.449618,12028,3274,-5.212832,-10.468597,M1.S100A8,,Myeloid.super,⋯,Omentum,Omentum,"singlet, live, CD45+",pre-Rx,S1,Omentum,HGSOC,SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM,1,1
GSE180661_HGSOC_SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM_AACCACAAGTCGAATA,SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM,Myeloid.cell,7.51928,4668,1787,-4.814623,-12.559104,M1.S100A8,,Myeloid.super,⋯,Omentum,Omentum,"singlet, live, CD45+",pre-Rx,S1,Omentum,HGSOC,SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM,5,5
GSE180661_HGSOC_SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM_AACCATGTCCTGTTAT,SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM,Myeloid.cell,3.620654,4861,1329,-5.429079,-8.997123,,,Myeloid.super,⋯,Omentum,Omentum,"singlet, live, CD45+",pre-Rx,S1,Omentum,HGSOC,SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM,1,1
GSE180661_HGSOC_SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM_AACTTCTCATCCAATG,SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM,Myeloid.cell,10.215054,8370,2586,-5.818494,-12.637597,M1.S100A8,,Myeloid.super,⋯,Omentum,Omentum,"singlet, live, CD45+",pre-Rx,S1,Omentum,HGSOC,SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM,5,5


In [3]:
table(OC$sample_type)
table(OC$cancer_type)
table(OC$patient_id)
table(OC$sample_id)


    Adnexa    Ascites      Bowel    Omentum      Other Peritoneum         UQ 
     82132      20853      17962      38303       7106      20966      14901 


 HGSOC 
202223 


SPECTRUM-OV-002 SPECTRUM-OV-003 SPECTRUM-OV-007 SPECTRUM-OV-008 SPECTRUM-OV-009 
            820            8260             454             619           20429 
SPECTRUM-OV-014 SPECTRUM-OV-022 SPECTRUM-OV-024 SPECTRUM-OV-025 SPECTRUM-OV-026 
           2073           11424            5143            5766            7808 
SPECTRUM-OV-031 SPECTRUM-OV-036 SPECTRUM-OV-037 SPECTRUM-OV-041 SPECTRUM-OV-042 
            985            1412            5771            3311            2971 
SPECTRUM-OV-045 SPECTRUM-OV-049 SPECTRUM-OV-050 SPECTRUM-OV-051 SPECTRUM-OV-052 
           5285            2387            7725            3235            7208 
SPECTRUM-OV-053 SPECTRUM-OV-054 SPECTRUM-OV-065 SPECTRUM-OV-067 SPECTRUM-OV-068 
           5198            1381            1730            1212            1313 
SPECTRUM-OV-070 SPECTRUM-OV-071 SPECTRUM-OV-075 SPECTRUM-OV-077 SPECTRUM-OV-080 
           9479            3205            4459            5641            3573 
SPECTRUM-OV-081 SPECTRUM-OV


                              SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM 
                                                                      228 
                                     SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY 
                                                                      592 
                           SPECTRUM-OV-003_S1_UNSORTED_INFRACOLIC_OMENTUM 
                                                                     1115 
                                  SPECTRUM-OV-003_S1_UNSORTED_LEFT_ADNEXA 
                                                                     1963 
                          SPECTRUM-OV-003_S1_UNSORTED_LEFT_UPPER_QUADRANT 
                                                                      925 
                            SPECTRUM-OV-003_S1_UNSORTED_PELVIC_PERITONEUM 
                                                                     2031 
                                 SPECTRUM-OV-003_S1_UNSORTED_RIGHT_ADNEXA 
                        

In [4]:
#set cancer_subtype metadata
OC@meta.data$cancer_subtype <- "HGSOC"

In [5]:
table(OC$sample_type)


    Adnexa    Ascites      Bowel    Omentum      Other Peritoneum         UQ 
     82132      20853      17962      38303       7106      20966      14901 

In [6]:
#note: adnexa primary, everything else mets/ascites

#split by sample_type
OC_Ad <- subset(OC, subset = sample_type %in% c("Adnexa"))
OC_As <- subset(OC, subset = sample_type %in% c("Ascites"))
OC_Bo <- subset(OC, subset = sample_type %in% c("Bowel"))
OC_Om <- subset(OC, subset = sample_type %in% c("Omentum"))
OC_Ot <- subset(OC, subset = sample_type %in% c("Other"))
OC_Pe <- subset(OC, subset = sample_type %in% c("Peritoneum"))
OC_Uq <- subset(OC, subset = sample_type %in% c("UQ"))

#set sample_type_major metadata
OC_Ad@meta.data$sample_type_major <- "primary tumour"
OC_As@meta.data$sample_type_major <- "ascites"
OC_Bo@meta.data$sample_type_major <- "metastatic tumour"
OC_Om@meta.data$sample_type_major <- "metastatic tumour"
OC_Ot@meta.data$sample_type_major <- "metastatic tumour"
OC_Pe@meta.data$sample_type_major <- "metastatic tumour"
OC_Uq@meta.data$sample_type_major <- "metastatic tumour"

In [7]:
#set site metadata 
OC_Ad@meta.data$site <- "ovary"
OC_As@meta.data$site <- "ascites fluid"
OC_Bo@meta.data$site <- "bowel"
OC_Om@meta.data$site <- "omentum"
OC_Pe@meta.data$site <- "peritoneum"
OC_Uq@meta.data$site <- "upper abdomen"

In [8]:
#before setting site clarifying details - other category will need to do individually 
table(OC_Ot$sample_id)


                         SPECTRUM-OV-007_S1_CD45P_ANTERIOR_ABDOMINAL_WALL 
                                                                       81 
SPECTRUM-OV-008_S1_CD45P_LEFT_PARARENAL_LYMPH_NODE_WITH_COLONIC_MESENTARY 
                                                                       54 
                           SPECTRUM-OV-031_S1_CD45P_INFRARENAL_LYMPH_NODE 
                                                                      124 
                                  SPECTRUM-OV-045_S1_CD45P_PELVIC_IMPLANT 
                                                                     1268 
                                 SPECTRUM-OV-070_S1_CD45P_HEPATIC_SURFACE 
                                                                     3206 
                                         SPECTRUM-OV-105_S1_CD45P_BLADDER 
                                                                     1257 
                          SPECTRUM-OV-116_S1_CD45P_RIGHT_PARACOLIC_GUTTER 
                        

In [9]:
#remove other sample that is pelvic implant and the two with less than 100 cells
OC_Ot <- subset(OC_Ot, !(subset = sample_id %in% c("SPECTRUM-OV-045_S1_CD45P_PELVIC_IMPLANT","SPECTRUM-OV-007_S1_CD45P_ANTERIOR_ABDOMINAL_WALL","SPECTRUM-OV-008_S1_CD45P_LEFT_PARARENAL_LYMPH_NODE_WITH_COLONIC_MESENTARY")))


In [10]:
table(OC_Ot$sample_id)


 SPECTRUM-OV-031_S1_CD45P_INFRARENAL_LYMPH_NODE 
                                            124 
       SPECTRUM-OV-070_S1_CD45P_HEPATIC_SURFACE 
                                           3206 
               SPECTRUM-OV-105_S1_CD45P_BLADDER 
                                           1257 
SPECTRUM-OV-116_S1_CD45P_RIGHT_PARACOLIC_GUTTER 
                                           1116 

In [11]:
#split further
OC_Ot_31 <- subset(OC_Ot, subset = sample_id %in% c("SPECTRUM-OV-031_S1_CD45P_INFRARENAL_LYMPH_NODE"))
OC_Ot_70 <- subset(OC_Ot, subset = sample_id %in% c("SPECTRUM-OV-070_S1_CD45P_HEPATIC_SURFACE"))
OC_Ot_105 <- subset(OC_Ot, subset = sample_id %in% c("SPECTRUM-OV-105_S1_CD45P_BLADDER"))
OC_Ot_116 <- subset(OC_Ot, subset = sample_id %in% c("SPECTRUM-OV-116_S1_CD45P_RIGHT_PARACOLIC_GUTTER"))

OC_Ot_31@meta.data$site <- "lymph node"
OC_Ot_70@meta.data$site <- "liver"
OC_Ot_105@meta.data$site <- "bladder"
OC_Ot_116@meta.data$site <- "peritoneum"

#merge OC_Ots back together 
OC_Ot <- merge(OC_Ot_31, y = c(OC_Ot_70, OC_Ot_105, OC_Ot_116), project = "GSE180661")


In [12]:
#next need to set integration_id, want it to be distinct if samples are from different sides, but same if multiple samples from same site
#checked below, the only times there are more than one sample from the same patient at the same site it is because they are left and right
#therefore can merge everything back toether and set integration_id as sample_id but need to add GSE180661 in front

In [13]:
table(OC_Ad$sample_id)
table(OC_As$sample_id)
table(OC_Bo$sample_id)
table(OC_Om$sample_id)
table(OC_Pe$sample_id)
table(OC_Uq$sample_id)
table(OC_Ot$sample_id)


         SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY 
                                          592 
      SPECTRUM-OV-003_S1_UNSORTED_LEFT_ADNEXA 
                                         1963 
     SPECTRUM-OV-003_S1_UNSORTED_RIGHT_ADNEXA 
                                         1095 
         SPECTRUM-OV-007_S1_CD45P_LEFT_ADNEXA 
                                           97 
          SPECTRUM-OV-009_S1_CD45P_LEFT_OVARY 
                                         3108 
         SPECTRUM-OV-009_S1_CD45P_RIGHT_OVARY 
                                         3608 
          SPECTRUM-OV-014_S1_CD45P_LEFT_OVARY 
                                          376 
         SPECTRUM-OV-022_S1_CD45P_LEFT_ADNEXA 
                                         4260 
        SPECTRUM-OV-022_S1_CD45P_RIGHT_ADNEXA 
                                         2967 
         SPECTRUM-OV-025_S1_CD45P_RIGHT_OVARY 
                                         1994 
         SPECTRUM-OV-026_S1_CD45P_LEFT_ADNEXA 
            


SPECTRUM-OV-007_S1_CD45P_ASCITES SPECTRUM-OV-009_S1_CD45P_ASCITES 
                              97                               52 
SPECTRUM-OV-014_S1_CD45P_ASCITES SPECTRUM-OV-022_S1_CD45P_ASCITES 
                             317                             2566 
SPECTRUM-OV-024_S1_CD45P_ASCITES SPECTRUM-OV-026_S1_CD45P_ASCITES 
                            1591                              313 
SPECTRUM-OV-037_S1_CD45P_ASCITES SPECTRUM-OV-041_S1_CD45P_ASCITES 
                              65                             1738 
SPECTRUM-OV-042_S1_CD45P_ASCITES SPECTRUM-OV-050_S1_CD45P_ASCITES 
                             994                              670 
SPECTRUM-OV-051_S1_CD45P_ASCITES SPECTRUM-OV-054_S1_CD45P_ASCITES 
                            1537                              207 
SPECTRUM-OV-065_S1_CD45P_ASCITES SPECTRUM-OV-068_S1_CD45P_ASCITES 
                              94                              157 
SPECTRUM-OV-070_S1_CD45P_ASCITES SPECTRUM-OV-071_S1_CD45P_ASC


      SPECTRUM-OV-007_S1_CD45P_BOWEL       SPECTRUM-OV-008_S1_CD45P_BOWEL 
                                  22                                  127 
      SPECTRUM-OV-009_S1_CD45P_BOWEL       SPECTRUM-OV-014_S1_CD45P_BOWEL 
                                1991                                  527 
      SPECTRUM-OV-022_S1_CD45P_BOWEL       SPECTRUM-OV-025_S1_CD45P_BOWEL 
                                1631                                 2226 
      SPECTRUM-OV-026_S1_CD45P_BOWEL SPECTRUM-OV-068_S1_CD45P_LARGE_BOWEL 
                                3232                                  405 
      SPECTRUM-OV-077_S1_CD45P_CECUM       SPECTRUM-OV-082_S1_CD45P_BOWEL 
                                1914                                 3017 
      SPECTRUM-OV-090_S1_CD45P_BOWEL       SPECTRUM-OV-107_S1_CD45P_BOWEL 
                                 299                                 1229 
      SPECTRUM-OV-110_S1_CD45P_BOWEL 
                                1342 


   SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM 
                                           228 
SPECTRUM-OV-003_S1_UNSORTED_INFRACOLIC_OMENTUM 
                                          1115 
   SPECTRUM-OV-007_S1_CD45P_INFRACOLIC_OMENTUM 
                                            77 
   SPECTRUM-OV-008_S1_CD45P_INFRACOLIC_OMENTUM 
                                           438 
   SPECTRUM-OV-009_S1_CD45P_INFRACOLIC_OMENTUM 
                                          3419 
   SPECTRUM-OV-024_S1_CD45P_INFRACOLIC_OMENTUM 
                                           442 
   SPECTRUM-OV-025_S1_CD45P_INFRACOLIC_OMENTUM 
                                          1546 
   SPECTRUM-OV-036_S1_CD45P_INFRACOLIC_OMENTUM 
                                           174 
   SPECTRUM-OV-037_S1_CD45P_INFRACOLIC_OMENTUM 
                                          1340 
   SPECTRUM-OV-041_S1_CD45P_INFRACOLIC_OMENTUM 
                                           595 
   SPECTRUM-OV-042_S1_CD45P_INFRACOLIC_


    SPECTRUM-OV-003_S1_UNSORTED_PELVIC_PERITONEUM 
                                             2031 
       SPECTRUM-OV-007_S1_CD45N_PELVIC_PERITONEUM 
                                               80 
       SPECTRUM-OV-009_S1_CD45P_PELVIC_PERITONEUM 
                                             3191 
       SPECTRUM-OV-014_S1_CD45P_PELVIC_PERITONEUM 
                                              486 
       SPECTRUM-OV-024_S1_CD45P_PELVIC_PERITONEUM 
                                             1071 
       SPECTRUM-OV-036_S1_CD45P_PELVIC_PERITONEUM 
                                               82 
       SPECTRUM-OV-042_S1_CD45P_PELVIC_PERITONEUM 
                                              632 
                  SPECTRUM-OV-070_S1_CD45P_PELVIS 
                                             2902 
SPECTRUM-OV-071_S1_CD45P_PELVIC_PERITONEAL_TUMOUR 
                                             1359 
       SPECTRUM-OV-080_S1_CD45P_PELVIC_PERITONEUM 
                              


 SPECTRUM-OV-003_S1_UNSORTED_LEFT_UPPER_QUADRANT 
                                             925 
SPECTRUM-OV-003_S1_UNSORTED_RIGHT_UPPER_QUADRANT 
                                            1131 
    SPECTRUM-OV-009_S1_CD45P_LEFT_UPPER_QUADRANT 
                                            2707 
   SPECTRUM-OV-009_S1_CD45P_RIGHT_UPPER_QUADRANT 
                                            2353 
   SPECTRUM-OV-014_S1_CD45P_RIGHT_UPPER_QUADRANT 
                                             367 
    SPECTRUM-OV-024_S1_CD45P_LEFT_UPPER_QUADRANT 
                                            2039 
            SPECTRUM-OV-037_S1_CD45P_LUQ_OMENTUM 
                                            1943 
        SPECTRUM-OV-042_S1_CD45P_RIGHT_DIAPHRAGM 
                                             696 
        SPECTRUM-OV-053_S1_CD45P_RIGHT_DIAPHRAGM 
                                            1767 
   SPECTRUM-OV-105_S1_CD45P_RIGHT_UPPER_QUADRANT 
                                             973 


 SPECTRUM-OV-031_S1_CD45P_INFRARENAL_LYMPH_NODE 
                                            124 
       SPECTRUM-OV-070_S1_CD45P_HEPATIC_SURFACE 
                                           3206 
               SPECTRUM-OV-105_S1_CD45P_BLADDER 
                                           1257 
SPECTRUM-OV-116_S1_CD45P_RIGHT_PARACOLIC_GUTTER 
                                           1116 

In [14]:
#merge back together 
OC <- merge(OC_Ad, y = c(OC_As, OC_Bo, OC_Om, OC_Pe, OC_Uq, OC_Ot), project = "GSE180661")

In [15]:
#set integration_id metadata
OC@meta.data$integration_id <- paste0("GSE180661_", OC@meta.data$sample_id)

In [16]:
table(OC$sample_id)
table(OC$integration_id)


      SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM 
                                              228 
             SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY 
                                              592 
   SPECTRUM-OV-003_S1_UNSORTED_INFRACOLIC_OMENTUM 
                                             1115 
          SPECTRUM-OV-003_S1_UNSORTED_LEFT_ADNEXA 
                                             1963 
  SPECTRUM-OV-003_S1_UNSORTED_LEFT_UPPER_QUADRANT 
                                              925 
    SPECTRUM-OV-003_S1_UNSORTED_PELVIC_PERITONEUM 
                                             2031 
         SPECTRUM-OV-003_S1_UNSORTED_RIGHT_ADNEXA 
                                             1095 
 SPECTRUM-OV-003_S1_UNSORTED_RIGHT_UPPER_QUADRANT 
                                             1131 
       SPECTRUM-OV-007_S1_CD45N_PELVIC_PERITONEUM 
                                               80 
                 SPECTRUM-OV-007_S1_CD45P_ASCITES 
                              


      GSE180661_SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM 
                                                        228 
             GSE180661_SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY 
                                                        592 
   GSE180661_SPECTRUM-OV-003_S1_UNSORTED_INFRACOLIC_OMENTUM 
                                                       1115 
          GSE180661_SPECTRUM-OV-003_S1_UNSORTED_LEFT_ADNEXA 
                                                       1963 
  GSE180661_SPECTRUM-OV-003_S1_UNSORTED_LEFT_UPPER_QUADRANT 
                                                        925 
    GSE180661_SPECTRUM-OV-003_S1_UNSORTED_PELVIC_PERITONEUM 
                                                       2031 
         GSE180661_SPECTRUM-OV-003_S1_UNSORTED_RIGHT_ADNEXA 
                                                       1095 
 GSE180661_SPECTRUM-OV-003_S1_UNSORTED_RIGHT_UPPER_QUADRANT 
                                                       1131 
       GSE180661_SPECTR

In [17]:
OC
OC@project.name
head(OC@meta.data)

An object of class Seurat 
32223 features across 200820 samples within 1 assay 
Active assay: RNA (32223 features, 0 variable features)
 2 layers present: counts, data

Unnamed: 0_level_0,sample,cell_type,percent.mt,nCount_RNA,nFeature_RNA,umap50_1,umap50_2,cluster_label,cluster_label_sub,cell_type_super,⋯,surgery,sample_type,cancer_type,sample_id,RNA_snn_res.0.2,seurat_clusters,cancer_subtype,sample_type_major,site,integration_id
Unnamed: 0_level_1,<chr>,<chr>,<dbl>,<dbl>,<int>,<dbl>,<dbl>,<chr>,<chr>,<chr>,⋯,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE180661_HGSOC_SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY_AAACCCATCACTTGGA,SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY,Myeloid.cell,9.9969568,19716,4604,-5.581082,-11.423828,Clearing.M,,Myeloid.super,⋯,S1,Adnexa,HGSOC,SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY,5,5,HGSOC,primary tumour,ovary,GSE180661_SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY
GSE180661_HGSOC_SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY_AAAGGATGTCGAACGA,SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY,Myeloid.cell,0.2450123,2857,1009,-2.26745,-9.315787,M2.MARCO,,Myeloid.super,⋯,S1,Adnexa,HGSOC,SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY,2,2,HGSOC,primary tumour,ovary,GSE180661_SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY
GSE180661_HGSOC_SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY_AAAGGTAAGAGAACCC,SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY,Myeloid.cell,9.9815695,10309,3237,-6.48956,-7.339522,M2.SELENOP,,Myeloid.super,⋯,S1,Adnexa,HGSOC,SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY,1,1,HGSOC,primary tumour,ovary,GSE180661_SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY
GSE180661_HGSOC_SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY_AAAGGTATCAAACTGC,SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY,Myeloid.cell,7.2906448,22426,4823,-5.466492,-11.45955,Clearing.M,,Myeloid.super,⋯,S1,Adnexa,HGSOC,SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY,5,5,HGSOC,primary tumour,ovary,GSE180661_SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY
GSE180661_HGSOC_SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY_AAAGTGAAGAGAGCGG,SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY,Myeloid.cell,6.6791025,19299,4588,-3.561478,-7.727219,Clearing.M,,Myeloid.super,⋯,S1,Adnexa,HGSOC,SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY,2,2,HGSOC,primary tumour,ovary,GSE180661_SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY
GSE180661_HGSOC_SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY_AAAGTGAAGTATAACG,SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY,Myeloid.cell,7.6674738,8673,2591,-6.008342,-11.60491,M1.S100A8,,Myeloid.super,⋯,S1,Adnexa,HGSOC,SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY,5,5,HGSOC,primary tumour,ovary,GSE180661_SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY


In [18]:
#exclude any samples with <100 cells
table(OC$integration_id)
#exclude 10 samples
OC <- subset(OC, !(subset = integration_id %in% c("GSE180661_SPECTRUM-OV-007_S1_CD45N_PELVIC_PERITONEUM","GSE180661_SPECTRUM-OV-007_S1_CD45P_ASCITES","GSE180661_SPECTRUM-OV-007_S1_CD45P_BOWEL","GSE180661_SPECTRUM-OV-007_S1_CD45P_INFRACOLIC_OMENTUM","GSE180661_SPECTRUM-OV-007_S1_CD45P_LEFT_ADNEXA","GSE180661_SPECTRUM-OV-009_S1_CD45P_ASCITES","GSE180661_SPECTRUM-OV-036_S1_CD45P_PELVIC_PERITONEUM","GSE180661_SPECTRUM-OV-037_S1_CD45P_ASCITES","GSE180661_SPECTRUM-OV-065_S1_CD45P_ASCITES","GSE180661_SPECTRUM-OV-081_S1_CD45P_ASCITES","")))
table(OC$integration_id)


      GSE180661_SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM 
                                                        228 
             GSE180661_SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY 
                                                        592 
   GSE180661_SPECTRUM-OV-003_S1_UNSORTED_INFRACOLIC_OMENTUM 
                                                       1115 
          GSE180661_SPECTRUM-OV-003_S1_UNSORTED_LEFT_ADNEXA 
                                                       1963 
  GSE180661_SPECTRUM-OV-003_S1_UNSORTED_LEFT_UPPER_QUADRANT 
                                                        925 
    GSE180661_SPECTRUM-OV-003_S1_UNSORTED_PELVIC_PERITONEUM 
                                                       2031 
         GSE180661_SPECTRUM-OV-003_S1_UNSORTED_RIGHT_ADNEXA 
                                                       1095 
 GSE180661_SPECTRUM-OV-003_S1_UNSORTED_RIGHT_UPPER_QUADRANT 
                                                       1131 
       GSE180661_SPECTR


      GSE180661_SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM 
                                                        228 
             GSE180661_SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY 
                                                        592 
   GSE180661_SPECTRUM-OV-003_S1_UNSORTED_INFRACOLIC_OMENTUM 
                                                       1115 
          GSE180661_SPECTRUM-OV-003_S1_UNSORTED_LEFT_ADNEXA 
                                                       1963 
  GSE180661_SPECTRUM-OV-003_S1_UNSORTED_LEFT_UPPER_QUADRANT 
                                                        925 
    GSE180661_SPECTRUM-OV-003_S1_UNSORTED_PELVIC_PERITONEUM 
                                                       2031 
         GSE180661_SPECTRUM-OV-003_S1_UNSORTED_RIGHT_ADNEXA 
                                                       1095 
 GSE180661_SPECTRUM-OV-003_S1_UNSORTED_RIGHT_UPPER_QUADRANT 
                                                       1131 
                   GSE1

In [19]:
#join layers and then split them by integration_id
Layers(OC[["RNA"]])
#split layers
OC[["RNA"]] <- split(OC[["RNA"]], f = OC$integration_id)
Layers(OC[["RNA"]])


“Input is a v3 assay and `split()` only works for v5 assays; converting
[36m•[39m to a v5 assay”
“Assay RNA changing from Assay to Assay5”


In [20]:
#record number of cells
table(OC$sample_type_major)

OC
OC@project.name


          ascites metastatic tumour    primary tumour 
            20484             97574             82035 

An object of class Seurat 
32223 features across 200093 samples within 1 assay 
Active assay: RNA (32223 features, 0 variable features)
 268 layers present: counts.GSE180661_SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY, counts.GSE180661_SPECTRUM-OV-003_S1_UNSORTED_LEFT_ADNEXA, counts.GSE180661_SPECTRUM-OV-003_S1_UNSORTED_RIGHT_ADNEXA, counts.GSE180661_SPECTRUM-OV-009_S1_CD45P_RIGHT_OVARY, counts.GSE180661_SPECTRUM-OV-009_S1_CD45P_LEFT_OVARY, counts.GSE180661_SPECTRUM-OV-014_S1_CD45P_LEFT_OVARY, counts.GSE180661_SPECTRUM-OV-022_S1_CD45P_RIGHT_ADNEXA, counts.GSE180661_SPECTRUM-OV-022_S1_CD45P_LEFT_ADNEXA, counts.GSE180661_SPECTRUM-OV-025_S1_CD45P_RIGHT_OVARY, counts.GSE180661_SPECTRUM-OV-026_S1_CD45P_RIGHT_OVARY, counts.GSE180661_SPECTRUM-OV-026_S1_CD45P_LEFT_ADNEXA, counts.GSE180661_SPECTRUM-OV-031_S1_CD45P_LEFT_FALLOPIAN_TUBE, counts.GSE180661_SPECTRUM-OV-036_S1_CD45P_RIGHT_ADNEXA, counts.GSE180661_SPECTRUM-OV-036_S1_CD45P_LEFT_ADNEXA, counts.GSE180661_SPECTRUM-OV-037_S1_CD45P_LEFT_OVARY, coun

In [None]:
#below just subsetting to count no. samples per sample type

In [22]:
OC_A <- subset(OC, subset = sample_type_major %in% c("ascites"))

In [23]:
OC_M <- subset(OC, subset = sample_type_major %in% c("metastatic tumour"))

In [24]:
OC_T <- subset(OC, subset = sample_type_major %in% c("primary tumour"))

In [25]:
table(OC_A$integration_id)
table(OC_M$integration_id)
table(OC_T$integration_id)


GSE180661_SPECTRUM-OV-014_S1_CD45P_ASCITES 
                                       317 
GSE180661_SPECTRUM-OV-022_S1_CD45P_ASCITES 
                                      2566 
GSE180661_SPECTRUM-OV-024_S1_CD45P_ASCITES 
                                      1591 
GSE180661_SPECTRUM-OV-026_S1_CD45P_ASCITES 
                                       313 
GSE180661_SPECTRUM-OV-041_S1_CD45P_ASCITES 
                                      1738 
GSE180661_SPECTRUM-OV-042_S1_CD45P_ASCITES 
                                       994 
GSE180661_SPECTRUM-OV-050_S1_CD45P_ASCITES 
                                       670 
GSE180661_SPECTRUM-OV-051_S1_CD45P_ASCITES 
                                      1537 
GSE180661_SPECTRUM-OV-054_S1_CD45P_ASCITES 
                                       207 
GSE180661_SPECTRUM-OV-068_S1_CD45P_ASCITES 
                                       157 
GSE180661_SPECTRUM-OV-070_S1_CD45P_ASCITES 
                                      1030 
GSE180661_SPECTRUM-OV-071_S1_CD


      GSE180661_SPECTRUM-OV-002_S1_CD45P_INFRACOLIC_OMENTUM 
                                                        228 
   GSE180661_SPECTRUM-OV-003_S1_UNSORTED_INFRACOLIC_OMENTUM 
                                                       1115 
  GSE180661_SPECTRUM-OV-003_S1_UNSORTED_LEFT_UPPER_QUADRANT 
                                                        925 
    GSE180661_SPECTRUM-OV-003_S1_UNSORTED_PELVIC_PERITONEUM 
                                                       2031 
 GSE180661_SPECTRUM-OV-003_S1_UNSORTED_RIGHT_UPPER_QUADRANT 
                                                       1131 
                   GSE180661_SPECTRUM-OV-008_S1_CD45P_BOWEL 
                                                        127 
      GSE180661_SPECTRUM-OV-008_S1_CD45P_INFRACOLIC_OMENTUM 
                                                        438 
                   GSE180661_SPECTRUM-OV-009_S1_CD45P_BOWEL 
                                                       1991 
      GSE180661_SPECTRU


         GSE180661_SPECTRUM-OV-002_S1_CD45P_RIGHT_OVARY 
                                                    592 
      GSE180661_SPECTRUM-OV-003_S1_UNSORTED_LEFT_ADNEXA 
                                                   1963 
     GSE180661_SPECTRUM-OV-003_S1_UNSORTED_RIGHT_ADNEXA 
                                                   1095 
          GSE180661_SPECTRUM-OV-009_S1_CD45P_LEFT_OVARY 
                                                   3108 
         GSE180661_SPECTRUM-OV-009_S1_CD45P_RIGHT_OVARY 
                                                   3608 
          GSE180661_SPECTRUM-OV-014_S1_CD45P_LEFT_OVARY 
                                                    376 
         GSE180661_SPECTRUM-OV-022_S1_CD45P_LEFT_ADNEXA 
                                                   4260 
        GSE180661_SPECTRUM-OV-022_S1_CD45P_RIGHT_ADNEXA 
                                                   2967 
         GSE180661_SPECTRUM-OV-025_S1_CD45P_RIGHT_OVARY 
                              

In [21]:
#re-export seurat object ready for integration
saveRDS(OC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE180661_myeloid_int.RDS")

In [26]:
#remove all objects in R
rm(list = ls())

Following the above then need to move entire Myeloid_Cells_Integrate folder from scratch to rdm
``` bash
rsync -azvhp /scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/ /QRISdata/Q5935/nikita/scdata/Myeloid_Cells/Myeloid_Cells_Integrate
```

## GSE154826

In [15]:
LUNG <- readRDS("/scratch/user/s4436039/scdata/GSE154826/GSE154826_myeloid.RDS")

In [16]:
LUNG
LUNG@project.name
head(LUNG@meta.data)

An object of class Seurat 
33694 features across 80487 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 103 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, counts.19, counts.20, counts.21, counts.22, counts.23, counts.24, counts.25, counts.26, counts.27, counts.28, counts.29, counts.30, counts.31, counts.32, counts.33, counts.34, counts.35, counts.36, counts.37, counts.38, counts.39, counts.40, counts.41, counts.42, counts.43, counts.44, counts.45, counts.46, counts.47, counts.48, counts.49, counts.50, counts.51, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, data.19, data.20, data.21, data.22, data.23, data.24, data.25, data.26, data.27, data.28, data.29, data.30, data.31, data.32, data.33, d

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,site,cancer_subtype,sample_type_major,integration_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE154826_D1_AAACCTGCAAATCCGT-1,GSE154826,1204,520,Healthy Lung,Healthy,p370,GSE154826_Healthy_p370,lung,,healthy,GSE154826_Healthy_p370,3.820598,3,3
GSE154826_D1_AAACCTGGTTCGAATC-1,GSE154826,16307,3146,Healthy Lung,Healthy,p370,GSE154826_Healthy_p370,lung,,healthy,GSE154826_Healthy_p370,3.581284,2,2
GSE154826_D1_AAACGGGCACCAGATT-1,GSE154826,406,271,Healthy Lung,Healthy,p370,GSE154826_Healthy_p370,lung,,healthy,GSE154826_Healthy_p370,4.926108,3,3
GSE154826_D1_AAACGGGCATCTCGCT-1,GSE154826,19344,3235,Healthy Lung,Healthy,p370,GSE154826_Healthy_p370,lung,,healthy,GSE154826_Healthy_p370,4.414806,2,2
GSE154826_D1_AAAGATGAGATGTGTA-1,GSE154826,1040,511,Healthy Lung,Healthy,p370,GSE154826_Healthy_p370,lung,,healthy,GSE154826_Healthy_p370,3.461538,3,3
GSE154826_D1_AAAGCAATCTGAAAGA-1,GSE154826,1165,611,Healthy Lung,Healthy,p370,GSE154826_Healthy_p370,lung,,healthy,GSE154826_Healthy_p370,9.527897,2,2


In [17]:
table(LUNG$sample_type)
table(LUNG$cancer_type)
table(LUNG$patient_id)
table(LUNG$sample_id)
table(LUNG$site)
table(LUNG$cancer_subtype)
table(LUNG$sample_type_major)
table(LUNG$integration_id)


Healthy Lung       tumour 
       46304        34183 


Healthy    LUAD    LUSC 
  46304   30450    3733 


 p338  p370  p371  p377  p378  p393  p403  p406  p408  p410  p458  p460  p464 
  570   977  4564  4216 12186  2051  4880  2488  2257  3754   212   278    96 
 p514  p522  p532  p558  p564  p569  p570  p571  p572  p578  p581  p714  p725 
 1772  3466  1767   479  7526  1649  3314  1330  3927  3233  3823  2092  2142 
 p729  p800 
 4802   636 


GSE154826_Healthy_p370 GSE154826_Healthy_p371 GSE154826_Healthy_p377 
                   738                   1176                   3335 
GSE154826_Healthy_p378 GSE154826_Healthy_p403 GSE154826_Healthy_p406 
                 10732                   4197                   1000 
GSE154826_Healthy_p408 GSE154826_Healthy_p410 GSE154826_Healthy_p458 
                  1876                   2670                    117 
GSE154826_Healthy_p460 GSE154826_Healthy_p464 GSE154826_Healthy_p514 
                   206                     56                    956 
GSE154826_Healthy_p522 GSE154826_Healthy_p532 GSE154826_Healthy_p564 
                  1279                   1259                   3759 
GSE154826_Healthy_p569 GSE154826_Healthy_p570 GSE154826_Healthy_p572 
                   869                   2246                   2350 
GSE154826_Healthy_p578 GSE154826_Healthy_p581 GSE154826_Healthy_p714 
                  1217                   2208                   1427 
GSE154826_Healthy_p


 lung 
80487 


 LUAD    NA NSCLC 
 3388 46304 30795 


       healthy primary tumour 
         46304          34183 


GSE154826_Healthy_p370 GSE154826_Healthy_p371 GSE154826_Healthy_p377 
                   738                   1176                   3335 
GSE154826_Healthy_p378 GSE154826_Healthy_p403 GSE154826_Healthy_p406 
                 10732                   4197                   1000 
GSE154826_Healthy_p408 GSE154826_Healthy_p410 GSE154826_Healthy_p458 
                  1876                   2670                    117 
GSE154826_Healthy_p460 GSE154826_Healthy_p464 GSE154826_Healthy_p514 
                   206                     56                    956 
GSE154826_Healthy_p522 GSE154826_Healthy_p532 GSE154826_Healthy_p564 
                  1279                   1259                   3759 
GSE154826_Healthy_p569 GSE154826_Healthy_p570 GSE154826_Healthy_p572 
                   869                   2246                   2350 
GSE154826_Healthy_p578 GSE154826_Healthy_p581 GSE154826_Healthy_p714 
                  1217                   2208                   1427 
GSE154826_Healthy_p

In [18]:
#need to fix incorrect metadata
table(LUNG$cancer_subtype)


 LUAD    NA NSCLC 
 3388 46304 30795 

In [19]:
#split by cancer_subtype 
LUAD <- subset(LUNG, subset = cancer_subtype %in% c("LUAD"))
Other <- subset(LUNG, subset = cancer_subtype %in% c("NA","NSCLC"))

#set cancer_subtype metadata
LUAD@meta.data$cancer_subtype <- "NSCLC"

#merge back together 
LUNG <- merge(LUAD, y = c(Other), project = "GSE154826")

In [20]:
table(LUNG$cancer_subtype)


   NA NSCLC 
46304 34183 

In [21]:
LUNG

An object of class Seurat 
33694 features across 80487 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 104 layers present: counts.4.1, counts.40.2, counts.41.2, counts.42.2, counts.43.2, counts.44.2, counts.45.2, counts.46.2, counts.47.2, counts.48.2, counts.49.2, data.4.1, scale.data.1, counts.1.2, counts.2.2, counts.3.2, counts.5.2, counts.6.2, counts.7.2, counts.8.2, counts.9.2, counts.10.2, counts.11.2, counts.12.2, counts.13.2, counts.14.2, counts.15.2, counts.16.2, counts.17.2, counts.18.2, counts.19.2, counts.20.2, counts.21.2, counts.22.2, counts.23.2, counts.24.2, counts.25.2, counts.26.2, counts.27.2, counts.28.2, counts.29.2, counts.30.2, counts.31.2, counts.32.2, counts.33.2, counts.34.2, counts.35.2, counts.36.2, counts.37.2, counts.38.2, counts.39.2, counts.50.2, counts.51.2, data.1.2, data.2.2, data.3.2, data.5.2, data.6.2, data.7.2, data.8.2, data.9.2, data.10.2, data.11.2, data.12.2, data.13.2, data.14.2, data.15.2, data.16.2, data.1

In [22]:
#exclude any samples with <100 cells
table(LUNG$integration_id)
#exclude GSE154826_Healthy_p464, GSE154826_NSCLC_p458, GSE154826_NSCLC_p460, GSE154826_NSCLC_p464
LUNG <- subset(LUNG, !(subset = integration_id %in% c("GSE154826_Healthy_p464","GSE154826_NSCLC_p458","GSE154826_NSCLC_p460","GSE154826_NSCLC_p464")))
table(LUNG$integration_id)


GSE154826_Healthy_p370 GSE154826_Healthy_p371 GSE154826_Healthy_p377 
                   738                   1176                   3335 
GSE154826_Healthy_p378 GSE154826_Healthy_p403 GSE154826_Healthy_p406 
                 10732                   4197                   1000 
GSE154826_Healthy_p408 GSE154826_Healthy_p410 GSE154826_Healthy_p458 
                  1876                   2670                    117 
GSE154826_Healthy_p460 GSE154826_Healthy_p464 GSE154826_Healthy_p514 
                   206                     56                    956 
GSE154826_Healthy_p522 GSE154826_Healthy_p532 GSE154826_Healthy_p564 
                  1279                   1259                   3759 
GSE154826_Healthy_p569 GSE154826_Healthy_p570 GSE154826_Healthy_p572 
                   869                   2246                   2350 
GSE154826_Healthy_p578 GSE154826_Healthy_p581 GSE154826_Healthy_p714 
                  1217                   2208                   1427 
GSE154826_Healthy_p


GSE154826_Healthy_p370 GSE154826_Healthy_p371 GSE154826_Healthy_p377 
                   738                   1176                   3335 
GSE154826_Healthy_p378 GSE154826_Healthy_p403 GSE154826_Healthy_p406 
                 10732                   4197                   1000 
GSE154826_Healthy_p408 GSE154826_Healthy_p410 GSE154826_Healthy_p458 
                  1876                   2670                    117 
GSE154826_Healthy_p460 GSE154826_Healthy_p514 GSE154826_Healthy_p522 
                   206                    956                   1279 
GSE154826_Healthy_p532 GSE154826_Healthy_p564 GSE154826_Healthy_p569 
                  1259                   3759                    869 
GSE154826_Healthy_p570 GSE154826_Healthy_p572 GSE154826_Healthy_p578 
                  2246                   2350                   1217 
GSE154826_Healthy_p581 GSE154826_Healthy_p714 GSE154826_Healthy_p729 
                  2208                   1427                   2631 
  GSE154826_NSCLC_p

In [23]:
#join layers and then split them by integration_id
Layers(LUNG[["RNA"]])
#join layers
LUNG[["RNA"]] <- JoinLayers(LUNG[["RNA"]])
Layers(LUNG[["RNA"]])
#split layers
LUNG[["RNA"]] <- split(LUNG[["RNA"]], f = LUNG$integration_id)
Layers(LUNG[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [26]:
#record number of cells
table(LUNG$sample_type_major)
table(LUNG$integration_id)
table(LUNG$integration_id, LUNG$sample_type_major)
LUNG
LUNG@project.name


       healthy primary tumour 
         46248          33976 


GSE154826_Healthy_p370 GSE154826_Healthy_p371 GSE154826_Healthy_p377 
                   738                   1176                   3335 
GSE154826_Healthy_p378 GSE154826_Healthy_p403 GSE154826_Healthy_p406 
                 10732                   4197                   1000 
GSE154826_Healthy_p408 GSE154826_Healthy_p410 GSE154826_Healthy_p458 
                  1876                   2670                    117 
GSE154826_Healthy_p460 GSE154826_Healthy_p514 GSE154826_Healthy_p522 
                   206                    956                   1279 
GSE154826_Healthy_p532 GSE154826_Healthy_p564 GSE154826_Healthy_p569 
                  1259                   3759                    869 
GSE154826_Healthy_p570 GSE154826_Healthy_p572 GSE154826_Healthy_p578 
                  2246                   2350                   1217 
GSE154826_Healthy_p581 GSE154826_Healthy_p714 GSE154826_Healthy_p729 
                  2208                   1427                   2631 
  GSE154826_NSCLC_p

                        
                         healthy primary tumour
  GSE154826_Healthy_p370     738              0
  GSE154826_Healthy_p371    1176              0
  GSE154826_Healthy_p377    3335              0
  GSE154826_Healthy_p378   10732              0
  GSE154826_Healthy_p403    4197              0
  GSE154826_Healthy_p406    1000              0
  GSE154826_Healthy_p408    1876              0
  GSE154826_Healthy_p410    2670              0
  GSE154826_Healthy_p458     117              0
  GSE154826_Healthy_p460     206              0
  GSE154826_Healthy_p514     956              0
  GSE154826_Healthy_p522    1279              0
  GSE154826_Healthy_p532    1259              0
  GSE154826_Healthy_p564    3759              0
  GSE154826_Healthy_p569     869              0
  GSE154826_Healthy_p570    2246              0
  GSE154826_Healthy_p572    2350              0
  GSE154826_Healthy_p578    1217              0
  GSE154826_Healthy_p581    2208              0
  GSE154826_Hea

An object of class Seurat 
33694 features across 80224 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 93 layers present: counts.GSE154826_NSCLC_p371, counts.GSE154826_Healthy_p370, counts.GSE154826_NSCLC_p370, counts.GSE154826_Healthy_p371, counts.GSE154826_Healthy_p377, counts.GSE154826_NSCLC_p377, counts.GSE154826_Healthy_p378, counts.GSE154826_NSCLC_p378, counts.GSE154826_NSCLC_p393, counts.GSE154826_Healthy_p403, counts.GSE154826_NSCLC_p403, counts.GSE154826_Healthy_p406, counts.GSE154826_NSCLC_p406, counts.GSE154826_Healthy_p408, counts.GSE154826_NSCLC_p408, counts.GSE154826_Healthy_p410, counts.GSE154826_NSCLC_p410, counts.GSE154826_Healthy_p458, counts.GSE154826_Healthy_p460, counts.GSE154826_Healthy_p514, counts.GSE154826_NSCLC_p514, counts.GSE154826_Healthy_p522, counts.GSE154826_NSCLC_p522, counts.GSE154826_Healthy_p532, counts.GSE154826_NSCLC_p532, counts.GSE154826_NSCLC_p558, counts.GSE154826_Healthy_p564, counts.GSE154826_NSCLC_p564, co

In [27]:
#re-export seurat object ready for integration
saveRDS(LUNG, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE154826_myeloid_int.RDS")

In [28]:
#remove all objects in R
rm(list = ls())